ModelBox Open Source

/

AgentKit
Core Concepts/Tools/Built-in Tools/Firecrawl Tool

Firecrawl Tool

Advanced web scraping and crawling with JavaScript rendering

|

The Firecrawl tool provides advanced web scraping and crawling with JavaScript rendering and anti-bot bypass capabilities.

Installation

import "github.com/model-box/agent-kit/tool/firecrawl"

Setup

Requirements

  1. Firecrawl API Key: Sign up at Firecrawl
  2. API Access: Free tier includes 500 credits per month

Environment Variables

export FIRECRAWL_API_KEY="your-firecrawl-api-key"

Usage

package main

import (
    "context"
    "os"
    
    "github.com/model-box/agent-kit/agent"
    "github.com/model-box/agent-kit/model"
    "github.com/model-box/agent-kit/session"
    "github.com/model-box/agent-kit/tool/firecrawl"
)

func main() {
    // Create Firecrawl tools
    firecrawlTools := firecrawl.NewFirecrawlTools()
    
    // Create model
    model := model.Model("gpt-4o").
        SetAPIKey(os.Getenv("OPENAI_API_KEY"))
    
    // Create agent with Firecrawl tools
    agent := agent.New().
        SetModel(model).
        SetSystemPrompt("You are a web scraping assistant.").
        AddTool(firecrawlTools.Scrape()).
        AddTool(firecrawlTools.Crawl())
    
    // Create session and run
    session := session.New(agent)
    ctx := context.Background()
    
    response, err := session.Run(ctx, []agent.ChatMessage{
        agent.NewUserMessage("Scrape the main content from https://example.com in markdown format"),
    }, nil)
    
    if err != nil {
        panic(err)
    }
    
    println(response.GetLastMessage().GetContent())
}

Available Tools

firecrawl_scrape

Scrape a single webpage with JavaScript rendering support.

ParameterTypeRequiredDescription
urlstringYesThe URL to scrape
formats[]stringNoOutput formats: ["markdown", "html", "rawHtml", "content", "links", "screenshot"]
only_main_contentboolNoExtract only main content (default: true)
include_tags[]stringNoHTML tags to include (e.g., ["article", "main"])
exclude_tags[]stringNoHTML tags to exclude (e.g., ["nav", "footer"])
wait_forintNoWait time in milliseconds before scraping (max: 10000)
timeoutintNoTimeout in milliseconds (default: 30000, max: 60000)

firecrawl_crawl

Crawl multiple pages from a website.

ParameterTypeRequiredDescription
urlstringYesThe starting URL to crawl
max_depthintNoMaximum crawl depth (default: 2, max: 5)
limitintNoMaximum number of pages to crawl (default: 10, max: 100)
allowed_domains[]stringNoDomains to restrict crawling to
exclude_paths[]stringNoURL paths to exclude from crawling
include_paths[]stringNoURL paths to include in crawling
only_main_contentboolNoExtract only main content (default: true)

Output Formats

Markdown Format

Clean, readable markdown with proper formatting for easy processing.

HTML Format

Cleaned HTML with unnecessary elements removed.

Raw HTML Format

Complete HTML as rendered by the browser.

Content Format

Plain text content without any formatting.

All links found on the page with their text and URLs.

Screenshot Format

Base64-encoded screenshot of the page.

Features

  • JavaScript Rendering: Handles modern SPAs and dynamic content
  • Anti-Bot Bypass: Automatically handles many anti-scraping measures
  • Content Extraction: Intelligent extraction of main content
  • Metadata Extraction: Extracts title, description, Open Graph tags
  • Link Extraction: Collects all links with context
  • Screenshot Capture: Can capture page screenshots
  • Batch Crawling: Crawl entire websites efficiently

Rate Limits and Credits

  • Scraping: 1 credit per page
  • Crawling: 1 credit per page crawled
  • Free tier: 500 credits/month
  • Starter: 5,000 credits/month
  • Growth: 50,000 credits/month

Best Practices

  1. Use specific selectors: Include/exclude tags for better content extraction
  2. Set appropriate timeouts: For slow-loading pages
  3. Limit crawl depth: To avoid excessive API usage
  4. Filter domains: When crawling to stay within scope
  5. Use wait_for: For pages that load content dynamically
Edit on GitHub