# Firecrawl Tool Advanced web scraping and crawling with JavaScript rendering URL: https://agentkit.tech/docs/tools/built-in/firecrawl.mdx The Firecrawl tool provides advanced web scraping and crawling with JavaScript rendering and anti-bot bypass capabilities. ## Installation ```go import "github.com/model-box/agent-kit/tool/firecrawl" ``` ## Setup ### Requirements 1. **Firecrawl API Key**: Sign up at [Firecrawl](https://www.firecrawl.dev/) 2. **API Access**: Free tier includes 500 credits per month ### Environment Variables ```bash export FIRECRAWL_API_KEY="your-firecrawl-api-key" ``` ## Usage ```go package main import ( "context" "os" "github.com/model-box/agent-kit/agent" "github.com/model-box/agent-kit/model" "github.com/model-box/agent-kit/session" "github.com/model-box/agent-kit/tool/firecrawl" ) func main() { // Create Firecrawl tools firecrawlTools := firecrawl.NewFirecrawlTools() // Create model model := model.Model("gpt-4o"). SetAPIKey(os.Getenv("OPENAI_API_KEY")) // Create agent with Firecrawl tools agent := agent.New(). SetModel(model). SetSystemPrompt("You are a web scraping assistant."). AddTool(firecrawlTools.Scrape()). AddTool(firecrawlTools.Crawl()) // Create session and run session := session.New(agent) ctx := context.Background() response, err := session.Run(ctx, []agent.ChatMessage{ agent.NewUserMessage("Scrape the main content from https://example.com in markdown format"), }, nil) if err != nil { panic(err) } println(response.GetLastMessage().GetContent()) } ``` ## Available Tools ### firecrawl\_scrape Scrape a single webpage with JavaScript rendering support. | Parameter | Type | Required | Description | | ------------------- | --------- | -------- | ---------------------------------------------------------------------------------- | | `url` | string | Yes | The URL to scrape | | `formats` | \[]string | No | Output formats: \["markdown", "html", "rawHtml", "content", "links", "screenshot"] | | `only_main_content` | bool | No | Extract only main content (default: true) | | `include_tags` | \[]string | No | HTML tags to include (e.g., \["article", "main"]) | | `exclude_tags` | \[]string | No | HTML tags to exclude (e.g., \["nav", "footer"]) | | `wait_for` | int | No | Wait time in milliseconds before scraping (max: 10000) | | `timeout` | int | No | Timeout in milliseconds (default: 30000, max: 60000) | ### firecrawl\_crawl Crawl multiple pages from a website. | Parameter | Type | Required | Description | | ------------------- | --------- | -------- | -------------------------------------------------------- | | `url` | string | Yes | The starting URL to crawl | | `max_depth` | int | No | Maximum crawl depth (default: 2, max: 5) | | `limit` | int | No | Maximum number of pages to crawl (default: 10, max: 100) | | `allowed_domains` | \[]string | No | Domains to restrict crawling to | | `exclude_paths` | \[]string | No | URL paths to exclude from crawling | | `include_paths` | \[]string | No | URL paths to include in crawling | | `only_main_content` | bool | No | Extract only main content (default: true) | ## Output Formats ### Markdown Format Clean, readable markdown with proper formatting for easy processing. ### HTML Format Cleaned HTML with unnecessary elements removed. ### Raw HTML Format Complete HTML as rendered by the browser. ### Content Format Plain text content without any formatting. ### Links Format All links found on the page with their text and URLs. ### Screenshot Format Base64-encoded screenshot of the page. ## Features * **JavaScript Rendering**: Handles modern SPAs and dynamic content * **Anti-Bot Bypass**: Automatically handles many anti-scraping measures * **Content Extraction**: Intelligent extraction of main content * **Metadata Extraction**: Extracts title, description, Open Graph tags * **Link Extraction**: Collects all links with context * **Screenshot Capture**: Can capture page screenshots * **Batch Crawling**: Crawl entire websites efficiently ## Rate Limits and Credits * **Scraping**: 1 credit per page * **Crawling**: 1 credit per page crawled * **Free tier**: 500 credits/month * **Starter**: 5,000 credits/month * **Growth**: 50,000 credits/month ## Best Practices 1. **Use specific selectors**: Include/exclude tags for better content extraction 2. **Set appropriate timeouts**: For slow-loading pages 3. **Limit crawl depth**: To avoid excessive API usage 4. **Filter domains**: When crawling to stay within scope 5. **Use wait\_for**: For pages that load content dynamically