# Spider Cloud Tool Enterprise web scraping, crawling and search with proxy support URL: https://agentkit.tech/docs/tools/built-in/spider-cloud.mdx The Spider Cloud tool provides enterprise-grade web scraping, crawling, and search with advanced proxy support and intelligent content extraction. ## Installation ```go import "github.com/model-box/agent-kit/tool/spider_cloud" ``` ## Setup ### Requirements 1. **Spider Cloud API Key**: Sign up at [Spider Cloud](https://spider.cloud/) 2. **API Access**: Various pricing tiers available ### Environment Variables ```bash export SPIDER_CLOUD_API_KEY="your-spider-cloud-api-key" ``` ## Usage ```go package main import ( "context" "os" "github.com/model-box/agent-kit/agent" "github.com/model-box/agent-kit/model" "github.com/model-box/agent-kit/session" "github.com/model-box/agent-kit/tool/spider_cloud" ) func main() { // Create Spider Cloud tools spiderTools := spider_cloud.NewSpiderCloudTools() // Create model model := model.Model("gpt-4o"). SetAPIKey(os.Getenv("OPENAI_API_KEY")) // Create agent with Spider Cloud tools agent := agent.New(). SetModel(model). SetSystemPrompt("You are a web research assistant with advanced scraping capabilities."). AddTool(spiderTools.Scrape()). AddTool(spiderTools.Crawl()). AddTool(spiderTools.Search()) // Create session and run session := session.New(agent) ctx := context.Background() response, err := session.Run(ctx, []agent.ChatMessage{ agent.NewUserMessage("Search for 'machine learning tutorials' and fetch content of top results"), }, nil) if err != nil { panic(err) } println(response.GetLastMessage().GetContent()) } ``` ## Available Tools ### spider\_scrape Advanced webpage scraping with multiple options. | Parameter | Type | Required | Description | | ---------------- | ------------------ | -------- | ---------------------------------------------- | | `url` | string | Yes | The URL to scrape | | `return_formats` | \[]string | No | Content formats: \["markdown", "html", "text"] | | `request_type` | string | No | "http", "chrome", "smart" (default: "http") | | `custom_headers` | map\[string]string | No | Custom HTTP headers | | `cookies` | \[]Cookie | No | Cookies to send with request | | `proxy_config` | ProxyConfig | No | Proxy configuration | | `store_cookies` | bool | No | Store cookies from response | | `metadata` | bool | No | Include metadata in response | | `readability` | bool | No | Use readability mode (default: true) | ### spider\_crawl Crawl websites with configurable depth and filters. | Parameter | Type | Required | Description | | -------------------- | --------- | -------- | ---------------------------------------------- | | `url` | string | Yes | The starting URL to crawl | | `limit` | int | No | Maximum pages to crawl (default: 10, max: 500) | | `depth` | int | No | Maximum crawl depth (default: 3) | | `allowed_domains` | \[]string | No | Domains to restrict crawling to | | `blacklist_patterns` | \[]string | No | URL patterns to exclude | | `whitelist_patterns` | \[]string | No | URL patterns to include | | `return_formats` | \[]string | No | Content formats: \["markdown", "html", "text"] | | `request_type` | string | No | "http", "chrome", "smart" | | `readability` | bool | No | Use readability mode (default: true) | ### spider\_search Search the web with optional content fetching. | Parameter | Type | Required | Description | | -------------------- | ------ | -------- | ---------------------------------------------- | | `query` | string | Yes | Search query | | `search_type` | string | No | "search", "news", "images" (default: "search") | | `num_results` | int | No | Number of results (default: 10, max: 100) | | `domain` | string | No | Limit to specific domain | | `lang` | string | No | Language code | | `country` | string | No | Country code | | `fetch_page_content` | bool | No | Fetch full page content for each result | ## Request Types * **HTTP**: Fast, basic HTTP requests * **Chrome**: Full browser rendering for JavaScript-heavy sites * **Smart**: Automatically detects if browser rendering is needed ## Advanced Features ### Proxy Support Configure datacenter or residential proxies with country selection. ### Content Extraction Intelligent extraction with readability mode that removes: * Navigation menus * Advertisements * Sidebars * Footer content * Scripts and styles ### Pattern Matching Use URL patterns to control crawling: * Blacklist: `*/admin/*`, `*.pdf`, `*?print=true` * Whitelist: `/blog/*`, `*/2024/*` ## Use Cases 1. **Competitive Analysis**: Monitor competitor websites 2. **Content Aggregation**: Collect articles from multiple sources 3. **Price Monitoring**: Track product prices across e-commerce sites 4. **SEO Analysis**: Analyze website structure and content 5. **Research**: Gather information from academic or news sites 6. **Lead Generation**: Extract business information 7. **Market Research**: Analyze industry trends ## Best Practices 1. **Respect robots.txt**: Check site policies 2. **Use appropriate delays**: Don't overwhelm servers 3. **Set user agent**: Identify your bot 4. **Handle errors gracefully**: Implement retry logic 5. **Use proxies wisely**: For sites with rate limits 6. **Filter URLs**: Use patterns to stay focused