Google Search Tool

The Google Search Tool is a custom tool that allows the Newsletter AI Agent to search the web for relevant information using the Apify Google Search Scraper actor.

Overview

The Google Search Tool is primarily used by the Researcher Agent to gather general information about the specified topic. It provides a flexible interface for searching Google and extracting structured data from search results.

Implementation

The Google Search Tool is implemented as a CrewAI BaseTool that interacts with the Apify Google Search Scraper actor. Here’s the implementation:

from crewai.tools import BaseTool
from pydantic import BaseModel, Field, ConfigDict
from typing import List, Optional
from apify import Actor
from src.tools.base import RunApifyActor

class GoogleScraperInput(BaseModel):
    """Input schema for GoogleScraper tool."""
    queries: List[str] = Field(
        description="Search terms or Google Search URLs. Can use advanced techniques like 'AI site:twitter.com'. Limit 32 words per query."
    )

    resultsPerPage: Optional[int] = Field(
        description="Number of results to return per page",
        default=10
    )

    languageCode: Optional[str] = Field(
        default="en",
        description="Language for search results (passed as hl parameter)"
    )
    
    # Additional parameters...

class GoogleScraperTool(BaseTool):
    name: str = "Google Scraper"
    description: str = "Tool for scraping Google search results with configurable parameters"
    args_schema: type[BaseModel] = GoogleScraperInput
    actor: Actor = Field(description="Apify Actor instance")
    model_config = ConfigDict(arbitrary_types_allowed=True)
    
    def _run(
        self,
        queries: List[str],
        resultsPerPage: Optional[int] = 10,
        languageCode: Optional[str] = "en",
        # Additional parameters...
    ) -> str:
        run_inputs = {
            "queries": "\n".join(queries)
        }
        
        # Set additional parameters...
        
        run_actor = RunApifyActor(self.actor)
        dataset = run_actor._run("apify/google-search-scraper", run_inputs)
        return dataset

Parameters

The Google Search Tool accepts the following parameters:

ParameterTypeDescriptionDefault
queriesList[str]Search terms or Google Search URLsRequired
resultsPerPageintNumber of results to return per page10
languageCodestrLanguage for search results”en”
forceExactMatchboolWrap query in quotes for exact phrase matchingFalse
sitestrLimit search to specific site (e.g. site:example.com)None
relatedToSitestrFilter pages related to specific siteNone
wordsInTitleList[str]Filter pages with specific words in title[]
wordsInTextList[str]Filter pages with specific words in text[]
wordsInUrlList[str]Filter pages with specific words in URL[]
quickDateRangestrFilter by date range (e.g. d10, w2, m6, y1)“d30”
beforeDatestrFilter results before date (YYYY-MM-DD)None
afterDatestrFilter results after date (YYYY-MM-DD)None
fileTypesList[str]Filter by file types[]
mobileResultsboolReturn mobile version of search resultsFalse
includeUnfilteredResultsboolInclude lower quality resultsFalse

Usage

The Google Search Tool is used by the Researcher Agent to gather information about the specified topic:

# Initialize the tool
google_tool = GoogleScraperTool(actor=actor)

# Use the tool
search_params = {
    "queries": [topic],
    "resultsPerPage": 5,
    "maxPagesPerQuery": 2,
    "languageCode": "en",
    "quickDateRange": "m1"  # Last month
}

web_results = google_tool._run(**search_params)

Return Value

The tool returns a list of search results, where each result is a dictionary containing information about a search result, including:

  • title: The title of the search result
  • url: The URL of the search result
  • description: A snippet of text from the search result
  • position: The position of the result in the search results
  • Additional metadata about the search result

Apify Integration

The tool uses the Apify Google Search Scraper actor, which provides several advantages:

  1. Scalability: The actor can handle large numbers of search queries efficiently
  2. Reliability: The actor is designed to handle rate limiting and other issues that can arise when scraping search results
  3. Structured Data: The actor returns search results in a structured format that is easy to process

Configuration

To use the Google Search Tool, you need to set up the following environment variables:

APIFY_API_KEY=your_apify_api_key_here

Next Steps