Google News Scraper Tool

The Google News Scraper Tool is a custom tool that allows the Newsletter AI Agent to gather the latest news articles related to a specified topic using the Apify Super Fast Google News Scraper actor.

Overview

The Google News Scraper Tool is primarily used by the Researcher Agent to gather the latest news and developments about the specified topic. It provides a flexible interface for searching Google News and extracting structured data from news articles.

Implementation

The Google News Scraper Tool is implemented as a CrewAI BaseTool that interacts with the Apify Super Fast Google News Scraper actor. Here’s the implementation:

from crewai.tools import BaseTool
from pydantic import BaseModel, Field, ConfigDict
from typing import List, Optional, Literal
from apify import Actor
from src.tools.base import RunApifyActor

class GoogleNewsScraperInput(BaseModel):
    """Input schema for GoogleNewsScraper tool."""
    keywords: List[str] = Field(
        description="The keywords used to search for news articles"
    )

    maxItems: Optional[int] = Field(
        description="Set the maximum number of items you want to scrape for each keyword. If left unset, the actor will extract all available news.",
        default=20
    )

class GoogleNewsScraperTool(BaseTool):
    name: str = "Google News Scraper"
    description: str = "Tool for scraping Google News articles with configurable parameters"
    args_schema: type[BaseModel] = GoogleNewsScraperInput
    actor: Actor = Field(description="Apify Actor instance")
    model_config = ConfigDict(arbitrary_types_allowed=True)

    def _run(
        self,
        keywords: List[str],
        language: Optional[str] = "US:en",
        maxItems: Optional[int] = None
    ) -> str:
        run_inputs = {
            "keywords": keywords,
            "language": language
        }
        
        if maxItems:
            run_inputs["maxItems"] = maxItems

        proxy = {
            "useApifyProxy": True,
            "apifyProxyGroups": [
                "RESIDENTIAL"
            ]
        }
        run_inputs["proxy"] = proxy
        
        run_actor = RunApifyActor(self.actor)
        dataset = run_actor._run("aymorato/super-fast-google-news-scraper-pay-per-result", run_inputs)
        return dataset

Parameters

The Google News Scraper Tool accepts the following parameters:

ParameterTypeDescriptionDefault
keywordsList[str]The keywords used to search for news articlesRequired
languagestrLanguage and country code for the search (e.g., “US:en”)“US:en”
maxItemsintMaximum number of items to scrape for each keyword20

Usage

The Google News Scraper Tool is used by the Researcher Agent to gather the latest news about the specified topic:

# Initialize the tool
news_tool = GoogleNewsScraperTool(actor=actor)

# Use the tool
news_results = news_tool._run(
    keywords=[topic],
    language="US:en",
    maxItems=20
)

Return Value

The tool returns a list of news articles, where each article is a dictionary containing information about the article, including:

  • title: The title of the news article
  • link: The URL of the news article
  • source: The source of the news article (e.g., “CNN”, “BBC”)
  • publishedAt: The date the article was published
  • snippet: A brief snippet or summary of the article
  • Additional metadata about the article

Apify Integration

The tool uses the Apify Super Fast Google News Scraper actor, which provides several advantages:

  1. Scalability: The actor can handle large numbers of news searches efficiently
  2. Reliability: The actor is designed to handle rate limiting and other issues that can arise when scraping Google News
  3. Structured Data: The actor returns news articles in a structured format that is easy to process
  4. Freshness: The actor focuses on retrieving the latest news articles, ensuring that the information is up-to-date

Configuration

To use the Google News Scraper Tool, you need to set up the following environment variables:

APIFY_API_KEY=your_apify_api_key_here

Next Steps