Google News Scraper Tool
The Google News Scraper Tool is a custom tool that allows the Newsletter AI Agent to gather the latest news articles related to a specified topic using the Apify Super Fast Google News Scraper actor.
Overview
The Google News Scraper Tool is primarily used by the Researcher Agent to gather the latest news and developments about the specified topic. It provides a flexible interface for searching Google News and extracting structured data from news articles.
Implementation
The Google News Scraper Tool is implemented as a CrewAI BaseTool
that interacts with the Apify Super Fast Google News Scraper actor. Here’s the implementation:
from crewai.tools import BaseTool
from pydantic import BaseModel, Field, ConfigDict
from typing import List, Optional, Literal
from apify import Actor
from src.tools.base import RunApifyActor
class GoogleNewsScraperInput(BaseModel):
"""Input schema for GoogleNewsScraper tool."""
keywords: List[str] = Field(
description="The keywords used to search for news articles"
)
maxItems: Optional[int] = Field(
description="Set the maximum number of items you want to scrape for each keyword. If left unset, the actor will extract all available news.",
default=20
)
class GoogleNewsScraperTool(BaseTool):
name: str = "Google News Scraper"
description: str = "Tool for scraping Google News articles with configurable parameters"
args_schema: type[BaseModel] = GoogleNewsScraperInput
actor: Actor = Field(description="Apify Actor instance")
model_config = ConfigDict(arbitrary_types_allowed=True)
def _run(
self,
keywords: List[str],
language: Optional[str] = "US:en",
maxItems: Optional[int] = None
) -> str:
run_inputs = {
"keywords": keywords,
"language": language
}
if maxItems:
run_inputs["maxItems"] = maxItems
proxy = {
"useApifyProxy": True,
"apifyProxyGroups": [
"RESIDENTIAL"
]
}
run_inputs["proxy"] = proxy
run_actor = RunApifyActor(self.actor)
dataset = run_actor._run("aymorato/super-fast-google-news-scraper-pay-per-result", run_inputs)
return dataset
Parameters
The Google News Scraper Tool accepts the following parameters:
Parameter | Type | Description | Default |
---|
keywords | List[str] | The keywords used to search for news articles | Required |
language | str | Language and country code for the search (e.g., “US:en”) | “US:en” |
maxItems | int | Maximum number of items to scrape for each keyword | 20 |
Usage
The Google News Scraper Tool is used by the Researcher Agent to gather the latest news about the specified topic:
# Initialize the tool
news_tool = GoogleNewsScraperTool(actor=actor)
# Use the tool
news_results = news_tool._run(
keywords=[topic],
language="US:en",
maxItems=20
)
Return Value
The tool returns a list of news articles, where each article is a dictionary containing information about the article, including:
title
: The title of the news article
link
: The URL of the news article
source
: The source of the news article (e.g., “CNN”, “BBC”)
publishedAt
: The date the article was published
snippet
: A brief snippet or summary of the article
- Additional metadata about the article
Apify Integration
The tool uses the Apify Super Fast Google News Scraper actor, which provides several advantages:
- Scalability: The actor can handle large numbers of news searches efficiently
- Reliability: The actor is designed to handle rate limiting and other issues that can arise when scraping Google News
- Structured Data: The actor returns news articles in a structured format that is easy to process
- Freshness: The actor focuses on retrieving the latest news articles, ensuring that the information is up-to-date
Configuration
To use the Google News Scraper Tool, you need to set up the following environment variables:
APIFY_API_KEY=your_apify_api_key_here
Next Steps
Responses are generated using AI and may contain mistakes.