Twitter Scraper Tool

The Twitter Scraper Tool is a custom tool that allows the Newsletter AI Agent to collect tweets related to a specified topic using the Apify Twitter Scraper Lite actor.

Overview

The Twitter Scraper Tool is primarily used by the Researcher Agent to gather social media insights about the specified topic. It provides a flexible interface for searching Twitter and extracting structured data from tweets.

Implementation

The Twitter Scraper Tool is implemented as a CrewAI BaseTool that interacts with the Apify Twitter Scraper Lite actor. Here’s the implementation:

from crewai.tools import BaseTool
from pydantic import BaseModel, Field, ConfigDict
from typing import List, Optional
from apify import Actor
from src.tools.base import RunApifyActor

class TwitterScraperInput(BaseModel):
    """Input schema for TwitterScraper tool."""
    searchTerms: Optional[List[str]] = Field(
        description="Search terms to find tweets containing these terms. Alternative to using Twitter URLs.",
        default=None
    )

    sort: Optional[str] = Field(
        description="How to sort the returned tweets. Setting to 'Latest' yields more results.",
        default="Top",
        enum=["Top", "Latest"]
    )

    start: Optional[str] = Field(
        description="Scrape tweets starting from this date (format: YYYY-MM-DD)",
        default=None
    )

    end: Optional[str] = Field(
        description="Scrape tweets until this date (format: YYYY-MM-DD)", 
        default=None
    )

class TwitterScraperTool(BaseTool):
    name: str = "Twitter Scraper"
    description: str = "Tool for scraping Twitter content with configurable parameters"
    args_schema: type[BaseModel] = TwitterScraperInput
    actor: Actor = Field(description="Apify Actor instance")
    model_config = ConfigDict(arbitrary_types_allowed=True)

    def _run(
        self,
        searchTerms: Optional[List[str]] = None,
        sort: Optional[str] = "latest",
        start: Optional[str] = None,
        end: Optional[str] = None
    ) -> str:
        run_inputs = {}
        
        if searchTerms:
            run_inputs["searchTerms"] = searchTerms
        if sort:
            run_inputs["sort"] = sort
        run_inputs["maxItems"] = 5
        if start:
            run_inputs["start"] = start
        if end:
            run_inputs["end"] = end

        run_actor = RunApifyActor(self.actor)
        dataset = run_actor._run("apidojo/twitter-scraper-lite", run_inputs)
        return dataset

Parameters

The Twitter Scraper Tool accepts the following parameters:

ParameterTypeDescriptionDefault
searchTermsList[str]Search terms to find tweets containing these termsNone
sortstrHow to sort the returned tweets (“Top” or “Latest”)“Top”
startstrScrape tweets starting from this date (YYYY-MM-DD)None
endstrScrape tweets until this date (YYYY-MM-DD)None

Usage

The Twitter Scraper Tool is used by the Researcher Agent to gather social media insights about the specified topic:

# Initialize the tool
twitter_tool = TwitterScraperTool(actor=actor)

# Use the tool
twitter_results = twitter_tool._run(
    searchTerms=[topic],
    sort="latest",
    start="2023-01-01",  # Optional: start date
    end="2023-12-31"     # Optional: end date
)

Return Value

The tool returns a list of tweets, where each tweet is a dictionary containing information about the tweet, including:

  • text: The text content of the tweet
  • url: The URL of the tweet
  • username: The username of the tweet author
  • timestamp: The timestamp when the tweet was posted
  • likes: The number of likes the tweet received
  • retweets: The number of retweets the tweet received
  • Additional metadata about the tweet

Apify Integration

The tool uses the Apify Twitter Scraper Lite actor, which provides several advantages:

  1. Scalability: The actor can handle large numbers of Twitter searches efficiently
  2. Reliability: The actor is designed to handle rate limiting and other issues that can arise when scraping Twitter
  3. Structured Data: The actor returns tweets in a structured format that is easy to process
  4. No API Key Required: Unlike the official Twitter API, the actor doesn’t require API keys for basic functionality

Configuration

To use the Twitter Scraper Tool, you need to set up the following environment variables:

APIFY_API_KEY=your_apify_api_key_here

Next Steps