YouTube Scraper Tool

The YouTube Scraper Tool is a custom tool that allows the Newsletter AI Agent to find relevant video content on YouTube using an Apify actor. It provides a way to gather video-based information and insights related to the specified topic.

Overview

The YouTube Scraper Tool is primarily used by the Researcher Agent to gather video content about the specified topic. It provides a flexible interface for searching YouTube and extracting structured data from videos, channels, and playlists.

Implementation

The YouTube Scraper Tool is implemented as a CrewAI BaseTool that interacts with an Apify YouTube scraper actor. Here’s the implementation:

from crewai.tools import BaseTool
from pydantic import BaseModel, Field, ConfigDict
from typing import List, Optional
from apify import Actor
from src.tools.base import RunApifyActor

class YouTubeScraperInput(BaseModel):
    """Input schema for YouTubeScraper tool."""
    searchQueries: Optional[List[str]] = Field(
        description="Search terms just like you would enter in YouTube's search bar"
    )
    
    maxResultsShorts: Optional[int] = Field(
        default=0,
        description="Limit the number of Shorts videos to crawl"
    )
    
    maxResultStreams: Optional[int] = Field(
        default=0,
        description="Limit the number of Stream videos to crawl"
    )
    
    startUrls: Optional[List[str]] = Field(
        default=[],
        description="Direct URLs to YouTube videos, channels, playlists, hashtags or search results"
    )
    
    # Additional parameters...

class YouTubeScraperTool(BaseTool):
    name: str = "YouTube Scraper"
    description: str = "Tool for scraping YouTube videos, channels, playlists with configurable parameters"
    args_schema: type[BaseModel] = YouTubeScraperInput
    actor: Actor = Field(description="Apify Actor instance")
    model_config = ConfigDict(arbitrary_types_allowed=True)
    
    def _run(
        self,
        searchQueries: Optional[List[str]] = None,
        maxResultsShorts: Optional[int] = 0,
        maxResultStreams: Optional[int] = 0,
        startUrls: Optional[List[str]] = [],
        # Additional parameters...
    ) -> str:
        run_inputs = {}
        
        if searchQueries:
            run_inputs["searchQueries"] = searchQueries
        if maxResultsShorts:
            run_inputs["maxResultsShorts"] = maxResultsShorts
        if maxResultStreams:
            run_inputs["maxResultStreams"] = maxResultStreams
        if startUrls:
            run_inputs["startUrls"] = startUrls
        # Set additional parameters...
        
        run_actor = RunApifyActor(self.actor)
        dataset = run_actor._run("youtube-scraper-actor-name", run_inputs)
        return dataset

Parameters

The YouTube Scraper Tool accepts the following parameters:

ParameterTypeDescriptionDefault
searchQueriesList[str]Search terms for YouTube’s search barRequired
maxResultsShortsintLimit the number of Shorts videos to crawl0
maxResultStreamsintLimit the number of Stream videos to crawl0
startUrlsList[str]Direct URLs to YouTube videos, channels, playlists[]
downloadSubtitlesboolDownload subtitles for videosFalse
saveSubsToKVSboolSave downloaded subtitles to key-value storeFalse
subtitlesLanguagestrLanguage for subtitles download”any”
preferAutoGeneratedSubtitlesboolPrefer auto-generated subtitlesFalse
subtitlesFormatstrFormat for subtitle downloads”srt”
sortingOrderstrHow to sort the resultsNone
dateFilterstrFilter results by dateNone
videoTypestrFilter by video typeNone
lengthFilterstrFilter by video lengthNone
isHDboolFilter for HD videosNone
hasSubtitlesboolFilter for videos with subtitlesNone

Usage

The YouTube Scraper Tool is used by the Researcher Agent to gather video content about the specified topic:

# Initialize the tool
youtube_tool = YouTubeScraperTool(actor=actor)

# Use the tool
youtube_results = youtube_tool._run(
    searchQueries=[topic],
    maxResultsShorts=0,
    maxResultStreams=0,
    sortingOrder="relevance",
    dateFilter="last_month"
)

Return Value

The tool returns a list of YouTube videos, where each video is a dictionary containing information about the video, including:

  • title: The title of the video
  • url: The URL of the video
  • description: The description of the video
  • channelName: The name of the channel that uploaded the video
  • channelUrl: The URL of the channel
  • viewCount: The number of views the video has
  • publishedAt: The date the video was published
  • duration: The duration of the video
  • Additional metadata about the video

Apify Integration

The tool uses an Apify YouTube scraper actor, which provides several advantages:

  1. Scalability: The actor can handle large numbers of YouTube searches efficiently
  2. Reliability: The actor is designed to handle rate limiting and other issues that can arise when scraping YouTube
  3. Structured Data: The actor returns YouTube videos in a structured format that is easy to process
  4. Advanced Filtering: The actor supports advanced filtering options to narrow down search results

Configuration

To use the YouTube Scraper Tool, you need to set up the following environment variables:

APIFY_API_KEY=your_apify_api_key_here

Next Steps