YouTube Scraper Tool

The YouTube Scraper Tool is a custom tool that allows the Newsletter AI Agent to find relevant video content on YouTube using an Apify actor. It provides a way to gather video-based information and insights related to the specified topic.

Overview

The YouTube Scraper Tool is primarily used by the Researcher Agent to gather video content about the specified topic. It provides a flexible interface for searching YouTube and extracting structured data from videos, channels, and playlists.

Implementation

The YouTube Scraper Tool is implemented as a CrewAI BaseTool that interacts with an Apify YouTube scraper actor. Here’s the implementation:

from crewai.tools import BaseTool
from pydantic import BaseModel, Field, ConfigDict
from typing import List, Optional
from apify import Actor
from src.tools.base import RunApifyActor

class YouTubeScraperInput(BaseModel):
    """Input schema for YouTubeScraper tool."""
    searchQueries: Optional[List[str]] = Field(
        description="Search terms just like you would enter in YouTube's search bar"
    )
    
    maxResultsShorts: Optional[int] = Field(
        default=0,
        description="Limit the number of Shorts videos to crawl"
    )
    
    maxResultStreams: Optional[int] = Field(
        default=0,
        description="Limit the number of Stream videos to crawl"
    )
    
    startUrls: Optional[List[str]] = Field(
        default=[],
        description="Direct URLs to YouTube videos, channels, playlists, hashtags or search results"
    )
    
    # Additional parameters...

class YouTubeScraperTool(BaseTool):
    name: str = "YouTube Scraper"
    description: str = "Tool for scraping YouTube videos, channels, playlists with configurable parameters"
    args_schema: type[BaseModel] = YouTubeScraperInput
    actor: Actor = Field(description="Apify Actor instance")
    model_config = ConfigDict(arbitrary_types_allowed=True)
    
    def _run(
        self,
        searchQueries: Optional[List[str]] = None,
        maxResultsShorts: Optional[int] = 0,
        maxResultStreams: Optional[int] = 0,
        startUrls: Optional[List[str]] = [],
        # Additional parameters...
    ) -> str:
        run_inputs = {}
        
        if searchQueries:
            run_inputs["searchQueries"] = searchQueries
        if maxResultsShorts:
            run_inputs["maxResultsShorts"] = maxResultsShorts
        if maxResultStreams:
            run_inputs["maxResultStreams"] = maxResultStreams
        if startUrls:
            run_inputs["startUrls"] = startUrls
        # Set additional parameters...
        
        run_actor = RunApifyActor(self.actor)
        dataset = run_actor._run("youtube-scraper-actor-name", run_inputs)
        return dataset

Parameters

The YouTube Scraper Tool accepts the following parameters:

Parameter	Type	Description	Default
`searchQueries`	List[str]	Search terms for YouTube’s search bar	Required
`maxResultsShorts`	int	Limit the number of Shorts videos to crawl	0
`maxResultStreams`	int	Limit the number of Stream videos to crawl	0
`startUrls`	List[str]	Direct URLs to YouTube videos, channels, playlists	[]
`downloadSubtitles`	bool	Download subtitles for videos	False
`saveSubsToKVS`	bool	Save downloaded subtitles to key-value store	False
`subtitlesLanguage`	str	Language for subtitles download	”any”
`preferAutoGeneratedSubtitles`	bool	Prefer auto-generated subtitles	False
`subtitlesFormat`	str	Format for subtitle downloads	”srt”
`sortingOrder`	str	How to sort the results	None
`dateFilter`	str	Filter results by date	None
`videoType`	str	Filter by video type	None
`lengthFilter`	str	Filter by video length	None
`isHD`	bool	Filter for HD videos	None
`hasSubtitles`	bool	Filter for videos with subtitles	None

Usage

The YouTube Scraper Tool is used by the Researcher Agent to gather video content about the specified topic:

# Initialize the tool
youtube_tool = YouTubeScraperTool(actor=actor)

# Use the tool
youtube_results = youtube_tool._run(
    searchQueries=[topic],
    maxResultsShorts=0,
    maxResultStreams=0,
    sortingOrder="relevance",
    dateFilter="last_month"
)

Return Value

The tool returns a list of YouTube videos, where each video is a dictionary containing information about the video, including:

title: The title of the video
url: The URL of the video
description: The description of the video
channelName: The name of the channel that uploaded the video
channelUrl: The URL of the channel
viewCount: The number of views the video has
publishedAt: The date the video was published
duration: The duration of the video
Additional metadata about the video

Apify Integration

The tool uses an Apify YouTube scraper actor, which provides several advantages:

Scalability: The actor can handle large numbers of YouTube searches efficiently
Reliability: The actor is designed to handle rate limiting and other issues that can arise when scraping YouTube
Structured Data: The actor returns YouTube videos in a structured format that is easy to process
Advanced Filtering: The actor supports advanced filtering options to narrow down search results

Configuration

To use the YouTube Scraper Tool, you need to set up the following environment variables:

APIFY_API_KEY=your_apify_api_key_here

Next Steps

Learn about the Google News Scraper Tool
Explore the Researcher Agent that uses this tool
See how this tool contributes to the newsletter generation process

Tools

​YouTube Scraper Tool

​Overview

​Implementation

​Parameters

​Usage

​Return Value

​Apify Integration

​Configuration

​Next Steps