Building a YouTube Video Analysis Tool with OpenAI GPT and OpenAI Whisper

In this post, we'll build a YouTube video analysis tool that can summarize and extract insights from video content. This tool leverages the power of OpenAI's GPT-3.5 Turbo and Whisper models to handle natural language processing and audio transcription, respectively.

Our tool will follow a three-step process:

  1. Download the audio from a YouTube video
  2. Transcribe the audio using OpenAI Whisper
  3. Analyze and summarize the transcription using OpenAI GPT-3.5 Turbo

Step 1: Download Audio from YouTube

To download audio from YouTube, we will use the pytube library. Install it using pip:

pip install pytube

Here's a basic example of how to download the audio from a YouTube video:

from pytube import YouTube

video_url = "https://www.youtube.com/watch?v=example"
yt = YouTube(video_url)
audio = yt.streams.filter(only_audio=True).first()
audio.download(output_path="path/to/save/audio")

Step 2: Transcribe Audio using OpenAI Whisper

To transcribe the audio, we will use OpenAI's Whisper ASR (Automatic Speech Recognition) model. First, install the openai Python library:

pip install openai

Ensure you have an OpenAI API key and organization. Set them in your script:

import openai

openai.api_key = "your_openai_api_key"
openai.organization = "your_openai_organization"

Here's how to transcribe an audio file using Whisper:

with open("path/to/audio/file.mp3", "rb") as audio_file:
    transcript = openai.Transcription.create(
        audio=audio_file,
        model="whisper",
        format="mp3",
        sample_rate=None,
        speaker_labels=False,
    )

Step 3: Analyze and Summarize Transcription using OpenAI GPT

To analyze the transcription, we'll use OpenAI's GPT-3.5 Turbo model. Here's a basic example of how to send a prompt to GPT:

prompt = f"Pretend you are an expert at understanding video transcriptions and extracting meaning from them. I will provide you with a video's transcription and you will provide an outline and/or summary of the transcription including important points. Try your best to extract your own meaning from the transcript.\n{transcription}"

completion = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": prompt}]
)

response = completion.choices[0].message.content

By combining the code snippets, you can create a powerful YouTube video analysis tool that leverages the capabilities of OpenAI's GPT-3.5 Turbo and Whisper models. This tool will enable you to download audio from a YouTube video, transcribe the audio into text, and analyze the transcription to generate insights and summaries. This can be a valuable asset for various use cases, such as content creation, research, or simply improving your understanding of video content.