How to convert PDFs into Audiobooks using OpenAI’s Text-to-Speech API

By akohad Nov29,2023

[ad_1]

Given the character limit of 4096 for the OpenAI text-to-speech API, let’s create a function called split_text designed to divide the cleaned text into smaller chunks. Each chunk adheres to the maximum character limit, ensuring compatibility with the API. The process is as follows:

  1. The function splits the text into sentences.
  2. It then iteratively adds sentences to a chunk until adding another sentence would exceed the character limit.
  3. Once the limit is near, the current chunk is saved, and a new chunk starts with the next sentence.
  4. This process continues until all sentences are allocated to chunks.
def split_text(text, max_chunk_size=4096):
chunks = [] # List to hold the chunks of text
current_chunk = "" # String to build the current chunk

# Split the text into sentences and iterate through them
for sentence in text.split('.'):
sentence = sentence.strip() # Remove leading/trailing whitespaces
if not sentence:
continue # Skip empty sentences

# Check if adding the sentence would exceed the max chunk size
if len(current_chunk) + len(sentence) + 1 <= max_chunk_size:
current_chunk += sentence + "." # Add sentence to current chunk
else:
chunks.append(current_chunk) # Add the current chunk to the list
current_chunk = sentence + "." # Start a new chunk

# Add the last chunk if it's not empty
if current_chunk:
chunks.append(current_chunk)

return chunks

# Function Usage
chunks = split_text(plain_text)

# Printing each chunk with its number
for i, chunk in enumerate(chunks, 1):
print(f"Chunk {i}:\n{chunk}\n---\n")

Next, let’s create atext_to_speech function, which utilizes OpenAI’s text-to-speech API to convert text into audio. The function performs the following steps:

  1. Initializes an OpenAI client to interact with the API.
  2. Sends a request to the Audio API with the specified text, model, and voice parameters. The model parameter defines the quality of the text-to-speech conversion, while the voice parameter selects the voice type.
  3. Receives the audio response from the API and streams it to a specified output file.

⚠️ Please note I have specified OpenAI API Key in my environment variables, else you will need to provide OpenAI API Key to the client.

# Importing necessary modules
from pathlib import Path
import openai

def text_to_speech(input_text, output_file, model="tts-1-hd", voice="nova"):
# Initialize the OpenAI client
client = openai.OpenAI()

# Make a request to OpenAI's Audio API with the given text, model, and voice
response = client.audio.speech.create(
model=model, # Model for text-to-speech quality
voice=voice, # Voice type
input=input_text # The text to be converted into speech
)

# Define the path for the output audio file
speech_file_path = Path(output_file)

# Stream the audio response to the specified file
response.stream_to_file(speech_file_path)

# Print confirmation message after saving the audio file
print(f"Audio saved to {speech_file_path}")

Converting Text Chunks to Audio Files

Let’s define the convert_chunks_to_audio function, which processes each text chunk through the text_to_speech function and saves the resulting audio files. The steps are as follows:

  1. Iterate over the chunks of text.
  2. For each chunk, create a filename for the output audio file, ensuring it is saved in the specified output folder.
  3. Convert each text chunk to an audio file using the text_to_speech function defined earlier.
  4. Store the path of each generated audio file in a list.
# Importing necessary modules
import os
from pydub import AudioSegment

def convert_chunks_to_audio(chunks, output_folder):
audio_files = [] # List to store the paths of generated audio files

# Iterate over each chunk of text
for i, chunk in enumerate(chunks):
# Define the path for the output audio file
output_file = os.path.join(output_folder, f"chunk_{i+1}.mp3")

# Convert the text chunk to speech and save as an audio file
text_to_speech(chunk, output_file)

# Append the path of the created audio file to the list
audio_files.append(output_file)

return audio_files # Return the list of audio file paths

# Function Usage
output_folder = "chunks" # Define the folder to save audio chunks
audio_files = convert_chunks_to_audio(chunks, output_folder) # Convert chunks to audio files
print(audio_files) # print list of all the audio files generated

Note: Make sure the folder exists before running the code. In case of our example it is called chunks.

The combine_audio_with_moviepy function combines multiple audio clips into a single audio file using the moviepy library. The function follows these steps:

  1. Iterate through the files in the specified folder, filtering for .mp3 files.
  2. For each audio file, create an AudioFileClip object and add it to a list.
  3. Once all audio clips are collected, use concatenate_audioclips to merge them into a single continuous audio clip.
  4. Write the combined clip to an output file.
# Importing necessary modules from moviepy
from moviepy.editor import concatenate_audioclips, AudioFileClip
import os

def combine_audio_with_moviepy(folder_path, output_file):
audio_clips = [] # List to store the audio clips

# Iterate through each file in the given folder
for file_name in sorted(os.listdir(folder_path)):
if file_name.endswith('.mp3'):
# Construct the full path of the audio file
file_path = os.path.join(folder_path, file_name)
print(f"Processing file: {file_path}")

try:
# Create an AudioFileClip object for each audio file
clip = AudioFileClip(file_path)
audio_clips.append(clip) # Add the clip to the list
except Exception as e:
# Print any errors encountered while processing the file
print(f"Error processing file {file_path}: {e}")

# Check if there are any audio clips to combine
if audio_clips:
# Concatenate all the audio clips into a single clip
final_clip = concatenate_audioclips(audio_clips)
# Write the combined clip to the specified output file
final_clip.write_audiofile(output_file)
print(f"Combined audio saved to {output_file}")
else:
print("No audio clips to combine.")

# Function Usage
combine_audio_with_moviepy('chunks', 'combined_audio.mp3') # Combine audio files in 'chunks' folder

I created an image in Canva which I will render as a video while audio plays in the background.

The create_mp4_with_image_and_audio function combines an image and an audio file to create an MP4 video. This can be particularly useful for presentations or other scenarios where an audio track needs to be accompanied by a static image, like a YouTube video. The function performs the following steps:

  1. Load the audio file as an AudioFileClip.
  2. Create a video clip from the specified image using ImageClip, setting its duration to match the length of the audio.
  3. Set the frames per second (fps) for the video clip.
  4. Assign the audio clip as the audio track of the video clip.
  5. Write the final video clip to an output file, specifying the video and audio codecs.
from moviepy.editor import AudioFileClip, ImageClip

def create_mp4_with_image_and_audio(image_file, audio_file, output_file, fps=24):
# Load the audio file
audio_clip = AudioFileClip(audio_file)

# Create a video clip from an image
video_clip = ImageClip(image_file, duration=audio_clip.duration)

# Set the fps for the video clip
video_clip = video_clip.set_fps(fps)

# Set the audio of the video clip as the audio clip
video_clip = video_clip.set_audio(audio_clip)

# Write the result to a file
video_clip.write_videofile(output_file, codec='libx264', audio_codec='aac')

# Example usage
image_file = 'cover_image.png' # Replace with the path to your image
audio_file = 'combined_audio.mp3' # The combined audio file
output_file = 'output_video.mp4' # Output MP4 file
create_mp4_with_image_and_audio(image_file, audio_file, output_file)

And that’s it. Once this code finish running we have an audiobook. Here is the output for the example from the above code.

[ad_2]

Source link

By akohad

Related Post

Leave a Reply

Your email address will not be published. Required fields are marked *