Automating YouTube Shorts with Python and AI

Once again, I found myself a bit bored — and when that happens, I usually end up building something random. After chatting with an AI for a while, I decided what my next mini project would be: automating the creation of short videos.

The initial idea was simple:

Use AI to generate a short, curiosity-driven text

Generate an image related to the topic

Convert the text to speech using tools like gTTS or ElevenLabs

Combine everything into a short video

🛠️ First Attempt: Static Image + Audio

Here’s the basic code that generates a short video from an image and an audio file:

from moviepy import ImageClip, AudioFileClip, CompositeVideoClip

def create_video(image_path, audio_path):
    audio = AudioFileClip(audio_path)
    image = ImageClip(image_path).with_duration(audio.duration).resized(height=1280)
    image = image.with_position("center").with_audio(audio)

    video = CompositeVideoClip([image])
    video_path = "content/short.mp4"
    video.write_videofile(video_path, fps=24)
    return video_path

The result? It worked — but it was just a static image with background narration.
result of the video

AI-generated text:

Cleopatra lived closer in time to the invention of the iPhone than to the building of the Great Pyramid.

Image suggestion:

iPhone

Not bad for a first try — a basic but functional automated Shorts generator!

📝 Adding Text to the Video

Next, I wanted to overlay the generated text on top of the video. I ran into a small font issue, which I fixed by explicitly setting a font path:

font_path = '/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf'
if not os.path.exists(font_path):
    font_path = None  # fallback if not found

With that, the video creation function evolved:

from moviepy import ImageClip, AudioFileClip, CompositeVideoClip, TextClip

def create_video(image_path, audio_path, text):
    audio = AudioFileClip(audio_path)
    image = ImageClip(image_path).with_duration(audio.duration).resized(height=1280)
    image = image.with_position("center").with_audio(audio)

    font_path = '/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf'
    if not os.path.exists(font_path):
        font_path = None

    txt_clip = TextClip(
        text=text,
        font=font_path,
        font_size=48,
        color='white'
    ).with_position('top').with_duration(audio.duration)

    video = CompositeVideoClip([image, txt_clip])
    video_path = "content/short.mp4"
    video.write_videofile(video_path, fps=24)
    return video_path

Now, we had videos with text overlays!
example of video with text on it

Still not perfect — the text stayed static throughout the video — but progress nonetheless.

🎬 Making Text Dynamic (Like Subtitles)

I wanted the text to appear gradually, in sync with the narration. I decided to break the text into sentences and display each one sequentially. Here’s how I handled that:

import re

# Split text into sentences
sentences = re.split(r'(?<=[.!?]) +', text)
n = len(sentences)
duration_per_sentence = audio.duration / n if n > 0 else audio.duration

subtitle_clips = []
for i, sentence in enumerate(sentences):
    start = i * duration_per_sentence
    end = start + duration_per_sentence
    subtitle = TextClip(
        text=sentence,
        font=font_path,
        font_size=20,
        color='black'
    ).with_position('center').with_start(start).with_duration(duration_per_sentence)
    subtitle_clips.append(subtitle)

video = CompositeVideoClip([image] + subtitle_clips)

The result? A much more engaging video with properly timed subtitles.

example of video with correct subtitles

🧠 What I’ve Learned So Far

This mini project isn’t finished — but here’s what I’ve picked up along the way:

🎥 How to create videos in Python using moviepy

🗣️ How to convert text to speech with gTTS and ElevenLabs

🕒 How to sync subtitles with narration

🤖 How to integrate simple AI-generated content

🖼️ How to add multiple images in a slideshow format (WIP)

There’s still plenty of room to improve — syncing voice and subtitles more precisely, adding transitions, animations, or even background music — but this foundation already opens up a lot of possibilities.

If you’re curious, the project is on GitHub:
👉 shortomated GitHub repo

💭 Final Thoughts
Automation is becoming increasingly accessible — especially with the help of AI. While this project isn’t fully AI-powered, it demonstrates how combining tools like gTTS, Unsplash API, and moviepy can produce impressive results with relatively little effort.

Hope you found this article useful or at least a little inspiring. Stay curious — and keep building!

Leave a Reply