Once again, I found myself a bit bored — and when that happens, I usually end up building something random. After chatting with an AI for a while, I decided what my next mini project would be: automating the creation of short videos.
The initial idea was simple:
Use AI to generate a short, curiosity-driven text
Generate an image related to the topic
Convert the text to speech using tools like gTTS or ElevenLabs
Combine everything into a short video
🛠️ First Attempt: Static Image + Audio
Here’s the basic code that generates a short video from an image and an audio file:
from moviepy import ImageClip, AudioFileClip, CompositeVideoClip
def create_video(image_path, audio_path):
audio = AudioFileClip(audio_path)
image = ImageClip(image_path).with_duration(audio.duration).resized(height=1280)
image = image.with_position("center").with_audio(audio)
video = CompositeVideoClip([image])
video_path = "content/short.mp4"
video.write_videofile(video_path, fps=24)
return video_path
The result? It worked — but it was just a static image with background narration.
AI-generated text:
Cleopatra lived closer in time to the invention of the iPhone than to the building of the Great Pyramid.
Image suggestion:
iPhone
Not bad for a first try — a basic but functional automated Shorts generator!
📝 Adding Text to the Video
Next, I wanted to overlay the generated text on top of the video. I ran into a small font issue, which I fixed by explicitly setting a font path:
font_path = '/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf'
if not os.path.exists(font_path):
font_path = None # fallback if not found
With that, the video creation function evolved:
from moviepy import ImageClip, AudioFileClip, CompositeVideoClip, TextClip
def create_video(image_path, audio_path, text):
audio = AudioFileClip(audio_path)
image = ImageClip(image_path).with_duration(audio.duration).resized(height=1280)
image = image.with_position("center").with_audio(audio)
font_path = '/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf'
if not os.path.exists(font_path):
font_path = None
txt_clip = TextClip(
text=text,
font=font_path,
font_size=48,
color='white'
).with_position('top').with_duration(audio.duration)
video = CompositeVideoClip([image, txt_clip])
video_path = "content/short.mp4"
video.write_videofile(video_path, fps=24)
return video_path
Now, we had videos with text overlays!
Still not perfect — the text stayed static throughout the video — but progress nonetheless.
🎬 Making Text Dynamic (Like Subtitles)
I wanted the text to appear gradually, in sync with the narration. I decided to break the text into sentences and display each one sequentially. Here’s how I handled that:
import re
# Split text into sentences
sentences = re.split(r'(?<=[.!?]) +', text)
n = len(sentences)
duration_per_sentence = audio.duration / n if n > 0 else audio.duration
subtitle_clips = []
for i, sentence in enumerate(sentences):
start = i * duration_per_sentence
end = start + duration_per_sentence
subtitle = TextClip(
text=sentence,
font=font_path,
font_size=20,
color='black'
).with_position('center').with_start(start).with_duration(duration_per_sentence)
subtitle_clips.append(subtitle)
video = CompositeVideoClip([image] + subtitle_clips)
The result? A much more engaging video with properly timed subtitles.
🧠 What I’ve Learned So Far
This mini project isn’t finished — but here’s what I’ve picked up along the way:
🎥 How to create videos in Python using moviepy
🗣️ How to convert text to speech with gTTS and ElevenLabs
🕒 How to sync subtitles with narration
🤖 How to integrate simple AI-generated content
🖼️ How to add multiple images in a slideshow format (WIP)
There’s still plenty of room to improve — syncing voice and subtitles more precisely, adding transitions, animations, or even background music — but this foundation already opens up a lot of possibilities.
If you’re curious, the project is on GitHub:
👉 shortomated GitHub repo
💭 Final Thoughts
Automation is becoming increasingly accessible — especially with the help of AI. While this project isn’t fully AI-powered, it demonstrates how combining tools like gTTS, Unsplash API, and moviepy can produce impressive results with relatively little effort.
Hope you found this article useful or at least a little inspiring. Stay curious — and keep building!