Automated Short-Form Video Production: The Complete Technical Pipeline from HTML Templates to FFmpeg
The Raw Efficiency of Short-Form Video
A 15-second video conveys far more information than a 200-word text block. On Instagram Reels, YouTube Shorts, and TikTok, short-form videos have the highest reach and engagement rates of any content format.
The problem: producing a short-form video is incredibly time-consuming.
Manually editing a 15-second video in Premiere Pro or CapCut -- from concept to completion -- takes at least 30-60 minutes. What if you need to post 3-5 per day?
Our solution: automate the production with code.
System Architecture Overview
Ultra Lab's automated short-form video production system has three stages:
HTML Animation Templates -> Playwright Capture -> FFmpeg Compositing
Stage 1: HTML Animation Templates
Use web technologies (HTML + CSS + JavaScript) to create every frame of the video animation.
Why HTML instead of After Effects?
- Programmable: Text, numbers, and colors can all be controlled with variables
- Templatized: One template can be used with hundreds of different content variations
- No Adobe license needed: Open-source tech, zero cost
- Version controlled: Templates are code -- manageable with Git
A typical template structure:
<div class="video-container" style="width:1080px; height:1920px;">
<div class="background-animation">...</div>
<div class="text-layer">
<h1 class="hook-text">Did you know?</h1>
<p class="main-content">90% of people don't know this...</p>
</div>
<div class="cta-layer">
<span>Follow @ultralab.tw</span>
</div>
</div>
CSS animations handle all entrance, emphasis, and transition effects:
.hook-text {
animation: slideUp 0.6s ease-out 0.5s both;
}
.main-content {
animation: fadeIn 0.8s ease-out 1.5s both;
}
Stage 2: Playwright Capture
Playwright is a headless browser automation tool. We use it to:
- Open the HTML template page
- Wait for animations to complete
- Capture screenshots frame by frame (30 FPS = 30 images per second)
- Output as an image sequence
Why Playwright over Puppeteer?
- Supports more browser engines
- More accurate CSS animation rendering
- Built-in waiting mechanisms, less prone to dropped frames
Each frame is a 1080x1920 PNG image. A 15-second video produces approximately 450 images.
Stage 3: FFmpeg Compositing
FFmpeg is the Swiss Army knife of audio/video processing. We use it to composite the image sequence into the final video:
ffmpeg -framerate 30 -i frame_%04d.png \
-i background_music.mp3 \
-c:v libx264 -pix_fmt yuv420p \
-shortest output.mp4
During this stage, we also add:
- Background music: Automatically selected from a preset music library
- Sound effects: Notification sounds when text appears
- Subtitle tracks: Auto-generated SRT subtitles
Three Psychological Trigger Templates
We've designed three categories of proven short-form video templates, each targeting a different psychological trigger:
Fear Type
Open with alarming data or facts to trigger "Do I have this problem too?" anxiety.
Examples:
- "90% of people won't have enough retirement savings"
- "Your password may have already been leaked"
Efficiency Type
Show a quick way to solve a problem, making viewers think "It's that simple?"
Examples:
- "3 steps to automate your IG posting"
- "This tool saves me 2 hours every day"
Greed Type
Showcase potential gains or opportunities to trigger "I want that too" desire.
Examples:
- "This side hustle earns $1,500/month"
- "A single SaaS tool generating $30,000/year in revenue"
Each category has 3-5 visual variations, totaling 10-15 templates that rotate to prevent viewer fatigue.
Batch Production Workflow
The complete workflow in practice:
- Once per week: Set the week's topics and content direction
- AI auto-generates: Gemini creates copy based on template type
- Auto-template insertion: Code injects the copy into HTML templates
- Batch capture: Playwright captures each template sequentially
- Batch compositing: FFmpeg batch-processes all videos
- Scheduled publishing: Videos automatically enter the publishing queue
Batch-processing 20 videos takes approximately 15-20 minutes (depending on machine performance).
Cost Structure
| Item | Cost |
|---|---|
| HTML template development | One-time (included in service) |
| Playwright + FFmpeg | Open-source, free |
| AI copy generation | NT$300-500/month |
| Server / local compute | Existing hardware is sufficient |
| Background music licensing | Free asset libraries |
| Monthly total | NT$300-500 |
Compared to outsourcing a single short-form video (NT$1,000-3,000/video), the automated production system costs 1/100th of manual production.
Who Is This For?
- Brand owners: Need consistent short-form video output but don't have an editing team
- Content creators: One person managing short-form video across multiple platforms
- Marketing agencies: Batch-producing short-form videos for clients
- E-commerce sellers: Product showcases, promotional countdowns, unboxing videos
Conclusion
Automated short-form video production doesn't require After Effects skills or expensive software licenses. With the open-source combination of HTML + Playwright + FFmpeg, you can build a high-efficiency short-form video production pipeline.
Want to learn more about the technical details, or ready to start using our system? Free consultation -- we reply within 24 hours.