Complete guide to video captioning application
Captify is a modern web application that automatically generates and overlays captions on video files. It uses AssemblyAI for speech-to-text transcription and Remotion for video rendering with customizable caption styles.
Uses AssemblyAI API for accurate speech-to-text conversion
4 predefined caption styles (TikTok, Bottom-Centered, Top-Bar, Karaoke)
Renders mixed Hindi (Devanagari) and English text correctly
Client-side rendering with canvas overlay and audio capture
git clone <repository-url>
cd catfynpm installCreate a `.env.local` file in the root directory:
AWS_ACCESS_KEY_ID=your_aws_access_key
AWS_SECRET_ACCESS_KEY=your_aws_secret_key
AWS_REGION=your_aws_region
AWS_S3_BUCKET_NAME=your_bucket_name
ASSEMBLYAI_API_KEY=your_assemblyai_api_keynpm run dev| Variable | Description | Example |
|---|---|---|
| AWS_ACCESS_KEY_ID | AWS access key for S3 | .... |
| AWS_SECRET_ACCESS_KEY | AWS secret key for S3 | ... |
| AWS_REGION | AWS region for S3 bucket | ap-northeast-3 |
| AWS_S3_BUCKET_NAME | S3 bucket name | assembly-ai-bucket |
| ASSEMBLYAI_API_KEY | AssemblyAI API key | your_api_key_here |
Click "Upload Video" button on the home page, select an MP4 file from your device, and wait for upload to complete (file is uploaded to S3).
Navigate to the generator page, click "Auto-generate captions" button, and wait for transcription to complete (this may take a few minutes).
Choose from 4 available caption styles (TikTok Style, Standard Subtitles, News Bar, Karaoke Highlight). Preview updates in real-time.
Use the Remotion Player to preview the video with captions. Scrub through the timeline to see captions at different times.
Click "Render Video" button, wait for rendering to complete (progress shown in button), and rendered video appears below.
Click "Download Video" button. File downloads as `Captify_by_Vishal.mp4`.
/api/upload-urlGenerates a presigned URL for uploading videos to S3.
{
"fileName": "video.mp4",
"fileType": "video/mp4"
}/api/transcriptionInitiates transcription with AssemblyAI.
{
"audio_url": "https://s3.amazonaws.com/...",
"speaker_labels": true,
"format_text": true,
"punctuate": true,
"speech_model": "universal",
"language_detection": true
}/api/pollingPolls for transcription status and returns results when complete.
/api/renderRenders video with captions (currently returns placeholder).
tiktok
Position: Bottom center
Text: Bold, large font
Background: Transparent (text shadow for readability)
Font: Noto Sans (supports Hinglish)
Best for: Social media videos, short-form content
bottom-centered
Position: Bottom center
Text: White with strong shadow
Background: Transparent
Font: Noto Sans (supports Hinglish)
Best for: Professional videos, documentaries
top-bar
Position: Top of video
Text: White text
Background: Transparent bar
Font: Noto Sans (supports Hinglish)
Best for: News videos, informational content
karaoke
Position: Center
Text: Words highlight as they are spoken
Background: Transparent with glow effect
Font: Noto Sans (supports Hinglish)
Best for: Music videos, karaoke content
The project uses webpack (not Turbopack) for Remotion compatibility.
webpack: (config, { isServer }) => {
if (!isServer) {
config.resolve.alias = {
"@remotion/bundler": false,
"@remotion/renderer": false,
"esbuild": false,
};
}
}Implement full Remotion server-side rendering, generate true MP4 files, background job processing
Custom font selection, color customization, position adjustment, animation effects
Upload multiple videos, queue system for rendering, progress tracking for multiple videos
Save projects, history of rendered videos, cloud storage integration
Subtitle file export (SRT, VTT), translation support, multiple language captions
Video compression options, rendering quality presets, caching for faster previews
Drag-and-drop upload, timeline editor for captions, real-time caption editing