Captify Documentation

Complete guide to video captioning application

Project Overview

Captify is a modern web application that automatically generates and overlays captions on video files. It uses AssemblyAI for speech-to-text transcription and Remotion for video rendering with customizable caption styles.

Automatic Transcription

Uses AssemblyAI API for accurate speech-to-text conversion

Multiple Caption Styles

4 predefined caption styles (TikTok, Bottom-Centered, Top-Bar, Karaoke)

Hinglish Support

Renders mixed Hindi (Devanagari) and English text correctly

Video Rendering

Client-side rendering with canvas overlay and audio capture

Features

Tech Stack

Frontend

  • Next.js 16.0.3 - React framework with App Router
  • React 19.2.0 - UI library
  • TypeScript 5 - Type safety
  • Tailwind CSS 4 - Styling
  • Zustand 5.0.8 - State management
  • Remotion 4.0.375 - Video rendering library

Backend

  • Next.js API Routes - Server-side endpoints
  • AWS SDK v3 - S3 integration
  • AssemblyAI 4.19.0 - Speech-to-text API

Installation & Setup

Prerequisites

  • Node.js 18+ and npm/yarn/pnpm
  • AWS S3 bucket with appropriate permissions
  • AssemblyAI API key
  • FFmpeg (for server-side rendering, optional)

Installation Steps

1. Clone the repository

git clone <repository-url>
cd catfy

2. Install dependencies

npm install

3. Set up environment variables

Create a `.env.local` file in the root directory:

AWS_ACCESS_KEY_ID=your_aws_access_key
AWS_SECRET_ACCESS_KEY=your_aws_secret_key
AWS_REGION=your_aws_region
AWS_S3_BUCKET_NAME=your_bucket_name
ASSEMBLYAI_API_KEY=your_assemblyai_api_key

4. Run the development server

npm run dev

Environment Variables

VariableDescriptionExample
AWS_ACCESS_KEY_IDAWS access key for S3....
AWS_SECRET_ACCESS_KEYAWS secret key for S3...
AWS_REGIONAWS region for S3 bucketap-northeast-3
AWS_S3_BUCKET_NAMES3 bucket nameassembly-ai-bucket
ASSEMBLYAI_API_KEYAssemblyAI API keyyour_api_key_here

Usage Guide

1

Upload Video

Click "Upload Video" button on the home page, select an MP4 file from your device, and wait for upload to complete (file is uploaded to S3).

2

Generate Transcription

Navigate to the generator page, click "Auto-generate captions" button, and wait for transcription to complete (this may take a few minutes).

3

Select Caption Style

Choose from 4 available caption styles (TikTok Style, Standard Subtitles, News Bar, Karaoke Highlight). Preview updates in real-time.

4

Preview Video

Use the Remotion Player to preview the video with captions. Scrub through the timeline to see captions at different times.

5

Render Video

Click "Render Video" button, wait for rendering to complete (progress shown in button), and rendered video appears below.

6

Download Video

Click "Download Video" button. File downloads as `Captify_by_Vishal.mp4`.

API Endpoints

POST /api/upload-url

Generates a presigned URL for uploading videos to S3.

Request Body:

{
  "fileName": "video.mp4",
  "fileType": "video/mp4"
}

POST /api/transcription

Initiates transcription with AssemblyAI.

Request Body:

{
  "audio_url": "https://s3.amazonaws.com/...",
  "speaker_labels": true,
  "format_text": true,
  "punctuate": true,
  "speech_model": "universal",
  "language_detection": true
}

POST /api/polling

Polls for transcription status and returns results when complete.

POST /api/render

Renders video with captions (currently returns placeholder).

Caption Styles

TikTok Style

tiktok

Position: Bottom center

Text: Bold, large font

Background: Transparent (text shadow for readability)

Font: Noto Sans (supports Hinglish)

Best for: Social media videos, short-form content

Standard Subtitles

bottom-centered

Position: Bottom center

Text: White with strong shadow

Background: Transparent

Font: Noto Sans (supports Hinglish)

Best for: Professional videos, documentaries

News Bar

top-bar

Position: Top of video

Text: White text

Background: Transparent bar

Font: Noto Sans (supports Hinglish)

Best for: News videos, informational content

Karaoke Highlight

karaoke

Position: Center

Text: Words highlight as they are spoken

Background: Transparent with glow effect

Font: Noto Sans (supports Hinglish)

Best for: Music videos, karaoke content

Configuration

Next.js Configuration

The project uses webpack (not Turbopack) for Remotion compatibility.

webpack: (config, { isServer }) => {
  if (!isServer) {
    config.resolve.alias = {
      "@remotion/bundler": false,
      "@remotion/renderer": false,
      "esbuild": false,
    };
  }
}

Remotion Configuration

  • Resolution: 1080x1920 (vertical video)
  • Frame Rate: 30 fps
  • Duration: Dynamic (based on video length)

Troubleshooting

Future Improvements

Server-Side Rendering

Implement full Remotion server-side rendering, generate true MP4 files, background job processing

Additional Caption Styles

Custom font selection, color customization, position adjustment, animation effects

Batch Processing

Upload multiple videos, queue system for rendering, progress tracking for multiple videos

User Accounts

Save projects, history of rendered videos, cloud storage integration

Advanced Features

Subtitle file export (SRT, VTT), translation support, multiple language captions

Performance Optimizations

Video compression options, rendering quality presets, caching for faster previews

UI/UX Improvements

Drag-and-drop upload, timeline editor for captions, real-time caption editing

Captify Documentation - Complete guide to video captioning application