Captify

Captify Documentation

Complete guide to video captioning application

Project Overview

Captify is a modern web application that automatically generates and overlays captions on video files. It uses AssemblyAI for speech-to-text transcription and Remotion for video rendering with customizable caption styles.

Automatic Transcription

Uses AssemblyAI API for accurate speech-to-text conversion

Multiple Caption Styles

4 predefined caption styles (TikTok, Bottom-Centered, Top-Bar, Karaoke)

Hinglish Support

Renders mixed Hindi (Devanagari) and English text correctly

Video Rendering

Client-side rendering with canvas overlay and audio capture

Features

Tech Stack

Frontend

Next.js 16.0.3 - React framework with App Router
React 19.2.0 - UI library
TypeScript 5 - Type safety
Tailwind CSS 4 - Styling
Zustand 5.0.8 - State management
Remotion 4.0.375 - Video rendering library

Backend

Next.js API Routes - Server-side endpoints
AWS SDK v3 - S3 integration
AssemblyAI 4.19.0 - Speech-to-text API

Installation & Setup

Prerequisites

Node.js 18+ and npm/yarn/pnpm
AWS S3 bucket with appropriate permissions
AssemblyAI API key
FFmpeg (for server-side rendering, optional)

Installation Steps

1. Clone the repository

git clone <repository-url>
cd catfy

2. Install dependencies

npm install

3. Set up environment variables

Create a `.env.local` file in the root directory:

AWS_ACCESS_KEY_ID=your_aws_access_key
AWS_SECRET_ACCESS_KEY=your_aws_secret_key
AWS_REGION=your_aws_region
AWS_S3_BUCKET_NAME=your_bucket_name
ASSEMBLYAI_API_KEY=your_assemblyai_api_key

4. Run the development server

npm run dev

Environment Variables

Variable	Description	Example
AWS_ACCESS_KEY_ID	AWS access key for S3	....
AWS_SECRET_ACCESS_KEY	AWS secret key for S3	...
AWS_REGION	AWS region for S3 bucket	ap-northeast-3
AWS_S3_BUCKET_NAME	S3 bucket name	assembly-ai-bucket
ASSEMBLYAI_API_KEY	AssemblyAI API key	your_api_key_here

Usage Guide

Upload Video

Click "Upload Video" button on the home page, select an MP4 file from your device, and wait for upload to complete (file is uploaded to S3).

Generate Transcription

Navigate to the generator page, click "Auto-generate captions" button, and wait for transcription to complete (this may take a few minutes).

Select Caption Style

Choose from 4 available caption styles (TikTok Style, Standard Subtitles, News Bar, Karaoke Highlight). Preview updates in real-time.

Preview Video

Use the Remotion Player to preview the video with captions. Scrub through the timeline to see captions at different times.

Render Video

Click "Render Video" button, wait for rendering to complete (progress shown in button), and rendered video appears below.

Download Video

Click "Download Video" button. File downloads as `Captify_by_Vishal.mp4`.

API Endpoints

POST `/api/upload-url`

Generates a presigned URL for uploading videos to S3.

Request Body:

{
  "fileName": "video.mp4",
  "fileType": "video/mp4"
}

POST `/api/transcription`

Initiates transcription with AssemblyAI.

Request Body:

{
  "audio_url": "https://s3.amazonaws.com/...",
  "speaker_labels": true,
  "format_text": true,
  "punctuate": true,
  "speech_model": "universal",
  "language_detection": true
}

POST `/api/polling`

Polls for transcription status and returns results when complete.

POST `/api/render`

Renders video with captions (currently returns placeholder).

Caption Styles

TikTok Style

tiktok

Position: Bottom center

Text: Bold, large font

Background: Transparent (text shadow for readability)

Font: Noto Sans (supports Hinglish)

Best for: Social media videos, short-form content

Standard Subtitles

bottom-centered

Position: Bottom center

Text: White with strong shadow

Background: Transparent

Font: Noto Sans (supports Hinglish)

Best for: Professional videos, documentaries

News Bar

top-bar

Position: Top of video

Text: White text

Background: Transparent bar

Font: Noto Sans (supports Hinglish)

Best for: News videos, informational content

Karaoke Highlight

karaoke

Position: Center

Text: Words highlight as they are spoken

Background: Transparent with glow effect

Font: Noto Sans (supports Hinglish)

Best for: Music videos, karaoke content

Configuration

Next.js Configuration

The project uses webpack (not Turbopack) for Remotion compatibility.

webpack: (config, { isServer }) => {
  if (!isServer) {
    config.resolve.alias = {
      "@remotion/bundler": false,
      "@remotion/renderer": false,
      "esbuild": false,
    };
  }
}

Remotion Configuration

Resolution: 1080x1920 (vertical video)
Frame Rate: 30 fps
Duration: Dynamic (based on video length)

Troubleshooting

Future Improvements

Server-Side Rendering

Implement full Remotion server-side rendering, generate true MP4 files, background job processing

Additional Caption Styles

Custom font selection, color customization, position adjustment, animation effects

Batch Processing

Upload multiple videos, queue system for rendering, progress tracking for multiple videos

User Accounts

Save projects, history of rendered videos, cloud storage integration

Advanced Features

Subtitle file export (SRT, VTT), translation support, multiple language captions

Performance Optimizations

Video compression options, rendering quality presets, caching for faster previews

UI/UX Improvements

Drag-and-drop upload, timeline editor for captions, real-time caption editing

Captify Documentation - Complete guide to video captioning application

Table of Contents

Project Overview

Automatic Transcription

Multiple Caption Styles

Hinglish Support

Video Rendering

Features

Video Upload

Auto-Captioning

Caption Style Presets

Hinglish Support

Tech Stack

Frontend

Backend

Installation & Setup

Prerequisites

Installation Steps

1. Clone the repository

2. Install dependencies

3. Set up environment variables

4. Run the development server

Environment Variables

Usage Guide

Upload Video

Generate Transcription

Select Caption Style

Preview Video

Render Video

Download Video

API Endpoints

POST /api/upload-url

Request Body:

POST /api/transcription

Request Body:

POST /api/polling

POST /api/render

Caption Styles

TikTok Style

Standard Subtitles

News Bar

Karaoke Highlight

Configuration

Next.js Configuration

Remotion Configuration

Troubleshooting

Transcription Not Starting

Captions Not Appearing

Video Upload Fails

Rendering Takes Too Long

Download Not Working

Hinglish Text Not Rendering

Future Improvements

Server-Side Rendering

Additional Caption Styles

Batch Processing

User Accounts

Advanced Features

Performance Optimizations

UI/UX Improvements

POST `/api/upload-url`

POST `/api/transcription`

POST `/api/polling`

POST `/api/render`