Remotion Lambda Migration: Cost, Quota Math, skip-worktree

Migrating Remotion from local rendering to AWS Lambda — the quota trap (10, not 1000), real cost at $15/video, and the git skip-worktree fix.

Share

Remotion Lambda Migration: Quota Math, Cost, and the skip-worktree Surprise | AI PM Portfolio

Migrating Remotion From Local Rendering to Lambda: Quota Math, Cost, and the skip-worktree Surprise

April 11, 2026 · 12 min read · AI for Solo Founders

Last Updated: 2026-04-11

Migrating Remotion video rendering from a local machine to AWS Lambda is straightforward in code but hides three traps. First, new AWS accounts have a default concurrent Lambda execution limit of 10 — not 1,000 — which means you can only render 2-3 videos simultaneously since Remotion uses 3-5 parallel chunks per render. Second, the AWS Service Quotas console has a UI bug that shows approved increases as "pending" for days. Third, Remotion's build step generates large binary bundles in public/ that break git commits on iCloud-synced repos, requiring git update-index --skip-worktree to unblock deployment.

Why move from local rendering to Lambda at all?

I built a SaaS application that generates personalized video walkthroughs for each user after their filing is complete. Each video is roughly 90 seconds — a narrated walkthrough of their results, key numbers, and next steps. For the first few months, I rendered these locally on my development machine using Remotion's CLI.

Local rendering works when you have 5 users. It does not work when you have 50. Each 90-second video takes approximately 3-4 minutes to render on a modern MacBook Pro (M-series chip, 16GB RAM). At 50 videos per week, that is 2.5-3.3 hours of continuous rendering — during which my machine is effectively unusable. I could not attend calls, run the dev server, or respond to support requests while renders were running.

The economics were clear: either buy a dedicated render server ($150-300/month for a GPU instance on AWS or Hetzner) or use Remotion's built-in Lambda integration. Lambda won because it scales to zero — I pay nothing when nobody needs a video — and Remotion's documentation for Lambda setup is genuinely good. I wrote about the initial video pipeline architecture in an earlier post.

What does the 6-step video pipeline look like?

Before diving into the Lambda migration, here is the full pipeline. Each step feeds the next, and understanding the flow matters because it explains why Lambda concurrency is so critical.

  1. Script generation — An AI model analyzes the user's data and generates a narration script broken into segments (intro, key findings, action items, closing). This runs on the application server.
  2. Asset creation — Text-to-speech converts each segment to MP3. Simultaneously, the application renders data visualization PNGs for each segment (charts, summary cards, comparisons). All assets upload to cloud storage.
  3. Remotion composition — The application assembles a JSON payload describing scenes, timing, audio URLs, and visual asset URLs. This becomes the inputProps for the Remotion composition.
  4. Lambda renderrenderMediaOnLambda() sends the composition to AWS. Lambda splits the video into chunks, renders them in parallel across multiple invocations, and stitches the final MP4.
  5. S3 upload — The rendered MP4 lands in an S3 bucket. A webhook fires when rendering is complete, triggering a transfer from S3 to the application's primary storage.
  6. Email delivery — The application sends the user an email with a link to view their personalized video walkthrough in-app.

Steps 1-3 take about 15-25 seconds. Step 4 takes 45-90 seconds depending on video length. Steps 5-6 take under 5 seconds. Total pipeline: roughly 60-120 seconds from trigger to delivery. On my local machine, step 4 alone took 3-4 minutes. I covered the cost optimization story for multi-provider AI pipelines in a separate post.

How does Remotion Lambda actually work under the hood?

Remotion's Lambda integration is not just "run Remotion on Lambda." It is a distributed rendering system. When you call renderMediaOnLambda(), Remotion does the following:

  1. Uploads your composition bundle to S3 (a "site" — a static deployment of your Remotion project).
  2. Invokes a "main" Lambda function that orchestrates the render.
  3. The orchestrator splits your video timeline into chunks (typically 10-20 second segments).
  4. Each chunk is rendered by a separate Lambda invocation — this is where the parallelism comes from.
  5. After all chunks render, the orchestrator stitches them into a single MP4 using FFmpeg (which runs inside the Lambda function).
  6. The final MP4 is written to S3, and a webhook fires to notify your application.

For a 90-second video at 30fps, Remotion typically spawns 3-5 parallel Lambda invocations (the concurrency parameter in your render call controls the exact number). Each invocation needs 2-3 GB of memory and runs for 15-30 seconds. The key insight: each video render consumes multiple concurrent Lambda executions, not just one.

What does the render call look like in code?

// lib/video/lambda-client.ts
import { renderMediaOnLambda } from "@remotion/lambda/client";

export async function renderVideo(inputProps: VideoInputProps) {
  const { renderId, bucketName } = await renderMediaOnLambda({
    region: "us-east-1",
    functionName: process.env.REMOTION_FUNCTION_NAME!,
    serveUrl: process.env.REMOTION_SERVE_URL!, // S3 site URL
    composition: "TaxWalkthrough",
    inputProps,
    codec: "h264",
    concurrency: 3,           // parallel chunk renders per video
    timeoutInMilliseconds: 900000, // 15 min max
    webhook: {
      url: `${process.env.NEXT_PUBLIC_APP_URL}/api/webhooks/remotion`,
      secret: process.env.REMOTION_WEBHOOK_SECRET!,
    },
    customData: {
      returnId: inputProps.returnId,
      compositionId: "TaxWalkthrough",
    },
  });
  return { renderId, bucketName };
}

Output on success:

{
  "renderId": "r-abc123def456",
  "bucketName": "remotionlambda-useast1-jguuxrk7ia"
}

The render is asynchronous. Your application returns immediately and waits for the webhook callback. The Lambda configuration I landed on: 3 GB memory, 900-second timeout, concurrency of 3 chunks per video. That means each video render uses 3 concurrent Lambda executions.

Why does the AWS Lambda concurrent execution limit matter so much?

This is where I lost two days.

AWS Lambda has a concept called "concurrent executions" — the maximum number of Lambda function instances running simultaneously across your entire AWS account. The AWS documentation states the default limit is 1,000 concurrent executions. Every tutorial, every blog post, every StackOverflow answer assumes you have 1,000.

You do not have 1,000. New and low-usage AWS accounts are silently throttled to 10 concurrent executions.

The quota math that nobody warns you about: Each Remotion render uses 3-5 concurrent Lambda invocations (for parallel chunk rendering). With a default limit of 10, you can render at most 2-3 videos simultaneously. If a fourth render starts while three are running, Lambda throttles the excess invocations, and your render either times out or produces a corrupted video with missing chunks.

I discovered this after my first three test renders worked perfectly (because they ran sequentially) and the fourth one failed with a cryptic timeout. The Lambda logs showed TooManyRequestsException buried inside Remotion's chunk orchestration logic. It took me a full day to connect "render timeout" to "account concurrency limit" because the error message does not mention quotas.

How do you check your actual Lambda concurrency limit?

# Check your ACTUAL limit (not the documented default)
aws lambda get-account-settings

# Output — note "ConcurrentExecutions" is the real number
{
    "AccountLimit": {
        "TotalCodeSize": 80530636800,
        "CodeSizeUnzipped": 262144000,
        "CodeSizeZipped": 52428800,
        "ConcurrentExecutions": 10,        # <-- NOT 1000
        "UnreservedConcurrentExecutions": 10
    },
    "AccountUsage": {
        "TotalCodeSize": 148473018,
        "FunctionCount": 3
    }
}

If ConcurrentExecutions shows 10, you are throttled. You need to request an increase through AWS Service Quotas.

What is the AWS Service Quotas UI bug?

Requesting a quota increase should be straightforward: go to Service Quotas in the AWS console, find Lambda concurrent executions, enter the desired value, submit. In practice, there are two problems.

Problem 1: The validation bug. The quota increase form validates that your requested value is greater than the current default limit (1,000), not your current applied limit (10). So if you request 200 — a perfectly reasonable number — the form rejects it because 200 < 1,000. You have to request at least 1,001, even if you only need 200. I requested 1,000 (which was rejected by the form), then tried 1,001 (which submitted successfully). AWS support granted 1,000.

Problem 2: The status display bug. After submitting, the Service Quotas console showed my request status as "Pending" for four days. I assumed it was still being reviewed. It was not. The quota had been approved and applied within 24 hours, but the console never updated to show "Approved." I only discovered it was live by checking via CLI:

# Check the ACTUAL current quota value (not the request status)
aws service-quotas get-service-quota \
    --service-code lambda \
    --quota-code L-B99A9384

# Output
{
    "Quota": {
        "ServiceCode": "lambda",
        "QuotaName": "Concurrent executions",
        "QuotaCode": "L-B99A9384",
        "Value": 1000.0,        # <-- Approved! Console still says "Pending"
        "Unit": "None",
        "Adjustable": true,
        "GlobalQuota": false
    }
}

The lesson: never trust the AWS console for quota request status. Always verify via CLI or API. I filed a case with AWS support (case #177568943600135), chatted with a support agent, and had it resolved. Total time from request to approval: approximately 36 hours. The console UI did not update for another 3 days after that.

What is the skip-worktree surprise, and how does it fix git?

With Lambda working and quotas sorted, the last blocker was deployment. Remotion's build process generates files in the public/ directory — large binary bundles (site artifacts, compiled compositions, asset manifests) that should not be committed to git. In a normal repo, you would add public/ to .gitignore and move on.

My situation was different. The public/ directory already contained tracked files — OG images, logos, favicons — that needed to stay in git. I could not gitignore the entire directory. And the Remotion-generated files were large enough (some over 300KB) that they made git commit slow. But the real problem was worse.

Why did git commit hang for 40 minutes?

The repository was hosted on iCloud Drive (inside ~/Documents/GitHub/). macOS iCloud Drive uses "dataless" file placeholders — files that exist in the filesystem metadata but whose contents are stored in iCloud, not on local disk. When git runs refresh_index during a commit, it reads every tracked file to check for modifications. If a file is a dataless placeholder, macOS initiates an iCloud download, which can take seconds per file. With dozens of files in public/, this caused git commit to hang for 40+ minutes.

I tried several approaches that failed:

  • brctl download public/ — returned exit 0 but left files dataless (brctl is best-effort, not guaranteed).
  • rsync to a non-iCloud directory — hung indefinitely on a 369KB PNG file because rsync has no per-file timeout.
  • Disabling iCloud sync temporarily — caused ~/Documents to appear empty, which is terrifying even if the files are not actually lost.

The fix that worked: git update-index --skip-worktree.

# Tell git to skip checking public/ files during refresh_index
cd ~/Documents/GitHub/my-project
git ls-files -z public/ | xargs -0 git update-index --skip-worktree

# Now git commit works instantly — no iCloud reads
git commit -m "feat: migrate video rendering to Lambda"

# After commit, restore normal git behavior
git ls-files -z public/ | xargs -0 git update-index --no-skip-worktree

The --skip-worktree flag is a bit on the git index entry. It tells git: "assume this file has not changed — do not read it from the working tree during refresh." It is a metadata-only operation that completes in milliseconds. The file stays tracked, its last-committed content stays in git history, but git skips the filesystem read that triggers the iCloud download. I wrote about git patterns for production codebases in an earlier post.

When to use skip-worktree vs. assume-unchanged: Both flags prevent git from checking a file for changes. --skip-worktree is the correct choice when you intentionally want to keep local modifications invisible to git (build artifacts, local config overrides). --assume-unchanged is intended for performance optimization on large repos and can be overridden by git operations like git reset. For build tool output in tracked directories, always use --skip-worktree.

What does the real cost breakdown look like?

After running the Lambda pipeline for two weeks in production, here are the actual costs per video. These are real numbers from my AWS billing dashboard, not estimates.

Component Cost per Video Details
Lambda compute $0.08-0.12 3 concurrent invocations, 3GB memory, 20-30s each. Billed per 1ms at $0.0000166667/GB-second.
S3 storage (chunks) $0.002 Temporary chunk storage during render. Cleaned up after stitching. Negligible.
S3 storage (final MP4) $0.003/month ~120MB per video at $0.023/GB/month. Accumulates over time.
ElevenLabs TTS $5.00/month (flat) Starter plan covers ~100 videos/month of narration at current script lengths.
CloudFront delivery $0.01 ~120MB transfer per view at $0.085/GB for first 10TB.
Total per video ~$0.10-0.15 Lambda + S3 + CloudFront per render
Total with TTS amortized ~$15/video at 10/month $5 TTS + $1-1.50 infra, amortized over low volume

The cost story changes dramatically with volume. At 10 videos per month, the TTS subscription dominates and the effective cost is roughly $15/video. At 100 videos per month, TTS amortizes to $0.05/video and the total drops to roughly $0.15-0.20/video. At 1,000 videos per month, you would need the ElevenLabs Scale plan ($99/month) and total cost would be roughly $0.20/video.

The critical insight: Lambda compute is nearly free. The real cost is text-to-speech and storage, not rendering. This surprised me — I expected compute to dominate. In practice, a 90-second video render costs less than a single ChatGPT API call for the script generation that precedes it.

How does Lambda compare to other rendering approaches?

Approach Cost/video (100/mo) Max concurrent Setup time Scale to zero Best for
Local machine $0 (your time) 1 Minutes N/A Prototyping, <10 videos/month
Remotion Lambda $0.15-0.20 200+ (with quota) 2-4 hours Yes Production SaaS, variable load
Dedicated GPU server (Hetzner) $1.50-3.00 4-8 1-2 days No ($150-300/mo fixed) High volume, predictable load
Shotstack API $0.40-1.00 Unlimited (managed) 1-2 hours Yes Simple templates, non-React
Creatomate API $0.50-1.50 Unlimited (managed) 1-2 hours Yes Template-based, drag-and-drop
Remotion Cloud Run $0.10-0.15 Configurable 3-5 hours Yes GCP shops, longer videos

Remotion Lambda wins for my use case because: (1) I already have the Remotion composition code, so no rewrite needed; (2) my volume is variable — some weeks 3 videos, some weeks 30 — and scale-to-zero means I pay nothing during quiet weeks; (3) the 200+ concurrent capacity (post-quota increase) means I can handle bursts without queueing.

The dedicated server option only makes sense above roughly 500 videos per month, where the fixed $150-300/month cost amortizes below Lambda's per-render pricing. At my current volume (30-50/month), Lambda is 3-5x cheaper than a dedicated server.

What are the three lessons from this migration?

Lesson 1: Always check actual Lambda quotas, not documented defaults

Run aws lambda get-account-settings before writing a single line of Lambda code. If ConcurrentExecutions shows 10, file the quota increase request immediately — it takes 24-48 hours. Do not wait until your first production render fails. The AWS Service Quotas documentation describes the process, but does not mention the UI validation bug. Use the CLI to verify approval status.

Lesson 2: The skip-worktree trick is essential for any build tool that generates files in tracked directories

This is not specific to Remotion. Any build tool that writes output to a directory that git already tracks — Next.js writing to .next/, Webpack writing to dist/, Remotion writing to public/ — can create the same problem. The pattern is: git ls-files -z <dir> | xargs -0 git update-index --skip-worktree before the operation, then --no-skip-worktree after. Keep this in your toolbox.

Lesson 3: Lambda cost scales linearly — there are no economies of scale

Unlike a dedicated server where the per-video cost decreases as volume increases (fixed cost amortized over more renders), Lambda pricing is perfectly linear. Your 1,000th video costs exactly the same as your 1st. This is a feature at low volume (no idle cost) and a liability at high volume (no volume discount). The crossover point in my calculation is roughly 500 videos per month — above that, a dedicated render server starts winning on unit economics.

Frequently Asked Questions

How long does a Remotion Lambda render take for a 90-second video?

With concurrency set to 3 and 3GB memory per invocation, a 90-second 1080p video renders in 45-90 seconds on Lambda. This is roughly 3-4x faster than local rendering on an M-series MacBook Pro because the chunks render in parallel across multiple Lambda instances. Longer videos (3-5 minutes) take 2-4 minutes on Lambda.

Do I need to deploy a Remotion "site" to S3 separately?

Yes. Before rendering, you must deploy your Remotion project as a static site to S3 using deploySite() or a deploy script. This site contains your bundled composition code and static assets. Lambda downloads this site at render time. You only need to redeploy when your composition code changes — not for every render. In my setup, I run a deploy script (deploy-remotion-lambda.ts) after each code change to the Remotion project.

Can I use Remotion Lambda with Vercel serverless functions?

Yes, but with caveats. The renderMediaOnLambda() call returns immediately (it is an async dispatch, not a synchronous render), so it fits within Vercel's default 10-second timeout. However, the webhook that Lambda fires when rendering is complete hits your Vercel API route, which needs to download the MP4 from S3 and upload it to your storage — that transfer can exceed 10 seconds for large videos. Set maxDuration: 60 on the webhook route.

What happens if a Lambda render fails mid-way?

Remotion's orchestrator handles chunk-level retries automatically. If a single chunk invocation fails (out of memory, timeout), the orchestrator retries it up to 3 times before marking the entire render as failed. If the render fails, the webhook fires with a type: "error" payload instead of "success". Your webhook handler should update the video status to "failed" and surface the error to the user or an admin dashboard.

Is the $15/video cost competitive with other video generation services?

At low volume (under 30 videos/month), $15/video is high compared to API-based services like Shotstack ($0.40-1.00/video). But $15 includes the TTS subscription amortized at low volume — the actual infrastructure cost is $0.10-0.15/video, which is cheaper than every managed alternative. At 100+ videos/month, the amortized cost drops to $0.15-0.20/video, making Remotion Lambda the cheapest option that gives you full control over the composition code.


Dinesh Challa is an AI Product Manager building production software with Claude Code. Follow him on LinkedIn.

Published April 11, 2026. Part of a series on building production AI products as a solo founder.