AI Agents

Inngest vs Temporal vs Step Functions vs Airflow Comparison

A practitioner's 4-layer mental model for choosing between Inngest, Temporal, AWS Step Functions, Airflow, and queues. Based on real production cost analysis.

Inngest vs Temporal vs Step Functions vs Airflow: A 4-Layer Mental Model | AI PM Portfolio

Inngest vs Temporal vs Step Functions vs Airflow: A 4-Layer Mental Model for Workflow Engines

April 11, 2026 · 14 min read · AI Agents / Infrastructure

Last Updated: 2026-04-11

The workflow engine market is confusing because four fundamentally different layers of abstraction all call themselves "workflow orchestrators." Layer 1 is await-inline (just await the async call). Layer 2 is background queues (Redis, SQS). Layer 3 is workflow engines (Inngest, Temporal). Layer 4 is orchestration platforms (Airflow, Dagster). Most web apps should start at L1 and only move up when they have seen the specific pain that justifies the complexity. Choosing the wrong layer is a category error, not a tooling preference.

Why is the workflow engine landscape so confusing in 2026?

Search "best workflow engine" and you get a list that includes Inngest, Temporal, AWS Step Functions, Airflow, Prefect, Dagster, BullMQ, Trigger.dev, and a dozen others. They all claim to handle "workflows." They all have retry logic. They all have dashboards. But they solve fundamentally different problems, and picking one for the wrong category creates pain that no amount of configuration can fix.

The confusion exists because the word "workflow" spans four layers of abstraction. A Stripe webhook handler that takes 200ms is a "workflow." A 47-step tax document extraction pipeline is a "workflow." A nightly ETL job that moves 2TB from S3 to Snowflake is a "workflow." The tools built for each category overlap in marketing but barely overlap in optimal use cases.

I learned this the hard way while building a tax-tech application on Vercel and Next.js. We had three background jobs that kept dying silently -- Vercel killed the serverless container before the async work finished. The natural instinct was to reach for the most powerful workflow engine available. Instead, I spent a week researching every major option, deployed one to production, and rejected another after calculating operational costs. This post is the mental model that emerged from that process. I wrote separately about the await-inline migration heuristic that informed these decisions.

What are the four layers of workflow abstraction?

Every background job in a web application belongs to one of four layers. The layers are ordered by complexity, operational cost, and capability. The single most important decision is choosing the right layer -- not the right tool within a layer.

L1 Await inline -- the simplest option most teams skip

Just await the async work directly in your request handler. No new dependencies. No new infrastructure. No new deployment pipeline. If the work completes in under 300 seconds on a serverless platform (Vercel's Pro tier ceiling, per their documentation), this is often all you need.

// L1: Await inline — simplest possible pattern
// app/api/returns/[id]/confirm/route.ts

export async function POST(req: Request) {
  const { returnId } = await req.json();

  // Just await the work. No queue. No worker. No new vendor.
  const extraction = await extractDocument(returnId);   // ~8s
  const intelligence = await runIntelligence(returnId);  // ~12s
  await sendConfirmationEmail(returnId);                 // ~2s

  // Total: ~22s. Well within Vercel's 300s maxDuration.
  return Response.json({ status: "confirmed" });
}

export const maxDuration = 300; // Vercel Pro ceiling

L1 works when: the task completes in under 300 seconds, the caller can wait (admin paths, internal tools, API-to-API calls), and you have fewer than five background job types. According to Vercel's 2025 usage data, 78% of serverless function invocations complete in under 10 seconds. Most teams are at L1 and do not know it.

L1 fails when: the work takes longer than your platform's timeout, the caller is a user who cannot tolerate a 30-second spinner, or you need retry-with-backoff on external API failures. When any of those become true -- and not before -- you move to L2.

L2 Background job queue -- fire-and-forget with retry

A queue accepts a message, persists it, and delivers it to a worker with retry semantics. Redis/BullMQ, AWS SQS, Supabase pgmq, and Upstash QStash all live here. The mental model is: "send this message, retry on failure, I do not need to coordinate multiple steps."

L2 is right for: email sending, webhook delivery, image processing, search indexing, single-step jobs where the only orchestration is "retry if it fails." You do not need Temporal for sending an email. You do not need Airflow for resizing an image.

L2 fails when: you need multi-step coordination (step A feeds into step B, and if step B fails, you need to compensate step A), fan-out (one event triggers N parallel handlers), or human-in-the-loop approval gates. That is when you need L3.

L3 Workflow engine -- step functions with durable execution

Inngest, Temporal, and AWS Step Functions live here. The defining characteristic is step-level retry and state persistence: if step 3 of a 5-step workflow fails, the engine retries step 3 without re-running steps 1 and 2. The workflow's state is durable across failures, deploys, and container restarts.

L3 is right for: multi-step pipelines (document upload, extraction, intelligence, notification), sagas with compensation (charge the card, book the hotel, if the flight fails then refund the card and cancel the hotel), long-running processes that span hours or days, and agent workflows where an AI makes decisions across multiple steps.

I deployed Inngest to production for a 5-step walkthrough generation pipeline: prepare context, generate script via Claude, render images, generate audio via ElevenLabs, persist to storage. Each step has independent retry. When ElevenLabs rate-limits at step 4, Inngest retries step 4 three times with exponential backoff. Steps 1-3 are not re-executed. The total setup time was 45 minutes from npm install inngest to the first successful production run. This pipeline is part of a broader agentic framework where AI systems act autonomously on behalf of users.

L4 Orchestration platform -- DAG-based scheduling for data pipelines

Airflow, Dagster, Prefect, Luigi, and Argo Workflows live here. The defining characteristic is DAG-based scheduling: directed acyclic graphs of tasks that run on a schedule, moving data between systems. The mental model is not "an event fires a function" but "a graph of tasks runs at 2 AM every night."

L4 is right for: nightly ETL (extract from Postgres, transform, load to Snowflake), ML training pipelines, BI report generation, dbt orchestration, batch processing of millions of records. If your homepage example code moves data from S3 to a data warehouse, you are looking at L4.

L4 is wrong for: web application background jobs. Using Airflow to send a confirmation email after a user action is a category error. It works, technically, but you are paying for a scheduler + executor + metadata database + web server running 24/7 to handle event-driven work that should be a function invocation.

The key insight: "Workflow engines for apps" (L3) and "workflow engines for data" (L4) are sibling categories, not parent-child. They both got called "workflow orchestrators" by their marketing teams, which created the category confusion. Their optimal use cases barely overlap. Picking one for the other category is a category error that creates pain on day one.

How do Inngest, Temporal, Step Functions, Airflow, and BullMQ actually compare?

This table reflects my evaluation as of April 2026, including hands-on deployment of Inngest and detailed cost analysis of Temporal for a production tax-tech application running on Vercel + Supabase.

Dimension	Inngest	Temporal	AWS Step Functions	Airflow	BullMQ
Layer	L3 (app workflow)	L3 (app workflow)	L3 (app workflow)	L4 (data orchestration)	L2 (queue)
Learning curve	45 min to first function	3-5 days (determinism model)	1-2 days (ASL JSON)	1+ week (DAGs, executors)	2-4 hours
Pricing at startup scale	Free (50K steps/mo)	~$200/mo + worker hosting	Free tier + AWS overhead	$200-300/mo managed	Redis cost only (~$10/mo)
Self-host option	Yes (open-source)	Yes (open-source)	No (AWS-only)	Yes (Apache project)	Yes (Redis)
Retry granularity	Per step	Per activity	Per state	Per task	Per job (whole job)
Observability	Built-in dashboard, per-step traces	Temporal UI + Grafana	CloudWatch + X-Ray	Airflow UI + logs	Bull Board or custom
TypeScript support	Native, first-class	Good (TS SDK)	CDK only (ASL is JSON)	Python-first	Native
Serverless compatible	Yes (designed for it)	No (needs worker process)	Yes (AWS Lambda native)	No (scheduler + workers)	No (needs worker process)
Cold start impact	None (runs in your existing functions)	Worker always warm	Lambda cold start applies	Workers always warm	Worker always warm
Fan-out	Yes (event-driven)	Yes (child workflows)	Yes (parallel states)	Yes (dynamic tasks)	Manual (publish N jobs)
Human-in-the-loop	step.waitForEvent	Signals + queries	Callback tasks	Sensors (polling)	Not built-in
Community (GitHub stars)	~5K (growing fast)	~12K (mature)	N/A (AWS product)	~37K (very mature)	~6K
Vendor lock-in	Medium (TS code portable)	Medium (open-source)	High (AWS-only, ASL JSON)	Medium (Apache standard)	Low (Redis standard)
Best for	TS teams on serverless, <10K workflows/day	Backend teams needing durable execution at scale	AWS-native shops	Data engineering teams	Simple retry queues

What does real Inngest deployment look like in production?

Here is exactly what happened when I deployed Inngest to a production Vercel application in April 2026. No vendor cherry-picking -- the good and the rough edges.

Setup: 45 minutes from install to first production run

The total file diff was 6 files changed. One new API route (/api/inngest) to serve as the Inngest endpoint. One client factory file with lazy initialization. One function definition file. Three modified files to replace fire-and-forget async calls with inngest.send() calls. No new infrastructure. No worker process. No separate deployment pipeline.

// lib/inngest/client.ts — lazy factory (no module-level init)
import { Inngest } from "inngest";

let _client: Inngest | null = null;

export function getInngest(): Inngest {
  if (!_client) {
    _client = new Inngest({ id: "my-tax-app" });
  }
  return _client;
}

// lib/inngest/functions/walkthrough.ts — 5-step durable pipeline
export const generateWalkthrough = getInngest().createFunction(
  {
    id: "generate-walkthrough",
    retries: 3,
    // Each step gets independent retry + its own lambda lifetime
    triggers: [{ event: "app/walkthrough.requested" }],
  },
  async ({ event, step }) => {
    // Step 1: Prepare context (~2s)
    const context = await step.run("prepare-context", async () => {
      return await fetchReturnContext(event.data.returnId);
    });

    // Step 2: Generate script via Claude (~15s)
    const script = await step.run("generate-script", async () => {
      return await generateScript(context);
    });

    // Step 3: Render images (~20s)
    const images = await step.run("render-images", async () => {
      return await renderSlideImages(script);
    });

    // Step 4: Generate audio via ElevenLabs (~30s)
    // If ElevenLabs rate-limits, Inngest retries THIS step only
    const audio = await step.run("generate-audio", async () => {
      return await generateAudio(script);
    });

    // Step 5: Persist everything to storage (~5s)
    await step.run("persist-walkthrough", async () => {
      return await persistWalkthrough(event.data.returnId, {
        script, images, audio,
      });
    });
  }
);

The Vercel integration installed cleanly. The Inngest dashboard showed the function registered within 30 seconds of deploy. Local development with npx inngest-cli dev provided a local dashboard with step-by-step execution traces -- genuinely useful for debugging, not just a status page.

What went well

Zero new infrastructure: Inngest runs inside the existing Vercel deployment. No worker process, no separate cloud account, no parallel CI/CD pipeline. For a small team (PM + AI agents), this is the deciding factor.
Per-step observability: When ElevenLabs returned a 429 at step 4, the Inngest dashboard showed exactly which step failed, what the payload was, and when the retry would fire. In the old fire-and-forget model, this failure was invisible.
Free tier covers startup scale: 50,000 step executions per month. Our tax application processes roughly 200 walkthroughs per month at 5 steps each = 1,000 step executions. We are at 2% of the free tier.
v4 API is clean: createFunction takes 2 args (options + handler). Triggers go inside options. step.run callbacks must be idempotent (which is good discipline anyway).

What was rough

v3-to-v4 migration docs are thin: The function signature changed from 3 args to 2 args between major versions. I hit a TypeScript error that took 20 minutes to debug because the migration guide did not highlight this clearly.
Event payload typing requires manual effort: Inngest v4 does not auto-infer event payload types from step.run. You either set up Zod schemas (more boilerplate) or cast inline (event.data as { returnId: string }).
Debugging a stuck function required dashboard familiarity: When a function hung, I needed to navigate the Inngest dashboard's Runs tab, expand the failed step, and read the stack trace. The dashboard is good, but it is a new tool to learn.

Why did I reject Temporal for this specific application?

Temporal is the technically superior workflow engine. It handles millions of workflows per day at Uber, Netflix, and Stripe. Its durable execution model is battle-tested across a decade of production usage (originating as Cadence at Uber). If I were hiring a platform engineering team and building a system that processed 10,000+ workflows per day, Temporal would be my first choice.

I rejected it for one structural reason: Temporal requires a long-running worker process. My application runs on Vercel, which is serverless. There is no place to run a persistent worker without adding a second compute provider (Fly.io, Railway, ECS, Kubernetes). That single architectural mismatch disqualified Temporal regardless of its technical superiority.

The cost analysis reinforced the decision:

Temporal Cloud: ~$200/month minimum + worker hosting (~$20-50/month on Fly.io) = ~$2,700-3,000/year
Inngest: $0/year at current scale (free tier)
Difference: ~$3,000/year for 3 background jobs processing ~200 events/month

The Temporal deterministic workflow constraint added cognitive overhead: you cannot use Date.now(), Math.random(), or direct HTTP calls inside workflow functions. All side effects must go through activities. This is architecturally sound but represents a 1-2 week learning curve and ongoing cognitive tax for a small team. Temporal's documentation on workflow determinism is thorough but underscores the paradigm shift required.

Important nuance: Rejecting Temporal for THIS application is not rejecting Temporal. If I hire a platform engineer, scale to 10,000+ workflows per day, or need polyglot workflows (Go + Python + TypeScript), Temporal becomes the right answer. The migration from Inngest to Temporal is feasible because the business logic inside step.run callbacks is plain TypeScript -- only the wiring is Inngest-specific. Choosing Inngest now does not lock you out of Temporal later.

When should you use each option? A decision tree

After evaluating all five options against a real production application, here is the decision tree I now use for every new background job:

Can the work complete in <300s and the caller can wait? Use L1 (await inline). Add maxDuration = 300 on Vercel. No new dependencies. This covers the vast majority of admin-path and API-to-API workflows.
Is it a single-step job that just needs retry? Use L2 (BullMQ, SQS, QStash, or Supabase pgmq). Email sends, webhook deliveries, image resizing. One message, one handler, retry on failure.
Is it a multi-step pipeline where steps have independent failure modes? Use L3 (Inngest if serverless/TS-first, Temporal if you have workers and need scale). Document extraction pipelines, payment sagas, AI agent workflows.
Is it a scheduled data pipeline (ETL, ML training, BI reports)? Use L4 (Airflow, Dagster, Prefect). Note: L4 coexists alongside L3. A mature application might use Inngest for app workflows AND Airflow for data pipelines.

The critical heuristic: start at L1, only move up when you have seen the specific pain that justifies the complexity. Not "we might need retry someday" but "we lost 3 walkthrough generations last week because Vercel killed the container at second 45." Real pain, not anticipated pain. I documented this heuristic in detail after living through the migration.

What are the three most common workflow engine mistakes?

Mistake 1: Starting at L3 when L1 would suffice

I see this in nearly every "how I built my SaaS" blog post: the team adopts Temporal or Inngest on day one for three background jobs that each take 8 seconds. They spend a week on setup, learn a new mental model, add monitoring for a system that processes 50 events per day, and debug queue-specific edge cases that would not exist if they had just awaited the work inline.

The await-inline pattern has zero operational surface area. No dashboard to monitor. No dead-letter queue to drain. No event schema to maintain. No vendor dependency to evaluate. For a bootstrapping startup with fewer than 5 background job types, this simplicity is a feature, not a limitation. I ran an entire tax application on await-inline for months before the first workflow engine was justified.

Mistake 2: Using Airflow for web application background jobs

Airflow is a phenomenal tool for data engineering. It is the wrong tool for handling a Stripe webhook. The mismatch is structural:

Airflow is schedule-first; web app jobs are event-first
Airflow requires a scheduler + executor + metadata database running 24/7; web app jobs should be ephemeral
Airflow is Python-first; most modern web apps are TypeScript
Managed Airflow (Astronomer, MWAA, Cloud Composer) starts at $200-300/month; Inngest is free at startup scale

I learned to spot this category error by reading the homepage example code. If the example moves data from S3 to Snowflake, it is data orchestration (L4). If the example handles a Stripe webhook or processes a user upload, it is application orchestration (L3). The example shape reveals the tool's true category better than any feature comparison.

Mistake 3: Building a custom queue when Inngest exists

Before Inngest existed (pre-2022), building a custom queue on top of Redis or Postgres was the only option for small teams on serverless. You would create a jobs table, write a cron that polls for pending rows, process the oldest row, handle retries manually, build your own dead-letter logic, and instrument your own monitoring. This worked, but it was 300-500 lines of undifferentiated glue code.

In 2026, building a custom queue for a web application is like writing your own ORM: technically possible, educationally valuable, and almost always the wrong business decision. Inngest's free tier covers 50,000 step executions per month. Unless you have compliance requirements that prevent using a third-party service, the build-vs-buy math has shifted decisively toward buy. This mirrors a pattern I have seen across my career: the infrastructure layer commoditizes, and teams that build undifferentiated infrastructure instead of product features lose to teams that do not.

How should you evaluate ANY infrastructure choice?

The workflow engine evaluation taught me a three-question framework that applies to every infrastructure decision:

Does it match the deployment topology you have already committed to? If you are on Vercel and the tool requires a long-running process, the answer is no, regardless of how technically superior the tool is. Operational tax of running parallel infrastructure topologies wipes out feature advantages.
Does it match the cognitive bandwidth your team has? A PM building with AI agents has different bandwidth than a 10-person backend team. The tool should feel like "writing the same code I already write, with one new wrapper" rather than "learning a new paradigm."
What is the lock-in cost if you are wrong? Business logic should be portable; only the wiring should be vendor-specific. Inngest's step.run callbacks contain plain TypeScript. If Inngest disappears tomorrow, the migration to Temporal is rewriting the wiring, not the logic.

When you compare two tools and they share the same marketing keyword ("workflow," "agent," "queue," "automation") but solve different problems, read the homepage example code. The example is the most important customer's most common use case. It tells you more about the tool's true category than any feature comparison matrix.

What does migration between layers look like in practice?

One concern teams raise: "If I start at L1, will it be painful to migrate to L3 later?" In my experience, no. The migration from await-inline to Inngest took 45 minutes for the first function. Here is why:

// BEFORE: L1 await-inline in the route handler
export async function POST(req: Request) {
  const { returnId } = await req.json();
  await extractDocument(returnId);     // Just awaited
  await runIntelligence(returnId);     // Just awaited
  await sendEmail(returnId);           // Just awaited
  return Response.json({ ok: true });
}

// AFTER: L3 Inngest — same business logic, new wiring
// Route handler (7 lines replacing 47-line fire-and-forget IIFE):
export async function POST(req: Request) {
  const { returnId } = await req.json();
  await getInngest().send({
    name: "app/document.confirmed",
    data: { returnId },
  });
  return Response.json({ ok: true });
}

// Inngest function (business logic is identical):
export const processDocument = getInngest().createFunction(
  { id: "process-document", triggers: [{ event: "app/document.confirmed" }] },
  async ({ event, step }) => {
    await step.run("extract", () => extractDocument(event.data.returnId));
    await step.run("intelligence", () => runIntelligence(event.data.returnId));
    await step.run("notify", () => sendEmail(event.data.returnId));
  }
);

The business logic functions (extractDocument, runIntelligence, sendEmail) did not change. They moved from being called in the route handler to being called inside step.run wrappers. The migration cost was wiring, not logic. This is why starting at L1 is low-risk: the upgrade path is cheap and well-defined.

What does the 2026 landscape look like for each category?

Layer	Leading tools (2026)	Trend
L1 (Await inline)	Native platform support (Vercel maxDuration, Lambda timeout)	Platform limits keep increasing; Vercel Pro now supports 300s, up from 60s in 2023
L2 (Queue)	BullMQ, Upstash QStash, SQS, Supabase pgmq	Serverless queues (QStash, pgmq) gaining over self-hosted Redis
L3 (Workflow engine)	Inngest, Temporal, Trigger.dev, AWS Step Functions	Inngest and Trigger.dev winning serverless teams; Temporal dominates enterprise; Step Functions stable but AWS-locked
L4 (Orchestration platform)	Airflow, Dagster, Prefect	Dagster gaining on Airflow for new projects; Prefect strong on developer experience; Airflow still dominant by installed base

The most significant trend: L3 tools are moving down-market. Inngest's free tier and 45-minute setup time make workflow engines accessible to solo developers and tiny teams. Two years ago, the smallest team that could justify a workflow engine was 3-5 engineers. Today, a PM building with AI coding assistants can deploy Inngest in under an hour. This compression is changing when teams should move from L1 to L3 -- the complexity cost is dropping, which means the pain threshold for adoption is lower.

Frequently Asked Questions

Is Inngest production-ready in 2026?

Yes. Inngest has been generally available since 2023, raised a Series A, and serves thousands of production applications. I deployed it to a production tax-tech application in April 2026. The v4 SDK is stable, the dashboard provides per-step observability, and the Vercel integration installs cleanly. The main risk is vendor viability (Series A company), but the business logic inside step.run is portable TypeScript, so migration cost is bounded.

Can Temporal run on Vercel or other serverless platforms?

No. Temporal requires a long-running worker process that polls the Temporal server for tasks. Serverless platforms like Vercel and Cloudflare Workers do not support persistent processes. You would need a separate compute layer (Fly.io, Railway, ECS, or Kubernetes) to run Temporal workers. This is the structural reason Inngest wins for serverless-first teams.

Should I use Airflow for my web application's background jobs?

Almost certainly not. Airflow is optimized for scheduled data pipelines (ETL, ML training, BI reports), not event-driven application workflows. If your jobs are triggered by user actions (uploads, purchases, form submissions), use Inngest or Temporal (L3). If your jobs are nightly batch processing that moves data between systems, Airflow is excellent. Many mature applications use both: L3 for app workflows, L4 for data pipelines.

What is the migration path from Inngest to Temporal if I outgrow Inngest?

The business logic inside Inngest step.run callbacks is plain TypeScript. Migrating to Temporal means rewriting the wiring (event triggers become workflow starters, step.run becomes activities, the serve route becomes a worker process) while keeping the business logic unchanged. Estimated effort: 1-2 days per function for a team familiar with Temporal. The key is to keep step.run callbacks as thin wrappers around standalone business logic functions.

How do I know when to move from L1 (await inline) to L3 (workflow engine)?

Move when you have experienced at least one of these concrete pains: (1) your serverless platform killed a container before async work completed, losing user data, (2) you need to retry step 3 of a 5-step pipeline without re-running steps 1-2, (3) you have a workflow that spans hours or days and needs to survive deploys. Do not move because you anticipate needing it someday. The migration from L1 to L3 takes under an hour, so the cost of waiting is near zero.

Dinesh Challa is an AI Product Manager building production software with Claude Code. Follow him on LinkedIn.

Published April 11, 2026. Based on deploying Inngest to a production Vercel application, rejecting Temporal after cost analysis, and evaluating Airflow, Step Functions, and BullMQ for a tax-tech platform with 3 background job types processing ~200 events/month.