Functions
Azure Functions is the Lambda of Azure, but the comparison undersells the one idea it gets more right than anyone: bindings. You declare where the data comes from and where it goes, and the runtime moves it. No client construction, no retry loops, no serialisation code. This page covers the bindings model, the four hosting plans and how to choose, the scale controller that decides how many instances you get, cold starts, Durable Functions and the replay rule that trips everyone once, and what it all costs. It ends with a CLI lab you can run in fifteen minutes.
One app, many functions
Before anything else, get the unit of deployment right, because it is not the function. It is the function app. An app is a collection of functions that deploy together, share a hosting plan, share configuration and connection strings, run in the same process (or worker), and scale together as one unit. If your app has an HTTP function and three queue workers, a flood of queue messages scales out instances that also carry the HTTP function, and vice versa. That is a real architectural decision: group functions that share a lifecycle and a scaling profile into one app, and split anything with a different traffic shape into its own.
The app has one host.json at its root, and it configures the runtime for every
function inside: queue batch sizes, retry policies, logging and sampling, Durable Functions
settings, HTTP concurrency limits. Per-function configuration lives next to the code, either
as a function.json file in the older programming models or as decorators and
attributes in the newer ones. The split matters in practice. If one queue function needs a
batch size of 1 because it does heavy work per message, that host.json setting
applies to every queue function in the app, which is another reason to split apps by workload
shape rather than by team org chart.
This is the first big divergence from AWS, where each Lambda function is its own unit with its own memory size, concurrency settings, and scaling behaviour. Lambda gives you fine control per function; Functions gives you cohesion per app. Neither is strictly better. The app model means one deploy ships a whole microservice, shared startup code runs once per instance, and a single connection pool serves every function in the app. The cost is that you cannot tune one function's instance size independently of its neighbours.
Triggers and bindings, the actual differentiator
Every function has exactly one trigger: the event that causes it to run. A message landing on a storage queue, a blob being created, a Cosmos DB change-feed entry, an Event Hubs batch, an HTTP request, a timer. The trigger also delivers the payload, already deserialised, as the function's input parameter. So far this is the same shape as Lambda's event sources.
The part Lambda does not have is input and output bindings. A binding is a declarative connection to data: "also fetch me this blob," "write my return value to this Cosmos container," "send this object to that Service Bus topic." You declare the binding, and the runtime owns the client, the connection, the retries, and the serialisation. The equivalent Lambda function imports an SDK, builds a client outside the handler, handles pagination and encoding, and wraps it in error handling. With bindings, that plumbing simply is not in your file. Here is a complete function that triggers on a queue message and writes a result blob, in the Python v2 model:
import azure.functions as func
import json
app = func.FunctionApp()
@app.queue_trigger(arg_name="msg", queue_name="orders",
connection="AzureWebJobsStorage")
@app.blob_output(arg_name="receipt",
path="receipts/{id}.json",
connection="AzureWebJobsStorage")
def process_order(msg: func.QueueMessage, receipt: func.Out[str]) -> None:
order = json.loads(msg.get_body())
receipt.set(json.dumps({
"id": order["id"],
"total": order["qty"] * order["price"],
"status": "confirmed",
}))Notice what is absent. There is no storage client, no account URL, no credential handling, no
upload call, no base64 decode of the queue message. The path on the output
binding even contains a binding expression: {id} is
resolved at runtime from the JSON body of the triggering message, so each order writes a blob
named after itself. Binding expressions can pull from trigger metadata, message properties,
and app settings, which covers a surprising share of the routing logic you would otherwise
write by hand.
Bindings exist for most of the platform: storage queues and blobs, Cosmos DB, Service Bus and Event Hubs, Event Grid, SignalR, SendGrid, Kafka on some plans, and more through extension packages. Triggers and bindings are implemented as extensions to the Functions host, declared as packages in your project, which is why a queue-triggered .NET app and a queue-triggered Python app behave identically at the polling layer: it is the same host code underneath.
The hosting plans
Functions separates what your code does from where it runs, and the "where" is the hosting plan. There are four mainline choices, and the decision drives cost, cold-start behaviour, networking, and how long a single execution may run.
Consumption is the classic serverless deal: the platform allocates instances when events arrive, bills per execution, and scales to zero when traffic stops. The catch is the cold start when scaling from zero, a 10-minute ceiling on execution time, an instance size fixed at 1.5 GB, and no virtual-network integration. Flex Consumption is the newer take on the same idea: still scale-to-zero and pay-per-use, but with per-function-group scaling, a choice of instance memory sizes, VNet support, much faster scale-out, and optional always-ready instances that hold a floor of warm capacity for a baseline fee. For new event-driven apps it is usually the right default, and Microsoft positions it as the successor to classic Consumption.
Premium (the Elastic Premium SKUs) keeps at least one instance warm at all times and maintains a pool of pre-warmed spares, so requests never see a cold start. It adds VNet integration, larger instances up to 14 GB, and executions that can run for an hour or more. You pay for the warm instances around the clock, so an idle Premium app costs real money every month. Dedicated means running the Functions host on an App Service plan you already own: the same VMs that serve your web apps. There is no scale controller magic; scaling follows the plan's autoscale rules, and you switch on Always On so the host is not unloaded at idle. It makes sense when the plan already exists and has spare capacity, or when you want fully predictable cost.
| Plan | Scale to zero | Cold starts | VNet | Max run | Pick it when |
|---|---|---|---|---|---|
| Consumption | Yes | Yes | No | 10 min | spiky or low traffic, latency tolerant, lowest cost |
| Flex Consumption | Yes | Yes, unless always-ready | Yes | Hours | new apps; serverless economics plus networking |
| Premium | No | No | Yes | An hour or more | latency-sensitive APIs, steady traffic, big workloads |
| Dedicated | No | No (Always On) | Yes | Unbounded | existing App Service plan, fixed-cost budgeting |
One more option exists for completeness: hosting the Functions runtime in a container on Azure Container Apps or your own Kubernetes with KEDA. That buys portability and custom images at the price of running more of the stack yourself; treat it as the escape hatch, not the default.
The scale controller
On Consumption and Premium, the thing deciding how many instances your app gets is the scale controller, a platform component that watches your event sources from the outside. It does not wait for your function to be slow; it inspects the sources directly. For a storage queue it tracks the queue length and the age of the oldest message. For Service Bus it watches the message count on the subscribed entity. For Event Hubs it looks at unprocessed events per partition. For HTTP it reacts to request concurrency. From those signals it votes to scale out, scale in, or hold, and the platform allocates or reclaims instances accordingly.
Modern runtimes use target-based scaling, which replaces the older incremental vote with simple arithmetic: desired instances equals unprocessed events divided by a per-instance target (for example, a default target of 16 messages per instance for storage queues). That lets the platform jump from 1 to 8 instances in one decision instead of stepping up one at a time. There are still rate limits on how fast new instances appear, and ceilings on how far it goes: classic Consumption tops out around 200 instances on Windows and 100 on Linux, Flex goes to 1,000, and an Event Hubs trigger can never usefully exceed the partition count of the hub, because each partition is owned by one instance at a time.
Two practical consequences. First, each instance runs the whole app and each queue-triggered
function pulls messages in batches (configured in host.json), so a single
instance can chew through hundreds of messages before scale-out is ever needed; do not expect
one instance per message. Second, because the controller watches source metrics rather than
your code, a poison message that fails forever still looks like "queue not draining" and will
hold instances up. Configure max dequeue counts so poison messages move to the poison queue
instead of recycling.
Cold starts, and what actually helps
When the controller scales from zero, the first event pays for instance allocation: the platform finds a worker, mounts or downloads your app package, starts the Functions host, loads your language worker, runs your imports and any module-level initialisation, and only then invokes the function. On classic Consumption that is commonly a few hundred milliseconds to several seconds, with the spread driven mostly by your own package: a small .NET or JavaScript app lands near the bottom, a Python app importing pandas lands near the top. The physics are the same as a Lambda cold start; the mitigation menu is different.
What works, in rough order of effectiveness: move to Premium and let the pre-warmed pool absorb scale-out, so no request ever sees a cold host; use Flex Consumption with always-ready instances to keep a warm floor while still paying serverless rates for the burst; deploy with run-from-package so the app starts from a single zip instead of thousands of small files; and cut import-time work, lazy-loading heavy modules inside the function body when only some invocations need them. Timer-based "keep warm" pings, the folk remedy, only keep one instance warm and do nothing for the instances created during scale-out, which is precisely when users notice.
Durable Functions, workflows as code
Plain functions are stateless and short-lived, which is exactly wrong for multi-step processes: call three APIs in order, fan out a thousand work items and collect the results, wait three days for a human to approve something. Durable Functions is an extension that adds stateful workflows on top of the same runtime, and unlike Step Functions, which describes the workflow as a JSON state machine, Durable workflows are ordinary code in your language.
Three function types do the work. An orchestrator function is the workflow: it calls other functions, awaits their results, branches, loops, and sleeps, using the language's own control flow. Activity functions are the steps: ordinary functions that do the actual I/O and computation. Entity functions are small named state holders, an actor-ish model for things like per-device counters that many orchestrations read and update. A classic fan-out/fan-in orchestrator looks like this:
@app.orchestration_trigger(context_name="context")
def rebuild_reports(context):
days = yield context.call_activity("list_days", None)
tasks = [context.call_activity("rebuild_day", d) for d in days]
results = yield context.task_all(tasks) # fan-out, then join
yield context.call_activity("write_summary", results)
return len(results)How can an orchestrator "wait three days" when the platform kills idle code? It does not stay resident. Durable Functions is event-sourced: every action the orchestrator takes ("scheduled activity X," "activity X completed with result Y," "timer fired") is appended to a history table in a storage account, in a structure called the task hub. When the orchestrator awaits something, the function exits and the instance can be reclaimed. When the awaited result arrives, the framework runs the orchestrator again from the top, and every await it has already passed returns instantly from the recorded history instead of re-running the work. Execution proceeds to the first await with no recorded result, schedules it, and exits again. The orchestrator is therefore replayed many times, and its position in the workflow is reconstructed from history rather than held in memory.
context.current_utc_datetime), no random values or new GUIDs (use the
context's versions), no I/O, no network calls, no environment-dependent branching inside the
orchestrator. All real work belongs in activities, which run exactly once per scheduling and
have no such restriction. The other surprise: anything you log in an orchestrator is logged
again on every replay. Check is_replaying before logging, or you will think
your workflow ran five times.The comparison with Step Functions comes down to where the definition lives. Step Functions
gives you a managed state machine with a visual diagram, IAM-governed service integrations,
and a definition that cannot have a nondeterminism bug because it is data, not code. Durable
gives you real code: loops, exception handling, helper functions, unit tests in your normal
framework, and patterns like eternal orchestrations (continue_as_new) and
human-interaction waits via external events. The state lives in your own storage account, so
there is no per-state-transition fee like Standard Step Functions; you pay normal Functions
execution plus storage transactions, which Durable produces in volume.
Bindings or the SDK
Bindings are not always the right call, and seasoned teams use a simple split. Triggers: always. There is no sane way to hand-roll the polling, leasing, checkpointing, and scaling integration a trigger gives you, so the trigger side of the model is essentially mandatory. Output and input bindings: use them while they are doing simple things, and drop to the service SDK the moment you need control the binding does not expose.
Concretely, output bindings struggle when you need conditional writes (write only on some code paths is fine, but per-item error handling on a batch is not), fine-grained options (blob tiers, Cosmos transactional batches, Service Bus sessions and scheduled messages), or to know the outcome of the write in order to do something next. A binding failure typically fails the whole invocation; an SDK call gives you the exception, the retry policy, and the ability to compensate. The good news is the two compose: keep the queue trigger, build one client at module scope (or inject it in .NET), and write through the SDK inside the function. You lose nothing of the scaling model by skipping output bindings.
Watching it run
Monitoring is Application Insights, wired in by default when you create the app. Every invocation produces a request record with duration and success, dependency calls (the storage writes, the HTTP calls, the Cosmos operations) are tracked automatically in most stacks, and your log lines are correlated to the invocation that produced them. Live Metrics gives a near-real-time view while you test, and the failures blade groups exceptions by problem. For queue-driven apps the metric worth alarming on is not function duration but queue depth and oldest-message age on the source, because that is what tells you the consumers are losing.
One default to know about: adaptive sampling is on. Under load, App Insights
deliberately drops a fraction of telemetry to cap volume and cost, so "I count fewer
invocation traces than queue messages" is usually sampling, not lost messages. Tune it under
logging.applicationInsights.samplingSettings in host.json, exclude
the event types you cannot afford to lose, and remember the telemetry bill scales with what
you keep. A busy Functions app can produce more spend in App Insights ingestion than in
compute, which surprises teams the first time they read the invoice.
What it costs
Classic Consumption bills two meters: executions (about $0.20 per million) and gigabyte seconds (about $0.000016 per GB-s), with a monthly free grant of one million executions and 400,000 GB-s. The arithmetic is friendlier than people expect. A function averaging 400 ms at 512 MB consumes 0.2 GB-s per run; a million such runs are 200,000 GB-s, inside the free grant entirely. The real bill for small event-driven systems is usually the storage account next to the app: queue transactions, the Azure Files share, and, if you use Durable Functions, the torrent of table and queue operations the task hub performs. Always look at the storage line when a "free" Functions app costs money.
Flex Consumption bills per-instance execution time (plus a small per-execution meter), and always-ready instances bill at a reduced baseline rate whether or not they are working. Premium drops the per-execution meters entirely: you pay vCPU-seconds and GB-seconds for every instance allocated, all day, which is why an idle EP1 costs on the order of a few hundred dollars a month while an idle Consumption app costs nearly nothing. The crossover logic mirrors the Lambda-versus-Fargate argument: a steady, busy workload favours pre-provisioned capacity, spiky and idle-heavy traffic favours the per-execution meters.
Build it yourself
The lab: create a function app on the Consumption plan, deploy a queue-triggered function
with the func CLI, drop a message on the queue, watch the function fire in the
log stream, and tear everything down. You need the Azure CLI, Azure Functions Core Tools v4,
and Python 3.11. Everything lands in one resource group (see
foundations for why that makes
teardown a one-liner).
- Create the resource group, storage account, and function app. The storage
account is not optional; the app keeps its state and its code package there.
RG=fn-lab LOC=westeurope SA=fnlab$RANDOM APP=fnlab-app-$RANDOM az group create -n $RG -l $LOC az storage account create -n $SA -g $RG -l $LOC --sku Standard_LRS az functionapp create -n $APP -g $RG --storage-account $SA \ --consumption-plan-location $LOC --os-type Linux \ --runtime python --runtime-version 3.11 --functions-version 4 - Scaffold the project and write the function. The v2 Python model is one
file; replace the generated
function_app.pywholesale.func init orders-app --python cd orders-app cat > function_app.py << 'EOF' import azure.functions as func import logging app = func.FunctionApp() @app.queue_trigger(arg_name="msg", queue_name="orders", connection="AzureWebJobsStorage") def on_order(msg: func.QueueMessage) -> None: logging.info("processing order: %s", msg.get_body().decode()) EOF - Create the queue and deploy. The connection name
AzureWebJobsStoragealready points at the app's own storage account, so the queue goes there.CONN=$(az storage account show-connection-string -n $SA -g $RG -o tsv) az storage queue create -n orders --connection-string "$CONN" func azure functionapp publish $APP - Send a message. The storage-queue trigger expects base64-encoded message
bodies by default, and a raw string will dead-letter with a decode error, so encode it.
az storage message put -q orders \ --content "$(echo -n '{"id":"42","qty":3}' | base64)" \ --connection-string "$CONN" - Watch it run. Stream the logs and you should see the cold start (host
startup lines) followed by your "processing order" line within a few seconds. Send a
second message and notice the warm invocation is near-instant.
func azure functionapp logstream $APP # in another shell, repeat step 4 and watch the invocation appear - Poke the scale signals. Put 50 messages on the queue in a loop and watch
the trigger drain them in batches; check the poison queue exists for failures.
for i in $(seq 1 50); do az storage message put -q orders \ --content "$(echo -n '{"id":"'$i'"}' | base64)" \ --connection-string "$CONN" -o none done az storage queue list --connection-string "$CONN" -o table # orders drains; an orders-poison queue appears only if messages fail 5 times - Tear down. One resource group, one delete. Nothing in this lab survives
it, so nothing keeps billing.
az group delete -n $RG --yes --no-wait
orders-poison after five attempts), the
queue name in the decorator matches exactly, and the deploy actually registered the function
(func azure functionapp list-functions $APP should show on_order).
Those three cover nearly every first-run failure.Further reading
- Triggers and bindings concepts — the official reference for the model, with the full matrix of which bindings exist per service.
- Hosting plans compared — Microsoft's own decision table, with the current limits per plan.
- Event-driven scaling — how the scale controller and target-based scaling decide instance counts.
- Durable orchestrator code constraints — the determinism rules, with examples of each violation and its fix.
- Durable Functions overview — the patterns (chaining, fan-out/fan-in, monitors, human interaction, entities) in one page.