Lambda Cold Start gemessen: Quarkus, Native, Node, SnapStart

01Eine Zahl reicht nicht

"Java auf Lambda ist langsam." Das stimmt. Die Frage ist nur: wie viel, wo, und ab wann nicht mehr.

Die meisten Cold-Start-Posts da draussen messen ein Hello-World mit 1 KB Payload und schreiben dann eine Schlagzeile. Das hilft niemandem, der vor der Wahl zwischen Quarkus JVM, Quarkus Native, Node.js und SnapStart steht. Also habe ich es selbst gemessen, sauber, mit klarer Methodik und ohne Marketing-Layer.

Was du in diesem Beitrag bekommst: harte Zahlen aus eu-central-2 (Zürich), arm64, vier Runtimes nebeneinander, plus zwei Befunde, die so noch nicht in den AWS-Blog-Posts standen.

02Setup, knapp

Wenig Variablen, viele Wiederholungen.

Dieselbe Workload in allen vier Runtimes: JSON rein, UUID validieren, SHA-256 über den Payload, in DynamoDB schreiben, lesen, JSON raus. Identisch in Quarkus 3.34 und Node 24. Die Java-Codebasis ist eine, der einzige Unterschied ist mvn package versus mvn package -Dnative.

Region: eu-central-2 (Zürich), arm64 (Graviton)
Runtimes: Quarkus JVM auf Java 25, Quarkus Native via GraalVM, Node.js 24, JVM mit SnapStart
Memory: 512, 1024, 1769 MB (1769 entspricht 1 vCPU)
Payloads: 1 KB, 100 KB, 1 MB
Iterationen: 50 Cold + 50 Warm pro Konfig (25 + 25 für SnapStart)
Cold Starts forcieren: update-function-configuration mit Nonce-Env-Var, dann wait function-updated
Mess-Quelle: REPORT-Zeile aus aws lambda invoke --log-type Tail, kein Warten auf CloudWatch-Ingestion
Code und Daten: vollstaendiges Repo unter github.com/k-i-soft/lambda-coldstart-bench, inklusive der Roh-CSVs aus allen Messlaeufen unter results/raw/

Klarstellung zur Terminologie "Quarkus JVM" und "Quarkus JVM mit SnapStart" laufen auf der JVM die AWS Lambda mitbringt, das ist Amazon Corretto JDK 25 (Lambda-Runtime java25). Quarkus selbst bringt keine JVM mit, es ist nur das Framework. "Quarkus Native" ist via GraalVM/Mandrel zu einem statischen Binary kompiliert, ohne JVM darunter, ausgefuehrt auf der Custom Runtime provided.al2023. Wer einen anderen JDK-Anbieter benutzen will (Temurin, Zulu, Liberica), muss sich ueber eine Custom Runtime selbst was bauen, das ist nicht Teil dieser Studie.

Bewusst nicht gemessen: VPC-Lambdas (anderes Cold-Start-Profil), Provisioned Concurrency (kein Cold Start), Lambda@Edge (anderer Stack). Ehrliche Caveats sind wichtiger als breite Aussagen.

Payload

03Init Duration: Memory ist (fast) egal

Die erste Überraschung kommt bei den reinen Initialisierungszeiten.

Wenn du erwartest, dass mehr Memory den Cold Start beschleunigt, weil Lambda CPU proportional zu Memory zuteilt: das stimmt nur für eine der vier Runtimes. Für die anderen drei ist die Init Duration bemerkenswert konstant.

Cold Start Init Duration p50 (ms), 1 KB Payload
Runtime	512 MB	1024 MB	1769 MB
Quarkus JVM	1109	1112	1125
JVM mit SnapStart	941	705	658
Quarkus Native	390	387	390
Node.js 24	320	316	319
Quarkus JVM	1117	1087	1115
JVM mit SnapStart	931	715	647
Quarkus Native	389	389	392
Node.js 24	316	317	317
Quarkus JVM	1102	1092	1106
JVM mit SnapStart	949	698	626
Quarkus Native	387	392	386
Node.js 24	319	321	321

Quarkus JVM braucht ~1100 ms, egal ob 512 oder 1769 MB. Native bleibt bei ~390 ms, Node bei ~320 ms. Mehr CPU hilft offenbar nicht beim Class-Loading und CDI-Bootstrap.

Der einzige Ausreisser ist SnapStart, dessen Restore Duration sehr wohl von Memory abhängt. Plausibel: Snapshot-Deserialize ist CPU-gebunden, mehr Memory bringt mehr Threads zum Auspacken.

Erste Lehre: Wenn dein Engpass wirklich Init ist, hilft mehr Memory nicht. Du musst die Runtime wechseln.

04Total Cold Duration: hier kippt das Bild

Init Duration ist nur die halbe Geschichte. Was der User erlebt, ist Init plus erste Ausführung.

Die JVM hat ein zweites Problem, das in der Init Duration nicht sichtbar ist: der JIT compiliert erst beim ersten echten Invoke. Diese Compile-Arbeit landet als Duration im REPORT, nicht als Init Duration. Und sie ist proportional zur verfügbaren CPU.

Cold Start Total Duration p50 (ms), 1 KB Payload (Init + erste Ausführung)
Runtime	512 MB	1024 MB	1769 MB
Quarkus JVM	5778	3320	2397
JVM mit SnapStart	10315	5463	3553
Quarkus Native	437	425	432
Node.js 24	541	430	407
Quarkus JVM	5835	3326	2399
JVM mit SnapStart	10775	5569	3535
Quarkus Native	446	432	435
Node.js 24	536	434	407
Quarkus JVM	5832	3375	2401
JVM mit SnapStart	11014	5797	3661
Quarkus Native	499	471	449
Node.js 24	639	464	422

Hier passiert was Wichtiges: Quarkus JVM bei 512 MB braucht 5.8 Sekunden bis zur ersten Antwort. Bei 1769 MB sind es noch 2.4 Sekunden. Mehr CPU lässt den JIT schneller arbeiten und drückt die Total-Cold-Latenz drastisch.

Native und Node sind in einer komplett anderen Liga, beide unter 600 ms, weil keine JIT-Compile-Phase nötig ist. Zwischen Native und Node entscheidet nur noch das Bisschen Bootstrap-Overhead. Native ist sogar leicht schneller bei Total Cold.

Der wichtigste Befund: Mehr Memory hilft nur indirekt, über den JIT, nicht über den Init. Wenn du mit JVM auf Lambda lebst und unter 1 Sekunde Cold Start brauchst, musst du ueber 1769 MB hinaus oder zu Native wechseln.

05Warm: alles wird gleich

Sobald die Container warm sind, verschwindet der Unterschied fast vollständig.

Warm Duration p50 (ms), 1 KB Payload
Runtime	512 MB	1024 MB	1769 MB
Quarkus JVM	23	14	15
JVM mit SnapStart	51	21	17
Quarkus Native	11	11	10
Node.js 24	10	11	10
Quarkus JVM	29	16	16
JVM mit SnapStart	57	27	20
Quarkus Native	11	13	12
Node.js 24	11	10	12
Quarkus JVM	99	53	39
JVM mit SnapStart	88	41	31
Quarkus Native	59	37	27
Node.js 24	36	20	24

Native und Node bei 10-11 ms, JVM bei 14-15 ms, SnapStart leicht dahinter. Bei realen Lasten mit gehaltenen Containern ist die Runtime-Wahl performance-technisch fast egal. Die ganze Cold-Start-Diskussion reduziert sich auf das Init-Profil und die ersten ein bis zwei Aufrufe.

Der praktische Schluss: Wenn deine Function genug Traffic hat um warm zu bleiben, ist auch JVM auf Lambda akzeptabel. Wenn sie sporadisch läuft (Webhooks, Cron-Trigger, niedrige Latenz-Anforderungen am ersten Request), brauchst du Native oder Node.

06SnapStart, der unerwartete Verlierer

AWS verkauft SnapStart als 10x schnelleren Cold Start für Java. Meine naive Out-of-the-Box-Messung sagt: ohne Priming wird es sogar langsamer als Standard-JVM.

Schau noch einmal auf Tabelle 04: SnapStart Total Cold ist bei 512 MB 10315 ms, fast doppelt so lang wie Standard-JVM mit 5778 ms. Auch bei 1024 und 1769 MB liegt SnapStart hinten.

Wie kann das sein? Restore Duration (Tabelle 03) ist klar besser als Init Duration, also was läuft schief?

Die Erklärung steckt in der Mechanik: Lambda nimmt den Snapshot direkt nach Quarkus' Init-Phase, bevor der JIT die Handler-Code-Pfade gesehen hat. Beim Restore springt Lambda direkt zur Handler-Ausführung. Da steht der JIT bei Null. Plus: AWS SDK Connections im Snapshot sind tote Sockets, der DynamoDB-Client muss komplett neu connecten.

Das Resultat: Restore ist schnell, der erste Invoke aber teurer als beim Standard-JVM-Cold-Start, weil dort der Init-Loop schon ein paar Pfade kompiliert hat.

Update: Priming gemessen, kein Unterschied Wir haben SnapStart auch mit CRaC-Priming gemessen (Handler implementiert org.crac.Resource, registriert sich beim Boot, ruft sich vor dem Snapshot einmal selbst auf). Cold Total bei 1024 MB / 1 KB: 5698 ms primed gegen 5463 ms unprimed. 4 Prozent Differenz, im Messrauschen. Priming hat in dieser Konfiguration nicht geholfen.

Warum? AWS SDK Connections im Snapshot sind nach Restore tot. Der erste Invoke nach Restore muss eine frische TCP-Connection zu DynamoDB aufbauen, einen TLS-Handshake durchziehen und das Endpoint resolven, unabhaengig davon ob Priming gelaufen ist. JIT-Compilation und Class-Loading, die der Snapshot tatsaechlich konserviert, traegen offenbar nicht genug zum First-Invoke-Cost bei. Die teuren Anteile (Netzwerk-Aufbau plus AWS-SDK-Connection-State) sind strukturell nicht primingbar.

Pragmatisch heisst das: SnapStart einschalten und Priming hinzufuegen reicht in unserer Konfiguration nicht. Auf einer DDB-Roundtrip-lastigen Workload kann SnapStart das Versprechen nicht halten. Das "10x schneller mit Priming" gilt fuer Workloads die ueberwiegend lokal arbeiten (CPU-bound, nicht IO-bound). Sobald externe Dienste im Hot-Path sind, bleibt Native der zuverlaessigere Weg zu sub-Sekunde Cold Starts.

07Was du daraus mitnimmst

Vier Empfehlungen, alle aus den Daten oben abgeleitet.

Cold-Start-kritisch und sporadisch: Quarkus Native oder Node. Sub-Sekunde Cold, kein JIT-Risiko, vergleichbarer Aufwand
Hoher Traffic, Container-warm: alle vier Runtimes sind brauchbar. Wähle nach Team-Skill und Ökosystem, nicht nach Cold-Start-Marketing
JVM auf Lambda: mindestens 1024 MB, eher 1769 MB. Bei 512 MB ist die JIT-Phase brutal
SnapStart: nur mit Priming. Naiv aktiviert macht es die Latenz schlechter, nicht besser

Und ein letzter Punkt: arm64 statt x86. Alle Messungen oben laufen auf Graviton. Du sparst Kosten und bekommst leicht bessere Werte. Es gibt 2026 keinen guten Grund mehr für x86-Lambda, ausser Legacy-Bibliotheken die nicht für arm64 gebaut sind.

08Caveats: was Priming schwierig macht

Priming klingt einfach. Sobald du es einsetzt, lernst du die Lifecycle-Details.

Was wir oben gemacht haben: der Handler implementiert org.crac.Resource, registriert sich beim CRaC-Core und ruft sich in beforeCheckpoint einmal mit einer Dummy-Payload auf. Lambda triggert das vor der Snapshot-Erstellung, JIT und SDK-Connections landen im Snapshot, beim Restore springt der erste Aufruf direkt in den warmen Code.

Klingt sauber. Ist an mehreren Stellen empfindlich:

Echter Aufruf, echte Seiteneffekte. Unser Priming schreibt einen Dummy-Eintrag mit ID 00000000-0000-0000-0000-000000000000 in DynamoDB. Pro publishter Version landet so ein Item in der Tabelle. Bei einem ernsthaften System musst du entweder einen separaten Priming-Pfad bauen, der NICHT in den echten Datentopf schreibt, oder die Eintraege spaeter wegfiltern
Connections im Snapshot sind tot nach Restore. TLS-Sessions, Sockets, HTTP-Keepalives sind bei Snapshot-Erstellung valide, beim Restore Stunden spaeter haengt da nichts mehr dran. Das AWS SDK v2 reconnectet meist lazy, aber nicht immer sauber. Klassische JDBC-Treiber ueber TCP brauchen oft explizites Reconnect
SecureRandom muss reseeded werden. Wenn der Snapshot einen geseededen SecureRandom enthaelt, generieren alle Restored-Instanzen die gleiche Sequenz. UUIDs kollidieren, JWT-IDs kollidieren, Sessions kollidieren. Lambda kuemmert sich um den Standard-SecureRandom, aber nur um den. Eigene RNGs musst du in afterRestore selbst neu seeden
Cache-Inhalte werden alt. In-Memory-Caches im Snapshot sind nach Restore noch da, aber ihre Daten sind ggf. veraltet. Was beim normalen Cold Start nicht passiert (leerer Cache nach Init) wird mit SnapStart zu einem Wartungs-Thema
Priming-Exception blockiert den Snapshot. Jede unhandled Exception in beforeCheckpoint schlaegt bis zur Snapshot-Erstellung durch und laesst sie scheitern. Wir fangen alles ab (try { ... } catch (Exception ignored) {}), das ist pragmatisch, maskiert aber echte Probleme. In Produktion willst du wenigstens loggen und alarmieren
Deploy-Zeit steigt. Jeder publish-version durchlaeuft jetzt Init plus Priming plus Snapshot. Statt 5-10 Sekunden zaehlst du 20-40. Bei Canary-Releases oder Blue-Green-Switches summiert sich das ueber alle Functions
Test-Coverage-Luecke. Dein Priming-Code laeuft in einem anderen Lifecycle als dein Handler-Code, und in den meisten Test-Setups laeuft er gar nicht. Wenn du den Handler refactoring machst und dabei eine Klasse umbenennst, faellt die Aenderung in beforeCheckpoint vielleicht erst beim ersten Cold Restore in Produktion auf

Pragmatisch heisst das: Priming ist kein "an/aus-Schalter", sondern eine bewusste Architektur-Entscheidung mit eigener Wartungslast. Wer das nicht stemmen will, faehrt mit Quarkus Native oder Node ruhiger und ohne Ueberraschungen beim ersten Restore.

09Caveats: Native Builds und die Reflection-Falle

Native sieht in den Tabellen wie der klare Sieger aus. Die Hausaufgaben dafuer macht der Build, nicht die Runtime.

GraalVM native-image ist Ahead-of-Time-Compilation: dein Code, deine Libraries, die Runtime, alles wird zur Build-Zeit zu einem statischen Binary kompiliert. Der grosse Vorteil ist Startup unter 100 ms (im Quarkus-Idealfall) und tiefer Memory-Footprint. Der Preis ist die "Closed-World-Assumption": alles was zur Laufzeit ausgefuehrt werden kann, muss zur Build-Zeit sichtbar sein. Was der Compiler nicht sieht, ist nicht im Image. Punkt.

Was das praktisch bedeutet:

Reflection ist die Hauptfalle. Klassen die nur via Class.forName(), getDeclaredField() oder aehnlichen reflexiven APIs erreicht werden, fehlen standardmaessig im Image. Der Build crasht nicht, das Binary laeuft, aber zur Laufzeit kommt eine ClassNotFoundException oder NoSuchFieldException, oft tief in einer Library, oft erst auf einem produktiven Code-Pfad den deine Tests nicht abgedeckt haben
Gson ist das klassische Beispiel. Gson serialisiert und deserialisiert ueber Reflection auf Felder. Ohne Reflection-Hints findet es deine Klassen nicht und gibt entweder leere Objekte zurueck oder crasht zur Laufzeit. Workaround: @RegisterForReflection auf jede Bean die Gson anfasst, oder zu Jackson wechseln, weil die Quarkus-Jackson-Extension die Reflection-Configs gleich mitliefert
Jackson, Hibernate, JAX-RS, Quarkus-Eigenes funktionieren problemlos, weil die Extensions die Reflection-Configs zur Build-Zeit generieren. Sobald du eine Library nutzt, fuer die keine Quarkus-Extension existiert (zB. eine Nischen-Crypto-Lib, ein Custom-XML-Parser, ein hauseigenes ORM), bist du selbst dran. Das ist die Stelle wo Native-Migrationen typischerweise wochenlang haengen
Dynamic Proxies, Resources, Java-Serialization brauchen jeweils eigene Konfiguration. Resources (application.properties ist standardmaessig dabei, dein config.json ist es nicht) registrierst du via -H:IncludeResources oder die Quarkus-Aequivalente. Java-Serialisierung erfordert eine explizite Liste serialisierbarer Klassen, sonst Klassen-not-found zur Laufzeit
JNI-Bibliotheken sind problematisch. Wer Dependencies mit native-glibc-Bindings hat (manche Crypto-Libraries, einige XML-Parser, alte File-IO-Wrapper, JNI-basierte DB-Treiber), bekommt entweder Linker-Errors zur Build-Zeit oder Laufzeit-Crashes. Manche Libraries sind schlicht nicht native-image-kompatibel
Tracing Agent als Erste Hilfe. GraalVM hat einen Tracing Agent der waehrend eines normalen JVM-Laufs aufzeichnet, welche Reflection und welche Resources tatsaechlich benutzt werden. Output ist eine reflect-config.json die du in den Build steckst. Funktioniert gut, aber nur fuer Code-Pfade die deine Tests tatsaechlich abdecken. Was die Tests nicht durchlaufen, fehlt im Image. Coverage wird damit ploetzlich auch eine Build-Sicherheits-Frage

Plus die Build-Realitaet, die mit der Runtime nichts mehr zu tun hat:

Build-Zeit: 5-15 Minuten fuer Quarkus Native, statt 30 Sekunden fuer JVM. In der CI planst du eine eigene Pipeline-Stage dafuer ein, sonst blockiert sie alle Pull Requests
Build-RAM: native-image braucht 8-12 GB RAM zur Compile-Zeit. CI-Runner mit 4 GB werfen den Build mit OOM raus
Container-Build: cross-compile (linux/arm64-Binary von macOS aus) erfordert Docker oder Podman mit dem passenden Builder-Image. quarkus.native.container-build=true macht das transparent, kostet aber den Image-Pull beim ersten Lauf (~1.5 GB Mandrel-Image)
Debugging ist anders: kein JMX, kein jmap, kein jstack, kein Live-Profiling mit den ueblichen Tools. Stack-Traces sind kuerzer, weil viele Methoden inlined sind. -g hilft, macht das Binary aber groesser
Peak Throughput: AOT-Optimierungen sind statisch. Der JIT in einer warmgelaufenen JVM kann adaptiv optimieren und schlaegt das Native-Binary bei langlebigen Workloads oft um 10-30 Prozent. Auf Lambda mit kurzen Sessions ist das egal, fuer langlaufende Services kann es relevant sein
Versions-Lock-In: das Native-Binary wird gegen eine bestimmte GraalVM-JDK-Version und ein bestimmtes glibc/Linux-Image gebaut. Wechsel der Lambda-Runtime (zB Amazon Linux 2 zu 2023) erfordert Rebuild, sonst kommen Linker-Errors. Das ist eine Bindung die du in einem Multi-Service-Stack im Auge behalten musst

Praktischer Fluss bei mir: JVM-Mode in Dev und CI-Tests (schneller Build-Cycle, voller Debug-Komfort, schnelles Feedback), Native nur fuer den finalen Build der nach Lambda geht. Quarkus macht das einfach durch das gleiche Maven-Profil, derselbe Code-Stand wird beidseitig validiert. Wer das umkehrt und ausschliesslich nativ baut, verliert die schnelle Iterationsschleife und entdeckt Reflection-Probleme erst in der CI.

Was Native dafuer strukturell nicht hat: Snapshot-Restore-Probleme. Es gibt keinen Snapshot, jeder Cold Start initialisiert Connections frisch, es gibt keinen toten Pool im Speicher. Was bei SnapStart als "stale state nach Restore" zur Buerde wird (siehe Section 06), ist bei Native nicht relevant. Auf IO-lastigen Workloads (DDB, RDS, andere AWS-Dienste im Hot-Path) macht das den Unterschied zu SnapStart aus, der durch Priming nicht aufgehoben werden kann.

10Vergleich auf einen Blick

Drei Runtimes, zehn Aspekte. Welche Zeile den Ausschlag gibt, haengt an deinem Workload.

Aspekt	Quarkus JVM	JVM mit SnapStart	Quarkus Native
Build-Zeit	~30 s	~30 s	5-15 min
Build-Setup	JDK	JDK	8-12 GB RAM, Mandrel-Container
Handler-Aenderung	keine	CRaC-Hook plus Priming-Pfad	ggf. `@RegisterForReflection`
Cold Init/Restore p50 (1024 MB)	~1100 ms	~700 ms	~390 ms
Cold Total p50 (1024 MB, 1 KB)	3320 ms	5463 ms*	425 ms
Warm p50 (1024 MB, 1 KB)	14 ms	21 ms	11 ms
Reflection	unproblematisch	unproblematisch	Hints zwingend
Library-Kompatibilitaet	volle JVM	volle JVM plus CRaC-Bewusstsein	nur native-kompatibel
Debugging	JMX, jstack, JFR	JMX, JFR	stark eingeschraenkt
Deploy pro Version	~5 s	~20-40 s (Snapshot)	~5 s
Lambda-Runtime	`java25`	`java25` mit SnapStart	`provided.al2023`
Wartungslast	tief	mittel (Priming-Lifecycle)	mittel (Reflection, Build-Pipeline)
* Wert aus der Messung ohne CRaC-Priming. Eine zweite Messreihe mit aktiviertem Priming ergab 5698 ms (1024 MB / 1 KB), also +4 Prozent im Messrauschen. Priming hat in dieser Konfiguration nicht geholfen, Erklaerung in Section 06.

Wenn du nur eine Zeile vergleichst: Cold Total bei 1024 MB. Das ist die Zeit die dein User zwischen Klick und Antwort bei einer kalten Funktion sieht. Native gewinnt um Faktor 8 gegen JVM und um Faktor 13 gegen SnapStart. CRaC-Priming hat den SnapStart-Wert nicht messbar verbessert, weil Connection-State nach Restore neu aufgebaut werden muss und das der dominante Anteil am First-Invoke-Cost ist.

Cold Starts sind kein binäres Problem. Sie sind ein dreidimensionales: Runtime, Memory, Workload. Wer nur eine Achse misst, misst zu wenig. Wer SnapStart oder Native naiv einschaltet, misst die falsche.

Code, Methodik und Roh-Daten: github.com/k-i-soft/lambda-coldstart-bench

AWS & Performance Engineering

Lambda-Architektur, die nicht im Cold Start steckenbleibt.

Ich helfe Teams, die richtige Runtime-Wahl für ihre Workload zu treffen, mit Messungen statt Bauchgefühl. Quarkus Native auf Graviton ist mein Spielfeld.

Gespräch buchen →

01One number is not enough

"Java on Lambda is slow." True. The question is by how much, where, and from when on it stops being true.

Most cold-start posts out there benchmark a Hello World with a 1 KB payload and slap a headline on top. That helps nobody facing a real choice between Quarkus JVM, Quarkus Native, Node.js, and SnapStart. So I measured it myself, cleanly, with clear methodology and no marketing layer.

What you get in this post: hard numbers from eu-central-2 (Zurich), arm64, four runtimes side by side, plus two findings that haven't been in the AWS blog posts so far.

02Setup, briefly

Few variables, many repetitions.

Same workload in all four runtimes: JSON in, validate UUID, SHA-256 over the payload, write to DynamoDB, read back, JSON out. Identical in Quarkus 3.34 and Node 24. The Java codebase is a single source; the only difference is mvn package versus mvn package -Dnative.

Region: eu-central-2 (Zurich), arm64 (Graviton)
Runtimes: Quarkus JVM on Java 25, Quarkus Native via GraalVM, Node.js 24, JVM with SnapStart
Memory: 512, 1024, 1769 MB (1769 equals 1 vCPU)
Payloads: 1 KB, 100 KB, 1 MB
Iterations: 50 cold + 50 warm per config (25 + 25 for SnapStart)
Forcing cold starts: update-function-configuration with a nonce env var, then wait function-updated
Measurement source: REPORT line from aws lambda invoke --log-type Tail, no waiting for CloudWatch ingestion
Code and data: full repository at github.com/k-i-soft/lambda-coldstart-bench, including the raw CSVs from every run under results/raw/

Terminology note "Quarkus JVM" and "Quarkus JVM with SnapStart" run on the JVM that AWS Lambda ships, which is Amazon Corretto JDK 25 (Lambda runtime java25). Quarkus itself doesn't bundle a JVM, it is just the framework. "Quarkus Native" is compiled via GraalVM/Mandrel into a static binary, with no JVM underneath, running on the custom runtime provided.al2023. If you want a different JDK vendor (Temurin, Zulu, Liberica) you have to roll your own custom runtime, which is out of scope for this study.

Deliberately not measured: VPC Lambdas (different cold-start profile), Provisioned Concurrency (no cold start to measure), Lambda@Edge (different stack). Honest caveats matter more than broad claims.

Payload

03Init Duration: memory is (almost) irrelevant

First surprise comes from the pure initialization times.

If you expect more memory to speed up cold start because Lambda allocates CPU proportionally to memory, that's only true for one of the four runtimes. For the other three, init duration is remarkably flat.

Cold start init duration p50 (ms), 1 KB payload
Runtime	512 MB	1024 MB	1769 MB
Quarkus JVM	1109	1112	1125
JVM with SnapStart	941	705	658
Quarkus Native	390	387	390
Node.js 24	320	316	319
Quarkus JVM	1117	1087	1115
JVM with SnapStart	931	715	647
Quarkus Native	389	389	392
Node.js 24	316	317	317
Quarkus JVM	1102	1092	1106
JVM with SnapStart	949	698	626
Quarkus Native	387	392	386
Node.js 24	319	321	321

Quarkus JVM needs ~1100 ms regardless of 512 or 1769 MB. Native stays around ~390 ms, Node around ~320 ms. More CPU does not help with class loading or CDI bootstrap.

The only exception is SnapStart, whose Restore Duration does scale with memory. Plausible reason: snapshot deserialization is CPU-bound, more memory means more threads doing the unpacking.

First takeaway: if your bottleneck is genuinely init, more memory will not save you. You have to switch runtime.

04Total cold duration: this is where it tilts

Init Duration is only half the story. What the user actually feels is init plus first execution.

The JVM has a second problem that init duration doesn't show: the JIT only compiles on the first real invocation. That compile work shows up as Duration in REPORT, not Init Duration. And it scales with available CPU.

Cold start total duration p50 (ms), 1 KB payload (init + first execution)
Runtime	512 MB	1024 MB	1769 MB
Quarkus JVM	5778	3320	2397
JVM with SnapStart	10315	5463	3553
Quarkus Native	437	425	432
Node.js 24	541	430	407
Quarkus JVM	5835	3326	2399
JVM with SnapStart	10775	5569	3535
Quarkus Native	446	432	435
Node.js 24	536	434	407
Quarkus JVM	5832	3375	2401
JVM with SnapStart	11014	5797	3661
Quarkus Native	499	471	449
Node.js 24	639	464	422

Here something important happens: Quarkus JVM at 512 MB needs 5.8 seconds to first response. At 1769 MB it's still 2.4 seconds. More CPU lets the JIT work faster and pulls total cold latency down sharply.

Native and Node play in a completely different league, both under 600 ms, because there's no JIT compile phase. Between Native and Node, only the bit of bootstrap overhead remains. Native is even slightly faster on total cold.

The key finding: more memory only helps indirectly, through the JIT, not through init. If you live with JVM on Lambda and need sub-second cold starts, you have to go beyond 1769 MB or switch to native.

05Warm: everything converges

Once containers are warm, the difference almost disappears.

Warm duration p50 (ms), 1 KB payload
Runtime	512 MB	1024 MB	1769 MB
Quarkus JVM	23	14	15
JVM with SnapStart	51	21	17
Quarkus Native	11	11	10
Node.js 24	10	11	10
Quarkus JVM	29	16	16
JVM with SnapStart	57	27	20
Quarkus Native	11	13	12
Node.js 24	11	10	12
Quarkus JVM	99	53	39
JVM with SnapStart	88	41	31
Quarkus Native	59	37	27
Node.js 24	36	20	24

Native and Node at 10-11 ms, JVM at 14-15 ms, SnapStart slightly behind. With warm containers, runtime choice is almost performance-irrelevant. The whole cold-start debate reduces to the init profile and the first one or two invocations.

Practical conclusion: if your function gets enough traffic to stay warm, JVM on Lambda is fine. If it runs sporadically (webhooks, cron triggers, low-latency requirements on the first request), you need Native or Node.

06SnapStart, the unexpected loser

AWS markets SnapStart as 10x faster cold starts for Java. My naive out-of-the-box measurement says: without priming, it's actually slower than standard JVM.

Look at table 04 again: SnapStart total cold at 512 MB is 10315 ms, almost twice as long as standard JVM at 5778 ms. At 1024 and 1769 MB, SnapStart is also behind.

How can that be? Restore Duration (table 03) is clearly better than Init Duration, so what's going wrong?

The mechanic explains it: Lambda takes the snapshot right after Quarkus' init phase, before the JIT has seen any handler code paths. On restore, Lambda jumps straight into handler execution. The JIT is at zero. Plus: AWS SDK connections inside the snapshot are dead sockets, the DynamoDB client has to reconnect from scratch.

The result: restore is fast, but the first invocation costs more than a standard JVM cold start, because the standard cold start at least had its init loop compile a few paths.

Update: priming measured, no difference We did measure SnapStart with CRaC priming as well (handler implements org.crac.Resource, registers itself at boot, calls itself once before the snapshot). Cold total at 1024 MB / 1 KB: 5698 ms primed vs 5463 ms unprimed. 4 percent delta, within measurement noise. Priming did not help in this configuration.

Why? AWS SDK connections in the snapshot are dead after restore. The first invoke after restore has to build a fresh TCP connection to DynamoDB, do a TLS handshake, and resolve the endpoint, regardless of whether priming ran. JIT compilation and class loading, which the snapshot does preserve, evidently do not contribute enough to first-invoke cost. The expensive parts (network setup plus AWS SDK connection state) are structurally not primable.

Pragmatically: flipping SnapStart on and adding priming is not enough in our setup. On a DDB-roundtrip-heavy workload SnapStart cannot deliver on the promise. The "10x faster with priming" claim applies to workloads that are mostly local (CPU-bound, not IO-bound). Once external services sit in the hot path, native remains the more reliable path to sub-second cold starts.

07What to take away

Four recommendations, all derived from the data above.

Cold-start critical and sporadic: Quarkus Native or Node. Sub-second cold, no JIT risk, comparable engineering effort
High traffic, container stays warm: any of the four runtimes is fine. Pick by team skill and ecosystem, not by cold-start marketing
JVM on Lambda: at least 1024 MB, ideally 1769 MB. At 512 MB the JIT phase is brutal
SnapStart: only with priming. Turn it on naively and it makes latency worse, not better

And one final note: arm64 over x86. All measurements above run on Graviton. You save money and get slightly better numbers. In 2026 there's no good reason left for x86 Lambda, except for legacy libraries not built for arm64.

08Caveats: what makes priming hard

Priming sounds easy. The moment you actually use it, you learn the lifecycle details.

What we did above: the handler implements org.crac.Resource, registers with CRaC Core, and calls itself in beforeCheckpoint with a dummy payload. Lambda triggers that before snapshot creation, JIT and SDK connections end up inside the snapshot, the first invoke after restore jumps straight into warm code.

Sounds clean. It's actually delicate in several spots:

Real call, real side effects. Our priming writes a dummy item with id 00000000-0000-0000-0000-000000000000 to DynamoDB. One row per published version lands in the table. For a real system you either build a separate priming path that does not hit your real data, or filter the dummies out later
Connections in the snapshot are dead after restore. TLS sessions, sockets, HTTP keepalives are valid at snapshot time. Hours later at restore, nothing on the other end is listening anymore. The AWS SDK v2 mostly reconnects lazily, but not always cleanly. Classic JDBC drivers over TCP usually need explicit reconnect
SecureRandom must be reseeded. If the snapshot contains a seeded SecureRandom, all restored instances generate the same sequence. UUIDs collide, JWT IDs collide, sessions collide. Lambda handles this for the default SecureRandom, but only that one. Your own RNGs must be reseeded in afterRestore
Cache contents go stale. In-memory caches in the snapshot are still there after restore, but their data may be outdated. Something a normal cold start avoids (empty cache after init) becomes a real problem with SnapStart
Priming exceptions block the snapshot. Any unhandled exception in beforeCheckpoint propagates and fails snapshot creation. We swallow everything (try { ... } catch (Exception ignored) {}), which is pragmatic but masks real problems. In production you at least want to log and alert
Deploy time increases. Every publish-version now goes through init plus priming plus snapshot. Instead of 5-10 seconds you count 20-40. With canary releases or blue-green switches that adds up across all functions
Test coverage gap. Priming code runs in a different lifecycle than handler code, and in most test setups it does not run at all. Refactor the handler, rename a class, and the change in beforeCheckpoint may only show up on the first cold restore in production

Pragmatically: priming is not an on-off switch. It is a deliberate architecture decision with its own maintenance cost. If you cannot carry that, Quarkus Native or Node is calmer and free of restore-time surprises.

09Caveats: native builds and the reflection trap

In the cold-start tables, native looks like the clear winner. The build does the homework that makes it possible, not the runtime.

GraalVM native-image is ahead-of-time compilation: your code, your libraries, the runtime, all compiled to a static binary at build time. The big upside is startup under 100 ms (in the Quarkus best case) and a low memory footprint. The price is the "closed-world assumption": everything that can run at runtime must be visible at build time. Whatever the compiler does not see is not in the image. Period.

What that means in practice:

Reflection is the main trap. Classes only reached via Class.forName(), getDeclaredField(), or similar reflective APIs are missing from the image by default. The build does not crash, the binary runs, but at runtime you get a ClassNotFoundException or NoSuchFieldException, often deep inside some library, often only on a production code path your tests never covered
Gson is the classic example. Gson serializes and deserializes via reflection on fields. Without reflection hints it cannot find your classes and either returns empty objects or crashes at runtime. Workaround: @RegisterForReflection on every bean Gson touches, or switch to Jackson, because the Quarkus Jackson extension ships the reflection configs
Jackson, Hibernate, JAX-RS, Quarkus-native code all work fine because the respective extensions generate the reflection configs at build time. The moment you use a library for which no Quarkus extension exists (a niche crypto lib, a custom XML parser, an in-house ORM), you are on your own. That's the spot where native migrations typically get stuck for weeks
Dynamic proxies, resources, Java serialization all need their own configuration. Resources (application.properties is in by default, your config.json is not) you register via -H:IncludeResources or the Quarkus equivalents. Java serialization needs an explicit list of serializable classes, otherwise class-not-found at runtime
JNI libraries are problematic. Dependencies with native-glibc bindings (some crypto libs, certain XML parsers, old file-IO wrappers, JNI-based DB drivers) get you linker errors at build time or runtime crashes. Some libraries simply are not native-image-compatible
Tracing Agent as first aid. GraalVM ships a tracing agent that records, during a normal JVM run, which reflection and which resources are actually used. Output is a reflect-config.json you feed into the build. Works well, but only for code paths your tests actually exercise. Anything tests don't run won't be in the image. Coverage suddenly becomes a build-safety question, not just a quality question

Plus the build reality, which has nothing to do with runtime:

Build time: 5-15 minutes for Quarkus native, versus 30 seconds for JVM. In CI you plan a separate pipeline stage for it, otherwise it blocks every pull request
Build RAM: native-image needs 8-12 GB RAM at compile time. CI runners with 4 GB OOM the build
Container build: cross-compile (linux/arm64 binary from macOS) requires Docker or Podman with the right builder image. quarkus.native.container-build=true makes it transparent but costs you the image pull on first run (~1.5 GB Mandrel image)
Debugging is different: no JMX, no jmap, no jstack, no live profiling with the usual tools. Stack traces are shorter because many methods got inlined. -g helps at the cost of a bigger binary
Peak throughput: AOT optimizations are static. The JIT in a warmed-up JVM can adaptively optimize and beats the native binary on long-lived workloads, often by 10-30 percent. On Lambda with short sessions that's irrelevant, for long-running services it can matter
Version lock-in: the native binary is built against a specific GraalVM JDK version and a specific glibc/Linux image. A Lambda runtime switch (e.g. Amazon Linux 2 to 2023) requires a rebuild, otherwise you hit linker errors. That's a binding you have to track in a multi-service stack

My practical flow: JVM mode in dev and CI tests (fast build cycle, full debug comfort, fast feedback), native only for the final build that ships to Lambda. Quarkus makes that simple via the same Maven profile, the same code state is validated both sides. Reverse it and build native exclusively, and you lose the fast iteration loop and only catch reflection problems in CI.

What native structurally does not have: snapshot/restore issues. There is no snapshot. Every cold start initializes connections fresh. There is no dead pool sitting in memory. What is a burden for SnapStart ("stale state after restore", see Section 06) simply does not apply to native. On IO-heavy workloads (DDB, RDS, other AWS services in the hot path) that distinction is decisive, and priming cannot bridge it.

10Side-by-side comparison

Three runtimes, ten aspects. Which row is the deciding one depends on your workload.

Aspect	Quarkus JVM	JVM with SnapStart	Quarkus Native
Build time	~30 s	~30 s	5-15 min
Build setup	JDK	JDK	8-12 GB RAM, Mandrel container
Handler change	none	CRaC hook plus priming path	maybe `@RegisterForReflection`
Cold init/restore p50 (1024 MB)	~1100 ms	~700 ms	~390 ms
Cold total p50 (1024 MB, 1 KB)	3320 ms	5463 ms*	425 ms
Warm p50 (1024 MB, 1 KB)	14 ms	21 ms	11 ms
Reflection	fine	fine	hints required
Library compatibility	full JVM	full JVM plus CRaC awareness	native-compatible only
Debugging	JMX, jstack, JFR	JMX, JFR	heavily constrained
Deploy per version	~5 s	~20-40 s (snapshot)	~5 s
Lambda runtime	`java25`	`java25` with SnapStart	`provided.al2023`
Maintenance burden	low	medium (priming lifecycle)	medium (reflection, build pipeline)
* Value from the measurement without CRaC priming. A second run with priming enabled gave 5698 ms (1024 MB / 1 KB), so +4 percent within measurement noise. Priming did not help in this configuration, see Section 06 for the why.

If you compare just one row: Cold total at 1024 MB. That's the time your user sees between clicking and the response on a cold function. Native wins by a factor of 8 over JVM and a factor of 13 over SnapStart. CRaC priming did not measurably improve the SnapStart number, because connection state has to be rebuilt after restore, and that is the dominant first-invoke cost.

Cold starts are not a binary problem. They are three-dimensional: runtime, memory, workload. Anyone who measures only one axis measures too little. Anyone who flips on SnapStart or native naively is measuring the wrong one.

Code, methodology and raw data: github.com/k-i-soft/lambda-coldstart-bench

AWS & Performance Engineering

Lambda architecture that doesn't get stuck on cold start.

I help teams pick the right runtime for their workload, by measurement, not gut feeling. Quarkus Native on Graviton is my home turf.

Book a call →

Lambda Cold Start gemessen.

Lambda cold start, measured.

01Eine Zahl reicht nicht

02Setup, knapp

03Init Duration: Memory ist (fast) egal

04Total Cold Duration: hier kippt das Bild

05Warm: alles wird gleich

06SnapStart, der unerwartete Verlierer

07Was du daraus mitnimmst

08Caveats: was Priming schwierig macht

09Caveats: Native Builds und die Reflection-Falle

10Vergleich auf einen Blick

Lambda-Architektur, die nicht im Cold Start steckenbleibt.

01One number is not enough

02Setup, briefly

03Init Duration: memory is (almost) irrelevant

04Total cold duration: this is where it tilts

05Warm: everything converges

06SnapStart, the unexpected loser

07What to take away

08Caveats: what makes priming hard

09Caveats: native builds and the reflection trap

10Side-by-side comparison

Lambda architecture that doesn't get stuck on cold start.