My first attempt at sending a verification email from AWS Cognito was a full SMTP server of my own. A complete CDK stack, EC2 instance, Postfix configuration, the works. It ran locally. In the AWS cloud, not a single mail got through. Three traps later, the system was running in production. Here is the story, the architecture, and what I learned along the way.

The DACH onboarding problem

AWS Cognito ships with working email delivery out of the box. On sign-up, verification, and password reset, the service sends mail automatically. Sounds done. Until you see the first mail in your inbox.

Sender: no-reply@verificationemail.com. Language: English, hardcoded. Layout: spartan plain text with a code. Branding: none. For a SaaS product targeting the DACH region, that is a conversion killer. A Swiss user who just signed up for your compliance tool looks at an English mail from a domain they do not recognize and wonders whether it is phishing. The verification rate drops measurably.

The solution is in the AWS docs: a Custom Email Sender Lambda. Cognito triggers a Lambda function that you write, and you take over delivery entirely. Your own domain, your own language, your own template. Sounds straightforward.

It is not straightforward. On the way to a production-ready setup, three traps wait, and the official documentation politely skips two of them.

Trap 1: The SMTP server that was never allowed to send

The first idea when you need to send email from a Lambda is usually SMTP. Java has JavaMail, Node has Nodemailer, Python has smtplib. All proven, all documented, all working locally in two hours.

My reflex at the time: spin up a lean SMTP server in the cloud and have the Lambda send to it. Postfix on a small EC2 instance, everything behind a VPC, cleanly provisioned via CDK. I wrote the whole stack. Security Group, Elastic IP, Route 53 entry, Postfix configuration via User Data. Deployment finished. Lambda wired. Sign-up flow triggered.

Nothing arrived.

What I did not know: AWS blocks outbound port 25 by default, both on EC2 and on Lambda. It is a platform-wide anti-spam measure that has been in place for years. The official rationale: spam prevention and reputation protection for the AWS IP ranges. You can have the block lifted via AWS Support, but that is its own path with justification, domain proof, and waiting time. For a production system with low SMTP volume anyway, that is an unnecessary detour.

The lesson: SMTP from AWS is a dead end unless you have a business reason to run your own mail server. For transactional mail, the path is a different one.

Takeaway from trap 1 When you build a standard solution on AWS and it does not work, your code is usually not the problem. It is a platform default that the docs for the primary path do not mention, because the primary path is a different AWS service.

Trap 2: SES, but only via the API

The intended AWS path is Simple Email Service (SES). Fully managed, with domain verification, DKIM, bounce tracking, a sandbox limit to start, and a production request when you are ready. Solid. My next step was rewriting the Lambda to send via SES instead of Postfix.

SES offers two interfaces. First, SMTP over SES, a classic SMTP endpoint you can hit with JavaMail or Nodemailer. Second, the SES API, called directly via the AWS SDK with no SMTP protocol in between.

SMTP over SES was the obvious next step because the existing code from trap 1 would have been reusable. The catch: I host in eu-central-2 (Zurich). The region has been available since 2022, and for Swiss data protection requirements it is the right home. But SES in eu-central-2 is API-only. The SMTP endpoint exists in larger regions like eu-central-1 (Frankfurt) or us-east-1, but not in Zurich.

That pushed the decision back to the code level: SES API directly from the Lambda. Which turned out to be the better choice anyway. The API variant is faster (no SMTP handshake overhead), more debuggable (clean JSON responses instead of SMTP status codes), and needs no additional credentials beyond the Lambda’s IAM role. Once you have integrated SES via API, you do not want SMTP back.

One detail you should not skip with SES: every new SES account starts in sandbox mode. In the sandbox you can only send to verified recipient addresses, with a hard daily limit. For production you need to request Production Access via AWS Support, including a use case description, expected volume, and a bounce handling concept. The request usually goes through within a day when the concept is clean. It does not go through when you have nothing to say about bounce handling. Plan this in before the system goes live.

Mails arrived now. With our own domain, our own branding, a localized template. Looked like the system was done.

Trap 3: KMS encryption meets cold start

A week later, the first support tickets came in. Users reported receiving two verification codes, some even three. Both codes worked, but it looked chaotic and unprofessional. The Lambda logs showed the same sign-up request triggering multiple times, seconds apart.

Here comes the trap that almost no tutorial covers. Cognito does not hand the verification code to your Custom Email Sender Lambda in plain text. The code is encrypted with a KMS key that you configure when you set up the User Pool. That makes sense from a security perspective: the code must not end up in CloudWatch logs if someone inspects the Lambda logs.

Decryption needs more than a plain kms.Decrypt call because Cognito uses envelope encryption. You need the AWS Encryption SDK, a much heavier package with its own Materials Providers, Keyrings, and Cryptographic Materials Manager. On the JVM, this library brings a significant class footprint. During a Java Lambda cold start, all of that needs to load, initialize, and verify. In practice, cold starts with the Encryption SDK on JVM land in the multi-second range, depending on memory setting and region.

Cognito has its own retry logic for Custom Email Sender Lambdas. If the Lambda does not respond within a certain time window, Cognito calls it again. And again. Until it succeeds or the limit is reached. For a slow JVM Lambda with the Encryption SDK, that means the first invocation is still running, the second one starts, both send a mail, and the user receives two codes. In the worst case three, if the Lambda scales out in parallel.

From Cognito’s perspective, the retry logic is correct. Cognito cannot know whether the Lambda is just taking a while or whether it actually crashed and the mail will never go out. When in doubt, better one retry than a user without a code. The logic is built on the assumption that the Lambda answers in the usual range, under a second.

The obvious fix would be keeping the Lambda warm (Provisioned Concurrency, scheduled pings). Both work, both cost extra, and neither solves the initial cold start after a deployment, or the case where the Lambda scales out and new instances start cold. The more robust fix is to build the Lambda so it has no relevant cold start to begin with.

My stack for this: Quarkus Native. Java code compiled to a native binary with GraalVM. Cold start in the low three-digit milliseconds range, well under the Cognito retry window. The same principle holds for other native frameworks (Micronaut, Spring Native), or alternatively for Node.js and Python, which do not need JVM warm-up.

One detail deserves a mention: native compilation and the AWS Encryption SDK do not get along out of the box. The combination needs additional configuration for reflection and a handful of class initializations that tutorials rarely document. Anyone going down this path should expect some trial and error, especially during the first Native Image build. Once it runs, it runs fast and stable.

Idempotency as a second line of defense

Even with a fast Lambda, the retry case can still happen during peak load. A second line of defense is worthwhile: making the mail-sending logic idempotent. Concretely, that means checking via the Cognito trigger source and user identity whether a mail with the same code went out in the last few seconds. A small DynamoDB table with TTL is enough, or a marker on an existing user record. Low effort, high impact: duplicate mails are ruled out by the architecture, not just unlikely.

In practice, fast Lambda plus idempotency check is the robust position. Fast Lambda alone relies on Cognito never retrying, which is not guaranteed. Idempotency alone stops duplicate mails but every second invocation still runs fully and costs Lambda time. Both together is clean.

What the docs leave out The KMS encryption step for the verification code is described in the AWS Cognito docs as a detail, not as a central hurdle. Tutorials and blog posts almost always show this with JVM Lambdas and without a note on the cold start problem interacting with the Cognito retry logic. Anyone running this in production learns it through their support ticket volume.

Architecture overview

The components that end up running in production:

Architecture: Cognito triggers the Custom Email Sender Lambda with an encrypted verification code, the Lambda decrypts via KMS (AWS Encryption SDK) and sends via the SES API.
Cognito triggers a Lambda with an encrypted code, the Lambda decrypts via KMS and sends via the SES API.

The flow in detail. The user signs up through your app against Cognito (1). Cognito generates a verification code, encrypts it with the configured KMS key, and triggers the Custom Email Sender Lambda with the encrypted payload (2). The Lambda decrypts the code via the AWS Encryption SDK (3), builds the mail with the localized template, and calls SES.SendEmail (4). SES delivers the mail through the verified domain (5).

Four components, three of them AWS-managed, one is your own code. That one code share decides whether the system holds up in production.

Trade-offs: which runtime for the Lambda

The runtime choice is the only non-trivial architecture decision in the whole setup. A rough overview:

OptionCold startEncryption SDKRecommendation
Java JVMseveral snativeavoid
Java Nativeunder 500msreflection configrecommended
Node.jsunder 500msJS SDK availablerecommended
Pythonunder 500mspip packagerecommended

The table is a rule of thumb, not a benchmark. Actual cold start times depend on memory setting, region, package size, and VPC configuration. With an existing Java codebase and a Java team, native is the right call. Starting from a clean slate, Node.js or Python skip the reflection discussion entirely.

A real benchmark is on my list. I am working on a cold start benchmark that compares Quarkus Native, Java JVM, and Node.js under the same conditions: same memory setting, same region (eu-central-2), same workload with the Encryption SDK. Once the numbers are in, a separate post will follow with a reproducible setup and a public repository.

When you do not need this

If your product is English-only, US-centric, and has no branding requirements, the Cognito default is fine. If you only use it internally with no external recipients, the question is moot. The Custom Email Sender Lambda answers a concrete marketing and compliance need: localized, branded, privacy-compliant delivery from your own domain.

The bigger pattern

The story has a point that goes beyond Cognito. Three times in this project, my first reflex was to build something myself. An SMTP server. An SMTP client against SES. A fast JVM Lambda with elaborate warm-up. Three times the better solution was to use an AWS-managed service or apply an established framework correctly, instead of accumulating my own code.

That is not a Cognito-specific phenomenon. It is the recurring shape of cloud architecture. AWS offers several hundred services, and most exist because a concrete problem occurs at a scale that makes DIY uneconomic. Auth, mail delivery, encryption, queue processing. Building it yourself means building software that someone else has already built, and taking on the maintenance for it.

In an earlier post I made the same point for authentication: do not roll your own logins. Cognito plus Authorization@Edge plus API Gateway solve the problem in a fraction of the time, with significantly better security. Here is the equivalent for email delivery: Cognito plus SES plus KMS, with a lean Lambda as glue. The custom code share shrinks to what is genuinely product-specific. The templates, the localization, the branding decisions.

The rule of thumb that emerges: every AWS service is a question. Do I need this capability? If yes, the answer is almost always to use the service, not rebuild it. The exceptions can be counted on one hand, and they are usually regulatory, not technical.

References

  1. AWS Cognito · Custom Email Sender Lambda Trigger
  2. AWS Encryption SDK · Developer Guide
  3. AWS SES API Reference
  4. AWS Knowledge Center · Port 25 throttling and unblock request
  5. AWS SES · Service endpoints and regional availability
  6. Quarkus · Amazon Lambda Native Compilation Guide