+
+
+
+
+
+
+
+
Blog/Engineering

The Invisible Character That Broke Our S3 Uploads

MKMichael Kim

How a non-breaking space in an Outlook attachment filename triggered an S3 signing bug, caused hours of SES retry storms, and was fixed with a one-character regex change.

Essay
Engineering
s3
aws
ses
regex
+6

The Invisible Character That Broke Our S3 Uploads

TL;DR
  • Emails from Outlook were silently failing; no bounce, no complaint, just a "try again later" error for the sender.
  • The culprit was a non-breaking space (\u00A0) injected by Outlook into attachment filenames.
  • S3's SigV4 signing breaks on non-ASCII bytes in headers, a known but poorly documented bug.
  • SES retried the failed Lambda for 8 hours, flooding logs with hundreds of duplicate invocations.
  • A one-character regex fix (\xFF\x7E) resolved the entire cascade.

A user reported that emails sent to their AgentMail inbox from Outlook weren't showing up in list-threads.

The sender wasn't getting a bounce. No complaint notification either. Just a vague "could not be delivered at this time, please try later" error on the sender(Outlook) side.

Something strange was happening between our mail servers receiving the email and our Lambda processing it.


The Investigation

Dropping messages is a serious issue so I flew to the Cloudwatch logs to diagnose the issue.

The inbox clearly existed and we were receiving the email but our Lambda was throwing an error before the message could be stored. Specifically, S3 was rejecting the attachment upload with a SignatureDoesNotMatch error.

SignatureDoesNotMatch: The request signature we calculated does not match the signature you provided.

That's weird. Our S3 credentials were fine. Every other attachment upload was working.

I looked at all the raw .eml files for emails with this error and the first thing that caught my eye was the Message-ID. It was unusually long and split across multiple lines, and I wondered if message-ids that spanned multiple lines caused our parsing of header content to fail. I spent a decent amount of time investigating whether our MIME parser was choking on the folded header. Was a false alarm. Damn.

Then I looked at the attachment's Content-Disposition header more carefully. Specifically, the filename. It looked normal:

Content-Disposition: attachment; filename="Q1 Report.pdf"

But when I took a closer look at the hex values around the filename, I saw something interesting:

51 31 C2 A0 52 65 70 6F 72 74 2E 70 64 66
Q  1  ??  ?? R  e  p  o  r  t  .  p  d  f

0xC2 0xA0 is a UTF-8 encoded non-breaking space (U+00A0) sitting where a regular space (0x20) should be.

Outlook had silently injected a non-breaking space into the filename when the user forwarded the email. Not sure if this was an Outlook thing or word docs automatically inject these types of things.


The Problem

After some perusing on the internet, I found why everything was failing. S3 uses Signature Version 4 (SigV4) for request authentication. The signing process involves creating a canonical request string from the HTTP headers, hashing it, and comparing signatures.

When a header value contains non-ASCII bytes(like our 0xC2 0xA0 sitting inside the Content-Disposition header)the SDK and S3 compute different canonical strings. The SDK encodes it as is, S3's server-side signer fades the non-ASCII characters, thus creating a different signature. The signatures don't match, and the request is rejected.

This is a known but poorly documented behavior. S3 SigV4 effectively only works reliably with ASCII header values. Someone reported this exact issue on Stack Overflow back in 2018. Almost 8 years later, AWS still hasn't fixed it.

Didn't really anticipate needing to account for non-ASCII values in attachment names.


The Root Cause

We had a helper function that sanitizes filenames for use in Content-Disposition headers. Here's what it looked like before:

function contentDisposition(filename: string): string {
    // ASCII-safe: only allow printable ASCII in the quoted filename
    const asciiSafe = filename.replace(/[^\x20-\xFF]/g, '_')

    // RFC 5987 encoded filename for UTF-8 support
    const encoded = encodeRFC5987(filename)

    return `attachment; filename="${asciiSafe}"; filename*=UTF-8''${encoded}`
}

See the regex? [^\x20-\xFF], this allows any byte from 0x20 (space) through 0xFF. That range includes the Latin-1 supplement block, which contains \u00A0 (non-breaking space at 0xA0).

The regex was supposed to strip non-ASCII characters, but it was letting the entire Latin-1 range through. The non-breaking space passed the filter, ended up in the filename="..." quoted string, and went straight into the S3 request header.


The Fix

One character:

function contentDisposition(filename: string): string {
    // ASCII-safe: only allow printable ASCII in the quoted filename
    const asciiSafe = filename.replace(/[^\x20-\x7E]/g, '_')

    // RFC 5987 encoded filename for UTF-8 support
    const encoded = encodeRFC5987(filename)

    return `attachment; filename="${asciiSafe}"; filename*=UTF-8''${encoded}`
}

\xFF\x7E. That's it.

0x7E is the tilde (~), the last printable ASCII character. By capping the range at 0x7E instead of 0xFF, we now reject every non-ASCII byte, including \u00A0, accented characters, emoji, and anything else that could trip up SigV4 signing.


Why This Works

The Content-Disposition header we generate has two parts:

Content-Disposition: attachment; filename="Q1_Report.pdf"; filename*=UTF-8''Q1%C2%A0Report.pdf

The first filename="..." is the ASCII-safe fallback. After our fix, any non-ASCII byte gets replaced with an underscore. This is what S3 sees in the header, and it's now guaranteed to be pure ASCII.

The second filename*=UTF-8''... uses RFC 5987 encoding. The original filename is preserved with full Unicode support, but it's percent-encode, so even \u00A0 becomes %C2%A0, which is ASCII-safe. Modern browsers and email clients use this form to display the correct filename to users.

Both paths are now safe. The ASCII fallback is strict. The RFC 5987 form is permissive but encoded. S3 gets clean headers, users see correct filenames.


Conclusion

One invisible byte(a non-breaking space injected by Outlook during a forward)cascaded into S3 signing failures, hours of SES retries, hundreds of wasted Lambda invocations, and a user wondering why their emails were vanishing.

The fix was a single character in a regex range. \xFF\x7E. That's the kind of bug that makes you stare at the trash can for an hour and then crash out on the entire office when you finally find it.

These are the types of edge cases that you can only come across at scale, and we're super excited to continue documenting engineering incidents and questions we ask ourselves as we push more features in the future. A new theme we are exploring to document is how we think of how an agent would use our API, and optimizing for agents using our product over humans.

Ready to build? Start integrating AgentMail into your AI agents today.

All systems online

Email Inboxes for AI Agents

SOC2 Type II Certified

© 2026 AgentMail, Inc. All rights reserved.

Privacy PolicyTerms of ServiceSOC 2