Email threads accumulate quoted replies that clutter the actual content. When processing emails programmatically, you need just the new message, not the entire conversation history.
Talon solves this problem by extracting clean reply content through sophisticated pattern matching and structural analysis.
Handles Gmail, Outlook, Apple Mail, Thunderbird HTML structures
93.8% success rate across 64 real-world test cases
Supports English, Japanese, Swedish, Polish, Dutch, German
1.92ms average processing time, 488 emails/second
Talon uses two complementary approaches depending on email format:
Recognizes patterns like:
>)Quotation Removal (Primary)
Talon has been tested on 64 real-world emails from various clients and languages.
Insight: For production systems, 1.92ms average is negligible. Even at worst case (21.55ms), Talon is faster than most network requests.
Talon failed 4 out of 64 test cases. Here’s what didn’t work:
Input:
Expected Output: First 5 lines only (up to Christopher Edwards)
Talon’s Output: Returns entire email including quoted text starting with “On Mon, Jun 3…” and all ”> quoted text”
Processing Time: 2.55ms
Issue: Signature placement before quotes confuses detection logic
Input:
Expected Output: Just the inline responses (I will reply under this one and and under this.)
Talon’s Output: Returns everything including “On Tue, Apr 29…” header and all quoted lines
Processing Time: 0.48ms
Issue: Interleaved inline responses not recognized as the reply pattern
Input:
Expected Output: Just testblah (before the forward marker)
Talon’s Output: Includes ”---------- Forwarded message ----------” and forwarded content
Processing Time: 3.41ms
Issue: HTML forward headers not removed by Gmail quote detection
Input:
Expected Output: Empty (no new content, just forward)
Talon’s Output: Includes ”-------- Forwarded Message --------” and forwarded content
Processing Time: 4.34ms
Issue: Thunderbird’s moz-forward-container class not recognized
Summary: 3 of 4 failures are forwarded messages. Regular replies work with 98%+ accuracy.
Input:
Talon’s Output: Awesome! I haven't had another problem with it.
Processing Time: 0.2ms
What Worked: Standard “On [date] [name] wrote:” pattern detected, quote marker (>) recognized
Input:
Talon’s Output: Outlook with a reply directly above line
Processing Time: 0.51ms
What Worked: Outlook separator line (underscores) and “From:”/“Sent:” headers detected as splitter
Input:
Talon’s Output: Reply
Processing Time: 4.02ms
What Worked: Outlook’s OLK_SRC_BODY_SECTION span ID detected and removed structurally
Tradeoff: Talon is more comprehensive but slower than plain-text-only libraries
For production systems, 1.92ms average is negligible. Even at worst case (21.55ms), Talon is faster than most network requests.
As shown in test results, forwarded messages (especially HTML) are challenging:
Always handle potential parsing failures:
Always test with your specific email formats:
Test with real emails from your users’ actual email clients. Talon’s accuracy is based on diverse real-world samples, but your specific use case may have unique patterns.
For TypeScript/JavaScript projects, use TalonJS - a JavaScript port of Talon with similar functionality.
TalonJS provides 90.6% accuracy with slightly faster performance (1.88ms), making it ideal for JavaScript/TypeScript environments without needing Python dependencies.
When to use TalonJS vs Python Talon: