Overview
MIME (Multipurpose Internet Mail Extensions) is the standard that allows email to carry formatted text, attachments, and multiple body parts. A typical MIME email looks deceptively simple as text but can be deeply nested:
multipart/mixed— top-level container for body + attachments-
multipart/alternative— contains text and HTML versions -
text/plain— plain text body (quoted-printable encoded) -
text/html— HTML body (base64 encoded) -
application/pdf— attachment (base64 encoded) -
image/png— inline image with Content-ID reference
Each part has its own headers, content transfer encoding, and character set. Walking this tree correctly, decoding each part, and normalizing the output to a consistent structure is what a MIME parser does. JsonHook uses a battle-tested MIME parsing library and handles all edge cases — malformed MIME, missing boundaries, non-standard encodings, and broken mail client output.
Prerequisites
If you are using JsonHook, you do not need to write a MIME parser — JsonHook handles it. This guide is useful for:
- Understanding what JsonHook does under the hood
- Diagnosing unexpected output for edge-case emails
- Writing your own MIME parser in cases where JsonHook is not applicable
If you want to experiment with MIME parsing directly, useful libraries include: mailparser (Node.js), email.parser (Python stdlib), mail gem (Ruby), and mime (Go).
Skip the MIME Parser — Get Clean JSON
JsonHook handles MIME parsing for every inbound email. Free up to 100 emails/month.
Get Free API KeyStep-by-Step Instructions
Understanding how MIME becomes JSON (what JsonHook does for you):
- Parse the RFC 5322 headers. Split the raw message into headers and body at the first blank line. Parse each header, handling folded headers (continuation lines starting with whitespace).
- Detect the Content-Type. If
multipart/*, extract the boundary parameter. Iftext/plainortext/html, the body is a single part. - Walk the multipart tree recursively. Split the body on the boundary marker. For each part, parse its headers and repeat from step 2. This handles arbitrarily nested multipart structures.
- Decode each body part. Apply the
Content-Transfer-Encoding: base64-decode or quoted-printable-decode. Then apply the charset conversion specified inContent-Type; charset=to produce a UTF-8 string. - Classify each leaf part.
text/plainbecomestextBody.text/htmlbecomeshtmlBody. Any part with aContent-Disposition: attachmentor non-text content type becomes an entry inattachments. - Serialize to JSON using a consistent schema.
Code Example
A minimal MIME-to-JSON parser using Node.js's mailparser library (what you would write if you were doing this yourself without JsonHook):
import { simpleParser, ParsedMail } from "mailparser";
async function mimeToJson(rawMime: string | Buffer) {
const parsed: ParsedMail = await simpleParser(rawMime);
return {
email: {
from: parsed.from?.text ?? null,
to: parsed.to
? (Array.isArray(parsed.to)
? parsed.to.map(a => a.text)
: [parsed.to.text])
: [],
subject: parsed.subject ?? null,
date: parsed.date?.toISOString() ?? null,
messageId: parsed.messageId ?? null,
textBody: parsed.text ?? null,
htmlBody: parsed.html || null,
headers: Object.fromEntries(
[...parsed.headers.entries()].map(([k, v]) => [
k.toLowerCase(),
Array.isArray(v) ? v.join("
") : String(v),
])
),
attachments: (parsed.attachments ?? []).map(a => ({
filename: a.filename ?? "unnamed",
contentType: a.contentType,
size: a.size,
contentId: a.cid ?? null,
})),
},
};
}
// With JsonHook, you never need to call this — it is done for you.
This is essentially what JsonHook does at scale for every inbound email — but as a managed service with retries, logging, and HMAC signatures included.
Common Pitfalls
If you are parsing MIME yourself (rather than using JsonHook), watch for:
- Incorrect boundary detection. Boundary markers in multipart messages must be preceded by
--and the closing boundary must end with--. Boundaries can also contain special characters that need no escaping in the MIME spec but may confuse naive string splitting. - Missing charset handling. If you skip charset conversion, non-ASCII characters in email bodies become garbage. Always respect the
charsetparameter of theContent-Typeheader. - Treating quoted-printable as base64. These are different encodings.
Content-Transfer-Encoding: quoted-printableuses=XXsequences for non-ASCII bytes; base64 encodes binary data into the A-Z/0-9/+/ character set. - Not handling degenerate messages. Real-world email frequently violates RFC standards. A robust parser must handle: missing boundary markers, missing Content-Type headers, text/plain bodies sent without any multipart wrapping, and extremely long header lines.
- Memory issues with large attachments. Base64-decoding a 10 MB attachment requires ~7.5 MB of binary data. Stream large attachments to disk or object storage rather than holding them in memory.