How to Parse MIME Email to JSON

MIME parsing is complex — nested multipart structures, multiple encoding schemes, and inconsistent client behavior. JsonHook handles all of it and delivers clean JSON to your endpoint.

Table of Contents
  1. Overview
  2. Prerequisites
  3. Step-by-Step Instructions
  4. Code Example
  5. Common Pitfalls

Overview

MIME (Multipurpose Internet Mail Extensions) is the standard that allows email to carry formatted text, attachments, and multiple body parts. A typical MIME email looks deceptively simple as text but can be deeply nested:

  • multipart/mixed — top-level container for body + attachments
  •   multipart/alternative — contains text and HTML versions
  •     text/plain — plain text body (quoted-printable encoded)
  •     text/html — HTML body (base64 encoded)
  •   application/pdf — attachment (base64 encoded)
  •   image/png — inline image with Content-ID reference

Each part has its own headers, content transfer encoding, and character set. Walking this tree correctly, decoding each part, and normalizing the output to a consistent structure is what a MIME parser does. JsonHook uses a battle-tested MIME parsing library and handles all edge cases — malformed MIME, missing boundaries, non-standard encodings, and broken mail client output.

Prerequisites

If you are using JsonHook, you do not need to write a MIME parser — JsonHook handles it. This guide is useful for:

  • Understanding what JsonHook does under the hood
  • Diagnosing unexpected output for edge-case emails
  • Writing your own MIME parser in cases where JsonHook is not applicable

If you want to experiment with MIME parsing directly, useful libraries include: mailparser (Node.js), email.parser (Python stdlib), mail gem (Ruby), and mime (Go).

Skip the MIME Parser — Get Clean JSON

JsonHook handles MIME parsing for every inbound email. Free up to 100 emails/month.

Get Free API Key

Step-by-Step Instructions

Understanding how MIME becomes JSON (what JsonHook does for you):

  1. Parse the RFC 5322 headers. Split the raw message into headers and body at the first blank line. Parse each header, handling folded headers (continuation lines starting with whitespace).
  2. Detect the Content-Type. If multipart/*, extract the boundary parameter. If text/plain or text/html, the body is a single part.
  3. Walk the multipart tree recursively. Split the body on the boundary marker. For each part, parse its headers and repeat from step 2. This handles arbitrarily nested multipart structures.
  4. Decode each body part. Apply the Content-Transfer-Encoding: base64-decode or quoted-printable-decode. Then apply the charset conversion specified in Content-Type; charset= to produce a UTF-8 string.
  5. Classify each leaf part. text/plain becomes textBody. text/html becomes htmlBody. Any part with a Content-Disposition: attachment or non-text content type becomes an entry in attachments.
  6. Serialize to JSON using a consistent schema.

Code Example

A minimal MIME-to-JSON parser using Node.js's mailparser library (what you would write if you were doing this yourself without JsonHook):

import { simpleParser, ParsedMail } from "mailparser";

async function mimeToJson(rawMime: string | Buffer) {
  const parsed: ParsedMail = await simpleParser(rawMime);

  return {
    email: {
      from:        parsed.from?.text ?? null,
      to:          parsed.to
                     ? (Array.isArray(parsed.to)
                         ? parsed.to.map(a => a.text)
                         : [parsed.to.text])
                     : [],
      subject:     parsed.subject ?? null,
      date:        parsed.date?.toISOString() ?? null,
      messageId:   parsed.messageId ?? null,
      textBody:    parsed.text ?? null,
      htmlBody:    parsed.html || null,
      headers:     Object.fromEntries(
        [...parsed.headers.entries()].map(([k, v]) => [
          k.toLowerCase(),
          Array.isArray(v) ? v.join("
") : String(v),
        ])
      ),
      attachments: (parsed.attachments ?? []).map(a => ({
        filename:    a.filename ?? "unnamed",
        contentType: a.contentType,
        size:        a.size,
        contentId:   a.cid ?? null,
      })),
    },
  };
}

// With JsonHook, you never need to call this — it is done for you.

This is essentially what JsonHook does at scale for every inbound email — but as a managed service with retries, logging, and HMAC signatures included.

Common Pitfalls

If you are parsing MIME yourself (rather than using JsonHook), watch for:

  • Incorrect boundary detection. Boundary markers in multipart messages must be preceded by -- and the closing boundary must end with --. Boundaries can also contain special characters that need no escaping in the MIME spec but may confuse naive string splitting.
  • Missing charset handling. If you skip charset conversion, non-ASCII characters in email bodies become garbage. Always respect the charset parameter of the Content-Type header.
  • Treating quoted-printable as base64. These are different encodings. Content-Transfer-Encoding: quoted-printable uses =XX sequences for non-ASCII bytes; base64 encodes binary data into the A-Z/0-9/+/ character set.
  • Not handling degenerate messages. Real-world email frequently violates RFC standards. A robust parser must handle: missing boundary markers, missing Content-Type headers, text/plain bodies sent without any multipart wrapping, and extremely long header lines.
  • Memory issues with large attachments. Base64-decoding a 10 MB attachment requires ~7.5 MB of binary data. Stream large attachments to disk or object storage rather than holding them in memory.

Frequently Asked Questions

Does JsonHook handle malformed MIME messages?

Yes. JsonHook's parser is designed to handle the full range of real-world email including messages that technically violate the MIME RFC. For severely malformed messages where automatic recovery is not possible, the email is still delivered to your webhook with whatever fields could be parsed, and a parseWarnings array in the payload describes any issues encountered.

What happens to embedded (inline) images in HTML email?

Inline images referenced by cid: URLs in the HTML body appear in email.attachments with a non-null contentId field. The htmlBody retains the original cid: references. If you are rendering the HTML, you would replace cid:xxx references with the actual image URLs after downloading the inline attachment content.

How does JsonHook handle emails with only an HTML body and no plain-text body?

The textBody field will be null and htmlBody will contain the HTML content. JsonHook does not auto-generate a text version from the HTML — if you need a text version for display or processing, strip the HTML tags from htmlBody in your handler using a library like html-to-text.

Can I get the raw MIME message from JsonHook?

Yes, on the Pro plan. Use the delivery log API to download the original raw MIME message for any delivery within the retention period. This is useful for archival, debugging, or re-parsing with a different library.