Skip to content

ehamiter/mbox2db

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mbox2db

A fast, simple Rust-based tool to convert large mbox email archives into optimized SQLite databases. Built for handling gigabyte-sized Gmail exports with maximum performance.

Installation

cargo install mbox2db

Quick Start

# Convert mbox to SQLite (excludes Spam/Trash by default)
mbox2db all-mail.mbox

# Output: 2025-11-04-emails.db (in current directory)

Basic SQL Queries

-- Count all emails
SELECT COUNT(*) FROM emails;

-- Get most recent emails
SELECT subject, from_addr, date_parsed
FROM emails
ORDER BY date_parsed DESC
LIMIT 10;

-- Search subject lines
SELECT subject, date_parsed, from_addr
FROM emails
WHERE subject LIKE '%keyword%'
ORDER BY date_parsed DESC;

-- Count emails by year
SELECT strftime('%Y', date_parsed) as year, COUNT(*)
FROM emails
WHERE date_parsed IS NOT NULL
GROUP BY year
ORDER BY year;

Usage Options

mbox2db [OPTIONS] <INPUT>

Arguments:
  <INPUT>  Input mbox file path

Options:
  -o, --output <OUTPUT>              Custom output database path
  -d, --destructive                  Overwrite existing database instead of auto-incrementing
      --include-spam                 Include emails marked as Spam
      --include-trash                Include emails marked as Trash
      --include-spam-and-trash       Include both Spam and Trash emails
  -h, --help                         Print help

How to Export Gmail to mbox

  1. Go to Google Takeout
  2. Deselect all products, then select Mail
  3. Click "All Mail data included" and select specific labels if desired
  4. Choose "Export once" and "Send download link via email"
  5. Select file format: .zip or .tgz
  6. Click "Create export"
  7. Download and extract the .mbox file
Technical Details

Features

  • Lightning Fast: Single-transaction writes with optimized SQLite settings (WAL mode, memory mapping, large cache)
  • Smart Filtering: Automatically excludes Spam and Trash by default (configurable)
  • Auto-Incrementing Filenames: Creates dated databases (e.g., 2025-11-03-emails.db) that auto-increment to avoid overwriting
  • Robust Date Parsing: Handles 20+ malformed date formats commonly found in email archives
  • Progress Indicator: Modern spinner shows real-time progress and skipped email counts
  • Full-Text Search Ready: Creates indexes on common fields for instant queries

Building from Source

# Build release binary
cargo build --release

# Binary will be at ./target/release/mbox2db

Examples

Basic Conversion (Default Behavior)

# Filters out Spam/Trash, creates dated output file
mbox2db all-mail.mbox
# Output: 2025-11-04-emails.db

# Running again on the same day creates incremented file
mbox2db all-mail.mbox
# Output: 2025-11-04-emails-0001.db

Include Spam/Trash

# Include spam emails only
mbox2db all-mail.mbox --include-spam

# Include trash emails only
mbox2db all-mail.mbox --include-trash

# Include both spam and trash
mbox2db all-mail.mbox --include-spam-and-trash

Custom Output Path

# Specify custom output location
mbox2db all-mail.mbox -o ~/Documents/my-emails.db

# Overwrite existing file (destructive mode)
mbox2db all-mail.mbox -d -o emails.db

Database Schema

CREATE TABLE emails (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    from_addr TEXT,
    to_addr TEXT,
    cc TEXT,
    bcc TEXT,
    subject TEXT,
    date TEXT,              -- Original email date header
    date_parsed TEXT,       -- Parsed datetime in SQLite format (YYYY-MM-DD HH:MM:SS)
    message_id TEXT,
    in_reply_to TEXT,
    refs TEXT,              -- "references" header
    content_type TEXT,
    body_plain TEXT,
    body_html TEXT
);

-- Indexes for fast queries
CREATE INDEX idx_from ON emails(from_addr);
CREATE INDEX idx_date ON emails(date);
CREATE INDEX idx_date_parsed ON emails(date_parsed);
CREATE INDEX idx_subject ON emails(subject);

More SQL Query Examples

Search by Date

-- Get emails from 2025
SELECT * FROM emails 
WHERE date_parsed LIKE '2025%'
ORDER BY date_parsed DESC;

-- Get emails from date range
SELECT subject, date_parsed, from_addr 
FROM emails 
WHERE date_parsed BETWEEN '2020-01-01' AND '2020-12-31'
ORDER BY date_parsed DESC;

-- Count emails from specific sender
SELECT COUNT(*) FROM emails WHERE from_addr LIKE '%[email protected]%';

Full-Text Search

-- Search email body
SELECT subject, from_addr, date_parsed 
FROM emails 
WHERE body_plain LIKE '%search term%' 
   OR body_html LIKE '%search term%'
ORDER BY date_parsed DESC;

Email Threads

-- Find email threads by message_id/in_reply_to
SELECT * FROM emails 
WHERE in_reply_to = '<some-message-id>'
ORDER BY date_parsed;

Performance Notes

  • Optimized SQLite Settings:

    • WAL (Write-Ahead Logging) mode for better concurrency
    • NORMAL synchronous mode for fast writes
    • 64MB cache size
    • 30GB memory mapping
    • Single transaction for all inserts (~10-100x faster)
  • Handles Large Files: Tested with multi-GB mbox files containing 80,000+ emails

  • Date Parsing: Handles malformed dates including:

    • Double-dash timezones (--0400)
    • Single-digit time components (9:47:11)
    • Two-digit years (Jun 09)
    • Named timezones (Eastern Daylight Time, GMT-0700)
    • Various date formats (7/19/2005 8:11:52 AM)

License

MIT

Author

Eric Hamiter

About

Convert a Gmail (.mbox) export into a SQLite db

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages