A fast, simple Rust-based tool to convert large mbox email archives into optimized SQLite databases. Built for handling gigabyte-sized Gmail exports with maximum performance.
cargo install mbox2db# Convert mbox to SQLite (excludes Spam/Trash by default)
mbox2db all-mail.mbox
# Output: 2025-11-04-emails.db (in current directory)-- Count all emails
SELECT COUNT(*) FROM emails;
-- Get most recent emails
SELECT subject, from_addr, date_parsed
FROM emails
ORDER BY date_parsed DESC
LIMIT 10;
-- Search subject lines
SELECT subject, date_parsed, from_addr
FROM emails
WHERE subject LIKE '%keyword%'
ORDER BY date_parsed DESC;
-- Count emails by year
SELECT strftime('%Y', date_parsed) as year, COUNT(*)
FROM emails
WHERE date_parsed IS NOT NULL
GROUP BY year
ORDER BY year;mbox2db [OPTIONS] <INPUT>
Arguments:
<INPUT> Input mbox file path
Options:
-o, --output <OUTPUT> Custom output database path
-d, --destructive Overwrite existing database instead of auto-incrementing
--include-spam Include emails marked as Spam
--include-trash Include emails marked as Trash
--include-spam-and-trash Include both Spam and Trash emails
-h, --help Print help
- Go to Google Takeout
- Deselect all products, then select Mail
- Click "All Mail data included" and select specific labels if desired
- Choose "Export once" and "Send download link via email"
- Select file format:
.zipor.tgz - Click "Create export"
- Download and extract the
.mboxfile
Technical Details
- Lightning Fast: Single-transaction writes with optimized SQLite settings (WAL mode, memory mapping, large cache)
- Smart Filtering: Automatically excludes Spam and Trash by default (configurable)
- Auto-Incrementing Filenames: Creates dated databases (e.g.,
2025-11-03-emails.db) that auto-increment to avoid overwriting - Robust Date Parsing: Handles 20+ malformed date formats commonly found in email archives
- Progress Indicator: Modern spinner shows real-time progress and skipped email counts
- Full-Text Search Ready: Creates indexes on common fields for instant queries
# Build release binary
cargo build --release
# Binary will be at ./target/release/mbox2db# Filters out Spam/Trash, creates dated output file
mbox2db all-mail.mbox
# Output: 2025-11-04-emails.db
# Running again on the same day creates incremented file
mbox2db all-mail.mbox
# Output: 2025-11-04-emails-0001.db# Include spam emails only
mbox2db all-mail.mbox --include-spam
# Include trash emails only
mbox2db all-mail.mbox --include-trash
# Include both spam and trash
mbox2db all-mail.mbox --include-spam-and-trash# Specify custom output location
mbox2db all-mail.mbox -o ~/Documents/my-emails.db
# Overwrite existing file (destructive mode)
mbox2db all-mail.mbox -d -o emails.dbCREATE TABLE emails (
id INTEGER PRIMARY KEY AUTOINCREMENT,
from_addr TEXT,
to_addr TEXT,
cc TEXT,
bcc TEXT,
subject TEXT,
date TEXT, -- Original email date header
date_parsed TEXT, -- Parsed datetime in SQLite format (YYYY-MM-DD HH:MM:SS)
message_id TEXT,
in_reply_to TEXT,
refs TEXT, -- "references" header
content_type TEXT,
body_plain TEXT,
body_html TEXT
);
-- Indexes for fast queries
CREATE INDEX idx_from ON emails(from_addr);
CREATE INDEX idx_date ON emails(date);
CREATE INDEX idx_date_parsed ON emails(date_parsed);
CREATE INDEX idx_subject ON emails(subject);-- Get emails from 2025
SELECT * FROM emails
WHERE date_parsed LIKE '2025%'
ORDER BY date_parsed DESC;
-- Get emails from date range
SELECT subject, date_parsed, from_addr
FROM emails
WHERE date_parsed BETWEEN '2020-01-01' AND '2020-12-31'
ORDER BY date_parsed DESC;
-- Count emails from specific sender
SELECT COUNT(*) FROM emails WHERE from_addr LIKE '%[email protected]%';-- Search email body
SELECT subject, from_addr, date_parsed
FROM emails
WHERE body_plain LIKE '%search term%'
OR body_html LIKE '%search term%'
ORDER BY date_parsed DESC;-- Find email threads by message_id/in_reply_to
SELECT * FROM emails
WHERE in_reply_to = '<some-message-id>'
ORDER BY date_parsed;-
Optimized SQLite Settings:
- WAL (Write-Ahead Logging) mode for better concurrency
- NORMAL synchronous mode for fast writes
- 64MB cache size
- 30GB memory mapping
- Single transaction for all inserts (~10-100x faster)
-
Handles Large Files: Tested with multi-GB mbox files containing 80,000+ emails
-
Date Parsing: Handles malformed dates including:
- Double-dash timezones (
--0400) - Single-digit time components (
9:47:11) - Two-digit years (
Jun 09) - Named timezones (
Eastern Daylight Time,GMT-0700) - Various date formats (
7/19/2005 8:11:52 AM)
- Double-dash timezones (
MIT
Eric Hamiter