Skip to content

Latest commit

 

History

History
197 lines (136 loc) · 4.37 KB

File metadata and controls

197 lines (136 loc) · 4.37 KB

f2md

Convert PDF, DOCX, and image files to Markdown using AI. This CLI tool extracts text, images, and preserves table structure while converting documents to clean, well-formatted Markdown. It also supports OCR text extraction from images.

Features

  • PDF Support - Full text extraction, image extraction, and page screenshots for layout understanding
  • DOCX Support - Text and image extraction with structure preservation
  • Image OCR - Extract text from images (PNG, JPG, JPEG, GIF, WEBP) using AI-powered OCR
  • AI-Powered Conversion - Uses Google's Gemini AI to intelligently convert content to Markdown
  • Configurable Model - Choose which Gemini model to use via .env
  • Interactive CLI - Friendly prompts using clack.js
  • Easy Setup - Built-in configuration wizard for API keys

Installation

Using npx (no installation required)

npx f2md document.pdf

Using bunx

bunx f2md document.pdf

Using pnpm dlx

pnpm dlx f2md document.pdf

Global installation

npm install -g f2md
# or
bun install -g f2md

Setup

Before using the tool, you need to configure your Google AI API key.

Run the setup wizard

f2md setup
# or with npx
npx f2md setup

The setup wizard will:

  1. Show you where to get a Google AI API key (https://aistudio.google.com/apikey)
  2. Prompt you to enter your API key
  3. Prompt you to enter the Gemini model to use (default: gemini-2.5-flash)
  4. Ask where to save it (local project or global for all projects)

Manual setup

Alternatively, set environment variables:

export GOOGLE_GENERATIVE_AI_API_KEY="your-api-key-here"
export GOOGLE_GENERATIVE_AI_MODEL="gemini-2.5-flash"

Or create a .env file in your project:

GOOGLE_GENERATIVE_AI_API_KEY=your-api-key-here
GOOGLE_GENERATIVE_AI_MODEL=gemini-2.5-flash

GOOGLE_GENERATIVE_AI_MODEL is optional. If omitted, gemini-2.5-flash is used.

Usage

Interactive Mode

f2md

The tool will prompt you for:

  • Input file path (PDF, DOCX, or image)
  • Output file path

CLI Mode

# Convert with auto-generated output name
f2md document.pdf

# Convert with custom output path
f2md document.pdf output.md

# Extract text from an image (OCR)
f2md screenshot.png

# Extract text from image with custom output
f2md image.jpg output.md

Supported File Types

  • PDF (.pdf)
  • Word Documents (.docx)
  • Images (.png, .jpg, .jpeg, .gif, .webp) - OCR text extraction

Options

f2md --help     # Show help
f2md --version  # Show version
f2md setup      # Configure API key and model

How It Works

For PDF and DOCX files:

  1. Extraction - Reads the input file and extracts text, images, and layout information
  2. Processing - For PDFs, captures page screenshots to understand visual layout
  3. AI Conversion - Sends extracted content to your configured Gemini model
  4. Markdown Generation - Receives AI-generated Markdown with proper formatting
  5. Cleanup - Removes unused images and saves the final output

For Image files:

  1. Image Processing - Reads the image file and encodes it for AI processing
  2. OCR Analysis - Sends the image to Google's Gemini AI with specialized prompts for text extraction
  3. Text Extraction - AI extracts all visible text while preserving structure (headings, lists, tables)
  4. Markdown Generation - Converts extracted content to well-formatted Markdown
  5. Output - Saves the final Markdown file

Development

Prerequisites

  • Bun installed

Setup

# Clone the repository
git clone <repo-url>
cd f2md

# Install dependencies
bun install

# Run in development mode
bun run dev

Build

bun run build

Project Structure

src/
  cli.ts      - CLI entry point with clack prompts
  convert.ts  - Core conversion logic
  index.ts    - Public API exports
dist/         - Built output (generated)

API Usage

You can also use this as a library in your Node.js/Bun projects:

import { convert } from "f2md";

const result = await convert("input.pdf", "output.md", {
  onProgress: (message) => console.log(message),
  respectPages: false,
  model: "gemini-2.5-flash",
});

console.log(`Saved to: ${result.outputPath}`);
console.log(`Images saved: ${result.imagesSaved}`);
console.log(`Images cleaned: ${result.imagesDeleted}`);

License

MIT