Skip to content

link-draft: Add config file and ML-based tag/author classifier #53

@metaist

Description

@metaist

Summary

Refactor link-draft.py to use a config file (link-draft.toml) as single source of truth for mappings, and add a light ML classifier for predicting tags and other metadata.

Design

Config file: link-draft.toml

Committed to repo. Training updates it in-place; user reviews git diff and commits what's correct.

# Auto-updated by: link-draft.py --train
# Last trained: 2026-02-03T15:30:00Z

[author_domains]
"simonwillison.net" = "Simon Willison"  # seen: 12x
"astralcodexten.com" = "Scott Alexander"  # seen: 8x

[domain_tags]
"x.com" = ["tweet"]
"github.com" = ["programming"]
"amazon.com" = ["book"]

[keyword_tags]
"\\bAI\\b" = ["ai"]
"\\bLLM\\b" = ["ai", "llm"]

[known_people]
names = ["Jeremy Wertheimer", "Shalev NessAiver"]

[rejected.author_domains]
# Entries here won't be re-suggested by training

Training pipeline

  • Scans all existing link posts
  • Learns patterns weighted by recency (recent edits override older patterns)
  • Updates link-draft.toml in-place
  • Trains sklearn classifier for tag prediction, saves to _ignore_link-draft-model.pkl

Training flags

  • Auto-train if link-draft.toml missing or older than newest post
  • --skip-train to bypass auto-training
  • --train to force retrain

Draft generation flow

  1. Load static data from link-draft.toml (user overrides take precedence)
  2. Load ML model for fuzzy predictions
  3. Combine: static rules first, model fills gaps

Tasks

  • Define link-draft.toml schema
  • Extract hardcoded mappings from link-draft.py to config
  • Add training pipeline to scan posts and update toml
  • Add recency weighting (recent posts weighted higher)
  • Train sklearn classifier (e.g., SGDClassifier or MultinomialNB) for tags
  • Add --train, --skip-train flags
  • Add auto-train logic (train if config older than newest post)
  • Add [rejected] section support
  • Update draft generation to use config + model

Notes

  • Training should be fast (<5 seconds for ~324 posts)
  • Model file is gitignored; static config is committed
  • If user deletes a suggestion and it keeps coming back, add to [rejected] section

Issue created by Claude Opus 4.5

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions