Proofs written in Lean4 for the core katydid validation algorithm
The goal is to formalize the core katydid validation algorithm. This algorithm allows us to validate millions of serialized data structures per second on a single core. The algorithm is based on derivatives for regular expressions and extends this to Visibly Pushdown Automata (VPA), by splitting the derivative function into two functions. It also includes several basic optimizations, such as memoization, simplification, laziness, zipping of multiple states, short circuiting, evaluation at compilation, symbolic derivatives and a pull based parser for serialized data structures that allows us to skip over some of the parsing. You can play around with the validation language on its playground.
This is required reading for understanding the underlying algorithm, the subject of the proofs in this repo:
- Derivatives of Regular Expressions explained
- Derivatives of Context-Free Grammars explained (only the simplification rules, smart constructors and memoization are important)
- Derivatives of Symbolic Automata explained
This repo also requires the following background:
- Knowledge of dependent types, induction and understanding the difference between a property
True
and a booleantrue
. We recommend reading The Little Typer to gain an understanding of the basics. - Experience with an Interactive Theorem Prover, like Coq or Lean, including using tactics and Inductive Predicates. If you are unfamiliar with interactive theorem provers you can watch our talk for a taste. We recommend reading Coq in a Hurry as a quick overview and Coq Art up to
Chapter 8: Inductive Predicates
for a proper understanding.
Optionally the following will also be helpful, but this is not required:
- Experience with Lean4, since this project is written in Lean4. We recommend reading:
- Theorem Proving in Lean4 to close the gap between Coq and Lean.
- Lean Manual for programming in Lean and Monads.
- Lean Tactics File
- Coq Lean Tactic Cheat Sheet
- Lean Standard Libary Documentation
- Lean4 Meta Programming Book
Questions about Lean4 can be asked on proofassistants.stackexchange by tagging questions with lean
and lean4
or in the Zulip Chat.
We have pair programming sessions between 18:00 - 20:00 UK time, on dates that are announced on meetup.com.
- If you would like to watch, this will be streamed on Twitch.
- If you would like to do more than watch and join us inside the zoom session, please make sure you have met the prerequisites and email Walter.
This is just a quick overview of the steps towards our goal.
- Create Symbolic Predicates
- Create expression language as described in the post: Derivatives of Symbolic Automata explained
- Prove simplification rules for
or
,and
,false
, etc. - Prove that non-reader functions can be pre-computed before evaluating time
- Prove that the optimized comparison method using a hash is comparable (transitive, associative, etc.)
- Create the symbolic regular expressions for trees
- Steal as much as we can from our previous work in Coq
- Prove that the induction predicate is decidable.
- Create derivative functions for the symbolic tree expressions.
- Prove that the simple tree function and the VPA functions are equivalent and equivalent to the inductive predicate.
- Prove simplification rules
- Prove all optimizations of the katydid algorithm
- ...
Please check the prerequisites and read the contributing guidelines. The contributing guidelines are short and shouldn't be surprising.
- Lean4 has exceptional instructions for installing Lean4 in VS Code.
- Remember to also add
lake
(the build system for lean) to yourPATH
. You can do this on mac by addingexport PATH=~/.elan/bin/:${PATH}
to your~/.zshrc
file - Use mathlib's cache to speed up building time by running:
$lake exe cache get