-
Notifications
You must be signed in to change notification settings - Fork 15
Parser Combinators
We began the meeting by welcoming attendees new and old and James kicked things off by recapping his introduction to parser combinators.
The idea behind parser combinators is that each common operation in a parser can be implemented by a function, and then those functions can be combined into more elaborate operations. In general, a combinator is a function that takes an input state, typically the text to be parsed and an offset representing how far into the string you’ve already scanned. If the combinator matches the input at the current offset, it returns a parse tree and a new state: the text with the offset moved to the end of the text matched by the combinator. If it fails, it returns nothing, or the state it was given, depending on what’s most convenient for your implementation.
After explaining the core concept of parser combinators (covered more thoroughly in the aforementioned blog post) and touching on the issue of precedence and associativity in recursive descent, James introduced six useful combinators that he'd already implemented ahead of the meeting for us to mob with:
-
str(string, &action)
: match a literal stringstring
and perform someaction
with the resulting tree; -
chr(pattern, &action)
: match some character classpattern
(e.g.a-z
) and perform someaction
with the resulting tree; -
seq(*parsers, *action)
: match a contiguous sequence of other combinatorsparsers
and perform someaction
with the resulting tree; -
rep(parser, n, *action)
: match another combinatorparser
at leastn
times and perform someaction
with the resulting tree; -
alt(*parsers, &action)
: attempt to match the given combinatorsparsers
in order and perform someaction
on the first matching combinator; -
ref(name)
: an indirect reference to a named combinator, allowing us to form recursive matches.
With the core concepts in place, we decided to mob a parser for the arith
language introduced in the early chapters of "Types and Programming Languages". Specifically, this minimal language:
t = true
false
t
succ t
pred t
iszero t
if t then t else t
Our first goal being to parse the following expression:
if iszero pred succ 0 then true else false
Into our own representation using Ruby classes like so:
If.new(IsZero.new(Pred.new(Succ.new(Zero))), True, False)
We began by implementing the "constants" of our language, 0
, true
and false
by writing some RSpec tests and choosing some representation of our terms as plain Ruby objects.
Using James' existing ParserCombinators
library as a starting point, we wrote our own Parser
class with the following minimal rules:
class Parser
include ParserCombinators
def root
alt(zero, tru, fals)
end
def zero
str('0') { Zero }
end
def tru
str('true') { True }
end
def fals
str('false') { False }
end
end
We then went on to add support for a slightly more complicated term: succ t
which can contain any other term as its argument. We began with a failing test, detailing our desired outcome:
Succ.new(Zero)
And then added the rules to parse it:
class Parser
def root
alt(zero, tru, fals, succ)
end
def succ
seq(str('succ '), ref(:root)) { |node| Succ.new(node.last) }
end
end
As the node
yielded to us for the term succ 0
would be the following Ruby array:
[:seq, [:str, 'succ '], Zero]
We simply take the last
element of this array as the inner term we want to wrap in a Succ
.
We then decided to relax the whitespace requirement so that we could also parse expressions such as succ 0
by extracting a separate whitespace
rule:
class Parser
def succ
seq(str('succ'), whitespace, ref(:root)) { |node| Succ.new(node.last) }
end
def whitespace
rep(str(' '), 1)
end
end
We initially ignored Chris' pleas to abbreviate this to ws
but later succumbed to defining an alias for whitespace
called _
.
With the liberal application of copy-paste technology, we had similar support for pred
and iszero
and then decided to do a Computation Club first: a little refactoring.
Specifically, we decided to introduce a brand new combinator just for this function call-like terms:
class Parser
def function_call(name, klass)
seq(str(name), whitespace, ref(:root)) { |node| klass.new(node.last) }
end
end
This meant we could express succ
, pred
and iszero
more tersely:
class Parser
def succ
function_call('succ', Succ)
end
def pred
function_call('pred', Pred)
end
def iszero
function_call('iszero', IsZero)
end
end
Thanks to Charlie for organising the meeting, to Geckoboard and Leo for hosting and providing beverages, to James for the huge amount of preparation and leading the meeting and to Laura for volunteering the organise the next meeting.
- Home
- Documentation
- Choosing a Topic
- Shows & Tells
- Miscellaneous
- Opt Art
- Reinforcement Learning: An Introduction
- 10 Technical Papers Every Programmer Should Read (At Least Twice)
- 7 More Languages in 7 Weeks
- Lua, Day 1: The Call to Adventure
- Lua, Day 2: Tables All the Way Down
- Lua, Day 3
- Factor, Day 1: Stack On, Stack Off
- Factor, Day 2: Painting the Fence
- Factor, Day 3: Balancing on a Boat
- Elm, Day 1: Handling the Basics
- Elm, Day 2: The Elm Architecture
- Elm, Day 3: The Elm Architecture
- Elixir, Day 1: Laying a Great Foundation
- Elixir, Day 2: Controlling Mutations
- Elixir, Day 3: Spawning and Respawning
- Julia, Day 1: Resistance Is Futile
- Julia, Day 2: Getting Assimilated
- Julia, Day 3: Become One With Julia
- Minikanren, Days 1-3
- Minikanren, Einstein's Puzzle
- Idris Days 1-2
- Types and Programming Languages
- Chapter 1: Introduction
- Chapter 2: Mathematical Preliminaries
- Chapter 3: Untyped Arithmetic Expressions
- Chapter 4: An ML Implementation of Arithmetic Expressions
- Chapter 5: The Untyped Lambda-Calculus
- Chapters 6 & 7: De Bruijn Indices and an ML Implementation of the Lambda-Calculus
- Chapter 8: Typed Arithmetic Expressions
- Chapter 9: The Simply-Typed Lambda Calculus
- Chapter 10: An ML Implementation of Simple Types
- Chapter 11: Simple Extensions
- Chapter 11 Redux: Simple Extensions
- Chapter 13: References
- Chapter 14: Exceptions
- Chapter 15: Subtyping – Part 1
- Chapter 15: Subtyping – Part 2
- Chapter 16: The Metatheory of Subtyping
- Chapter 16: Implementation
- Chapter 18: Case Study: Imperative Objects
- Chapter 19: Case Study: Featherweight Java
- The New Turing Omnibus
- Errata
- Chapter 11: Search Trees
- Chapter 8: Random Numbers
- Chapter 35: Sequential Sorting
- Chapter 58: Predicate Calculus
- Chapter 27: Perceptrons
- Chapter 9: Mathematical Research
- Chapter 16: Genetic Algorithms
- Chapter 37: Public Key Cryptography
- Chapter 6: Game Trees
- Chapter 5: Gödel's Theorem
- Chapter 34: Satisfiability (also featuring: Sentient)
- Chapter 44: Cellular Automata
- Chapter 47: Storing Images
- Chapter 12: Error-Correcting Codes
- Chapter 32: The Fast Fourier Transform
- Chapter 36: Neural Networks That Learn
- Chapter 41: NP-Completeness
- Chapter 55: Iteration and Recursion
- Chapter 19: Computer Vision
- Chapter 61: Searching Strings
- Chapter 66: Church's Thesis
- Chapter 52: Text Compression
- Chapter 22: Minimum spanning tree
- Chapter 64: Logic Programming
- Chapter 60: Computer Viruses
- Show & Tell
- Elements of Computing Systems
- Archived pages