-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathreadme
31 lines (25 loc) · 1.84 KB
/
readme
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
I have implemeted the regex matcher using a finite state machine (FSM)
Using a FSM came about after I created a recursive grammar parser that was really slow at larger inputs
I remered creating a syntax parser in some programming language theory classes, and leveraged that experience here
The FSM was faster, but I didn't like the O(2^n) worst/average case runtime.
So, I allowed the FSM to be in multiple states at once, cutting the runtime down to O(n) worst/average case
To speed this up even more go's goroutine functons could be leveraged.
First, each line could be in a goroutine. With the file being outside and a worker pool being created
The worker pool would be an arbiter and allow for the line numbers to be leveraged.
Next, I would add the FSM parsing into a goroutine. Each move to a new character in the outermost parse loop would be a new goroutine.
Doing this, one could use channels to pass a successful parse back up the chain.
Usage:
./regex <filename> 'regex'
./regex <(echo 'sometext') 'regex'
Compiled for linux x86_64 arch.
I have renamed the binary for convience.
V2 notes:
- Added clearer transition logic
- Leveraged more of Go's features
- Got rid of a bug or two
After some testing and static code analysis I've noticed the code I have written is correct for all non ambigious matches.
If the match is ambigious, a correct result will be displayed. The algorithm is deterministic, but which solution is chosen depends on how the FSM is compiled.
For matching that is very specific on how ambigious matches are handled, a O(2^n) runtime is required.
A more Go like version could be created by creating an interface for nodes, type aliases for each node type,
and object methods for each node type that advance to the next character correctly.
This may be implemented Friday 8/17 in the evening if I haven't heard anything back about the code.