He attac, he protec, but most importantly, he dissec.
dissec
is a Python module used for implementing string dissection patterns,
compatible with ElasticSearch's Dissect processor.
The project is present at the following locations:
For example, you can dissect a string using this module with the following snippet:
from dissec.patterns import Pattern
pattern = Pattern.parse(
r'%{clientip} %{ident} %{auth} [%{@timestamp}] \"%{verb} %{request} '
+ r'HTTP/%{httpversion}\" %{status} %{size}',
)
result = pattern.dissect(
r'1.2.3.4 - - [30/Apr/1998:22:00:52 +0000] '
+ r'\"GET /english/venues/cities/images/montpellier/18.gif '
+ r'HTTP/1.0\" 200 3171',
)
print(result)
This will print the following, pretty-printed here for readability purposes:
{'@timestamp': '30/Apr/1998:22:00:52 +0000',
'auth': '-',
'clientip': '1.2.3.4',
'httpversion': '1.0',
'ident': '-',
'request': '/english/venues/cities/images/montpellier/18.gif',
'size': '3171',
'status': '200',
'verb': 'GET'}
See Dissecting a string using dissect patterns for more details on this usage.