Threat hunting with Polars and flaws.cloud AWS CloudTrail datasets.
Check out threat hunting notebook in nbviewer
or rerun the hunt yourself in Jupyter lab.
Normalized datasets and alerts can be found as parquet
files in the results
directory. You can load these for further exploration using your OLAP database of choice.
Polars is a OLAP query engine written in Rust. It's highly memory efficient, uses Apache Arrow as its memory model, and consistently tops database speed benchmarks against distributed OLAP engines e.g. PySpark and Snowflake.
At Tracecat, we use Polars as an alternative to jq
or grep
for quick-and-dirty threat hunting.
- Ridiculously fast and efficient string operations
- Piped query language
- Highly parallelized window operations
- Powerful aggregation functions to compute metrics
- Small binary with zero dependencies (~70ms import time)
If your logs fit in memory and you are using Python / Jupyter Notebooks as part of your threat hunting process, Polars should be your goto tool for threat hunting.
Note: for every 1GB of gzipped JSON logs on disk, you can expect Polars in-memory data model to take up approximately ~500MB of RAM.
Requires python>3.9
, pip
, and git lfs
to be installed.s
First clone the repository and download datasets from git lfs (large file system).
git clone [email protected]:TracecatHQ/hunts.git
cd hunts
git lfs fetch
git lfs pull
Create a new python environment using pip
or conda
(optional), then install the required dependencies via pip install -r requirements.txt
.
Finally, spin up Jupyter lab using jupyter lab
to view the aws_flaws.ipynb
and aws_flaws_2.ipynb
notebooks inside the notebooks
directory.
Interested in our work bringing low-cost, but powerful data engineering tools to cybersecurity? We'd love to hear your thoughts over email [email protected] or find us in the Tracecat Discord community!