This a fork of light-oauth2 system by networknt adapted for the needs of creating the dataset.
Original README is here: OLD_README.md.
For other dockerfiles and more documentation, see original repo.
This is latest development version of the data collection scripts.
For the version corresponding to the PROMISE25 paper "LO2: Microservice API Anomaly Dataset of Logs and Metrics", see the corresponding release. The corresponding dataset is available at https://doi.org/10.5281/zenodo.14257989.
This repository contains the following files:
-
Original source code
Most directories in this repository contain original source code of
light-oauth2components; -
Locust tests created for
light-oauth2APIs; -
The Docker compose file adapted to deploy
light-oauth2system as well as additional components needed to gather the data. We use the MySQL database for deployment whilelight-oauth2supports other options. See original repo for other deployment files; -
prometheus.yml Configuration of Prometheus used in deployment
-
prometheus_metrics.txt List of available and queried Prometheus metrics (from our research server)
-
opentelemetry-javaagent.jar Java agent for Jaeger that we attempt to inject into each container for tracing collection
-
Scripts
- prometheus_metrics.sh Script used to query all available Prometheus metrics
- fetch_data.py Script to fetch data of Prometheus and Jaeger agents
- data_run.sh Main script that deploys the system, runs Locust tests and collects all data
To replicate the data collection process, set up the following things:
- Clone this repository
- Install MySQL (
mysqladmincommand should be available) - Install Locust (
locustcommand should be available) - Install the
requestspython library - Install Docker (
dockeranddocker composecommands should be available)
The file prometheus_metrics.txt contains the list of all metrics that should be queried from Prometheus during data gathering. Currently, it contains metrics that were available on our research server.
It is possible to use the prometheus_metrics.sh script to query all metrics available on your host system:
- Start your Prometheus instance (container)
- If it is deployed somewhere else than
localhost:9000, change the URL in the script - Run the script
- The list of metrics will be saved into prometheus_metrics.txt to be used by the main script
- If you need only a subset of metrics, edit the file accordingly
To perform a single data run, i.e. deploy the system and execute all the locust tests, run the data_run.sh script.
The script performs the following:
- Get the list of all tagged tasks from locust files
- Get the list of all Prometheus metrics from prometheus_metrics.txt
- Deploy all the containers using
docker compose -f docker-compose-oauth2-mysql.yml up --force-recreate -d - Wait for the MySQL database to be ready and read the configuration for
light-oauth2 - Shuffle all discovered tests in a random order
- For each tag, run a test with a random duration of 20-180 seconds
- If the tag is
correct, run only tasks taggedcorrect - If the tag is any other tag, run all
correcttasks and the tagged task
- If the tag is
- Wait between 1-5 seconds between tests
- Fetch all logs, metrics, traces
- The data has the following directory structure:
LO2_run_UNIX: root folder of data,UNIXis the unix timestamp of the beginning of the runrun_log.log: log of thedata_run.shscript for the entire runcorrect/ERROR: directory for thecorrectorcorrect+ERRORtest execution*.logfiles: log files for each container and Locustmetrics: folder containing all data from Prometheus and Jaegermetric_*.json: a JSON file for each Prometheus metric from prometheus_metrics.txt with metric valuestraces_*.csv: a CSV file for each container with Jaeger traceslast_fetch_time.txt: timestamp of the end of the interval the data was fetched for
If you use this dataset or the data collection package, please cite the following paper:
Bakhtin, A., Nyyssölä, J., Wang, Y., Ahmad, N., Ping, K., Esposito, M., Mäntylä, M., & Taibi, D. (2025). LO2: Microservice API Anomaly Dataset of Logs and Metrics. Proceedings of the 21st International Conference on Predictive Models and Data Analytics in Software Engineering, 1–10. https://doi.org/10.1145/3727582.3728682