Source code, datasets, and instructions for the paper "A Conversational Framework for Faithful Multi-Perspective Analysis of Production Processes".
Production systems call for analysis techniques yielding reliable diagnostic and prognostic insights in a timely fashion. To this end, numerous reasoning techniques have been exploited, mainly within the simulation and formal verification realms. However, the technological barrier between these approaches and the target end users remains a stumbling block to their effective adoption. This paper presents a framework interposing a natural language based interface between the interpretation of the user’s request and the reasoning tools. The user’s natural language request is automatically translated into a machine-readable problem. The latter is then dispatched to a proper reasoning engine and either solved through a simulation or a formal verification task, thus enabling a multi-perspective analysis of the production system and certifying the correctness and transparency of the obtained solutions. The outcome is then reprocessed to be human-interpretable. State-of-the-art Large Language Models (LLMs), with their robust capability to interpret the inherent ambiguity of natural language, perform both translations. We evaluate the framework on a lab-scale case study replicating a real production system. The results of the experiments suggest that LLMs are promising complements to derive insights from faithful reasoning engines, supporting accurate analysis.
The Figure shows the components of the framework and how they interact.
The framework is designed to provide grounded and interpretable answers to natural language requests concerning a production process, i.e., the representation of the activities performed within a production system. It achieves this through the integration of a Conversational Layer and a Reasoning Layer. The former tackles the formulation of the problem to be fed to the Reasoning Layer and the interpretation of the results in response to the user. The latter exploits either a digital twin simulating the production process or a formal verifier reasoning on its automaton. Therefore, the approach assumes the availability of the simulation parameters and the automaton modeling the production process, provided by a domain expert rather than being LLM-generated to ensure their correctness.
As illustrated in the Figure, the Conversational Layer includes a set of LLMs: the Gateway LLM, which routes the user’s questions, and the Translator LLMs for Simulation and Verification, which translate these requests into machine-readable representations compatible with the corresponding reasoners' syntax.
.
├── images # figures for the README file
| └──...
├── models # automaton and simulation parameters of the factory
| └── ...
├── src # source code of proposed approach
| ├── uppaal # source code of the Uppaal verifier
| └── ...
├── tests # sources for the evaluation
| ├── outputs # outputs of the live conversations
| ├── test_sets # test sets employed during the evaluation
| └── validation # quantitative evaluation results for each run
└──...
First, you need to clone the repository:
git clone https://github.com/angelo-casciani/conv_automata
cd conv_automataCreate a new conda environment:
conda create -n conv_automata python=3.9 --yes
conda activate conv_automataRun the following command to install the necessary packages along with their dependencies in the requirements.txt file using pip:
pip install -r requirements.txtVisit the official Uppaal downloads page and download the appropriate version for your OS.
Run the installer and follow the instructions on the website.
Upon first launch, request and register a valid license key when prompted.
Set up a HuggingFace token and/or an OpenAI API key in the .env file in the root directory:
env HF_TOKEN=<your token, should start with hf_> OPENAI_API_KEY=<your key, should start with sk->
Please note that this software leverages the open-source and closed-source LLMs reported in the table:
| Model | HuggingFace Link |
|---|---|
| meta-llama/Meta-Llama-3-8B-Instruct | HF link |
| meta-llama/Meta-Llama-3.1-8B-Instruct | HF link |
| meta-llama/Llama-3.2-1B-Instruct | HF Link |
| meta-llama/Llama-3.2-3B-Instruct | HF link |
| mistralai/Mistral-7B-Instruct-v0.2 | HF link |
| mistralai/Mistral-7B-Instruct-v0.3 | HF link |
| mistralai/Mistral-Nemo-Instruct-2407 | HF link |
| mistralai/Ministral-8B-Instruct-2410 | HF link |
| Qwen/Qwen2.5-7B-Instruct | HF link |
| google/gemma-2-9b-it | HF link |
| gpt-4o-mini | OpenAI link |
Request in advance the permission to use each Llama model for your HuggingFace account. Retrive your OpenAI API key to use the supported GPT model.
Please note that each of the selected models have specific requirements in terms of GPU availability. It is recommended to have access to a GPU-enabled environment meeting at least the minimum requirements for these models to run the software effectively.
Run the conversational framework:
cd src
python3 main.pyThe complete conversation will be stored in a .txt file in the outputs folder.
The default parameters are:
- LLM:
'Qwen/Qwen2.5-7B-Instruct'; - Number of generated tokens:
512; - Interaction Modality:
'live', i.e., the live chat with the conversational framework.
To customize these settings, modify the corresponding arguments when executing main.py:
- Use
--llm_idto specify a different LLM (e.g., among the ones reported in the LLMs Requirements section). - Adjust
--max_new_tokensto change the number of generated tokens. - Set
--modalityto alter the interaction modality (i.e.,'live','evaluation-simulation', 'evaluation-verification', and 'evaluation-routing').
A comprehensive list of commands can be found in src/cmd4tests.sh.
To reproduce the experiments for the simulation evaluation, for example:
cd src
python3 main.py --llm_id Qwen/Qwen2.5-7B-Instruct --modality evaluation-simulation --max_new_tokens 512The results will be stored in a .txt file reporting all the information for the run and the corresponding results in the validation folder.
To reproduce the experiments for the verification evaluation, for example:
cd src
python3 main.py --llm_id gpt-4o-mini --modality evaluation-verification --max_new_tokens 512The results will be stored in a .txt file reporting all the information for the run and the corresponding results in the validation folder.
To reproduce the experiments for the routing evaluation, for example:
cd src
python3 main.py --llm_id mistralai/Mistral-7B-Instruct-v0.3 --modality evaluation-routing --max_new_tokens 512The results will be stored in a .txt file reporting all the information for the run and the corresponding results in the validation folder.
To generate new test sets for the three supported evaluations, run the script test_sets_generation.py before running an evaluation.
python3 test_sets_generation.pyDistributed under the GNU GPL License. See LICENSE for more information.
If you use this repository in your research, please cite:
@inproceedings{DBLP:conf/caise/CascianiLMM25,
author = {Angelo Casciani and
Livia Lestingi and
Andrea Marrella and
Andrea Matta},
editor = {John Krogstie and
Stefanie Rinderle{-}Ma and
Gerti Kappel and
Henderik A. Proper},
title = {A Conversational Framework for Faithful Multi-perspective Analysis
of Production Systems},
booktitle = {Advanced Information Systems Engineering - 37th International Conference,
CAiSE 2025, Vienna, Austria, June 16-20, 2025, Proceedings, Part {I}},
series = {Lecture Notes in Computer Science},
volume = {15701},
pages = {163--181},
publisher = {Springer},
year = {2025},
url = {https://doi.org/10.1007/978-3-031-94569-4\_10},
doi = {10.1007/978-3-031-94569-4\_10}
}