The code for the RANLP 2021 paper "Exploring German Multi-Level Text Simplification" by Nicolas Spring, Annette Rios and Sarah Ebling.
This repository allows you to reproduce the "APA multi" model from the paper. It contains the necessary code. To download the data, please visit https://zenodo.org/record/5148163.
- Preprocessing makes use of scripts from mosesdecoder, subword-nmt and fairseq.
- The ATS model is trained with sockeye.
- Anaconda was used to create a virtual python environment.
- These scripts are designed to run with the slurm workload manager. Adjustments may be needed to make them run on your system.
Download and install the code:
git clone https://github.com/ZurichNLP/RANLP2021-German-ATS.git
cd RANLP2021-German-ATS/
bash install/create_env.sh
# Activate the virtual environment with the "conda activate" command that is prompted
# conda activate /...
bash install/install.sh
Download the data:
mkdir -p data/aligned
cd data
# Download the ZIP file from https://zenodo.org/record/5148163
unzip APA_sentence-aligned_LHA.zip -d aligned/
cd ..
Preprocess the data:
bash preprocess/preprocess.sh
Train the ATS model:
bash train/train.sh