Skip to content

Helpful Utils

Cade Stocker edited this page Nov 29, 2025 · 7 revisions

download_small_aria.sh

./utils/download_small_aria.sh

downloads the unique version of the aria dataset (around 30,000 MIDI files).

You can use:

./utils/download_small_aria.sh shrink

to download the dataset, but have the script delete everything in the downloaded dataset except for 2 of the subdirectories. This is an easy way for testing that everything is working.

If you want to use a different dataset than aria or nottingham, refer to this page.

midi_to_seed.py

This util takes a MIDI file and a directory of preprocessed data. It then tokenizes the MIDI file with the vocab from the preprocessed directory. This is used in generation when the user provides the --seed_midi_file flag.

python utils/midi_to_seed.py --midi <path to midi you want to use> --dataset <path to dataset you want to use>

All args:

  • --midi: the path to the midi file you want to tokenize
  • --dataset: the path to the dataset you want to tokenize the midi with
  • --seq_length: desired seed sequence length (if not specified it will use the entire file)
  • --output: path to save the seed sequence at (optional)

find_best_midis.py

This util looks through all logs within the project and finds the csv files that log MIDI details and MIDI evaluations. It then uses pandas to calculate scores for all generated MIDIs. These scores are calculated by comparing each generated file to different means, maxes, and mins for different metrics.

Just call

python utils/find_best_midis.py

args:

  • --output_dir: the directory you want the output to go to. A csv file with rankings is created and saved to this directory along with copies of the MIDI files that were ranked.
  • --top_n: number of top files to select. This is the amount of MIDI files that will be copied to the output directory.

diagnose_generation.py

Takes a model and its data directory, then generates MIDI files with predetermined generation args. It then analyzes the MIDI files that were generated for things like number of unique notes, how many repetitions are made, how much of of the total vocab did the model use, etc.

This is a good way to test your models after training to see generally if they can make music.

python utils/diagnose_generation.py --model_path <path to model checkpoint> --data_dir <path to preprocessed data for chosen model>

args:

  • --model_path: path to model checkpoint
  • --data_dir: path to preprocessed data for chosen model
  • --num_samples: number of samples to generate and analyze (default is 5)
  • --generate_length: length to be used when calling generate.py

analyze_logs.py

Looks in all training, evaluation, and generation logs. Then analyzes things like "temperature effects", "quality by sampling strategy", "performance by model type", "dataset comparison", etc. by using pandas.

Examples:

# if you were in the csce585-midi directory:
mkdir analysis
python utils/analyze_logs.py --output_dir analysis

args:

  • --output_dir: output directory you want to use or create that will store the files created by this script REQUIRED
  • --plot: specify to create the plot
  • --tables: specify to make the tables
  • --report: specify to make the report file

By default, if you don't include any of the non-required flags, the script makes all three (plot, tables, and report) and stores them in the directory you gave.

Clone this wiki locally