This directory provides scripts to run inference on Wav2Vec2ForCTC. These scripts are tested and maintained by Intel® Gaudi®. Before you get started, make sure to review the Supported Configurations.
For more information on training and inference of deep learning models using Intel Gaudi AI accelerator, refer to developer.habana.ai.
For model performance data, refer to the Intel Gaudi Model Performance Data page.
This Wav2Vec2 model comes with a language modeling head on top for Connectionist Temporal Classification (CTC).
This model is based on PreTrainedModel. For details on the generic methods the library implements for all its models (such as downloading or saving etc.), refer to Wav2Vec2 for CTC section.
Please follow the instructions provided in the Gaudi Installation Guide
to set up the environment including the $PYTHON
environment variable. To achieve the best performance, please follow the methods outlined in the Optimizing Training Platform Guide.
The guides will walk you through the process of setting up your system to run the model on Gaudi.
In the docker container, clone this repository and switch to the branch that matches your Intel Gaudi software version.
You can run the hl-smi
utility to determine the Intel Gaudi software version.
git clone -b [Intel Gaudi software version] https://github.com/HabanaAI/Model-References
Note: If the repository is not in the PYTHONPATH, make sure to update by running the below.
export PYTHONPATH=/path/to/Model-References:$PYTHONPATH
In the docker container, go to the directory:
cd /root/Model-References/PyTorch/audio/wav2vec2/inference
Install the required packages using pip:
$PYTHON -m pip install -r requirements.txt
- Run inference on 1 HPU, mixed precision BF16, test-clean dataset(2620 samples), base model:
$PYTHON wav2vec.py --dtype bf16 --buckets 5 --use_graphs --perf -a
- Run inference on 1 HPU, mixed precision BF16, test-clean dataset(2620 samples), large model:
$PYTHON wav2vec.py --dtype bf16 --buckets 5 --use_graphs --perf -a --large
- Run inference on 1 HPU, mixed precision BF16, dev-clean dataset(73 samples), base model:
$PYTHON wav2vec.py --dtype bf16 --buckets 5 --use_graphs --perf -a --dev_clean_ds --repeat 25
- Run inference on 1 HPU, precision FP32, test-clean dataset(2620 samples), base model:
$PYTHON wav2vec.py --dtype fp32 --buckets 5 --use_graphs --perf -a
- Run inference on 1 HPU, precision FP32, test-clean dataset(2620 samples), large model:
$PYTHON wav2vec.py --dtype fp32 --buckets 5 --use_graphs --perf -a --large
- Run inference on 1 HPU, precision FP32, dev-clean dataset(73 samples), base model:
$PYTHON wav2vec.py --dtype fp32 --buckets 5 --use_graphs --perf -a --dev_clean_ds --repeat 25
This model uses "HPU Graphs" feature by default to minimize the host time spent in the forward()
call.
If HPU Graphs are disabled, there could be noticeable host time spent in interpreting the lines in
the forward()
call, which can result in a latency increase.
Validated on | Intel Gaudi Software Version | PyTorch Version | Mode |
---|---|---|---|
Gaudi | 1.10.0 | 2.0.1 | Inference |
Gaudi 2 | 1.11.0 | 2.0.1 | Inference |
Performance improvements.
Initial release.
The following lists the modifications applied to the script from huggingface/wav2vec.
-
Added support for Gaudi devices:
- Added dtype support.
- Added perf measurement flag.
- Added "large" model flavor.
- Added -a flag for measuring accuracy (WER).
- Added test-clean dataset support with 2620 samples.
-
To improve performance:
- Added bucketing support.
- Added HPU Graphs support.
- Enabled async D2H copy using HPU streams.
- Enabled async HPU Graphs execution (HPU Graphs are launched on a separate thread to free up main execution thread for CPU processing).
For users who intend to modify this script, run new models or use new datasets, other than those used in this reference script, the following is recommended:
- Periodically synchronize all active threads (e.g., every 2620 samples done in reference script). This allows freeing up of resources (e.g. Host pinned memory) and avoids failure due to resource exhaustion. This synchronization duration can be empirically determined for a given model & dataset.
- Ensure the number of streams created do not exceed 3000 (2620 streams created in reference script). Reuse streams if a number larger than this is required.