Merged
Conversation
Closed
lilithgrigoryan
requested changes
Jun 18, 2025
Collaborator
There was a problem hiding this comment.
Hi @nithinraok. Thanks!
Overall looks good to me. I left some comments about docs and docstrings, that need to be fixed and I will review code once more. Also, please, consider adding end2end tests.
|
|
||
| # Step 2: Populate Full Text for Manifest | ||
| class CreateFullAudioManifestEarnings21(BaseParallelProcessor): | ||
| """ |
Collaborator
There was a problem hiding this comment.
Same here. Please, add proper docstrings and update api.rst
Collaborator
Author
|
Updated based on comments. @lilithgrigoryan pls have a look |
lilithgrigoryan
requested changes
Jul 1, 2025
Collaborator
There was a problem hiding this comment.
@nithinraok Thanks for the code and great docs!
Minor question, can we rename the earnings21 folders to just earnings? From what I understand, the configs cover both earnings21 and earnings22, right?
Otherwise LGTM
Signed-off-by: Nithin Rao Koluguri <nithinrao.koluguri@gmail.com>
Signed-off-by: Nithin Rao Koluguri <nithinrao.koluguri@gmail.com>
Signed-off-by: Nithin Rao Koluguri <nithinrao.koluguri@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add Earnings21/22 Dataset Processing Pipeline with Forced Alignment
Overview
This PR introduces a complete 7-step processing pipeline for converting Earnings21 and Earnings22 datasets to NeMo format with advanced forced alignment capabilities. The pipeline supports both full dataset processing and evaluation subsets with optional speaker segmentation.
High-Level Changelog
New Features
Core Pipeline Processors:
CreateInitialAudioAndManifest: Initial audio manifest creation with automatic audio conversion (MP3 → WAV, multi-channel → mono, any sample rate → 16kHz)CreateFullAudioManifestEarnings21: Ground truth text reconstruction from NLP token files with punctuation/capitalization preservationNeMoForcedAligner: Word-level forced alignment using NeMo ASR models with CTC headsCreateSentenceSegmentedManifest: Intelligent sentence-level segmentation based on CTM files with punctuation-aware splittingSpeakerSegmentedManifest: Speaker-change detection and segmentation with optional metadata mappingDataset Support:
Audio Processing:
Pipeline Configuration
7-Step Processing Workflow:
Key Configuration Options:
dataset_type: "earnings21" | "earnings22"subset: "full" | "eval10" (earnings21 only)forced_alignment_model: Configurable NeMo ASR modelpreserve_punctuation/preserve_capitalization: Text processing optionsinclude_speaker_info/include_tags: Optional metadata inclusionOutput Formats
Sentence-Level Segments (Primary Output):
{ "audio_filepath": "/path/to/audio.wav", "duration": 15.2, "offset": 45.3, "text": "This is a complete sentence with proper punctuation.", "alignment": [ {"word": "This", "start": 45.3, "end": 45.6}, {"word": "is", "start": 45.6, "end": 45.8} ] }Speaker-Level Segments (Optional):
{ "audio_filepath": "/path/to/audio.wav", "duration": 0, "text": "Speaker segment text...", "speaker": "speaker_1", "segment_id": 0 }Usage Examples