- 
                Notifications
    You must be signed in to change notification settings 
- Fork 2.9k
Fixing Aggregate Episodes for >1 Video File Aggregation #2212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fixing Aggregate Episodes for >1 Video File Aggregation #2212
Conversation
| Thanks @nicholas-maselli ! I'll test it out. I have a script to stress test the merging https://gist.github.com/michel-aractingi/d9e7a41f1738bf976518c7abdd63fe20 Maybe it wasn't comprehensive enough to catch this bug | 
| Hey @michel-aractingi 👋 I actually remember testing this thing out myself back when I worked on #1264. I had a fairly comprehensive set of tests to verify aggregation of multiple data & video files, and tests check out things frame by frame so I am fairly surprised to read this @nicholas-maselli, but I am sure something is off. Could you perhaps provide a MRE of the issue so that we can see from there what exactly is off? Thank you so much for looking into this btw 🤗 | 
| 
 Is there someplace I can send you the series of datasets I tested this on. They are fairly large with custom cameras / robots so its possible there were some corner cases as a result of that but ill be using this code a lot so ill be able to catch any oddities I see and let you guys know | 
| ) | ||
| current_offset += src_duration | ||
|  | ||
| videos_idx[key]["episode_offset"] = current_offset | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current offset must be saved for each iteration of the loop in the function above this.
Currently the current offset is not saved after it resets to 0, it continues on from the total frame count in the next iteration of the outer loop which makes the bug tricky to spot
| Thanks for working on this, @nicholas-maselli. I've noticed the a very similar problem. I'm not sure if I'm hitting a different edge case than you're describing here, but on both  I've merged 2 datasets, one with ~1600 episodes and one with ~30 or so as a test here. Hoping to explore and solve this issue with you! | 
| data_idx = {"chunk": 0, "file": 0} | ||
| videos_idx = { | ||
| key: {"chunk": 0, "file": 0, "latest_duration": 0, "episode_duration": 0} for key in video_keys | ||
| key: {"chunk": 0, "file": 0, "episode_duration": 0, "episode_offset": 0, "src_to_offset": {} } for key in video_keys | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| key: {"chunk": 0, "file": 0, "episode_duration": 0, "episode_offset": 0, "src_to_offset": {} } for key in video_keys | |
| key: {"chunk": 0, "file": 0, "episode_duration": 0, "episode_offset": 0, "src_to_offset": {} } | |
| for key in video_keys | 
To fix the Quality test
| Hey @brysonjones could you confirm if this PR fixes your issue? | 
| 
 Hey @michel-aractingi, I just tested and got this same error with the most recent changes: To give a bit more color about the test here: 
 | 
| Thanks @brysonjones I'll try to reproduce your issue by merging a converted dataset with a recorded dataset. If you can provide any minimal way of reproduce the buggy behaviour your getting it would be great help, @brysonjones @nicholas-maselli | 
| 
 @michel-aractingi I did try using the most recent conversion script updates to try again and unfortunately saw no change. Will continue to run some experiments and let you know what I find. If there's additional info in the stacktrace or logging that may be helpful, let me know and I can get that! | 
| 
 My apologies i was away let me send you guys the test file's I've been using (custom robot with 4 cameras, 25 datasets, 10 episodes per set) This error you posted was exactly the same error I was getting. The reason is because episode the video file moves to file-001, the episode metadata starts at frame 0 again (which is correct) but then after the first episode into that new video file, it goes back to 1000+ with the frame index because the current index is only temporarily reset to 0 not for the entire loop | 
| This was my explanation from in the code but ill add it to the conversation here: " Currently the current offset is not saved after it resets to 0, it continues on from the total frame count in the next iteration of the outer loop which makes the bug tricky to spot | 
| @nicholas-maselli @michel-aractingi I've continued to do more experimentation and the way that I've had this recreate this error is by having a large V21 dataset, converting it to V30, and then merging with another V30 dataset. Something in this process is causing the merge to be corrupted, where the frame indices are incorrect along with some potential video decoding problems. Unfortunately, this update doesn't seem to fix that issue from what I see. | 
What this does
When combining more then 1 video file, the offset incorrectly resets back to the "latest_duration". This is a fairly major bug because it prevents training on the files at all
This is because we set the offset to 0 when rotating to a new file, we set the current_offset to the src_duration but then outside of the loop we reset the current_offset to the large value of latest_duration again.
This PR removes the latest_duration, adds an episode offset variable for clean current_offset saving, and removes from dead code around the latest_offset when saving the data.
How it was tested
This code was tested on a set of 25 datasets with 10 episodes per dataset and 4 cameras active. Before the changes the metadata index's were reseting to 1000+ after the first episode of the second video file.
I can give the datasets at request but its very large for github
How to checkout & try? (for the reviewer)
Provide a simple way for the reviewer to try out your changes.
Examples:
You can use the dataset add aggregator to test. I wrote a custom script for this (I can add this if you want)