Create new cookbook for utilizing supervision methods to easily create YOLO datasets for training #1388

xaristeidou · 2024-07-21T16:37:17Z

Search before asking

I have searched the Supervision issues and found no similar feature requests.

Description

I find myself creating dataset structures and split in train, valid, test and images, labels folders multiple times. The whole process could easily be automated.

Use case

There are methods that currently exist to load a dataset sv.DetectionDataset.from_yolo(), split in selected ratio sv.DetectionDataset.split() and export to YOLO format sv.DetectionDataset.to_yolo().

Nevertheless, in creation of YOLO model training dataset structure, someone must write a custom split in train/valid/test (because split() is able to split only in two parts), and also create manually the train/valid/test folders needed for the preparation of the dataset. (As of my knowledge ultralytics YOLO models require by default to have train/valid folders that contain valid and not empty annotations, test folder can be empty).

For that reason I propose a new method to be added in sv.DetectionDataset which will combine the arguments of from_yolo(), split(), to_yolo() and will run the whole backend for creating train/valid/test folder and images/labels subfolders along with data.yaml file.

At this current point I have developed an implementation of such method which provides the ability to the user to create a YOLO dataset structure with a single line of code. An example of executing such a process can be seen in the following example:

import supervision as sv

dataset_directory = "/path/to/directory"

sv.DetectionDataset.create_yolo_dataset(
    images_directory_path=f"{dataset_directory}/images",
    annotations_directory_path=f"{dataset_directory}/labels",
    data_yaml_path=f"{dataset_directory}/data.yaml",
    train_ratio=0.7,
    valid_ratio=0.15,
    folders_export_path=f"{dataset_directory}",
    data_yaml_export_path = f"{dataset_directory}/data.yaml",
)

Additional

The test_ratio is automatically calculated based on the train_ratio, valid_ratio.
The arguments provided in the example are all mandatory arguments to be passed.
The user can provide any additional optional argument incorporated in from_yolo(), split(), to_yolo().

Let me know if you like this idea, and if you want to submit a PR with the initial implementation.

Are you willing to submit a PR?

Yes I'd like to help by submitting a PR!

The text was updated successfully, but these errors were encountered:

SkalskiP · 2024-07-22T11:20:49Z

Cześć @xaristeidou 👋🏻

To be honest, I would really prefer not to treat YOLO differently than other data formats. The Supervision API aims to provide reusable building blocks like sv.DetectionDataset.split or sv.DetectionDataset.as_yolo that you can compose together. To be honest, that sounds like the expected usage of supervision.

xaristeidou · 2024-07-22T12:13:47Z

@SkalskiP That is a fact, I was thinking about that it is "too much" automation. Maybe I could create a cookbook similar to 'Serialise Detections to a CSV File' and 'Serialise Detections to a JSON File', guiding and combining the aforementioned methods to construct a YOLO dataset easily.

SkalskiP · 2024-07-22T13:30:06Z

I think the cookbook makes a lot more sense. We also released this how-to guide last week. MAybe you could reuse some of those code snippet in your cookbook?

xaristeidou · 2024-07-22T14:09:58Z

Yeah sure!

SkalskiP · 2024-07-23T07:39:57Z

@xaristeidou should I expect cookbook PR? ;)

xaristeidou · 2024-07-23T08:09:36Z

@SkalskiP Yes, but not immediately, i don't have it ready now. I will work for it mostly this weekend.

xaristeidou · 2024-08-05T07:52:53Z

@SkalskiP I have ready the cookbook. I think we use first proceed with PR #1422 , so we can test the notebook properly, because in the final stage I run model.train() and it raises an error due to not proper data.yaml format.

xaristeidou added the enhancement New feature or request label Jul 21, 2024

xaristeidou changed the title ~~Add new method to sv.DetectionDataset which will create/export proper/final structure of YOLO dataset for training~~ Create new cookbook for utilizing supervision methods to easily create YOLO datasets for training Jul 23, 2024

xaristeidou mentioned this issue Aug 1, 2024

Add train, val, test folder paths in data.yaml at save_data_yaml() #1422

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create new cookbook for utilizing supervision methods to easily create YOLO datasets for training #1388

Create new cookbook for utilizing supervision methods to easily create YOLO datasets for training #1388

xaristeidou commented Jul 21, 2024

SkalskiP commented Jul 22, 2024

xaristeidou commented Jul 22, 2024 •

edited

Loading

SkalskiP commented Jul 22, 2024

xaristeidou commented Jul 22, 2024

SkalskiP commented Jul 23, 2024

xaristeidou commented Jul 23, 2024

xaristeidou commented Aug 5, 2024

Create new cookbook for utilizing supervision methods to easily create YOLO datasets for training #1388

Create new cookbook for utilizing supervision methods to easily create YOLO datasets for training #1388

Comments

xaristeidou commented Jul 21, 2024

Search before asking

Description

Use case

Additional

Are you willing to submit a PR?

SkalskiP commented Jul 22, 2024

xaristeidou commented Jul 22, 2024 • edited Loading

SkalskiP commented Jul 22, 2024

xaristeidou commented Jul 22, 2024

SkalskiP commented Jul 23, 2024

xaristeidou commented Jul 23, 2024

xaristeidou commented Aug 5, 2024

xaristeidou commented Jul 22, 2024 •

edited

Loading