Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Datasets with elements of different length #1084

Open
BrokenDuck opened this issue Jan 6, 2025 · 2 comments
Open

Support for Datasets with elements of different length #1084

BrokenDuck opened this issue Jan 6, 2025 · 2 comments

Comments

@BrokenDuck
Copy link

Currently, the skorch.dataset.get_len function does not accept dictionary input with elements of different size, raising ValueError("Dataset does not have consistent lengths.").

This behavior is problematic in cases such as GNNs with pytorch geometric, where the forward method expects node features and an edge index of a different size.

Accommodating for that could involve modifying the get_len function in some way to specify a length for a batch if a custom collate_fn is used.

I could implement that change in the library.

@BenjaminBossan
Copy link
Collaborator

Thanks for the report. We do have an example of using pytorch geometric with a section on data handling, not sure if this can be applied to your use case. Also pinging @githubnemo since he added the notebook.

@DCoupry
Copy link

DCoupry commented Feb 14, 2025

Been playing with this for a bit too. The main problem here is the ability to use torch_geometric with complex pipelines, which often await tabular data.

proposed solution: metadata routing. If we can successfully route the graphs data (e.g: an array or pandas DataFrame of torch_geometric.data.Data), it would be trivial to pass them as additional fit params, and use a collate_fn to handle stuff from there. This is also critical for complex, non-GNN architectures receiving multiple types of "X", so a nice thing to have anyway.

Metadata routing is not implemented for now in skorch NeuralNet, it might be a good idea to open an issue dedicated to it?
edit: added a dedicated issue: #1095

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants