Skip to content

Conversation

@boeddeker
Copy link
Member

Fix for cache, from_dataset and new to work properly with input datasets, that have no support for keys, because they are duplicated.

The issue was, that duplicates were dropped:

  • list(ds.items()) worked and created a list of key values pairs
  • dict(list(ds.items())) removed duplicated keys.

Now the code checks, that no examples get lost and concatenate is fixed to raise an exception, when items is called, while some keys are duplicated.

@boeddeker
Copy link
Member Author

Now the code checks, that no examples get lost and concatenate is fixed to raise an exception, when items is called, while some keys are duplicated.

Concatenate doesn't know if a following filter may remove duplicates. Hence, added back the items support in concatenate and check instead in from_dataset, if the input dataset has duplicates.

@boeddeker boeddeker merged commit 0c32ff7 into fgnt:master Dec 3, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants