Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for #108 - Improvements to data sampling logic #129

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

SophiaLi20
Copy link
Contributor

This PR addresses the issue described in #108 and incorporates the feedback mentioned in this comment.

Key Changes:

  • Refactored the data sampling logic to improve accuracy and performance.
  • Fixed incorrect sampling intervals that were causing mismatches in the dataset.
  • Updated sampler.py to ensure compatibility with the updated configuration.
  • Enhanced documentation in key areas to clarify the sampling methodology.
  • Added unit tests to validate the new sampling behavior and ensure edge cases are handled correctly.

Testing:

  • Tested the changes locally using sample datasets provided in the repository.
  • Added automated unit tests for the updated sampling logic:
  • Verified results for small datasets.
  • Confirmed correct handling of boundary conditions (e.g., missing values, irregular timestamps).
  • All tests passed successfully.

Additional Notes:

  • These updates aim to ensure more robust and accurate data handling for downstream processes.
  • The changes have been documented in README.md under the "Sampling Process" section.
  • Please review the changes, and let me know if any further adjustments are required.

Checklist:

  • [ x] My code follows OCF's coding style guidelines

  • [x ] I have performed a self-review of my own code

  • [x ] I have made corresponding changes to the documentation

  • I have added tests that prove my fix is effective or that my feature works

  • [x ] I have checked my code and corrected any misspellings

SophiaLi20 and others added 3 commits December 5, 2024 20:53
Updated the broken link to reference the correct torch datasets documentation
Co-authored-by: Sukhil Patel <[email protected]>
The key changes that are made in the code are -

Column Renaming: Used DataFrame.rename() for consistent column renaming.
Broadcasting Metadata: Simplified the creation of capacity_kwp using set_index() and loc[] to filter metadata efficiently.
Validation: Added checks for required metadata columns to ensure robustness.
Clarity: Reorganized and documented steps for better maintainability.
The code is more efficient, easier to read, and robust against common errors.
README.md Outdated
@@ -18,7 +18,8 @@ We are currently migrating to this repo from [ocf_datapipes](https://github.com/
## Documentation

**ocf-data-sampler** doesn't have external documentation _yet_; you can read a bit about how our torch datasets work in the Readme [here](https://github.com/openclimatefix/ocf-data-sampler/tree/readme-update/ocf_data_sampler/torch_datasets).
**ocf-data-sampler** doesn't have external documentation _yet_; you can read a bit about how our torch datasets work in the Readme [here]([torch datasets documentation](ocf_data_sampler/torch_datasets/README.md)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**ocf-data-sampler** doesn't have external documentation _yet_; you can read a bit about how our torch datasets work in the Readme [here]([torch datasets documentation](ocf_data_sampler/torch_datasets/README.md)
**ocf-data-sampler** doesn't have external documentation _yet_; you can read a bit about how our torch datasets work in the Readme [here](ocf_data_sampler/torch_datasets/README.md)

@Sukh-P
Copy link
Member

Sukh-P commented Jan 30, 2025

Thanks for this @SophiaLi20, left a few small comments it would be good to address and then sort out the merge conflict, after that we can get this merged in, thanks!

Added comments to test_config.py for better clarity

Added inline comments to explain what each test case is checking. This should make it easier for others to understand the purpose of each test without having to go through the code in detail. No actual functionality was changed—just improving readability.
Improve Readability of test_config.py by Adding Inline Comments
Added inline comments to explain what each test case is checking. This should make it easier for others to understand the purpose of each test without having to go through the code in detail. No actual functionality was changed
Copy link
Contributor Author

@SophiaLi20 SophiaLi20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The formatting looks much better, and the link is clear now. Everything looks good to me! Let me know if there's anything else I can update.

@SophiaLi20
Copy link
Contributor Author

Hey @Sukh-P , I sincerely apologize for the delay—I had exams and couldn’t get back to this sooner. I've now implemented the suggested changes and fixed the merge conflict. Please have a look when you get a chance, and let me know if there's anything else I should change. I’ll make any necessary changes right away. Thanks for your patience and guidance!

@Sukh-P
Copy link
Member

Sukh-P commented Mar 10, 2025

@SophiaLi20 sorry about the delay but is it okay to revert any changes you have made to the test_config file for now? I want to keep this PR to the original scope that you set out in the issue, then we can get this merged in, thanks!

Copy link
Member

@Sukh-P Sukh-P left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want to reduce the scope of this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants