Draft
Conversation
…commons, unidecode, httpx, requests-cache, orjson, and ftfy
- Updated import from `CompositeWebScraper` to `CompositeScraper` for consistency. - Introduced `create_default_source_validator` function to facilitate the creation of a source validator with default parameters, enhancing the validation process.
- Refactored `source_validator.py` to include comprehensive DOI and ISSN validation with detailed error handling. - Implemented a journal index for efficient fuzzy matching of source titles against a whitelist. - Updated the `arun` method to handle validation results and exceptions more gracefully. - Modified example scripts and tests to reflect changes from journal to source validation terminology. - Improved logging for better debugging and tracking of validation processes.
- Introduced `source_validation_pipeline_test.py` to demonstrate the complete source validation process. - Implemented functionality to search for research papers, extract DOIs, validate sources against a whitelist, and generate validation reports. - Included sample queries for Earth Sciences and Astronomy to showcase the pipeline's capabilities. - Added comprehensive inline comments and docstrings for clarity and maintainability.
- Introduced `simple_source_validation_example.py` to demonstrate the core functionality of the source validation process without complex dependencies. - Implemented a test case with real DOIs to validate against a whitelist and included a paper without a DOI to showcase validation failure. - Added comprehensive inline comments and docstrings for clarity and maintainability. - Utilized asynchronous programming to handle validation and output results effectively.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Title
Fix Source Validator Crossref API, Thread Safety and Performance Issues
Summary
Validation Improvements:
Performance: