Skip to content

Conversation

@anabossler
Copy link

Pull Request

Description

This PR implements the Neo4j Vector Store feature requested in #7.

Adds the following bibliometric analysis modules:

  • MetadataScopus: Semantic analysis of Scopus metadata across multiple research domains (Full Corpus, Cross-cutting, Environmental Assessment, Recycling Processes, Material Polymers, Regulatory Economics, Social Perception)
  • OpenAlex: Integration with OpenAlex API for bibliometric data retrieval and VOSviewer export
  • ScopusCrossRef: CrossRef integration with semantic analysis, funding data, and vector search capabilities
  • WebScrapperPatents: Patent web scraping functionality using Selenium
  • neo4j_vector_service: Vector store service with retrieval agents for graph-based search

Type of Change

  • 🐛 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📚 Documentation update
  • ⚡ Performance improvement
  • 🧹 Code refactoring
  • 🧪 Test addition or update
  • 🔧 Configuration change
  • 🧬 Bioinformatics enhancement
  • 🔄 Workflow improvement

Component

  • Core Workflow Engine
  • PRIME Flow (Protein Engineering)
  • Bioinformatics Flow (Data Fusion)
  • DeepSearch Flow (Web Research)
  • Challenge Flow (Experimental)
  • Tool Registry
  • Agent System
  • Configuration (Hydra)
  • Pydantic Graph
  • Documentation
  • Tests
  • Other: Bibliometric Analysis Modules

Related Issues

Changes Made

  • Added MetadataScopus module with semantic analysis capabilities for multiple research domains
  • Added OpenAlex API integration with VOSviewer export functionality
  • Added ScopusCrossRef module with semantic analysis, funding data extraction, and vector search
  • Added WebScrapperPatents module for patent data scraping
  • Added neo4j_vector_service with retrieval agents and vector store implementation
  • Included configuration files and example scripts for all modules

Testing

  • I have tested these changes locally
  • I have added/updated tests for my changes
  • All existing tests pass
  • I have tested with different configurations
  • I have tested with different flows (PRIME, Bioinformatics, DeepSearch, etc.)

Test Configuration

# Testing vector search functionality
python src/ScopusCrossRef/script7_vector_search.py

# Testing Neo4j connection
python src/ScopusCrossRef/test_neo4j_connection.py

Configuration Changes

  • Added new configuration options
  • Modified existing configuration
  • Removed configuration options

Configuration Details

# Neo4j Vector Store configuration
neo4j_vector_service:
  enabled: true
  agentes:
    retrieval_agent: enabled
  service:
    embeddings: configured
    vector_store: configured

Documentation

  • No documentation changes needed
  • Updated README
  • Updated API documentation
  • Updated configuration documentation
  • Added code comments
  • Updated examples

Performance Impact

  • No performance impact
  • Performance improvement
  • Performance regression (explain below)

Performance Details

  • Graph-based vector search provides better performance for heterogeneous data
  • Improved retrieval accuracy through Neo4j vector store
  • Enhanced search capabilities across multiple bibliometric data sources

Breaking Changes

  • No breaking changes
  • Breaking change (describe below)

Checklist

  • My code follows the project's style guidelines
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published

Additional Notes

This implementation provides a foundation for graph-based bibliometric analysis and vector search capabilities. The modules are designed to work both independently and as part of integrated workflows.

Screenshots/Output

N/A - Backend modules for data processing and vector search

Reviewer Notes

Please review the following areas:

  • Neo4j vector store implementation and retrieval agent logic
  • Semantic analysis scripts and their integration with the vector store
  • Configuration management across modules
  • Error handling and logging practices

Josephrp and others added 27 commits October 1, 2025 08:20
Signed-off-by: Tonic <[email protected]>
* fix: resolve circular import chain and missing dependencies

- Extract ToolSpec and ToolCategory to separate tool_specs.py module
- Fix double src import paths throughout codebase
- Use TYPE_CHECKING and runtime imports to break agent circular dependencies
- Add missing dependencies: trafilatura, gradio, limits, python-dateutil
- Correct analytics module import paths
- Add Union import to app.py
- Create configs/__init__.py for Hydra
- Update graph instantiation to include workflow nodes

* fix: resolve CI failures for lint and test jobs

Fixes two CI failures blocking PR #23:

1. Lint failure - removed 34 unused imports from DeepResearch/agents.py
   - Removed unused typing imports (Union, Type, Callable, Tuple)
   - Removed unused pydantic imports (BaseModel, Field, validator)
   - Removed unused pydantic_ai imports (RunContext, ModelRetry)
   - Removed unused datatype imports across rag, bioinformatics, and deep_agent modules
   - Fixed using ruff --fix for F401 errors

2. Test failure - added missing pytest-cov dependency
   - Added pytest-cov>=4.0.0 to dev dependencies in pyproject.toml
   - Updated uv.lock with pytest-cov==7.0.0 and coverage==7.10.7
   - Resolves "unrecognized arguments: --cov" error in CI test jobs

These changes ensure the circular import fix (commit 12122b2) passes all CI checks.

* fix: resolve all remaining lint errors in codebase

Fixes all lint errors that were blocking CI checks in PR #23:

Automated fixes (ruff --fix):
- Removed 240+ unused imports (F401) across 40+ files
- Removed 52 unnecessary f-string prefixes (F541) in app.py
- Removed 7 unused variable assignments (F841)

Manual fixes:
- tools/__init__.py: Added noqa comments for intentional side-effect imports
- code_sandbox.py: Fixed Python 3.10 f-string syntax (no backslash in f-strings)
- workflow_orchestrator.py: Added missing WorkflowConfig import (F821)
- chroma_dataclass.py: Renamed count() method to get_count() to avoid redefinition (F811)
- chunk_dataclass.py: Changed bare except to except Exception (E722)

All ruff checks now pass. This completes the CI lint fixes for the circular import resolution.

* fix: apply ruff formatting and add tests directory

Addresses CI failures in PR #23:

- Applied ruff format to all 76 modified Python files for consistent code style
- Added tests/__init__.py to prevent test discovery errors in CI
- All files now pass ruff format --check validation

These changes ensure the circular import fix passes all CI checks.

* fix: add placeholder test to satisfy CI test requirements

Resolves pytest exit code 5 when no tests are collected with --cov flag.
The placeholder test allows CI to pass while the test suite is being developed.

* fix: reorder Hydra defaults and migrate to dependency-groups

- Move Hydra override directives to end of defaults list
- Add _self_ to defaults to prevent composition warnings
- Migrate from tool.uv.dev-dependencies to dependency-groups.dev
- Add bandit to dev dependencies for security scanning

Resolves ConfigCompositionException in integration tests and eliminates
deprecation warnings from uv.

* chore: update uv.lock after dependency-groups migration

Updates lockfile to reflect the migration from tool.uv.dev-dependencies
to dependency-groups.dev and the addition of bandit.

* fix: remove unsupported --dry-run flag from integration test

The --dry-run flag is not implemented in the application.
Removed from CI to allow integration tests to pass.
* adds prompt testing using my fork of testcontainers

* adds tests , testscontainers , vllm object , scripts
* adds prompt testing using my fork of testcontainers

* adds tests , testscontainers , vllm object , scripts

* adds agentic design patterns

* adds linting , tests pass

* Complete agent interaction design patterns implementation - all tests passing

* Fix CI dependencies - install pydantic and omegaconf before running tests

* adds linting , tests pass

---------

Signed-off-by: Tonic <[email protected]>
* adds prompt testing using my fork of testcontainers

* adds tests , testscontainers , vllm object , scripts

* adds agentic design patterns

* adds linting , tests pass

* Complete agent interaction design patterns implementation - all tests passing

* Fix CI dependencies - install pydantic and omegaconf before running tests

* adds linting , tests pass

* adds type checking , ruff , black , codecov on dev branch , fixes some linting errors

* adds type checking , ruff , black , codecov on dev branch , fixes some linting errors

* adds type checking , ruff , black , codecov on dev branch , fixes some linting errors

* Potential fix for code scanning alert no. 13: Workflow does not contain permissions

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Signed-off-by: Tonic <[email protected]>

* adds ruff formatting

---------

Signed-off-by: Tonic <[email protected]>
Signed-off-by: Tonic <[email protected]>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Removes all references to non-existent @defer decorator from codebase.
The @defer decorator never existed in Pydantic AI. Tools are correctly
implemented using standard Pydantic AI patterns.

Changes:
- Removed 16 @defer comments from tool files
- Updated README Known Issues section
- All tools continue to work correctly (no functional changes)

Fixes #2

Signed-off-by: marioaderman <[email protected]>
* adds readme improvements and adds ty and black to the dev tools

* adds pre-commit hooks , make file , contributing.md improvements

---------

Signed-off-by: Tonic <[email protected]>
Signed-off-by: Tonic <[email protected]>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
* adds docssite

* adds documentation site, local deployment configs, github pages - fixes unused import errors

* fix: disable black formatter in pre-commit to avoid conflicts with ruff-format

* adds documentation site

* adds documentation site, removes black linter , adds ruff formater

* Type Cast Issues: Fixed ty type checker errors by adding proper type casting with cast(dict[str, Any], config_result) in DeepResearch/src/utils/deepsearch_utils.py
Callable Check: Added explicit callable() check for tools_attr.append in DeepResearch/src/statemachines/deep_agent_graph.py
Hash Method: Added __hash__ method to UsageDetails class in DeepResearch/src/datatypes/agent_framework_usage.py to resolve PLW1641 error
Import Organization: Fixed __all__ sorting issues using ruff --fix --unsafe-fixes
Dictionary Iteration: Fixed PLC0206 errors by using .items() for dictionary iteration
Configuration: Added PLC0415 (imports outside top-level) to the ignore list in pyproject.toml since these are common and acceptable in test files

* attempts to pass linting

* attempts to pass linting

* attempts to pass linting

* attempts to build site from actions

---------

Signed-off-by: Tonic <[email protected]>
Signed-off-by: Tonic <[email protected]>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
* adds docs build to pre-commit config

---------

Signed-off-by: Tonic <[email protected]>
* fix: remove misleading @defer decorator comments

Removes all references to non-existent @defer decorator from codebase.
The @defer decorator never existed in Pydantic AI. Tools are correctly
implemented using standard Pydantic AI patterns.

Changes:
- Removed 16 @defer comments from tool files
- Updated README Known Issues section
- All tools continue to work correctly (no functional changes)

Fixes #2

* feat: add custom LLM model wrappers for Pydantic AI

- Implement VLLMModel wrapper around existing VLLMClient
- Add OpenAICompatibleModel for vLLM, llama.cpp, TGI servers
- Provide factory methods (from_vllm, from_llamacpp, from_tgi)
- Include streaming support and message conversion
- Add convenience aliases for VLLMModel and LlamaCppModel

* fix: update OpenAICompatibleModel to use OllamaProvider and add tests

- Replace non-existent OpenAIProvider with OllamaProvider from pydantic_ai
- Remove dataclass decorator to properly inherit from OpenAIChatModel
- Fix factory methods to pass model_name as positional argument
- Add comprehensive test suite with 8 passing tests
- Skip integration tests that require actual vLLM servers

* refactor: integrate LLM models with Hydra configuration system

- Add from_config() method to support Hydra DictConfig
- Update all factory methods (from_vllm, from_llamacpp, from_tgi, from_custom) to accept optional config
- Support config override via direct parameters
- Extract generation settings from config (temperature, max_tokens, etc.)
- Add environment variable fallbacks (LLM_BASE_URL, LLM_API_KEY)
- Create config files for llamacpp, tgi, and vllm
- Update tests to cover both config-based and direct parameter approaches
- All 10 tests passing

* feat: add LLM client support with Pydantic validation (#10)

- Add LLMModelConfig and GenerationConfig datatypes
- Remove redundant vllm_model.py
- Update openai_compatible_model.py with validation
- Rewrite tests to use actual config files (30 tests)

* fix: add LLM datatypes to __all__ export list

* solves type and style errors

* Add comprehensive LLM model configuration documentation

All code examples include proper type guards and annotations.

* Add Models section to core documentation

Auto-generates API reference for LLM model classes.
@anabossler anabossler closed this Oct 9, 2025
@anabossler anabossler reopened this Oct 9, 2025
* Start working on literature review agent - pubmed API tools

Adds a couple of dependencies, some mocking things to allow us to do API tests

This is work in progress, in particular the _build_paper function is not filling everything it needs to yet.

* Add fixture to disable ratelimiter during testing

May als be useful for the web_search rate limits, though the list of stuff to disable might get big

* ruff-format fixes

* ruff import order fixes

* More linter fixes

* Missed linter check on pytest config
Josephrp and others added 23 commits October 12, 2025 22:27
* initial commit - adds bio-informatics tools & mcp

* initial commit - adds bio-informatics tools & mcp

* improves code quality

* refactor bioinformatics tools , utils, prompts

* adds docs

* adds quite a lot of testing , for windows, docker, linux , testcontainers

* adds docker tests and related improvements

* Potential fix for code scanning alert no. 21: Workflow does not contain permissions

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Signed-off-by: Tonic <[email protected]>

* Potential fix for code scanning alert no. 17: Workflow does not contain permissions

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Signed-off-by: Tonic <[email protected]>

* adds optional bioinformatics tests

* adds optional bioinformatics tests per branch option to allow fail

* adds pytest to replace uv

* adds dockers , docker tests , tools tests , ci , make file improvements

* merge commit

* removes docker from ci

* removes docker from ci

* feat: add bioinformatics MCP servers and tools infrastructure

* fix linter types and checks version , fix tests

* improves ci
* trigger codecov report
* fix: remove misleading @defer decorator comments

Removes all references to non-existent @defer decorator from codebase.
The @defer decorator never existed in Pydantic AI. Tools are correctly
implemented using standard Pydantic AI patterns.

Changes:
- Removed 16 @defer comments from tool files
- Updated README Known Issues section
- All tools continue to work correctly (no functional changes)

Fixes #2

* feat: add custom LLM model wrappers for Pydantic AI

- Implement VLLMModel wrapper around existing VLLMClient
- Add OpenAICompatibleModel for vLLM, llama.cpp, TGI servers
- Provide factory methods (from_vllm, from_llamacpp, from_tgi)
- Include streaming support and message conversion
- Add convenience aliases for VLLMModel and LlamaCppModel

* fix: update OpenAICompatibleModel to use OllamaProvider and add tests

- Replace non-existent OpenAIProvider with OllamaProvider from pydantic_ai
- Remove dataclass decorator to properly inherit from OpenAIChatModel
- Fix factory methods to pass model_name as positional argument
- Add comprehensive test suite with 8 passing tests
- Skip integration tests that require actual vLLM servers

* refactor: integrate LLM models with Hydra configuration system

- Add from_config() method to support Hydra DictConfig
- Update all factory methods (from_vllm, from_llamacpp, from_tgi, from_custom) to accept optional config
- Support config override via direct parameters
- Extract generation settings from config (temperature, max_tokens, etc.)
- Add environment variable fallbacks (LLM_BASE_URL, LLM_API_KEY)
- Create config files for llamacpp, tgi, and vllm
- Update tests to cover both config-based and direct parameter approaches
- All 10 tests passing

* feat: add LLM client support with Pydantic validation (#10)

- Add LLMModelConfig and GenerationConfig datatypes
- Remove redundant vllm_model.py
- Update openai_compatible_model.py with validation
- Rewrite tests to use actual config files (30 tests)

* fix: add LLM datatypes to __all__ export list

* solves type and style errors

* initial commit - adds bio-informatics tools & mcp

* initial commit - adds bio-informatics tools & mcp

* improves code quality

* refactor bioinformatics tools , utils, prompts

* adds docs

* adds quite a lot of testing , for windows, docker, linux , testcontainers

* adds docker tests and related improvements

* Potential fix for code scanning alert no. 21: Workflow does not contain permissions

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Signed-off-by: Tonic <[email protected]>

* Potential fix for code scanning alert no. 17: Workflow does not contain permissions

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Signed-off-by: Tonic <[email protected]>

* adds optional bioinformatics tests

* adds optional bioinformatics tests per branch option to allow fail

* adds pytest to replace uv

* adds dockers , docker tests , tools tests , ci , make file improvements

* merge commit

* removes docker from ci

* removes docker from ci

* feat: add bioinformatics MCP servers and tools infrastructure

* fix linter types and checks version , fix tests

* improves ci

* trigger codecov report

* Update CI to upload test results to Codecov for test analytics

* Fix Codecov repository slug to use Josephrp/DeepCritical

* adds deepcritical/deepcritical repository slug

---------

Signed-off-by: Tonic <[email protected]>
Signed-off-by: Tonic <[email protected]>
Co-authored-by: MarioAderman <[email protected]>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
* attempts codecov trigger
* attempts codecov trigger
* attempts codecov trigger
* adds codecov cli
* adds codecov components and upload
Signed-off-by: Tonic <[email protected]>
Signed-off-by: Tonic <[email protected]>
* fix permissions
- attempts ci fix
* attempts ci fix for upload
- attempts make upload optional
* Add workflow context and edge tests

* Add workflow events test

* Add workflow middleware test

* Add middleware test
* adds code quality improvements

* adds code execution tests,  documentation , agents , flows

* adds code execution tests,  documentation , agents , flows

* update tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants