Skip to content

Conversation

FortiShield
Copy link
Contributor

@FortiShield FortiShield commented Sep 26, 2025

User description

Category:

One of: Bugfix / Feature / Code style update / Refactoring Only / Build related changes / Documentation / Other (please specify)

Overview

Briefly outline your new changes...

Issue Number (if applicable) #00

New Vars (if applicable)

If you've added any new build scripts, environmental variables, config file options, dependency or devDependency, please outline here

Screenshot (if applicable)

If you've introduced any significant UI changes, please include a screenshot

Code Quality Checklist (Please complete)

  • All changes are backwards compatible
  • All lint checks and tests are passing
  • There are no (new) build warnings or errors
  • (If a new config option is added) Attribute is outlined in the schema and documented
  • (If a new dependency is added) Package is essential, and has been checked out for security or performance
  • (If significant change) Bumps version in package.json

PR Type

Enhancement, Documentation


Description

Major architectural refactoring: Renamed "assistant" to "agent" throughout the entire codebase for consistency
Complete web interface: Added FastAPI-based REST API with WebSocket support, authentication, and modern HTML/CSS/JS frontend
Comprehensive authentication system: Implemented JWT-based authentication with role-based access control and user management
Persistent storage: Added SQLite-based conversation and message storage with Alembic migrations
Tool system architecture: Created extensible tool registry with automatic parameter generation from type hints
Core agent framework: Implemented base agent classes with OpenAI integration and conversation management
Modern project structure: Added pyproject.toml, streamlined dependencies, and comprehensive documentation
Database infrastructure: Added SQLAlchemy models, migration scripts, and initialization utilities
Enhanced examples: Updated workday summarizer and added interactive usage examples


Diagram Walkthrough

flowchart LR
  A["Legacy Assistant"] --> B["Core Agent System"]
  B --> C["FastAPI Web Interface"]
  B --> D["Tool Registry"]
  B --> E["SQLite Storage"]
  C --> F["JWT Authentication"]
  C --> G["WebSocket Chat"]
  E --> H["Alembic Migrations"]
  F --> I["User Management"]
  G --> J["Modern Web UI"]
Loading

File Walkthrough

Relevant files
Enhancement
18 files
app.py
Add FastAPI web interface for agent interactions                 

src/gpt_computer_agent/api/app.py

• Created a complete FastAPI web application with conversation
management
• Implemented REST API endpoints for conversations,
messages, and WebSocket support
• Added authentication dependencies
and CORS middleware configuration
• Integrated with storage system and
agent for real-time chat functionality

+305/-0 
storage.py
Add SQLite-based conversation storage system                         

src/gpt_computer_agent/utils/storage.py

• Created persistent storage system using SQLite for conversations and
messages
• Implemented Message, Conversation, and Storage classes with
full CRUD operations
• Added database schema initialization and
conversation management methods

+301/-0 
common.py
Add common utility functions for configuration and operations

src/gpt_computer_agent/utils/common.py

• Added comprehensive utility functions for configuration, logging,
and file operations
• Implemented functions for timestamp generation,
URL validation, and checksum calculation
• Created environment
information gathering and directory management utilities

+246/-0 
tools.py
Add comprehensive tool system for agent capabilities         

src/gpt_computer_agent/core/tools.py

• Created tool system with Tool class and ToolRegistry for agent
capabilities
• Implemented automatic parameter generation from
function type hints
• Added tool validation, execution, and OpenAI API
format conversion

+208/-0 
agent.py
Add core agent architecture with OpenAI integration           

src/gpt_computer_agent/core/agent.py

• Created base agent architecture with BaseAgent and GPTChatAgent
classes
• Implemented OpenAI integration with tool calling and
conversation management
• Added agent configuration, state
saving/loading, and async execution support

+142/-0 
users.py
Add user management API routes with authentication             

src/gpt_computer_agent/api/routers/users.py

• Created comprehensive user management API routes with CRUD
operations
• Implemented authentication-protected endpoints with
role-based access control
• Added user creation, updating, deletion,
and profile management functionality

+152/-0 
security.py
Add JWT authentication and security utilities                       

src/gpt_computer_agent/core/security.py

• Implemented JWT-based authentication with access and refresh tokens

• Added password hashing, token creation/validation, and user
authentication
• Created FastAPI dependencies for current user and
active user retrieval

+115/-0 
logging.py
Add comprehensive logging configuration with loguru           

src/gpt_computer_agent/config/logging.py

• Created logging configuration system using loguru
• Implemented file
and console logging with rotation and retention
• Added interceptor
for standard library logging redirection

+108/-0 
user.py
Add user service layer for database operations                     

src/gpt_computer_agent/services/user.py

• Created user service layer with database operations for user
management
• Implemented user CRUD operations, authentication, and
first superuser creation
• Added password hashing and user validation
functionality

+103/-0 
user.py
Add Pydantic schemas for user data validation                       

src/gpt_computer_agent/schemas/user.py

• Created Pydantic schemas for user data validation and serialization

• Implemented user creation, update, authentication, and token schemas

• Added password strength validation and email validation

+94/-0   
__init__.py
Add authentication module with JWT and user management     

src/gpt_computer_agent/auth/init.py

• Created authentication module with JWT token handling
• Implemented
password hashing, token creation, and user authentication
• Added fake
user database for demonstration purposes

+106/-0 
initial_migration.py
Add initial database migration for user tables                     

alembic/versions/initial_migration.py

• Created initial database migration for users and refresh_tokens
tables
• Implemented table creation with proper indexes and foreign
key constraints
• Added upgrade and downgrade functions for database
schema management

+63/-0   
auth.py
Add authentication API routes for login and registration 

src/gpt_computer_agent/api/routers/auth.py

• Created authentication API routes for login and user registration

Implemented OAuth2-compatible token endpoint and user profile access

Added JWT token generation and user creation functionality

+82/-0   
env.py
Add Alembic migration environment configuration                   

alembic/env.py

• Created Alembic environment configuration for database migrations

Implemented offline and online migration modes with proper database
connection
• Added model imports and metadata configuration for
autogenerate support

+87/-0   
user.py
Add SQLAlchemy models for user authentication                       

src/gpt_computer_agent/models/user.py

• Created SQLAlchemy User and RefreshToken models for database storage

• Implemented user authentication properties and token management

Added proper table relationships and database constraints

+64/-0   
init_db.py
Add database initialization script with admin user creation

scripts/init_db.py

• Created database initialization script for setting up tables and
admin user
• Implemented automatic admin user creation with
configurable credentials
• Added error handling and database session
management

+60/-0   
index.html
Complete web interface implementation for GPT Computer Agent

src/gpt_computer_agent/api/templates/index.html

• Added complete HTML template for GPT Computer Agent web interface

Implemented responsive chat UI with sidebar for conversation
management
• Added JavaScript functionality for real-time messaging
and API integration
• Included Tailwind CSS styling and Font Awesome
icons for modern UI

+484/-0 
workday_summerizer.md
Enhanced workday summarizer with comprehensive monitoring features

example_use_cases/workday_summerizer.md

• Updated references from "GPT Computer Assistant" to "GPT Computer
Agent"
• Enhanced workday monitoring script with comprehensive
activity tracking
• Added JSON data storage and human-readable summary
generation
• Improved error handling and structured data collection

+154/-21
Refactoring
6 files
process.py
Rename assistant references to agent throughout processing

src/gpt_computer_agent/agent/process.py

• Renamed import from assistant to agent module
• Updated function
calls from assistant() to agent()
• Changed signal handler references
from assistant_response_ to agent_response_

+17/-17 
api.py
Update imports to use relative package references               

src/gpt_computer_agent/api.py

• Updated import statements to use relative imports instead of
absolute paths
• Changed references from gpt_computer_agent module to
current package imports

+18/-18 
button.py
Update GUI button signals from assistant to agent terminology

src/gpt_computer_agent/gui/button.py

• Updated signal handler connections from assistant_ to agent_
naming
• Changed method names and signal emissions to use agent
terminology

+14/-14 
chat_history.py
Rename Assistant class to Agent in chat history                   

src/gpt_computer_agent/utils/chat_history.py

• Renamed Assistant class to Agent and updated related references

Changed message type from "assistant" to "agent" throughout the
codebase
• Updated chat history processing to use agent terminology

+8/-8     
gpt_computer_agent.py
Update main application to use agent terminology                 

src/gpt_computer_agent/gpt_computer_agent.py

• Updated imports from assistant to agent_executor module
• Changed
variable names from assistant_stopped to agent_stopped
• Updated state
management to use agent terminology

+8/-8     
start.py
Update startup script with agent terminology and GitHub URL

src/gpt_computer_agent/start.py

• Updated docstring references from "computer assistant" to "computer
agent"
• Changed GitHub URL from khulnasoft to khulnasoft-lab
organization
• Updated import statements to use relative imports

+5/-5     
Documentation
5 files
basic_usage.py
Add basic usage example with interactive agent demo           

examples/basic_usage.py

• Created example script demonstrating agent usage with custom tools

Implemented sample tools for web search, calculator, and weather
functions
• Added interactive conversation loop for testing agent
capabilities

+133/-0 
README_NEW.md
New comprehensive README with modern documentation             

README_NEW.md

• Created new comprehensive README with modern project structure

Added detailed installation and usage instructions
• Included API
documentation and custom tool integration examples
• Added project
structure overview and contribution guidelines

+174/-0 
AUTHENTICATION.md
Authentication system documentation and implementation guide

AUTHENTICATION.md

• Added comprehensive authentication system documentation
• Included
JWT-based authentication with role-based access control
• Provided API
endpoint documentation and usage examples
• Added security
considerations and testing instructions

+119/-0 
README.md
Updated terminology from assistant to agent throughout README

README.md

• Updated Product Hunt badge URL from "assistant" to "agent"
• Changed
terminology from "assistant" to "agent" throughout documentation

Updated developer persona reference to use "agent" terminology
• Fixed
roadmap table references to use consistent "Agent" naming

+5/-5     
README.TR.md
Turkish README terminology update to use agent instead of assistant

README.TR.md

• Updated title from "GPT Computer Assistant" to "GPT Computer Agent"

• Changed image alt text references from "Assistant" to "Agent"

Updated terminology to maintain consistency with main README

+3/-3     
Configuration changes
4 files
settings.py
Add comprehensive application settings and configuration 

src/gpt_computer_agent/config/settings.py

• Created comprehensive settings management using Pydantic
BaseSettings
• Added configuration for API, authentication, database,
and platform-specific settings
• Implemented logging configuration and
environment variable support

+113/-0 
alembic.ini
Database migration configuration with Alembic setup           

alembic.ini

• Added Alembic configuration file for database migrations

Configured SQLite database connection string
• Set up logging
configuration for migration operations
• Included standard Alembic
settings and hooks

+111/-0 
pyproject.toml
Modern Python project configuration with pyproject.toml   

pyproject.toml

• Added modern Python project configuration file
• Defined project
metadata, dependencies, and build system
• Included development
dependencies and tool configurations
• Added command-line script entry
point for computeragent

+52/-0   
script.py.mako
Alembic migration script template for database versioning

alembic/script.py.mako

• Added Alembic migration script template
• Included standard revision
tracking and metadata
• Provided upgrade and downgrade function
templates
• Set up proper imports and revision identifiers

+24/-0   
Dependencies
1 files
requirements.txt
Modernized dependencies for web framework and development

requirements.txt

• Streamlined dependencies by removing legacy packages
• Added modern
web framework dependencies (FastAPI, uvicorn)
• Included database
support with SQLAlchemy and Alembic
• Added development tools for
testing and code quality

+37/-50 
Additional files
62 files
release_generation.yml +1/-1     
README.zh_CN.md +0/-160 
README.zh_TW.md +0/-208 
build_macos.sh +132/-0 
macos_build.sh +132/-30
Dockerfile +29/-0   
entrypoint.sh +7/-0     
mcp_langchain.ipynb +0/-272 
setup.sh +42/-0   
__init__.py [link]   
__init__.py +1/-1     
agent_executor.py +1/-1     
agent_tools.py [link]   
assistant.py +2/-2     
background.py [link]   
chat_history.py [link]   
agentic.py [link]   
__init__.py [link]   
input_box.py [link]   
record.py +3/-3     
stt.py [link]   
openai.py [link]   
openai_whisper_local.py [link]   
tts.py [link]   
microsoft_local.py [link]   
openai.py [link]   
wake_word.py [link]   
character.py +2/-2     
classes.py [link]   
ask_anthropic.py [link]   
base.py [link]   
computer.py [link]   
run.py [link]   
custom_callback.py +1/-1     
session.py +46/-0   
display_tools.py +4/-2     
__init__.py [link]   
llmsettings.py +1/-1     
settings.py +2/-2     
signal.py +6/-6     
llm.py [link]   
llm_settings.py [link]   
tool.py [link]   
__init__.py +6/-0     
remote.py [link]   
__init__.py [link]   
shot.py +2/-2     
standard_tools.py [link]   
teams.py [link]   
tooler.py +1/-1     
top_bar_wrapper.py +1/-1     
tracing.py [link]   
db.py [link]   
folder.py [link]   
function.py [link]   
kot_db.py [link]   
icon.icns [link]   
pypi.py [link]   
telemetry.py [link]   
train.py [link]   
user_id.py [link]   
version.py [link]   

Summary by CodeRabbit

  • New Features
    • Introduces Aideck agent framework with graph workflows, caching, knowledge base (RAG), embeddings (OpenAI, Azure, Bedrock, HuggingFace, Gemini, Ollama), tools, and canvas.
    • Adds JWT auth, centralized settings, structured logging, API/dev scripts, and Docker image with entrypoint.
  • Documentation
    • Rebrands project to Aideck; updates README and badges.
    • Adds DEVELOPMENT, TESTING, CLAUDE guides and rich notebooks.
  • Tests
    • Adds pytest config, runner, basic test, and CI workflow.
  • Chores
    • Provides .env template, setup/dev scripts, pre-commit hooks.
    • Adds publish/TestPyPI workflows; removes legacy release/build pipelines.
  • Refactor
    • UI text/signal renames and branding updates across app.

Copy link

Review PR in StackBlitz Codeflow Run & review this pull request in StackBlitz Codeflow.

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @FortiShield, your pull request is larger than the review limit of 150000 diff characters

Copy link

coderabbitai bot commented Sep 26, 2025

Walkthrough

Project rebranded to Aideck with extensive repo restructuring: new packaging via pyproject/uv, revamped CI (tests, publish) and Docker, large codebase additions (agent runtime, graph orchestration, embeddings, loaders, knowledge base, config/auth/db/logging), broad GUI/API import updates, cache/canvas utilities, and removal of legacy build/workflows, notebooks/examples, and setup scripts.

Changes

Cohort / File(s) Summary
Branding & Docs
README.md, README.TR.md, README.zh_CN.md, README.zh_TW.md, CLAUDE.md, DEVELOPMENT.md, TESTING.md, notebooks/README.md, notebooks/investment_full_report.md, basic_test.py
Rebrand to Aideck; update links/badges/commands; add developer/testing guides and notebooks; remove non-English READMEs; add a lightweight repo sanity test.
CI/CD Workflows
.github/workflows/publish.yml, .github/workflows/tests.yml, .github/workflows/test_publisher.yml, .github/workflows/claude.yml, .github/workflows/claude-code-review.yml, .github/workflows/release_generation.yml, .github/workflows/deploys.yml, .github/workflows/release.yml, .github/workflows/refactor.yml, .github/workflows/requirement_controller.yml, .github/workflows/auto-rebase-pr.ym
Add test, publish, TestPyPI, and Claude review workflows; remove legacy release/deploy/refactor/rebase/requirement controllers.
Packaging & Tooling
pyproject.toml, setup.py, gca_setup.py, gca_setup_generator.py, requirements.txt, requirements.in, project.toml, MANIFEST.in, .pre-commit-config.yaml, setup.sh, run_tests.py, pytest.ini
Migrate to pyproject+uv with extras/groups and indices; drop setup/requirements legacy; update MANIFEST asset path; add pre-commit hooks, env/setup script, pytest runner/config; remove Black block in project.toml.
Containerization
docker/Dockerfile, docker/entrypoint.sh
Add multi-stage Docker build with uv-based install and entrypoint wrapper.
Removed Build Scripts
build_scripts/openai/macos_build.sh, build_scripts/openai/windows_build.sh, .github/workflows/release_generation.yml
Delete legacy macOS/Windows packaging scripts and related workflow.
Core Agent Runtime
src/aideck/agent/base.py, src/aideck/agent/agent.py, src/aideck/agent/context_managers/*
Add BaseAgent and Direct agent with async execution, policies, guardrails, caching, storage/memory/LLM/call/task/system-prompt/reliability managers.
Graph Orchestration
src/aideck/graph/graph.py, src/aideck/context/*
Introduce task/decision graph with sequential/branching execution, shared state; add serialization helpers and default prompt model.
Embeddings Subsystem
src/aideck/embeddings/*
Add embedding framework (base/config/metrics), providers (OpenAI, Azure, Bedrock, HuggingFace, FastEmbed, Ollama, Gemini), factory and registry, helpers.
Loaders & Knowledge Base (RAG)
src/aideck/loaders/*, src/aideck/knowledge_base/*
Add pluggable loaders (text,csv,pdf,docx,json,xml,yaml,markdown,html) with configs/factory; add KnowledgeBase with ingest/split/embed/query lifecycle.
Config, Auth, DB, Security, Logging
src/aideck/config/settings.py, src/aideck/config/logging.py, src/aideck/auth/__init__.py, src/aideck/db/session.py, src/aideck/core/security.py
Add centralized settings with Loguru setup, JWT auth models/helpers, SQLAlchemy session bootstrap, security utilities and token flows.
Caching & Canvas
src/aideck/cache/*, src/aideck/canvas/canvas.py
Add session cache manager with vector/LLM matching; add persistent text Canvas with LLM-driven edits.
GUI/API Renames
src/aideck/gui/{button.py,llmsettings.py,settings.py,signal.py}, src/aideck/api.py, src/aideck/custom_callback.py, src/aideck/audio/{record.py,tts.py}, src/aideck/character.py
Rename assistant→agent signals/labels; update imports/messages to aideck; adjust UI strings; minor behavior remains.
Core Utilities
src/aideck/core/{agent.py,tools.py}, src/aideck/exceptions.py
Add generic agent/tool framework; add rich error/exception hierarchy.
App Entrypoints & Glue
src/aideck/__init__.py, src/aideck/gpt_computer_agent.py, src/aideck/display_tools.py, run.py, .gitignore, .env.example, .github/ISSUE_TEMPLATE/question.yml, bump.py, dev.sh
Adjust start import path; rebrand identifiers/messages; tweak imports; remove run.py side-effect; update ignore path; add env template; update issue template; version bump target; add dev server script.
Removed Notebooks/Examples
gpt_computer_agent/mcp/mcp_langchain.ipynb, example_use_cases/workday_summerizer.md
Remove MCP-LangChain notebook and example use case.
Removed Automation Script
refactor.py
Delete local refactor+commit automation script.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant Client as Direct Agent
  participant Ctx as ContextManagers
  participant LLM as ModelProvider
  participant Mem as Memory
  participant Store as Storage
  participant Guard as Guardrail
  participant Cache as CacheManager

  User->>Client: do_async(task)
  Client->>Cache: lookup(task)
  alt Cache Hit
    Cache-->>Client: cached_response
    Client-->>User: response
  else Cache Miss
    Client->>Ctx: open Memory/Storage/LLM contexts
    par Load Memory
      Ctx->>Mem: prepare_inputs_for_task
      Mem-->>Ctx: history/context
    and Load History
      Ctx->>Store: load_session_history
      Store-->>Ctx: messages
    end
    Client->>Guard: validate(task)
    alt Guard OK
      Ctx->>LLM: call(prompt, tools?)
      LLM-->>Ctx: model_response
      Ctx->>Mem: update_after_task(model_response)
      Ctx->>Store: save_session(model_response)
      Client->>Cache: store(task, response)
      Client-->>User: response
    else Guard Fail
      Guard-->>Client: validation_error
      Client-->>User: error
    end
    Ctx-->>Client: close contexts
  end
Loading
sequenceDiagram
  autonumber
  participant Dev as Developer
  participant G as Graph
  participant A as Default Agent
  participant KB as KnowledgeBase

  Dev->>G: add(Task A) >> Decision >> Task B/Task C
  note right of G: Build nodes and edges
  Dev->>G: run()
  G->>A: execute(Task A)
  A-->>G: output A
  G->>G: evaluate Decision(output A)
  alt True
    G->>A: execute(Task B)
    A-->>G: output B
  else False
    G->>A: execute(Task C)
    A-->>G: output C
  end
  opt RAG context
    G->>KB: query_async(if configured)
    KB-->>G: context chunks
  end
  G-->>Dev: final output
Loading
sequenceDiagram
  autonumber
  participant GH as GitHub Release
  participant CI as Publish Workflow
  participant UV as uv
  participant PyPI as PyPI

  GH-->>CI: release published
  CI->>UV: uv sync
  CI->>UV: uv build
  CI->>PyPI: uv publish (token)
  PyPI-->>CI: success/failure
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~180+ minutes

Poem

In tunnels of code, I hop with delight,
New graphs and embeddings nibble the night.
A cache full of carrots (vectors so sweet),
Loaders bring lettuce, RAG’s quite a treat.
With paws on the console, I publish and test—
Aideck burrows forward—ship the best! 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Title Check ⚠️ Warning Title "fix: issue and latest version pre-release" is too vague and does not accurately summarize the extensive architectural refactoring, feature additions, and documentation updates in this changeset. Please provide a concise title that clearly captures the main focus of this pull request, such as “Refactor assistant to agent with FastAPI web interface and JWT authentication”.
Description Check ⚠️ Warning The pull request description does not populate the required template sections such as Category, Overview, New Vars, Screenshot, and Code Quality Checklist and instead includes placeholder text; the structure should follow the repository’s template with each field completed. Please fill out the template by specifying the Category, providing a clear Overview of changes, listing any new variables or build scripts, including screenshots if applicable, and completing the Code Quality Checklist to reflect test and lint status.
✅ Passed checks (1 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 88.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 5 🔵🔵🔵🔵🔵
🧪 No relevant tests
🔒 Security concerns

JWT/auth inconsistencies:
Multiple auth implementations exist ('auth/init.py' and 'core/security.py') with overlapping token creation and verification. Mixing these in routers can lead to incorrect validation paths and privilege checks. Ensure a single, consistent JWT flow and avoid exposing weak defaults (e.g., default admin credentials).
Unsafe eval in example: 'examples/basic_usage.py' uses eval for 'calculate' which is unsafe even in examples. Replace with a safe parser or clearly gate behind a warning to avoid copy-paste vulnerabilities.
No

⚡ Recommended focus areas for review

Syntax Error

Class name was changed but introduced a syntax error ('classAgent'). This will crash at import; verify proper renaming to 'class Agent' and consistent references.

classAgent:
    def __init__(self, content, the_time):
Import Error

The file imports tools from '..utils.tools' but the new tools module is at 'core/tools.py'. This mismatch will cause runtime import failures; update import path or module placement.

Missing Imports

The route uses 'get_current_active_user' in dependency but it's not imported; also uses schemas and services inconsistently (UserInDB vs create_user signature). Confirm correct imports and types to avoid runtime errors.

@router.get("/me", response_model=UserResponse)
async def read_users_me(current_user: User = Depends(get_current_active_user)) -> Any:
    """
    Get current user information.
    """
    return current_user

Copy link
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 5 🔵🔵🔵🔵🔵
🧪 No relevant tests
🔒 Security concerns

SQL injection:
In src/gpt_computer_agent/utils/storage.py, raw SQL queries are used with string formatting for database operations. While parameters are properly passed using parameterized queries in most cases, there's a risk if any user input is directly concatenated into SQL statements. The code should be carefully reviewed to ensure all user inputs are properly sanitized or parameterized.

⚡ Recommended focus areas for review

Syntax Error

There's a syntax error in the renamed Assistant class. Line 64 has classAgent: without a space between 'class' and 'Agent', which would cause a Python syntax error.

classAgent:
Undefined Function

The function get_user_by_username is called but not imported or defined in the file, which would cause a runtime error when creating a new user.

user = get_user_by_username(db, username=user_in.username)
if user:
Missing Import

The function get_current_active_user is used but not imported, which would cause a runtime error when accessing the /me endpoint.

async def read_users_me(current_user: User = Depends(get_current_active_user)) -> Any:
    """

Copy link

qodo-merge-pro bot commented Sep 26, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
Separate web service from core agent

The suggestion recommends separating the new web service from the core agent and
desktop application. This would decouple the components into a core library and
distinct applications for web and desktop, avoiding a complex monolith.

Examples:

src/gpt_computer_agent/api/app.py [1-305]
""
FastAPI web interface for the GPT Computer Agent.
"""
from fastapi import FastAPI, HTTPException, Depends, Request, WebSocket, WebSocketDisconnect
from fastapi.middleware.cors import CORSMiddleware
from fastapi.staticfiles import StaticFiles
from fastapi.templating import Jinja2Templates
from fastapi.responses import HTMLResponse, JSONResponse, RedirectResponse
from pydantic import BaseModel, Field
from typing import List, Dict, Any, Optional

 ... (clipped 295 lines)
src/gpt_computer_agent/gui/button.py [1-173]

Solution Walkthrough:

Before:

# Current monolithic structure in the PR
src/gpt_computer_agent/
├── api/              # FastAPI web service
│   ├── app.py        # Imports core and GUI logic indirectly
│   └── routers/
│   └── templates/
├── core/             # Core agent logic
│   └── agent.py
├── gui/              # Desktop GUI (PyQt)
│   ├── button.py
│   └── settings.py
├── db/               # Database logic used by both
├── gpt_computer_agent.py # Main desktop app logic
└── start.py          # Entry point for both GUI and API

After:

# Proposed decoupled structure

# 1. Core Library (installable package)
# package: gpt-agent-core
# - Contains agent logic, tools, storage interfaces
# - No web or GUI dependencies

# 2. Web Application (depends on core library)
# package: gpt-agent-webapp
# - Contains FastAPI app, routers, auth
# - imports gpt-agent-core

# 3. Desktop Application (depends on core library)
# package: gpt-agent-desktop
# - Contains PyQt GUI code
# - imports gpt-agent-core
Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a significant architectural issue by highlighting the tight coupling of a new web service with an existing desktop application, and the proposed decoupling would vastly improve maintainability and separation of concerns.

High
Possible issue
Implement missing agent streaming method

Implement the missing stream_response method in the GPTChatAgent class to enable
streaming responses for the WebSocket endpoint.

src/gpt_computer_agent/api/app.py [266-274]

-# Stream agent response
-response_text = ""
-async for chunk in agent.stream_response(conv.messages):
-    if chunk:
-        response_text += chunk
-        await websocket.send_text(json.dumps({
-            "type": "chunk",
-            "content": chunk
-        }))
+# In src/gpt_computer_agent/core/agent.py, add the stream_response method to GPTChatAgent:
 
+async def stream_response(self, messages: List[Dict[str, Any]]):
+    """Stream the agent's response."""
+    self.messages = messages
+    try:
+        stream = self.client.chat.completions.create(
+            model="gpt-4",
+            messages=self.messages,
+            temperature=self.config.temperature,
+            max_tokens=1000,
+            stream=True,
+        )
+        for chunk in stream:
+            content = chunk.choices[0].delta.content
+            if content:
+                yield content
+    except Exception as e:
+        logger.error(f"Error in agent stream_response: {e}")
+        raise
+
+# No changes are needed in src/gpt_computer_agent/api/app.py, but the above implementation is required for it to work.
+
  • Apply / Chat
Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies that the stream_response method is missing in the GPTChatAgent class, which would cause a runtime error and break the WebSocket functionality introduced in this PR.

High
Avoid inefficient and risky database operations

Refactor the _save_conversation method to avoid inefficiently deleting and
re-inserting all messages when only updating conversation metadata. Create a
separate, more efficient method for metadata updates.

src/gpt_computer_agent/utils/storage.py [231-275]

+def update_conversation_metadata(self, conversation_id: str, title: str, updated_at: datetime) -> None:
+    """Update a conversation's metadata in the database."""
+    with sqlite3.connect(self.db_path) as conn:
+        cursor = conn.cursor()
+        cursor.execute(
+            """
+            UPDATE conversations
+            SET title = ?, updated_at = ?
+            WHERE id = ?
+            """,
+            (title, updated_at.isoformat(), conversation_id)
+        )
+        conn.commit()
+
 def _save_conversation(self, conversation: Conversation) -> None:
-    """Save a conversation to the database."""
+    """Save a new conversation to the database."""
     with sqlite3.connect(self.db_path) as conn:
         cursor = conn.cursor()
         
-        # Insert or update conversation
+        # This method should only be used for creating new conversations.
+        # Use INSERT and expect it to be a new conversation.
         cursor.execute(
             """
-            INSERT OR REPLACE INTO conversations 
+            INSERT INTO conversations 
             (id, title, created_at, updated_at, metadata)
             VALUES (?, ?, ?, ?, ?)
             """,
             (
                 conversation.id,
                 conversation.title,
                 conversation.created_at.isoformat(),
                 conversation.updated_at.isoformat(),
                 json.dumps(conversation.metadata)
             )
         )
-        
-        # Delete existing messages
-        cursor.execute(
-            "DELETE FROM messages WHERE conversation_id = ?",
-            (conversation.id,)
-        )
-        
-        # Insert messages
-        for msg in conversation.messages:
-            cursor.execute(
-                """
-                INSERT INTO messages 
-                (conversation_id, role, content, timestamp, metadata)
-                VALUES (?, ?, ?, ?, ?)
-                """,
-                (
-                    conversation.id,
-                    msg.role,
-                    msg.content,
-                    msg.timestamp.isoformat(),
-                    json.dumps(msg.metadata)
-                )
-            )
-        
+        # Assuming messages are added separately via add_message for new conversations.
         conn.commit()

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 8

__

Why: The suggestion correctly points out a major performance and data integrity issue in the _save_conversation method, which unnecessarily deletes and re-inserts all messages for a simple metadata update.

Medium
Security
Escape user messages to prevent XSS

Escape user-provided message content before rendering it as HTML to prevent a
Cross-Site Scripting (XSS) vulnerability.

src/gpt_computer_agent/api/templates/index.html [399-401]

+const escapeHtml = (unsafe) => {
+    return unsafe
+         .replace(/&/g, "&")
+         .replace(/</g, "&lt;")
+         .replace(/>/g, "&gt;")
+         .replace(/"/g, "&quot;")
+         .replace(/'/g, "&#039;");
+};
+
 const messageContent = role === 'agent' ? 
     marked.parse(content) : 
-    content.replace(/\n/g, '<br>');
+    escapeHtml(content).replace(/\n/g, '<br>');
  • Apply / Chat
Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a critical Cross-Site Scripting (XSS) vulnerability where unescaped user input is rendered as HTML, and the proposed fix is a standard and effective mitigation.

High
Sanitize agent responses to prevent XSS

Sanitize the HTML output from the agent's markdown-parsed response to prevent a
potential Cross-Site Scripting (XSS) vulnerability.

src/gpt_computer_agent/api/templates/index.html [399-401]

+// Make sure to include DOMPurify script, e.g., from a CDN:
+// <script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/purify.min.js"></script>
+
 const messageContent = role === 'agent' ? 
-    marked.parse(content) : 
+    DOMPurify.sanitize(marked.parse(content)) : 
     content.replace(/\n/g, '<br>');

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a critical Cross-Site Scripting (XSS) vulnerability from unsanitized agent output and proposes a standard, robust solution using a dedicated sanitization library.

High
Safely set title to prevent XSS

Set the conversation title using textContent instead of innerHTML to prevent a
Cross-Site Scripting (XSS) vulnerability from user-derived content.

src/gpt_computer_agent/api/templates/index.html [208-215]

 convElement.innerHTML = `
     <div class="flex items-center justify-between">
-        <span class="truncate">${title}</span>
+        <span class="truncate"></span>
         <button class="delete-conversation text-gray-400 hover:text-red-500" data-id="${conv.id}">
             <i class="fas fa-times"></i>
         </button>
     </div>
 `;
+convElement.querySelector('.truncate').textContent = title;
  • Apply / Chat
Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a critical Cross-Site Scripting (XSS) vulnerability where a title derived from user input is unsafely rendered, and the proposed fix using textContent is the correct approach.

High
Avoid using eval for security

Replace the use of eval() in the calculate tool with a safer alternative like
the asteval library to prevent potential security vulnerabilities from untrusted
input.

examples/basic_usage.py [42-58]

+# You will need to add 'asteval' to your project's dependencies.
+# pip install asteval
+from asteval import Interpreter
+
+aeval = Interpreter()
+
 @tool
 def calculate(expression: str) -> float:
     """
     Evaluate a mathematical expression.
     
     Args:
         expression: The mathematical expression to evaluate.
         
     Returns:
         The result of the evaluation.
     """
-    # In a real implementation, use a safe evaluation
+    # Use a safe evaluation library instead of eval()
     print(f"Calculating: {expression}")
     try:
-        return eval(expression, {"__builtins__": {}}, {})
+        # The asteval Interpreter provides a safe way to evaluate expressions.
+        return aeval.eval(expression)
     except Exception as e:
         return f"Error: {str(e)}"
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a significant security vulnerability by using eval() on potentially untrusted input. Replacing it with a safer evaluation library is a critical improvement.

Medium
  • Update

Copy link
Contributor

codiumai-pr-agent-free bot commented Sep 26, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
Consolidate redundant architectural patterns

The PR contains multiple, conflicting implementations for core functionalities
like authentication, logging, and database initialization. These redundancies
should be consolidated into a single, unified pattern for each service to avoid
maintenance issues.

Examples:

src/gpt_computer_agent/auth/__init__.py [1-107]
""
Authentication and authorization module for the GPT Computer Agent.
"""
from datetime import datetime, timedelta
from typing import Optional

from jose import JWTError, jwt
from passlib.context import CryptContext
from pydantic import BaseModel, Field


 ... (clipped 96 lines)
src/gpt_computer_agent/core/security.py [1-115]

Solution Walkthrough:

Before:

# Multiple conflicting implementations exist across the codebase.

# Example: Authentication
# File: src/gpt_computer_agent/auth/__init__.py
def authenticate_user(username, password):
    # Uses a fake in-memory DB
    ...

# File: src/gpt_computer_agent/services/user.py
def authenticate(db: Session, username, password):
    # Uses a real database session
    ...

# Example: Logging
# File: src/gpt_computer_agent/config/logging.py
def configure_logging(...): ...

# File: src/gpt_computer_agent/config/settings.py
def configure_logging(...): ...

# File: src/gpt_computer_agent/utils/common.py
def setup_logging(...): ...

After:

# A single, unified implementation for each core service.

# Example: Authentication (consolidated)
# File: src/gpt_computer_agent/services/user.py
def authenticate(db: Session, username, password):
    # The single source of truth for authentication logic
    # Uses the real database.
    ...

# The old `auth/__init__.py` is removed or repurposed.
# Routers consistently use the service layer.

# Example: Logging (consolidated)
# File: src/gpt_computer_agent/config/logging.py
def setup_logging():
    # The single source of truth for logging setup.
    # Reads configuration from settings.
    ...

# Other logging setup functions are removed.
# `app.py` calls the single `setup_logging` function on startup.
Suggestion importance[1-10]: 10

__

Why: This suggestion correctly identifies a critical architectural flaw where multiple, conflicting implementations for core services like authentication, logging, and database initialization exist, which would cause bugs and severe maintenance issues.

High
Possible issue
Prevent data loss during conversation update

Refetch the conversation object from storage before saving it to prevent
overwriting it with stale data, which would cause the most recent agent message
to be lost.

src/gpt_computer_agent/api/app.py [214-224]

 # Update conversation title if it's the first message
 if len(conv.messages) == 1:
     # Generate a title based on the first message
     title = message.content[:50] + (message.content[50:] and '...')
-    storage._save_conversation(Conversation(
-        **{
-            **conv.dict(),
-            "title": title,
-            "updated_at": datetime.utcnow()
-        }
-    ))
+    
+    # Re-fetch conversation to get the latest state including the agent's message
+    updated_conv = storage.get_conversation(message.conversation_id)
+    if updated_conv:
+        updated_conv.title = title
+        updated_conv.updated_at = datetime.utcnow()
+        storage._save_conversation(updated_conv)
  • Apply / Chat
Suggestion importance[1-10]: 9

__

Why: This suggestion correctly identifies a critical bug where using a stale conv object leads to data loss of the agent's message, and the proposed fix of refetching the conversation is correct.

High
Ensure email uniqueness during user creation

Add a check to verify that the email address is unique before creating a new
user, in addition to the existing username check, to prevent duplicate accounts
with the same email.

src/gpt_computer_agent/api/routers/users.py [58-64]

-user = get_user_by_username(db, username=user_in.username)
-if user:
+existing_user = get_user_by_username(db, username=user_in.username)
+if existing_user:
     raise HTTPException(
         status_code=status.HTTP_400_BAD_REQUEST,
         detail="The user with this username already exists in the system.",
     )
+
+from ...services.user import get_user_by_email
+existing_email = get_user_by_email(db, email=user_in.email)
+if existing_email:
+    raise HTTPException(
+        status_code=status.HTTP_400_BAD_REQUEST,
+        detail="The user with this email already exists in the system.",
+    )
+
 user = create_user(db, user_in)
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a potential data integrity issue by not checking for email uniqueness and provides a valid improvement to make the user creation process more robust.

Medium
Calculate time distribution from actual data

Calculate the time distribution for "work", "communication", and "breaks"
dynamically from the collected activity data instead of using hardcoded
percentages.

example_use_cases/workday_summerizer.md [105-118]

+# This is a conceptual fix. You would need to add logic to categorize activities.
+# For example, by asking the agent to classify each `user_activity`.
+
+# Placeholder for dynamic calculation
+work_minutes = 0
+communication_minutes = 0
+breaks_minutes = 0
+
+# Example categorization logic (to be implemented)
+# for entry in activity_log:
+#     category = remote.input(f"Categorize this activity: '{entry['user_activity']}' as work, communication, or break.")
+#     if category == 'work':
+#         work_minutes += SNAPSHOT_INTERVAL / 60
+#     ...
+
+# For now, acknowledge that it's a placeholder
+total_minutes = TOTAL_DURATION // 60
+
 # Generate summary report
 summary = {
     "date": datetime.now().strftime("%Y-%m-%d"),
-    "total_tracked_minutes": TOTAL_DURATION // 60,
+    "total_tracked_minutes": total_minutes,
     "apps_used": [{"app": app, "minutes": duration//60} for app, duration in time_spent.items()],
     "time_distribution": {
-        "work": {"minutes": int(TOTAL_DURATION * 0.7) // 60, "percentage": 70},
-        "communication": {"minutes": int(TOTAL_DURATION * 0.2) // 60, "percentage": 20},
-        "breaks": {"minutes": int(TOTAL_DURATION * 0.1) // 60, "percentage": 10}
+        # NOTE: The following is a placeholder and should be calculated dynamically
+        "work": {"minutes": int(total_minutes * 0.7), "percentage": 70},
+        "communication": {"minutes": int(total_minutes * 0.2), "percentage": 20},
+        "breaks": {"minutes": int(total_minutes * 0.1), "percentage": 10}
     },
     "key_activities": key_activities[:5],  # Top 5 key activities
     "productivity_score": 85,  # This could be calculated based on app usage patterns
     "detailed_breakdown": activities_by_app
 }

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 5

__

Why: The suggestion correctly points out that the time distribution in the example script is hardcoded and misleading, proposing a more accurate dynamic calculation.

Low
Security
Prevent XSS by escaping user messages

Escape user-provided message content before rendering it as HTML to prevent a
potential Cross-Site Scripting (XSS) vulnerability.

src/gpt_computer_agent/api/templates/index.html [398-401]

+function escapeHTML(str) {
+    const p = document.createElement("p");
+    p.textContent = str;
+    return p.innerHTML;
+}
+
 // Convert markdown to HTML if it's an agent message
 const messageContent = role === 'agent' ? 
     marked.parse(content) : 
-    content.replace(/\n/g, '<br>');
+    escapeHTML(content).replace(/\n/g, '<br>');
  • Apply / Chat
Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a critical Cross-Site Scripting (XSS) vulnerability and provides a standard, effective solution to prevent it.

High
Prevent XSS in conversation titles

To prevent a Cross-Site Scripting (XSS) vulnerability, set the conversation
title using textContent instead of interpolating it into an innerHTML string.

src/gpt_computer_agent/api/templates/index.html [208-215]

 convElement.innerHTML = `
     <div class="flex items-center justify-between">
-        <span class="truncate">${title}</span>
+        <span class="truncate"></span>
         <button class="delete-conversation text-gray-400 hover:text-red-500" data-id="${conv.id}">
             <i class="fas fa-times"></i>
         </button>
     </div>
 `;
+convElement.querySelector('.truncate').textContent = title;
  • Apply / Chat
Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a critical Cross-Site Scripting (XSS) vulnerability where conversation titles are rendered and provides a valid fix.

High
Replace unsafe eval with a safer alternative

Replace the insecure eval() function with the safer ast.literal_eval() to
prevent potential Denial of Service (DoS) attacks from malicious mathematical
expressions.

examples/basic_usage.py [53-58]

 # In a real implementation, use a safe evaluation
+import ast
+
 print(f"Calculating: {expression}")
 try:
-    return eval(expression, {"__builtins__": {}}, {})
-except Exception as e:
-    return f"Error: {str(e)}"
+    # Use ast.literal_eval for safe evaluation of simple expressions
+    return ast.literal_eval(expression)
+except (ValueError, SyntaxError, TypeError, MemoryError) as e:
+    return f"Error: Invalid or unsafe expression. {str(e)}"
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly points out a significant security vulnerability by using eval() and proposes a safer alternative, which is crucial even in an example file to avoid promoting insecure practices.

Medium
  • More

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 55

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
src/gpt_computer_agent/agent/assistant.py (1)

126-238: Preserve the legacy assistant() entry point

Renaming the public function to agent() is fine, but we still have callers (internal and, more importantly, downstream users) that import or invoke assistant(). Without a compatibility alias this becomes an immediate breaking change. Please keep the old name pointing at the new implementation. For example:

 def agent(
     llm_input, client, screenshot_path=None, dont_save_image=False, just_screenshot=False
 ):
     ...
     return return_value
+
+assistant = agent
src/gpt_computer_agent/utils/chat_history.py (1)

64-85: Restore a valid Agent class declaration

classAgent: is invalid syntax, so importing this module now raises a SyntaxError. That blocks every code path that touches chat history.

-classAgent:
+class Agent:
src/gpt_computer_agent/top_bar_wrapper.py (1)

11-24: the_main_window no longer resolves.

Switching to from . import the_main_window causes an AttributeError because the package’s __init__ doesn’t expose that symbol. Every decorated call path will now hit the except block and silently skip the UI updates. Please keep importing from the concrete module.

Restore the explicit import:

-        try:
-            from . import the_main_window
+        try:
+            from .gpt_computer_agent import the_main_window
src/gpt_computer_agent/gui/settings.py (1)

25-257: Package-level import breaks runtime behavior.

the_main_window isn’t exported from the package __init__, so this call to from .. import the_main_window raises ImportError, stopping the settings dialog handlers. Please continue importing from the concrete module.

Revert to the working import:

-    from .. import the_main_window
+    from ..gpt_computer_agent import the_main_window
example_use_cases/workday_summerizer.md (1)

180-186: Drop the stale “Summery” snippet from the code block

The example now builds activity_log/summary, but this trailing block still references i and summery_results, which no longer exist. Anyone copying the snippet will hit a NameError. Please remove the leftover lines to keep the sample runnable.

🧹 Nitpick comments (14)
build_scripts/openai/macos_build.sh (1)

103-105: Harden the cleanup guard

If DMG_DIR ever ends up blank (env override, earlier edit, running from /), rm -rf "${DMG_DIR}"/* expands to rm -rf /* and wipes the workstation. Guard the variable before expanding.

-mkdir -p "${DMG_DIR}"
-rm -rf "${DMG_DIR}"/*
+mkdir -p "${DMG_DIR}"
+rm -rf "${DMG_DIR:?}/"*
build_macos.sh (1)

103-105: Apply the same guarded cleanup

For the reasons noted in the other script, add the :? guard here too so a blank DMG_DIR never expands to the filesystem root.

-mkdir -p "${DMG_DIR}"
-rm -rf "${DMG_DIR}"/*
+mkdir -p "${DMG_DIR}"
+rm -rf "${DMG_DIR:?}/"*
src/gpt_computer_agent/agent/__init__.py (1)

2-2: Remove the duplicate wildcard import

Line 2 is now an exact duplicate of line 1. This looks like an accidental replacement of the former .assistant export and just re-imports the same module twice. Please drop the extra import (or restore the intended module) to keep the package namespace clean and avoid confusing re-exports.

README_NEW.md (1)

102-126: Add language hint to project structure code block

Markdownlint flagged the project-structure block because the fenced code fence lacks a language hint. Tagging it as plain text keeps the lint job happy without changing rendering.

-```
+```text
AUTHENTICATION.md (1)

1-120: Replace the placeholder token in docs.

The curl example includes Authorization: Bearer YOUR_ACCESS_TOKEN. Static scanners interpret that as a real secret and flag the PR. Swap it for a dotted placeholder to silence leak detectors.

Use a dotted placeholder, e.g.:

-  -H 'Authorization: Bearer YOUR_ACCESS_TOKEN'
+  -H 'Authorization: Bearer <YOUR_ACCESS_TOKEN>'
src/gpt_computer_agent/models/__init__.py (1)

1-6: Bring the module docstring to the top.

The triple quotes aren’t at the start of the file, so they don’t form a valid docstring and trigger lint errors. Move "Database models …" into a proper triple-quoted docstring immediately at the top.

Correct formatting:

-""
-Database models for the GPT Computer Agent.
-"""
+"""Database models for the GPT Computer Agent."""
src/gpt_computer_agent/api/templates/index.html (1)

196-233: Preserve selected conversation state by storing the id on the list item.

loadConversation expects each .conversation-item to carry dataset.id, but the elements never get one, so the active conversation highlight never re-applies. Set convElement.dataset.id = conv.id when building the list.

                 conversations.forEach(conv => {
                     const convElement = document.createElement('div');
                     convElement.className = 'conversation-item';
+                    convElement.dataset.id = conv.id;
src/gpt_computer_agent/core/agent.py (1)

138-139: Consider consolidating the return statement.

As flagged by the static analysis tool, consider moving the return statement to an else block for better code organization. While not critical, this improves readability.

Apply this diff:

-            return response_message.content
-            
-        except Exception as e:
+        except Exception as e:
             logger.error(f"Error in agent run: {e}")
             raise
+        else:
+            return response_message.content
src/gpt_computer_agent/schemas/user.py (1)

20-30: Consider extracting password validation logic to reduce duplication.

The password strength validation logic is duplicated between UserCreate and PasswordReset. Consider creating a reusable validator or custom field type.

Create a reusable password validator:

from pydantic import field_validator, Field
from typing import Annotated

def validate_password_strength(v: str) -> str:
    """Reusable password strength validator."""
    if len(v) < 8:
        raise ValueError('Password must be at least 8 characters long')
    if not any(c.isupper() for c in v):
        raise ValueError('Password must contain at least one uppercase letter')
    if not any(c.islower() for c in v):
        raise ValueError('Password must contain at least one lowercase letter')
    if not any(c.isdigit() for c in v):
        raise ValueError('Password must contain at least one number')
    return v

# Define a custom password type
StrongPassword = Annotated[str, Field(min_length=8)]

class UserCreate(UserBase):
    """Schema for creating a new user."""
    password: StrongPassword
    
    @field_validator('password')
    @classmethod
    def password_strength(cls, v):
        return validate_password_strength(v)

class PasswordReset(BaseModel):
    """Schema for password reset."""
    token: str
    new_password: StrongPassword
    
    @field_validator('new_password')
    @classmethod
    def password_strength(cls, v):
        return validate_password_strength(v)

Also applies to: 84-94

src/gpt_computer_agent/core/security.py (2)

88-89: Remove redundant exception handling.

The decode_token function already handles JWTError and raises an appropriate HTTPException. The outer try-except in get_current_user catching JWTError again is redundant since decode_token won't propagate JWTError.

Apply this diff to simplify the error handling:

-    try:
-        payload = decode_token(token)
-        username: str = payload.get("sub")
-        if username is None:
-            raise credentials_exception
-    except JWTError:
-        raise credentials_exception
+    payload = decode_token(token)
+    username: str = payload.get("sub")
+    if username is None:
+        raise credentials_exception

106-115: Inconsistent error handling in verify_password_reset_token.

This function silently returns None on errors while other functions raise exceptions. Consider making error handling consistent or documenting why this difference exists.

Consider either raising an exception or adding documentation:

def verify_password_reset_token(token: str) -> Optional[str]:
    """
    Verify a password reset token and return the username if valid.
    
    Returns:
        str: Username if token is valid
        None: If token is invalid, expired, or not a password reset token
    
    Note: Returns None instead of raising exceptions to allow graceful
    handling of invalid reset attempts without exposing token validity details.
    """
    try:
        payload = decode_token(token)
        # ... rest of the function
src/gpt_computer_agent/config/logging.py (1)

99-99: Add explicit Optional type hint.

PEP 484 requires explicit Optional for parameters that can be None.

Apply this diff to make the type hint explicit:

-def get_logger(name: str = None) -> logger:
+def get_logger(name: Optional[str] = None) -> logger:
src/gpt_computer_agent/api/app.py (1)

116-128: Avoid mutable defaults on the API response models.

Using {} / [] here creates shared mutable defaults across instances. Prefer Field(default_factory=...) so each response gets its own container.

 class MessageResponse(BaseModel):
     id: str
     role: str
     content: str
     timestamp: str
-    metadata: Dict[str, Any] = {}
+    metadata: Dict[str, Any] = Field(default_factory=dict)
@@
 class ConversationResponse(BaseModel):
     id: str
     title: str
     created_at: str
     updated_at: str
-    messages: List[MessageResponse] = []
+    messages: List[MessageResponse] = Field(default_factory=list)
src/gpt_computer_agent/gpt_computer_agent.py (1)

3-7: Remove duplicate agent_executor imports.

We now import agent_executor twice in both the package and fallback branches. It doesn’t change runtime behavior but it’s noisy and keeps static analysis warnings alive. Please drop the second occurrence in each block.

@@
-    from .agent.agent_executor import *
     from .llm import *
     from .llm_settings import llm_settings
-    from .agent.agent_executor import *
@@
-    from agent.agent_executor import *
     from llm import *
     from llm_settings import llm_settings
-    from agent.agent_executor import *

Also applies to: 24-27

alembic/env.py Outdated
Comment on lines 1 to 35
""
Alembic migration environment configuration.
"""
import os
import sys
from logging.config import fileConfig
from sqlalchemy import engine_from_config, pool
# Add the project root to the Python path
sys.path.insert(0, os.path.dirname(os.path.dirname(__file__)))
# Import your models here for 'autogenerate' support
from src.gpt_computer_agent.models import Base
from src.gpt_computer_agent.config.settings import settings
# this is the Alembic Config object, which provides
# access to the values within the .ini file in use.
config = config.get_section(config.config_ini_section)
# Interpret the config file for Python logging.
# This line sets up loggers basically.
if config.config_file_name is not None:
fileConfig(config.config_file_name)
# add your model's MetaData object here
# for 'autogenerate' support
target_metadata = Base.metadata
# other values from the config, defined by the needs of env.py,
# can be acquired:
# my_important_option = config.get_main_option("my_important_option")
# ... etc.
def run_migrations_offline() -> None:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix syntactically invalid module docstring.

The opening docstring is malformed ("""""), so the file fails to import and Alembic can’t start.

Apply this diff to restore a valid module docstring:

- ""
-Alembic migration environment configuration.
-"""
+"""Alembic migration environment configuration."""
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
""
Alembic migration environment configuration.
"""
import os
import sys
from logging.config import fileConfig
from sqlalchemy import engine_from_config, pool
# Add the project root to the Python path
sys.path.insert(0, os.path.dirname(os.path.dirname(__file__)))
# Import your models here for 'autogenerate' support
from src.gpt_computer_agent.models import Base
from src.gpt_computer_agent.config.settings import settings
# this is the Alembic Config object, which provides
# access to the values within the .ini file in use.
config = config.get_section(config.config_ini_section)
# Interpret the config file for Python logging.
# This line sets up loggers basically.
if config.config_file_name is not None:
fileConfig(config.config_file_name)
# add your model's MetaData object here
# for 'autogenerate' support
target_metadata = Base.metadata
# other values from the config, defined by the needs of env.py,
# can be acquired:
# my_important_option = config.get_main_option("my_important_option")
# ... etc.
def run_migrations_offline() -> None:
"""Alembic migration environment configuration."""
import os
import sys
from logging.config import fileConfig
from sqlalchemy import engine_from_config, pool
# Add the project root to the Python path
sys.path.insert(0, os.path.dirname(os.path.dirname(__file__)))
# Import your models here for 'autogenerate' support
from src.gpt_computer_agent.models import Base
from src.gpt_computer_agent.config.settings import settings
# this is the Alembic Config object, which provides
# access to the values within the .ini file in use.
config = config.get_section(config.config_ini_section)
# Interpret the config file for Python logging.
# This line sets up loggers basically.
if config.config_file_name is not None:
fileConfig(config.config_file_name)
# add your model's MetaData object here
# for 'autogenerate' support
target_metadata = Base.metadata
# other values from the config, defined by the needs of env.py,
# can be acquired:
# my_important_option = config.get_main_option("my_important_option")
# ... etc.
def run_migrations_offline() -> None:
🧰 Tools
🪛 Ruff (0.13.1)

2-2: Simple statements must be separated by newlines or semicolons

(invalid-syntax)


2-2: Simple statements must be separated by newlines or semicolons

(invalid-syntax)


2-2: Simple statements must be separated by newlines or semicolons

(invalid-syntax)


2-3: Expected an identifier

(invalid-syntax)

🤖 Prompt for AI Agents
In alembic/env.py around lines 1 to 35, the module docstring opening is
malformed ("") … """ causing import errors; replace the opening delimiter with a
matching triple-quote (""") so the file has a valid triple-quoted module
docstring (keep the existing text "Alembic migration environment configuration."
inside) and ensure no stray quote characters remain at the top of the file.

alembic/env.py Outdated
Comment on lines 14 to 83
from src.gpt_computer_agent.models import Base
from src.gpt_computer_agent.config.settings import settings
# this is the Alembic Config object, which provides
# access to the values within the .ini file in use.
config = config.get_section(config.config_ini_section)
# Interpret the config file for Python logging.
# This line sets up loggers basically.
if config.config_file_name is not None:
fileConfig(config.config_file_name)
# add your model's MetaData object here
# for 'autogenerate' support
target_metadata = Base.metadata
# other values from the config, defined by the needs of env.py,
# can be acquired:
# my_important_option = config.get_main_option("my_important_option")
# ... etc.
def run_migrations_offline() -> None:
"""Run migrations in 'offline' mode.

This configures the context with just a URL
and not an Engine, though an Engine is acceptable
here as well. By skipping the Engine creation
we don't even need a DBAPI to be available.

Calls to context.execute() here emit the given string to the
script output.
"""
url = config.get_main_option("sqlalchemy.url")
context.configure(
url=url,
target_metadata=target_metadata,
literal_binds=True,
dialect_opts={"paramstyle": "named"},
)
with context.begin_transaction():
context.run_migrations()
def run_migrations_online() -> None:
"""Run migrations in 'online' mode.

In this scenario we need to create an Engine
and associate a connection with the context.
"""
configuration = config.get_section(config.config_ini_section)
configuration["sqlalchemy.url"] = settings.DATABASE_URL
connectable = engine_from_config(
configuration,
prefix="sqlalchemy.",
poolclass=pool.NullPool,
)

with connectable.connect() as connection:
context.configure(
connection=connection,
target_metadata=target_metadata,
compare_type=True,
compare_server_default=True,
)

with context.begin_transaction():
context.run_migrations()


Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Restore required Alembic context/config wiring.

context is never imported, and config is reassigned before it’s defined. As written, context/config raise NameError, so migrations can’t run. Pull in context from Alembic and keep the Config object before you read sections or inject the database URL.

A minimal fix would look like:

+from alembic import context
...
-config = config.get_section(config.config_ini_section)
+config = context.config
+section = config.get_section(config.config_ini_section)

Then use section when you need the mutable dict (e.g. in run_migrations_online). Without these corrections the migration environment will never start.

🧰 Tools
🪛 Ruff (0.13.1)

36-36: Simple statements must be separated by newlines or semicolons

(invalid-syntax)


36-36: Simple statements must be separated by newlines or semicolons

(invalid-syntax)


36-36: Simple statements must be separated by newlines or semicolons

(invalid-syntax)


36-37: Expected an identifier

(invalid-syntax)


38-38: Unexpected indentation

(invalid-syntax)


38-38: Simple statements must be separated by newlines or semicolons

(invalid-syntax)


38-38: Simple statements must be separated by newlines or semicolons

(invalid-syntax)


38-38: Simple statements must be separated by newlines or semicolons

(invalid-syntax)


38-38: Compound statements are not allowed on the same line as simple statements

(invalid-syntax)


38-38: Expected ',', found name

(invalid-syntax)


38-38: Expected ',', found name

(invalid-syntax)


38-39: Expected ',', found newline

(invalid-syntax)


39-39: Expected ',', found 'and'

(invalid-syntax)


39-39: Expected ',', found name

(invalid-syntax)


39-39: Expected ',', found name

(invalid-syntax)


39-39: Expected ',', found name

(invalid-syntax)


39-40: Expected ',', found newline

(invalid-syntax)


40-40: Expected ',', found name

(invalid-syntax)


40-40: Expected ',', found name

(invalid-syntax)


40-40: Expected ',', found name

(invalid-syntax)


40-40: Expected ',', found name

(invalid-syntax)


40-41: Expected ',', found newline

(invalid-syntax)


41-41: Expected ',', found name

(invalid-syntax)


41-41: missing closing quote in string literal

(invalid-syntax)


41-42: Expected ',', found newline

(invalid-syntax)


43-43: Expected ',', found name

(invalid-syntax)


43-43: Expected ',', found name

(invalid-syntax)


43-43: Expected ',', found name

(invalid-syntax)


43-43: Expected ',', found name

(invalid-syntax)


43-43: Expected ',', found name

(invalid-syntax)


43-43: Expected ',', found name

(invalid-syntax)


43-43: Expected ',', found name

(invalid-syntax)


43-43: Expected ',', found name

(invalid-syntax)


43-43: Expected ',', found name

(invalid-syntax)


43-44: Expected ',', found newline

(invalid-syntax)


44-44: Expected ',', found name

(invalid-syntax)


44-45: Expected an identifier

(invalid-syntax)


59-59: Expected ',', found name

(invalid-syntax)


59-59: Expected ',', found name

(invalid-syntax)


59-59: Expected ',', found name

(invalid-syntax)


59-60: Expected an identifier

(invalid-syntax)


61-61: Expected ',', found name

(invalid-syntax)


61-61: Expected ',', found name

(invalid-syntax)


61-61: Expected ',', found name

(invalid-syntax)


61-61: Expected ',', found name

(invalid-syntax)


61-61: Expected ',', found name

(invalid-syntax)


61-61: Expected ',', found name

(invalid-syntax)


61-61: Expected ',', found name

(invalid-syntax)


61-61: Expected ',', found name

(invalid-syntax)


61-62: Expected ',', found newline

(invalid-syntax)


62-62: Expected ',', found 'and'

(invalid-syntax)


62-62: Expected ',', found name

(invalid-syntax)


62-62: Expected ',', found name

(invalid-syntax)


62-62: Expected ':', found 'with'

(invalid-syntax)


62-62: Expected ',', found name

(invalid-syntax)


62-63: Expected an identifier

(invalid-syntax)

🤖 Prompt for AI Agents
In alembic/env.py around lines 14-83, restore proper Alembic wiring: import
context from alembic (e.g. from alembic import context) and keep the Config
object by assigning config = context.config (do not reassign config to a dict);
call fileConfig(config.config_file_name) as you already do; in
run_migrations_online call config.get_section(...) into a local mutable dict
(e.g. section = config.get_section(config.config_ini_section)), set
section["sqlalchemy.url"] = settings.DATABASE_URL, and pass that section into
engine_from_config (instead of overwriting config). This fixes the NameError and
ensures the DB URL is injected only into the mutable section used to build the
engine.

Comment on lines 1 to 7
""${message}

Revision ID: ${up_revision}
Revises: ${down_revision | comma,n}
Create Date: ${create_date}

"""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix malformed docstring opener.

The template currently emits ""${message}, producing migration files that start with ""<text> and immediate syntax errors. The header must be a triple-quoted docstring.

Apply this diff to correct the docstring:

-""${message}
+"""${message}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
""${message}
Revision ID: ${up_revision}
Revises: ${down_revision | comma,n}
Create Date: ${create_date}
"""
"""${message}
Revision ID: ${up_revision}
Revises: ${down_revision | comma,n}
Create Date: ${create_date}
"""
🤖 Prompt for AI Agents
In alembic/script.py.mako around lines 1 to 7, the template emits a malformed
docstring opener `""${message}` causing syntax errors in generated migrations;
change the opener to triple quotes so it reads `"""${message}` and ensure the
closing triple-quote at the end of the header remains intact (i.e., a proper
triple-quoted docstring wrapping the message, revision, revises, and create date
lines).

Comment on lines 1 to 7
""initial migration

Revision ID: 0001_initial
Revises:
Create Date: 2023-11-01 00:00:00.000000

"""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix the malformed docstring—migration can’t even import right now.

The leading ""initial migration is invalid syntax, so alembic upgrade will crash before it starts. Replace it with a proper triple-quoted docstring (and feel free to drop the stray quote).

-""initial migration
+"""Initial migration."""

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.13.1)

1-1: Simple statements must be separated by newlines or semicolons

(invalid-syntax)


1-1: Simple statements must be separated by newlines or semicolons

(invalid-syntax)


3-3: Simple statements must be separated by newlines or semicolons

(invalid-syntax)


3-3: Invalid decimal integer literal

(invalid-syntax)


4-5: Expected an expression

(invalid-syntax)


5-5: Simple statements must be separated by newlines or semicolons

(invalid-syntax)


5-5: Invalid decimal integer literal

(invalid-syntax)


5-5: Invalid annotated assignment target

(invalid-syntax)


5-5: Expected a statement

(invalid-syntax)

🤖 Prompt for AI Agents
In alembic/versions/initial_migration.py around lines 1 to 7 the file starts
with a malformed docstring (" "initial migration) which is invalid Python and
prevents imports; replace the broken leading characters with a proper
triple-quoted docstring containing the header lines (e.g. """initial
migration\n\nRevision ID: 0001_initial\nRevises: \nCreate Date: 2023-11-01
00:00:00.000000\n\n""") — remove the stray quote so the file begins with a valid
"""...""" block.

build_macos.sh Outdated
Comment on lines 41 to 99
info "Installing Python dependencies..."
pip install ".[base,agentic]" || {
warn "Failed to install Python dependencies"
exit 1
}

# Install Homebrew if not installed
if ! command -v brew &> /dev/null; then
warn "Homebrew not found. Installing Homebrew..."
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" || {
warn "Failed to install Homebrew"
exit 1
}
# Add Homebrew to PATH if needed
export PATH="/opt/homebrew/bin:$PATH"
fi

# Install create-dmg if not installed
if ! command -v create-dmg &> /dev/null; then
info "Installing create-dmg..."
brew install create-dmg || {
warn "Failed to install create-dmg"
exit 1
}
fi

# Install appropriate PyInstaller version based on Python version
if [[ "$PYTHON_VERSION" == "3.13"* ]]; then
PYINSTALLER_VERSION="6.14.2" # Latest version for Python 3.13
elif [[ "$PYTHON_VERSION" == "3.12"* ]]; then
PYINSTALLER_VERSION="6.9.0" # Version for Python 3.12
else
PYINSTALLER_VERSION="5.13.2" # Version for Python 3.7-3.11
fi

info "Installing PyInstaller ${PYINSTALLER_VERSION}..."
pip install "pyinstaller==${PYINSTALLER_VERSION}" || {
warn "Failed to install PyInstaller"
exit 1
}

# Clean previous builds
info "Cleaning previous builds..."
rm -rf "${DIST_DIR}" "${WORKDIR}/build" "${WORKDIR}/__pycache__"

# Build the application
info "Building application with PyInstaller..."
pyinstaller \
--recursive-copy-metadata gpt_computer_agent \
run.py \
--windowed \
--add-data="${MEDIA_DIR}/*:${MEDIA_DIR}" \
--icon="${ICON_PATH}" \
--name="${APP_NAME}" \
--clean \
--noconfirm || {
warn "PyInstaller build failed"
exit 1
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep pip/PyInstaller tied to the queried interpreter

Same concern as in build_scripts/openai/macos_build.sh: we figure out python3’s version, but pip/pyinstaller may resolve to a different install. Please mirror the fix here as well so the build is deterministic.

-info "Installing Python dependencies..."
-pip install ".[base,agentic]" || {
+info "Installing Python dependencies..."
+python3 -m pip install ".[base,agentic]" || {
@@
-info "Installing PyInstaller ${PYINSTALLER_VERSION}..."
-pip install "pyinstaller==${PYINSTALLER_VERSION}" || {
+info "Installing PyInstaller ${PYINSTALLER_VERSION}..."
+python3 -m pip install "pyinstaller==${PYINSTALLER_VERSION}" || {
@@
-info "Building application with PyInstaller..."
-pyinstaller \
+info "Building application with PyInstaller..."
+python3 -m PyInstaller \
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
info "Installing Python dependencies..."
pip install ".[base,agentic]" || {
warn "Failed to install Python dependencies"
exit 1
}
# Install Homebrew if not installed
if ! command -v brew &> /dev/null; then
warn "Homebrew not found. Installing Homebrew..."
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" || {
warn "Failed to install Homebrew"
exit 1
}
# Add Homebrew to PATH if needed
export PATH="/opt/homebrew/bin:$PATH"
fi
# Install create-dmg if not installed
if ! command -v create-dmg &> /dev/null; then
info "Installing create-dmg..."
brew install create-dmg || {
warn "Failed to install create-dmg"
exit 1
}
fi
# Install appropriate PyInstaller version based on Python version
if [[ "$PYTHON_VERSION" == "3.13"* ]]; then
PYINSTALLER_VERSION="6.14.2" # Latest version for Python 3.13
elif [[ "$PYTHON_VERSION" == "3.12"* ]]; then
PYINSTALLER_VERSION="6.9.0" # Version for Python 3.12
else
PYINSTALLER_VERSION="5.13.2" # Version for Python 3.7-3.11
fi
info "Installing PyInstaller ${PYINSTALLER_VERSION}..."
pip install "pyinstaller==${PYINSTALLER_VERSION}" || {
warn "Failed to install PyInstaller"
exit 1
}
# Clean previous builds
info "Cleaning previous builds..."
rm -rf "${DIST_DIR}" "${WORKDIR}/build" "${WORKDIR}/__pycache__"
# Build the application
info "Building application with PyInstaller..."
pyinstaller \
--recursive-copy-metadata gpt_computer_agent \
run.py \
--windowed \
--add-data="${MEDIA_DIR}/*:${MEDIA_DIR}" \
--icon="${ICON_PATH}" \
--name="${APP_NAME}" \
--clean \
--noconfirm || {
warn "PyInstaller build failed"
exit 1
}
info "Installing Python dependencies..."
python3 -m pip install ".[base,agentic]" || {
warn "Failed to install Python dependencies"
exit 1
}
# Install Homebrew if not installed
if ! command -v brew &> /dev/null; then
warn "Homebrew not found. Installing Homebrew..."
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" || {
warn "Failed to install Homebrew"
exit 1
}
# Add Homebrew to PATH if needed
export PATH="/opt/homebrew/bin:$PATH"
fi
# Install create-dmg if not installed
if ! command -v create-dmg &> /dev/null; then
info "Installing create-dmg..."
brew install create-dmg || {
warn "Failed to install create-dmg"
exit 1
}
fi
# Install appropriate PyInstaller version based on Python version
if [[ "$PYTHON_VERSION" == "3.13"* ]]; then
PYINSTALLER_VERSION="6.14.2" # Latest version for Python 3.13
elif [[ "$PYTHON_VERSION" == "3.12"* ]]; then
PYINSTALLER_VERSION="6.9.0" # Version for Python 3.12
else
PYINSTALLER_VERSION="5.13.2" # Version for Python 3.7-3.11
fi
info "Installing PyInstaller ${PYINSTALLER_VERSION}..."
python3 -m pip install "pyinstaller==${PYINSTALLER_VERSION}" || {
warn "Failed to install PyInstaller"
exit 1
}
# Clean previous builds
info "Cleaning previous builds..."
rm -rf "${DIST_DIR}" "${WORKDIR}/build" "${WORKDIR}/__pycache__"
# Build the application
info "Building application with PyInstaller..."
python3 -m PyInstaller \
--recursive-copy-metadata gpt_computer_agent \
run.py \
--windowed \
--add-data="${MEDIA_DIR}/*:${MEDIA_DIR}" \
--icon="${ICON_PATH}" \
--name="${APP_NAME}" \
--clean \
--noconfirm || {
warn "PyInstaller build failed"
exit 1
}
🤖 Prompt for AI Agents
In build_macos.sh around lines 41-99, pip and pyinstaller are invoked directly
which can target a different Python than the one used to determine
PYTHON_VERSION; change all pip installs to use the same interpreter (e.g.
replace pip install ... with "$PYTHON_BIN" -m pip install ... or define
PYTHON_BIN="$(which python3)" earlier) and invoke PyInstaller via the
interpreter (e.g. "$PYTHON_BIN" -m PyInstaller ...) so pip/pyinstaller are
installed and run against the exact Python interpreter used for the build;
update the create/install PyInstaller and dependency install calls accordingly
and ensure any exit handling remains the same.

class UserBase(BaseModel):
"""Base user schema with common attributes."""
username: str = Field(..., min_length=3, max_length=50, regex="^[a-zA-Z0-9_-]+$")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Migrate regex parameter to pattern for Pydantic v2 compatibility.

The regex parameter is deprecated in Pydantic v2 in favor of pattern. Since the PR includes Pydantic v2 (>=2.0.0,<3.0.0) in dependencies, this needs to be updated.

Apply this diff to use the correct parameter:

-username: str = Field(..., min_length=3, max_length=50, regex="^[a-zA-Z0-9_-]+$")
+username: str = Field(..., min_length=3, max_length=50, pattern="^[a-zA-Z0-9_-]+$")

Based on learnings.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
username: str = Field(..., min_length=3, max_length=50, regex="^[a-zA-Z0-9_-]+$")
username: str = Field(..., min_length=3, max_length=50, pattern="^[a-zA-Z0-9_-]+$")
🤖 Prompt for AI Agents
In src/gpt_computer_agent/schemas/user.py around line 10, the Field(...) call
uses the deprecated regex parameter; replace regex="^[a-zA-Z0-9_-]+$" with
pattern="^[a-zA-Z0-9_-]+$" to be Pydantic v2 compatible, keeping min_length and
max_length unchanged.

Comment on lines 20 to 30
@validator('password')
def password_strength(cls, v):
if len(v) < 8:
raise ValueError('Password must be at least 8 characters long')
if not any(c.isupper() for c in v):
raise ValueError('Password must contain at least one uppercase letter')
if not any(c.islower() for c in v):
raise ValueError('Password must contain at least one lowercase letter')
if not any(c.isdigit() for c in v):
raise ValueError('Password must contain at least one number')
return v
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Migrate validators to Pydantic v2 syntax.

The @validator decorator is deprecated in Pydantic v2. Use @field_validator instead for compatibility with the specified Pydantic version (>=2.0.0,<3.0.0).

Apply this diff to update both validators:

-from pydantic import BaseModel, EmailStr, Field, validator
+from pydantic import BaseModel, EmailStr, Field, field_validator

 class UserCreate(UserBase):
     """Schema for creating a new user."""
     password: str = Field(..., min_length=8)
     
-    @validator('password')
-    def password_strength(cls, v):
+    @field_validator('password')
+    @classmethod
+    def password_strength(cls, v):
         if len(v) < 8:
             raise ValueError('Password must be at least 8 characters long')
         # ... rest of validation logic
         return v

 class PasswordReset(BaseModel):
     """Schema for password reset."""
     token: str
     new_password: str = Field(..., min_length=8)
     
-    @validator('new_password')
-    def password_strength(cls, v):
+    @field_validator('new_password')
+    @classmethod
+    def password_strength(cls, v):
         if len(v) < 8:
             raise ValueError('Password must be at least 8 characters long')
         # ... rest of validation logic
         return v

Based on learnings.

Also applies to: 84-94

🤖 Prompt for AI Agents
In src/gpt_computer_agent/schemas/user.py around lines 20-30 (and similarly for
the validator at 84-94), replace the Pydantic v1 @validator usage with Pydantic
v2's @field_validator: import field_validator from pydantic, change the
decorator from @validator('password') to @field_validator('password') (keep the
method signature as classmethod-style: def password_strength(cls, v):), and
ensure the function raises the same ValueError messages and returns v; apply the
same replacement for the validator at lines 84-94.

Comment on lines 41 to 43
""
Base schema for user stored in the database.
"""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix the malformed docstring.

The docstring is split incorrectly causing syntax errors.

Apply this diff to fix the docstring:

-    ""
-    Base schema for user stored in the database.
-    """
+    """Base schema for user stored in the database."""
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
""
Base schema for user stored in the database.
"""
"""Base schema for user stored in the database."""
🧰 Tools
🪛 Ruff (0.13.1)

33-43: Expected an identifier

(invalid-syntax)

🤖 Prompt for AI Agents
In src/gpt_computer_agent/schemas/user.py around lines 41 to 43, the
module/class docstring is split across lines causing a malformed string and
syntax errors; merge the split lines into a single properly quoted triple-quoted
docstring ("""Base schema for user stored in the database.""") so the docstring
opens and closes correctly on contiguous lines and remove any stray or unmatched
quote characters.

Comment on lines 49 to 50
class Config:
orm_mode = True
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Migrate orm_mode to model_config for Pydantic v2.

The Config class with orm_mode is deprecated in Pydantic v2. Use model_config with from_attributes instead.

Apply this diff to update to Pydantic v2 configuration:

-    class Config:
-        orm_mode = True
+    model_config = {"from_attributes": True}

Based on learnings.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class Config:
orm_mode = True
- class Config:
model_config = {"from_attributes": True}
🤖 Prompt for AI Agents
In src/gpt_computer_agent/schemas/user.py around lines 49-50, the Pydantic v1
Config class using orm_mode is deprecated; replace it with the Pydantic v2 style
by removing the Config class and adding a class-level model_config dict with
"from_attributes": True (e.g., model_config = {"from_attributes": True}) so the
model uses attribute access in v2.

Comment on lines +1 to +3
""
User service for handling user-related operations.
"""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix the file header string

Line 1 mirrors the same "" typo, yielding a SyntaxError that keeps the service module from loading.

Apply this diff:

-""
-User service for handling user-related operations.
-"""
+"""
+User service for handling user-related operations.
+"""
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
""
User service for handling user-related operations.
"""
"""
User service for handling user-related operations.
"""
🧰 Tools
🪛 Ruff (0.13.1)

2-2: Simple statements must be separated by newlines or semicolons

(invalid-syntax)


2-2: Compound statements are not allowed on the same line as simple statements

(invalid-syntax)


2-2: Expected 'in', found name

(invalid-syntax)


2-2: Expected ':', found name

(invalid-syntax)


2-3: Expected an identifier

(invalid-syntax)

🤖 Prompt for AI Agents
In src/gpt_computer_agent/services/user.py around lines 1 to 3, the module-level
docstring starts with a broken double-quote pair ("") causing a SyntaxError;
change the opening quotes to a proper triple-quoted string (""" ) and ensure the
docstring is closed with matching triple quotes so the file has a valid module
docstring (e.g., replace the first line's "" with """ and confirm the closing
""" is present at the end of the header).

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review continued from previous batch...

| Just Text Mode (Mute Speech) | Completed | Q2 2024 |
| Added profiles (Different Chats) | Completed | Q2 2024 |
| More Feedback About Assistant Status | Completed | Q2 2024 |
| More Feedback AboutAgent Status | Completed | Q2 2024 |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix typo in roadmap entry.

“AboutAgent” is missing a space; it should read “About Agent” in the roadmap table.

Apply this diff:

-| More Feedback AboutAgent Status                  | Completed    | Q2 2024        |
+| More Feedback About Agent Status                 | Completed    | Q2 2024        |
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
| More Feedback AboutAgent Status | Completed | Q2 2024 |
| More Feedback About Agent Status | Completed | Q2 2024 |
🤖 Prompt for AI Agents
In README.md around line 257, the roadmap table entry reads "More Feedback
AboutAgent Status" — change the string to "More Feedback About Agent Status"
(insert a space between "About" and "Agent") so the table displays "About Agent"
correctly.

requirements.txt Outdated
Comment on lines 12 to 15
# LLM Integration
openai>=1.0.0,<2.0.0
anthropic>=0.3.0,<1.0.0

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Restore the langchain requirement.

The restructure dropped the direct langchain dependency, but modules like src/gpt_computer_agent/display_tools.py and src/gpt_computer_agent/tooler.py still import from langchain.tools import tool. A fresh install from this requirements file will now raise ModuleNotFoundError: No module named 'langchain' because nothing else guarantees that package. Please add the dependency back.

 # LLM Integration
+langchain>=0.2.0,<0.3.0
 openai>=1.0.0,<2.0.0
 anthropic>=0.3.0,<1.0.0
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# LLM Integration
openai>=1.0.0,<2.0.0
anthropic>=0.3.0,<1.0.0
# LLM Integration
langchain>=0.2.0,<0.3.0
openai>=1.0.0,<2.0.0
anthropic>=0.3.0,<1.0.0
🤖 Prompt for AI Agents
In requirements.txt around lines 12 to 15, the repo removed the direct langchain
dependency but code (e.g., src/gpt_computer_agent/display_tools.py and
src/gpt_computer_agent/tooler.py) still imports from langchain, causing
ModuleNotFoundError on fresh installs; restore langchain to requirements.txt by
adding a suitable version pin (e.g., langchain>=0.x,<1.0.0 or match your tested
version) so pip installs langchain and prevents import errors.

Comment on lines 23 to 46
print("Creating database tables...")
Base.metadata.create_all(bind=engine)

# Create a session
db = Session(bind=engine)

try:
# Check if admin user already exists
admin = db.query(User).filter(User.username == "admin").first()

if not admin:
print("Creating admin user...")
# Create admin user
admin = User(
username="admin",
email=settings.FIRST_SUPERUSER,
hashed_password=get_password_hash(settings.FIRST_SUPERUSER_PASSWORD),
full_name="Admin User",
is_active=True,
is_superuser=True
)
db.add(admin)
db.commit()
print("Admin user created successfully!")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not bypass Alembic migrations when initializing the DB.

Calling Base.metadata.create_all() here sidesteps the new Alembic migration flow. That leaves the schema unversioned (no alembic_version table) and skips any imperative migration logic you’ve added, so the next alembic upgrade head will try to recreate existing tables and likely blow up with “table already exists” errors. Please drive initialization via Alembic instead.

-from src.gpt_computer_agent.db.session import engine, Base
+from alembic import command
+from alembic.config import Config
+
+from src.gpt_computer_agent.db.session import engine
@@
-    print("Creating database tables...")
-    Base.metadata.create_all(bind=engine)
+    print("Applying database migrations...")
+    alembic_cfg = Config(str(Path(__file__).parent.parent / "alembic.ini"))
+    alembic_cfg.set_main_option("sqlalchemy.url", str(engine.url))
+    command.upgrade(alembic_cfg, "head")

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In scripts/init_db.py around lines 23 to 46, the script calls
Base.metadata.create_all(bind=engine) which bypasses Alembic and leaves the DB
unversioned; remove that call and instead run the Alembic migration flow before
creating application data: ensure you programmatically invoke Alembic upgrade
head (either via alembic.command.upgrade with a Config pointing to your
alembic.ini/env or by running a subprocess "alembic upgrade head"), wait for it
to complete, then create a Session and check/create the admin user as you
already do; this ensures the alembic_version table and any imperative migration
logic are applied and prevents "table already exists" errors.

from cu.computer import *
from teams import *
from agent.agent_tools import get_tools
from agent_tools import get_tools
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Restore the absolute import path in the fallback branch.

When the relative imports fail (common when running modules from the repository root), the fallback path now does from agent_tools import get_tools, but the module actually lives under agent/agent_tools.py. This will raise ModuleNotFoundError, breaking the non-package execution path that previously worked. Please keep the agent. prefix here.

-    from agent_tools import get_tools
+    from agent.agent_tools import get_tools
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from agent_tools import get_tools
try:
from .agent_tools import get_tools
except ImportError:
from agent.agent_tools import get_tools
🤖 Prompt for AI Agents
In src/gpt_computer_agent/agent/agent_executor.py around line 21, the fallback
import uses "from agent_tools import get_tools" which will fail because the
module resides under the agent package; change the fallback to "from
agent.agent_tools import get_tools" so the absolute import matches the package
layout and restores the non-package execution path.

Comment on lines 1 to 3
""
FastAPI web interface for the GPT Computer Agent.
"""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix the module docstring quoting so the app module loads.

Same issue as the other files: the docstring starts with "" and triggers a SyntaxError on import. Switch to proper triple quotes.

-"" 
-FastAPI web interface for the GPT Computer Agent.
-"""
+"""FastAPI web interface for the GPT Computer Agent."""
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
""
FastAPI web interface for the GPT Computer Agent.
"""
"""FastAPI web interface for the GPT Computer Agent."""
🧰 Tools
🪛 Ruff (0.13.1)

2-2: Simple statements must be separated by newlines or semicolons

(invalid-syntax)


2-2: Simple statements must be separated by newlines or semicolons

(invalid-syntax)


2-2: Compound statements are not allowed on the same line as simple statements

(invalid-syntax)


2-2: Expected 'in', found name

(invalid-syntax)


2-2: Expected ':', found name

(invalid-syntax)


2-2: Simple statements must be separated by newlines or semicolons

(invalid-syntax)


2-3: Expected an identifier

(invalid-syntax)

🤖 Prompt for AI Agents
In src/gpt_computer_agent/api/app.py around lines 1 to 3, the module docstring
is incorrectly started with two double quotes (""), causing a SyntaxError on
import; change the docstring to use proper triple quotes (""" ... """) so the
module loads — replace the opening "" with """ and ensure the docstring is
closed with """ as well.

Comment on lines +1 to +48
""
SQLAlchemy models for user authentication.
"""
from datetime import datetime
from typing import Optional

from sqlalchemy import Boolean, Column, DateTime, Integer, String, Text
from sqlalchemy.orm import declarative_base

Base = declarative_base()

class User(Base):
"""User model for authentication and authorization."""
__tablename__ = "users"

id = Column(Integer, primary_key=True, index=True)
username = Column(String(50), unique=True, index=True, nullable=False)
email = Column(String(255), unique=True, index=True, nullable=True)
hashed_password = Column(String(255), nullable=False)
full_name = Column(String(100), nullable=True)
is_active = Column(Boolean(), default=True)
is_superuser = Column(Boolean(), default=False)
created_at = Column(DateTime, default=datetime.utcnow)
updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)
last_login = Column(DateTime, nullable=True)

def __repr__(self) -> str:
return f"<User {self.username}>"

@property
def is_authenticated(self) -> bool:
""
Return True if the user is authenticated.
"""
return True

@property
def is_anonymous(self) -> bool:
""
Return True if the user is anonymous.
"""
return False

def get_id(self) -> str:
""
Return the user ID as a string.
"""
return str(self.id)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix invalid docstrings that currently break module import.

Every module/method docstring here starts with "" but closes with """, which makes the file fail to import with a SyntaxError. Please convert these to proper triple-quoted docstrings so the module can even load.

-"" 
-SQLAlchemy models for user authentication.
-"""
+"""SQLAlchemy models for user authentication."""
@@
     @property
     def is_authenticated(self) -> bool:
-        ""
-        Return True if the user is authenticated.
-        """
+        """Return True if the user is authenticated."""
         return True
@@
     @property
     def is_anonymous(self) -> bool:
-        ""
-        Return True if the user is anonymous.
-        """
+        """Return True if the user is anonymous."""
         return False
@@
     def get_id(self) -> str:
-        ""
-        Return the user ID as a string.
-        """
+        """Return the user ID as a string."""
         return str(self.id)
@@
 class RefreshToken(Base):
-    ""
-    Refresh token model for JWT token refresh.
-    """
+    """Refresh token model for JWT token refresh."""
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
""
SQLAlchemy models for user authentication.
"""
from datetime import datetime
from typing import Optional
from sqlalchemy import Boolean, Column, DateTime, Integer, String, Text
from sqlalchemy.orm import declarative_base
Base = declarative_base()
class User(Base):
"""User model for authentication and authorization."""
__tablename__ = "users"
id = Column(Integer, primary_key=True, index=True)
username = Column(String(50), unique=True, index=True, nullable=False)
email = Column(String(255), unique=True, index=True, nullable=True)
hashed_password = Column(String(255), nullable=False)
full_name = Column(String(100), nullable=True)
is_active = Column(Boolean(), default=True)
is_superuser = Column(Boolean(), default=False)
created_at = Column(DateTime, default=datetime.utcnow)
updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)
last_login = Column(DateTime, nullable=True)
def __repr__(self) -> str:
return f"<User {self.username}>"
@property
def is_authenticated(self) -> bool:
""
Return True if the user is authenticated.
"""
return True
@property
def is_anonymous(self) -> bool:
""
Return True if the user is anonymous.
"""
return False
def get_id(self) -> str:
""
Return the user ID as a string.
"""
return str(self.id)
"""SQLAlchemy models for user authentication."""
from datetime import datetime
from typing import Optional
from sqlalchemy import Boolean, Column, DateTime, Integer, String, Text
from sqlalchemy.orm import declarative_base
Base = declarative_base()
class User(Base):
"""User model for authentication and authorization."""
__tablename__ = "users"
id = Column(Integer, primary_key=True, index=True)
username = Column(String(50), unique=True, index=True, nullable=False)
email = Column(String(255), unique=True, index=True, nullable=True)
hashed_password = Column(String(255), nullable=False)
full_name = Column(String(100), nullable=True)
is_active = Column(Boolean(), default=True)
is_superuser = Column(Boolean(), default=False)
created_at = Column(DateTime, default=datetime.utcnow)
updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)
last_login = Column(DateTime, nullable=True)
def __repr__(self) -> str:
return f"<User {self.username}>"
@property
def is_authenticated(self) -> bool:
"""Return True if the user is authenticated."""
return True
@property
def is_anonymous(self) -> bool:
"""Return True if the user is anonymous."""
return False
def get_id(self) -> str:
"""Return the user ID as a string."""
return str(self.id)
class RefreshToken(Base):
"""Refresh token model for JWT token refresh."""
# ... rest of RefreshToken implementation ...
🧰 Tools
🪛 Ruff (0.13.1)

2-2: Simple statements must be separated by newlines or semicolons

(invalid-syntax)


2-2: Compound statements are not allowed on the same line as simple statements

(invalid-syntax)


2-2: Expected 'in', found name

(invalid-syntax)


2-3: Expected an identifier

(invalid-syntax)


3-13: Expected an indented block after for statement

(invalid-syntax)


13-13: Simple statements must be separated by newlines or semicolons

(invalid-syntax)


13-13: Simple statements must be separated by newlines or semicolons

(invalid-syntax)


13-13: Compound statements are not allowed on the same line as simple statements

(invalid-syntax)


13-13: Invalid assignment target

(invalid-syntax)


13-34: Expected an identifier

(invalid-syntax)


13-34: Expected 'in', found string

(invalid-syntax)


34-35: Expected ':', found newline

(invalid-syntax)


37-37: unindent does not match any outer indentation level

(invalid-syntax)


38-38: Expected class, function definition or async function definition after decorator

(invalid-syntax)


40-40: Simple statements must be separated by newlines or semicolons

(invalid-syntax)


40-40: Expected 'else', found name

(invalid-syntax)


40-41: Expected an identifier

(invalid-syntax)

🤖 Prompt for AI Agents
In src/gpt_computer_agent/models/user.py lines 1 to 48, several docstrings open
with two quotes ("" ) but close with three ("""), causing a SyntaxError; update
each malformed docstring (the module-level docstring and the docstrings for
is_authenticated, is_anonymous, and get_id) so they use proper triple-quote
delimiters at the start (""" ... """) and keep the existing content unchanged.

Comment on lines +7 to +10
from sqlalchemy import Boolean, Column, DateTime, Integer, String, Text
from sqlalchemy.orm import declarative_base

Base = declarative_base()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Use the shared Declarative Base from db.session.

This file spins up its own declarative_base(), so init_db() (which uses the project’s shared Base from db.session) will never see these models and therefore won’t create the users/refresh_tokens tables. Import the shared Base instead so metadata stays unified.

-from sqlalchemy.orm import declarative_base
-
-Base = declarative_base()
+from ..db.session import Base
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from sqlalchemy import Boolean, Column, DateTime, Integer, String, Text
from sqlalchemy.orm import declarative_base
Base = declarative_base()
from sqlalchemy import Boolean, Column, DateTime, Integer, String, Text
from ..db.session import Base
🤖 Prompt for AI Agents
In src/gpt_computer_agent/models/user.py around lines 7 to 10, the file creates
a new declarative_base() locally which prevents init_db() from seeing these
models; replace the local declarative_base() with an import of the shared Base
from the project's db.session module (remove the local Base = declarative_base()
and import Base from db.session) so the models register on the unified metadata
and the users/refresh_tokens tables are created by init_db().

Comment on lines +127 to 130
from . import MainWindow
except ImportError:
from gpt_computer_agent import MainWindow
from . import MainWindow
os.environ["QT_AUTO_SCREEN_SCALE_FACTOR"] = "1"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix the MainWindow imports in both code paths.

from . import MainWindow binds the module (or fails outright if __package__ is unset), so MainWindow() becomes a TypeError when you launch the app. In addition, the fallback path no longer works for running the script directly because it now performs the same relative import. Please restore the qualified imports.

Apply this diff:

-        from . import MainWindow
+        from .gpt_computer_agent import MainWindow
@@
-        from . import MainWindow
+        from gpt_computer_agent.gpt_computer_agent import MainWindow
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from . import MainWindow
except ImportError:
from gpt_computer_agent import MainWindow
from . import MainWindow
os.environ["QT_AUTO_SCREEN_SCALE_FACTOR"] = "1"
from .gpt_computer_agent import MainWindow
except ImportError:
from gpt_computer_agent.gpt_computer_agent import MainWindow
os.environ["QT_AUTO_SCREEN_SCALE_FACTOR"] = "1"
🤖 Prompt for AI Agents
In src/gpt_computer_agent/start.py around lines 127 to 130, the imports
currently use "from . import MainWindow" in both try and except branches which
binds the module (or fails if __package__ is unset) causing MainWindow() to
raise a TypeError; change the try/except to import the class directly: inside
the try import MainWindow from the package-relative module (from .MainWindow
import MainWindow) and in the except ImportError import the class via the
top-level module for direct script execution (from MainWindow import
MainWindow), leaving the subsequent os.environ line unchanged.

Comment on lines +167 to +216
def setup_logging(
log_dir: Union[str, Path] = "logs",
log_level: str = "INFO",
log_format: str = "{time:YYYY-MM-DD HH:mm:ss.SSS} | {level: <8} | {name}:{function}:{line} - {message}",
retention: str = "30 days",
rotation: str = "100 MB",
serialize: bool = False
) -> None:
"""
Configure logging with loguru.

Args:
log_dir: Directory to store log files.
log_level: Logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL).
log_format: Log format string.
retention: Log retention period.
rotation: Log rotation condition.
serialize: Whether to output logs as JSON.
"""
from loguru import logger
import sys

# Remove default handler
logger.remove()

# Add stderr handler
logger.add(
sys.stderr,
level=log_level,
format=log_format,
colorize=True,
backtrace=True,
diagnose=True
)

# Add file handler
log_dir = ensure_directory(log_dir)
log_file = log_dir / f"gpt_agent_{get_timestamp('%Y%m%d')}.log"

logger.add(
str(log_file),
level=log_level,
format=log_format,
rotation=rotation,
retention=retention,
enqueue=True,
backtrace=True,
diagnose=True,
serialize=serialize
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Disable Loguru diagnose/backtrace by default.

diagnose=True (and backtrace=True) makes Loguru dump full stack locals on exceptions. That’s great for debugging but it will happily spill API keys, tokens, and other sensitive request context straight into your logs in production. Please default these to False (or gate them behind explicit opts) so we don’t widen the blast radius of an incident.

-def setup_logging(
-    log_dir: Union[str, Path] = "logs",
-    log_level: str = "INFO",
-    log_format: str = "{time:YYYY-MM-DD HH:mm:ss.SSS} | {level: <8} | {name}:{function}:{line} - {message}",
-    retention: str = "30 days",
-    rotation: str = "100 MB",
-    serialize: bool = False
-) -> None:
+def setup_logging(
+    log_dir: Union[str, Path] = "logs",
+    log_level: str = "INFO",
+    log_format: str = "{time:YYYY-MM-DD HH:mm:ss.SSS} | {level: <8} | {name}:{function}:{line} - {message}",
+    retention: str = "30 days",
+    rotation: str = "100 MB",
+    serialize: bool = False,
+    backtrace: bool = False,
+    diagnose: bool = False,
+) -> None:
@@
-        colorize=True,
-        backtrace=True,
-        diagnose=True
+        colorize=True,
+        backtrace=backtrace,
+        diagnose=diagnose,
@@
-        backtrace=True,
-        diagnose=True,
-        serialize=serialize
+        backtrace=backtrace,
+        diagnose=diagnose,
+        serialize=serialize,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def setup_logging(
log_dir: Union[str, Path] = "logs",
log_level: str = "INFO",
log_format: str = "{time:YYYY-MM-DD HH:mm:ss.SSS} | {level: <8} | {name}:{function}:{line} - {message}",
retention: str = "30 days",
rotation: str = "100 MB",
serialize: bool = False
) -> None:
"""
Configure logging with loguru.
Args:
log_dir: Directory to store log files.
log_level: Logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL).
log_format: Log format string.
retention: Log retention period.
rotation: Log rotation condition.
serialize: Whether to output logs as JSON.
"""
from loguru import logger
import sys
# Remove default handler
logger.remove()
# Add stderr handler
logger.add(
sys.stderr,
level=log_level,
format=log_format,
colorize=True,
backtrace=True,
diagnose=True
)
# Add file handler
log_dir = ensure_directory(log_dir)
log_file = log_dir / f"gpt_agent_{get_timestamp('%Y%m%d')}.log"
logger.add(
str(log_file),
level=log_level,
format=log_format,
rotation=rotation,
retention=retention,
enqueue=True,
backtrace=True,
diagnose=True,
serialize=serialize
)
def setup_logging(
log_dir: Union[str, Path] = "logs",
log_level: str = "INFO",
log_format: str = "{time:YYYY-MM-DD HH:mm:ss.SSS} | {level: <8} | {name}:{function}:{line} - {message}",
retention: str = "30 days",
rotation: str = "100 MB",
serialize: bool = False,
backtrace: bool = False,
diagnose: bool = False,
) -> None:
"""
Configure logging with loguru.
Args:
log_dir: Directory to store log files.
log_level: Logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL).
log_format: Log format string.
retention: Log retention period.
rotation: Log rotation condition.
serialize: Whether to output logs as JSON.
backtrace: Whether to capture tracebacks.
diagnose: Whether to include local variables in tracebacks.
"""
from loguru import logger
import sys
# Remove default handler
logger.remove()
# Add stderr handler
logger.add(
sys.stderr,
level=log_level,
format=log_format,
colorize=True,
backtrace=backtrace,
diagnose=diagnose
)
# Add file handler
log_dir = ensure_directory(log_dir)
log_file = log_dir / f"gpt_agent_{get_timestamp('%Y%m%d')}.log"
logger.add(
str(log_file),
level=log_level,
format=log_format,
rotation=rotation,
retention=retention,
enqueue=True,
backtrace=backtrace,
diagnose=diagnose,
serialize=serialize
)
🤖 Prompt for AI Agents
In src/gpt_computer_agent/utils/common.py around lines 167-216, the loguru
handlers currently enable backtrace=True and diagnose=True by default which can
leak sensitive data; change both calls to set backtrace=False and diagnose=False
by default, and expose optional function parameters (e.g., enable_backtrace:
bool = False, enable_diagnose: bool = False) or a single debug flag so callers
can opt-in to turning them on in non-production/debug runs; update the two
logger.add invocations to use these parameters instead of hardcoded True.

Comment on lines +1 to +3
""
Persistent storage for conversation history and agent state.
"""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Repair the module docstring to avoid a SyntaxError.

The opening docstring uses "" instead of """, so this module cannot be imported. Please fix the quoting.

-"" 
-Persistent storage for conversation history and agent state.
-"""
+"""Persistent storage for conversation history and agent state."""
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
""
Persistent storage for conversation history and agent state.
"""
"""Persistent storage for conversation history and agent state."""
🧰 Tools
🪛 Ruff (0.13.1)

2-2: Simple statements must be separated by newlines or semicolons

(invalid-syntax)


2-2: Compound statements are not allowed on the same line as simple statements

(invalid-syntax)


2-2: Expected 'in', found name

(invalid-syntax)


2-2: Expected ':', found name

(invalid-syntax)


2-3: Expected an identifier

(invalid-syntax)

🤖 Prompt for AI Agents
In src/gpt_computer_agent/utils/storage.py around lines 1 to 3, the module
docstring opens with two double quotes (""), causing a SyntaxError; replace the
opening delimiter with a proper triple-quote (""") and ensure there's a matching
closing triple-quote so the docstring is a valid triple-quoted string and the
existing text "Persistent storage for conversation history and agent state."
remains inside it.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 62

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
bump.py (1)

21-25: Fix: point version bump to the src-based package

The repository now uses a src/ layout (pyproject + uv packaging noted in the PR description), so the canonical __init__.py lives at src/aideck/__init__.py. aideck/__init__.py no longer exists at the repo root, and running python bump.py patch will raise FileNotFoundError. Please switch both the read and write paths to the src location (or otherwise derive the path from the installed module).

-    with open("aideck/__init__.py", "r") as file:
+    with open("src/aideck/__init__.py", "r") as file:-    with open("aideck/__init__.py", "r+") as file:
+    with open("src/aideck/__init__.py", "r+") as file:

Also applies to: 66-70

src/aideck/gui/llmsettings.py (1)

268-299: Add missing optional-dependencies for local_tts and local_stt
Under [project.optional-dependencies] in pyproject.toml, define extras named local_tts and local_stt so that aideck[local_tts] and aideck[local_stt] resolve correctly.

src/aideck/api.py (1)

86-103: Revert to symbol-level imports for the input box

from . import the_input_box returns the aideck.the_input_box module. Lines 88‑101 then call the_input_box.toPlainText(), assuming it’s the widget instance, which triggers AttributeError: module 'aideck.the_input_box' has no attribute 'toPlainText'. Import the actual object instead of the module.

Apply this diff:

-    from . import the_main_window, the_input_box
+    from .the_main_window import the_main_window
+    from .the_input_box import the_input_box
src/aideck/audio/tts.py (2)

36-39: Fix potential IndexError when only one speaker is configured.

With a single entry in supported_openai_speakers (currently only "fable"), random.choice([]) will raise. Fallback to the excluded voice when no alternative exists.

Apply:

 def random_model(exclude):
-    models = supported_openai_speakers.copy()
-    models.remove(exclude)
-    return random.choice(models)
+    models = [m for m in supported_openai_speakers if m != exclude]
+    return random.choice(models) if models else exclude

43-51: Avoid cache collisions across voices/providers; include them in the hash.

Currently hashing only text can reuse a file generated with a different voice/provider. Include tts_setting and voice in the hash and remove the redundant the_model line.

-    sha = hashlib.sha256(text_chunk.encode()).hexdigest()
-    location = os.path.join(artifacts_dir, f"{sha}.mp3")
+    tts_setting = load_tts_model_settings()
+    sha = hashlib.sha256(f"{tts_setting}:{voice}:{text_chunk}".encode()).hexdigest()
+    location = os.path.join(artifacts_dir, f"{sha}.mp3")
@@
-        the_model = load_model_settings()
-        tts_setting = load_tts_model_settings()
+        # the_model not used; removing avoids confusion
🧹 Nitpick comments (90)
src/aideck/context/agent.py (1)

8-19: Add the missing return annotation.

This helper always returns a JSON string, but without a -> str annotation the signature degrades to Any, which weakens type checking for downstream code.

Apply this diff:

-def turn_agent_to_string(agent: Agent):
+def turn_agent_to_string(agent: Agent) -> str:
src/aideck/cache/cache_manager.py (1)

158-175: Limit the LLM comparison batch to prevent prompt blow-ups

This prompt grows without bound as the cache fills up. Once you have dozens of cached entries, every lookup pushes a huge prompt to the LLM—blowing context limits, driving costs, and turning cache hits into failures. Please cap the batch size (oldest-first eviction or top‑N selection) before building the prompt.

.github/workflows/publish.yml (1)

30-31: Make publish non-interactive

Ensure CI doesn’t hang on confirmations.

-      - name: Publish
-        run: uv publish -t ${{ secrets.THE_PYPI_TOKEN }}
+      - name: Publish
+        run: uv publish -y -t ${{ secrets.THE_PYPI_TOKEN }}
src/aideck/agent/base.py (1)

3-12: Silence B024 or add a minimal abstract API

Ruff warns ABC with no abstract methods (B024). If this class is a pure marker to break cycles, silence the rule; otherwise, add a small abstract method.

Option A (silence rule):

-class BaseAgent(ABC):
+class BaseAgent(ABC):  # noqa: B024

Option B (enforce minimal interface):

-from abc import ABC
+from abc import ABC, abstractmethod
@@
-class BaseAgent(ABC):
+class BaseAgent(ABC):
@@
-    pass
+    @abstractmethod
+    def run(self, *args, **kwargs):
+        """Execute the agent; concrete subclasses must implement."""
+        raise NotImplementedError
.env.example (1)

15-16: Align DB filename with the new project name

Reflect the rebrand for clarity in local setups.

-DATABASE_URL=sqlite:///./gpt_agent.db
+DATABASE_URL=sqlite:///./aideck.db
src/aideck/agent/context_managers/memory_manager.py (1)

80-82: Avoid truthiness bug and unnecessary task; don’t let finally overshadow primary errors

  • If model_response is falsy but valid (e.g., empty list), updates are skipped. Check against None.
  • create_task(...); await task is redundant; just await the coroutine.
  • Consider protecting the finally block to avoid masking exceptions from inside the async with body.
-            if self.memory and self._model_response:
-                task = asyncio.create_task(self.memory.update_memories_after_task(self._model_response))
-                await task
+            if self.memory and self._model_response is not None:
+                try:
+                    await self.memory.update_memories_after_task(self._model_response)
+                except Exception as e:
+                    # TODO: replace with proper logger when available
+                    print(f"Warning: memory update failed: {e}")
src/aideck/agent/context_managers/storage_manager.py (3)

48-49: Avoid truthiness bug when deciding to persist

Persist should run when a response exists, even if it’s falsy (e.g., []). Check against None.

-            if self.agent.storage and self.model_response:
+            if self.agent.storage and self.model_response is not None:
                 self._save_to_storage()

40-42: Don’t catch broad Exception; avoid print in libraries

Limit to expected exceptions (e.g., pydantic ValidationError/TypeError) and log a warning instead of printing.

-                except Exception as e:
-                    print(f"Warning: Could not validate stored history. Starting fresh. Error: {e}")
+                except (Exception,) as e:  # TODO: narrow to pydantic ValidationError/TypeError
+                    # TODO: replace with structured logger
+                    print(f"Warning: invalid stored history; starting fresh. Error: {e}")
                     self.message_history = []

69-74: Consider appending instead of overwriting memory (if sessions are multi-run)

Currently replaces memory with the latest run, losing prior runs. If design requires multi-run history, append and cap size.

-            updated_session_data['memory'] = [all_messages_as_dicts]
+            existing = updated_session_data.get('memory') or []
+            updated_session_data['memory'] = (existing + [all_messages_as_dicts])[-10:]
src/aideck/embeddings/openai_provider.py (6)

168-190: Rate-limit state is not concurrency-safe

Concurrent calls mutate _request_times and _token_usage without coordination, leading to inaccurate throttling.

  • Add self._rl_lock = asyncio.Lock() in __init__.
  • Wrap the entire _check_rate_limits body with async with self._rl_lock:.
@@
     def __init__(self, config: Optional[OpenAIEmbeddingConfig] = None, **kwargs):
@@
         self._token_usage: List[tuple] = []
+        self._rl_lock = asyncio.Lock()
@@
     async def _check_rate_limits(self, estimated_tokens: int) -> None:
@@
-        current_time = time.time()
+        async with self._rl_lock:
+            current_time = time.time()
@@
-        self._request_times.append(current_time)
-        self._token_usage.append((current_time, estimated_tokens))
+            self._request_times.append(current_time)
+            self._token_usage.append((current_time, estimated_tokens))

229-241: Tighten exception handling for retry-after parsing

Avoid bare except, and prefer raise chaining for context.

-        except openai.RateLimitError as e:
+        except openai.RateLimitError as e:
             wait_time = 60
-            if "retry-after" in str(e):
-                try:
+            if "retry-after" in f"{e!s}":
+                try:
                     import re
-                    wait_match = re.search(r'retry-after[:\s]+(\d+)', str(e), re.IGNORECASE)
+                    wait_match = re.search(r'retry-after[:\s]+(\d+)', f"{e!s}", re.IGNORECASE)
                     if wait_match:
                         wait_time = int(wait_match.group(1))
-                except:
-                    pass
+                except (Exception) as _parse_err:
+                    # ignore parse errors; use default wait_time
+                    wait_time = wait_time
 
             await asyncio.sleep(min(wait_time, 300))
             raise

244-255: Preserve original tracebacks and avoid redundant str() in messages

Use raise ... from e and format with !s to satisfy linters.

-        except openai.AuthenticationError as e:
-            raise ConfigurationError(
-                f"OpenAI authentication failed: {str(e)}",
+        except openai.AuthenticationError as e:
+            raise ConfigurationError(
+                f"OpenAI authentication failed: {e!s}",
                 error_code="AUTHENTICATION_ERROR",
-                original_error=e
-            )
+                original_error=e
+            ) from e
@@
-        except openai.BadRequestError as e:
-            raise ModelConnectionError(
-                f"OpenAI API request error: {str(e)}",
+        except openai.BadRequestError as e:
+            raise ModelConnectionError(
+                f"OpenAI API request error: {e!s}",
                 error_code="API_REQUEST_ERROR", 
-                original_error=e
-            )
+                original_error=e
+            ) from e

257-262: Avoid catching bare Exception; preserve context

Catching all exceptions hides actionable error types. If keeping the catch-all, chain the exception.

-        except Exception as e:
-            raise ModelConnectionError(
-                f"OpenAI embedding failed: {str(e)}",
+        except Exception as e:
+            raise ModelConnectionError(
+                f"OpenAI embedding failed: {e!s}",
                 error_code="EMBEDDING_FAILED",
-                original_error=e
-            )
+                original_error=e
+            ) from e

270-272: Use logging instead of print in library code

Replace prints with structured logging or a provided logger.

-            print(f"OpenAI connection validation failed: {str(e)}")
+            # TODO: replace with structured logger
+            print(f"OpenAI connection validation failed: {e!s}")

35-36: Pydantic v2: prefer field_validator over validator

To avoid deprecation warnings and future breaks, use @field_validator("model_name").

Based on learnings

-from pydantic import Field, validator
+from pydantic import Field, field_validator
@@
-    @validator('model_name')
-    def validate_model_name(cls, v):
+    @field_validator('model_name')
+    def validate_model_name(cls, v: str):
         """Validate and auto-correct model names."""
src/aideck/loaders/json.py (4)

6-6: Guard optional jq dependency with a clear error

Importing jq at module import time will hard-fail projects that don’t use this loader. Provide a helpful message.

-import jq
+try:
+    import jq
+except ImportError as e:
+    raise ImportError("jq package is required for JSONLoader. Install with: pip install jq") from e

33-39: Avoid broad Exception and chain errors in jq execution

Narrow exception and preserve traceback.

-        except Exception as e:
-            raise ValueError(f"Error executing JQ query '{query}': {e}")
+        except Exception as e:
+            raise ValueError(f"Error executing JQ query '{query}': {e!s}") from e

72-78: Ensure record selector yields a list

If a selector returns a single object, iterating can misbehave (e.g., strings). Normalize to list.

-        records = self._execute_jq_query(self.config.record_selector, data)
+        records = self._execute_jq_query(self.config.record_selector, data)
+        if not isinstance(records, list):
+            records = [records]

105-106: Narrow exception and preserve context in loader

Catching all exceptions may hide actionable errors; at least chain the exception.

-        except Exception as e:
-            return self._handle_loading_error(str(file_path), e)
+        except Exception as e:
+            return self._handle_loading_error(str(file_path), e)

Note: If _handle_loading_error logs/handles types, keeping the broad catch is acceptable; ensure it reports the original exception.

src/aideck/loaders/config.py (1)

340-354: Annotate mutable class attribute with ClassVar

Prevents pydantic treating it as a model field.

-from typing import Dict, Any, Optional, List, Literal
+from typing import Dict, Any, Optional, List, Literal, ClassVar
@@
-    _config_map: Dict[str, type] = {
+    _config_map: ClassVar[Dict[str, type]] = {
src/aideck/loaders/text.py (4)

69-73: Bound concurrent file tasks to avoid too many open files.

Unbounded gather() can overwhelm the OS on large batches. Gate with a small semaphore (e.g., 64).

Example (no diff):

  • Create a semaphore = asyncio.Semaphore(64)
  • Wrap _process_single_file with a small async that acquires/releases the semaphore before awaiting.

89-95: Duplicate-source handling is inconsistent vs other loaders.

Here duplicates become error docs via _handle_loading_error, while XMLLoader re-raises FileExistsError. Align behavior across loaders (prefer re-raise or quietly return []) to keep API predictable.


96-101: Prefer keyword args for Path.read_text for clarity.

Make the fallback explicit and consistent with the aiofiles path.

Apply this diff:

-                content = await asyncio.to_thread(path.read_text, self.config.encoding, "ignore")
+                content = await asyncio.to_thread(
+                    path.read_text, encoding=self.config.encoding, errors="ignore"
+                )

115-116: Avoid blanket except; narrow or at least log error type.

Catching Exception obscures actionable errors; at minimum, log exception class and path.

src/aideck/embeddings/azure_openai_provider.py (3)

103-113: Preserve exception chaining when wrapping managed identity errors.

Use “raise … from e” to keep the original traceback; drop the bare broad except.

Apply this diff:

-        try:
-            if self.config.client_id:
-                self._credential = ManagedIdentityCredential(client_id=self.config.client_id)
-            else:
-                self._credential = DefaultAzureCredential()
-        except Exception as e:
-            raise ConfigurationError(
-                f"Failed to setup Azure Managed Identity: {str(e)}",
-                error_code="AZURE_AUTH_ERROR",
-                original_error=e
-            )
+        try:
+            if self.config.client_id:
+                self._credential = ManagedIdentityCredential(client_id=self.config.client_id)
+            else:
+                self._credential = DefaultAzureCredential()
+        except Exception as e:
+            raise ConfigurationError(
+                f"Failed to setup Azure Managed Identity: {e}",
+                error_code="AZURE_AUTH_ERROR",
+                original_error=e
+            ) from e

325-330: Preserve original stack on generic error wrap.

Use exception chaining to keep context.

Apply this diff:

-        except Exception as e:
-            raise ModelConnectionError(
-                f"Azure OpenAI embedding failed: {str(e)}",
-                error_code="AZURE_EMBEDDING_FAILED",
-                original_error=e
-            )
+        except Exception as e:
+            raise ModelConnectionError(
+                f"Azure OpenAI embedding failed: {e}",
+                error_code="AZURE_EMBEDDING_FAILED",
+                original_error=e
+            ) from e

332-340: Avoid print in library code; use structured logging.

Replace prints with your logger (e.g., loguru) as per project logging setup.

src/aideck/loaders/xml.py (2)

57-61: Limit concurrent file parses.

As with TextLoader, bound gather() to a sensible concurrency to avoid FD exhaustion.


89-91: Trim long exception messages (TRY003).

Prefer a concise message or a custom exception; long f-strings in raises are noisy.

src/aideck/embeddings/ollama_provider.py (3)

101-104: requests.Session.timeout assignment has no effect.

Requests has no session-wide timeout; pass per-request (which you already do). Remove the dead assignment.

Apply this diff:

         self.session = requests.Session()
-        
-        self.session.timeout = (self.config.connection_timeout, self.config.request_timeout)

432-444: Preserve original errors when wrapping; specialize connection failures.

Map aiohttp connector/timeouts explicitly; chain exceptions to maintain tracebacks.

Apply this diff:

-        except Exception as e:
-            if "connection" in str(e).lower():
-                raise ModelConnectionError(
-                    f"Failed to connect to Ollama server at {self.config.base_url}: {str(e)}",
-                    error_code="OLLAMA_CONNECTION_ERROR",
-                    original_error=e
-                )
-            else:
-                raise ModelConnectionError(
-                    f"Ollama embedding failed: {str(e)}",
-                    error_code="OLLAMA_EMBEDDING_ERROR",
-                    original_error=e
-                )
+        except asyncio.TimeoutError as e:
+            raise ModelConnectionError(
+                f"Timed out connecting to {self.config.base_url}",
+                error_code="OLLAMA_TIMEOUT",
+                original_error=e
+            ) from e
+        except Exception as e:
+            msg = str(e).lower()
+            code = "OLLAMA_CONNECTION_ERROR" if "connection" in msg else "OLLAMA_EMBEDDING_ERROR"
+            raise ModelConnectionError(
+                f"Ollama embedding failed: {e}",
+                error_code=code,
+                original_error=e
+            ) from e

356-366: Consider batching concurrently when texts > 1.

Current loop sends requests sequentially; use asyncio.gather with bounded concurrency to speed up.

src/aideck/loaders/yaml.py (3)

83-92: Duplicate-source handling: re-raise or return [].

As with Text/XML loaders, align behavior for duplicates. Right now you raise then immediately catch and convert to error docs.


6-15: Make jq dependency optional with a graceful fallback.

Hard error on missing pyjq limits adoption. If split_by_jq_query is trivial (e.g., “.”), allow operating without jq.


63-66: Bound concurrency when processing many YAML files.

As with other loaders, limit gather() fan-out to avoid FD exhaustion.

src/aideck/loaders/csv.py (4)

158-206: Avoid reimplementing CSV parsing in async; offload sync loader to a thread

The async implementation reads the entire file into memory and duplicates logic. Delegate to the tested sync path without blocking the event loop.

-    async def _aload_single_file(self, file_path: Path) -> List[Document]:
-        """Async helper to load documents from a single CSV file."""
-        documents = []
-        try:
-            document_id = self._generate_document_id(file_path)
-            if document_id in self._processed_document_ids:
-                raise FileExistsError(
-                    f"Source file '{file_path.resolve()}' has already been processed."
-                )
-            self._processed_document_ids.add(document_id)
-
-            if not self._check_file_size(file_path):
-                return []
-
-            async with aiofiles.open(file_path, mode="r", encoding=self.config.encoding or "utf-8", newline="") as f:
-                content_str = await f.read()
-                file_lines = content_str.splitlines()
-
-                if self.config.has_header:
-                    reader = csv.DictReader(
-                        file_lines,
-                        delimiter=self.config.delimiter,
-                        quotechar=self.config.quotechar,
-                    )
-                else:
-                    standard_reader = csv.reader(
-                        file_lines,
-                        delimiter=self.config.delimiter,
-                        quotechar=self.config.quotechar,
-                    )
-                    reader = (
-                        {f"column_{i}": val for i, val in enumerate(row)}
-                        for row in standard_reader
-                    )
-                
-                all_rows = []
-                for i, row in enumerate(reader):
-                    content = self._synthesize_content(row)
-                    if self.config.skip_empty_content and not content.strip():
-                        continue
-                    all_rows.append(content)
-                
-                if all_rows:
-                    documents.extend(self._create_documents_from_rows(all_rows, file_path, document_id))
-        except Exception as e:
-            docs_from_error = self._handle_loading_error(str(file_path), e)
-            documents.extend(docs_from_error)
-
-        return documents
+    async def _aload_single_file(self, file_path: Path) -> List[Document]:
+        """Async helper to load documents from a single CSV file without blocking the event loop."""
+        if not self._check_file_size(file_path):
+            return []
+        return await asyncio.to_thread(self._load_single_file, file_path)

131-136: Rename unused loop variable

Avoid B007 warnings; use _i to signal intentional unused variable.

-                for i, row in enumerate(reader):
+                for _i, row in enumerate(reader):
@@
-                for i, row in enumerate(reader):
+                for _i, row in enumerate(reader):

Also applies to: 194-199


139-141: Catch narrower exceptions

Catching Exception hides actionable errors. Limit to csv.Error, UnicodeDecodeError, OSError, ValueError.

-        except Exception as e:
+        except (csv.Error, UnicodeDecodeError, OSError, ValueError) as e:
             docs_from_error = self._handle_loading_error(str(file_path), e)
             documents.extend(docs_from_error)

106-112: Avoid marking files as processed before successful load

Adding to _processed_document_ids before parsing prevents retrials after transient failures. Move the add into the success path.

@@
-            self._processed_document_ids.add(document_id)
+            # Mark as processed after successful parse
@@
-                if all_rows:
-                    documents.extend(self._create_documents_from_rows(all_rows, file_path, document_id))
+                if all_rows:
+                    self._processed_document_ids.add(document_id)
+                    documents.extend(self._create_documents_from_rows(all_rows, file_path, document_id))

Also applies to: 137-139

src/aideck/loaders/base.py (1)

77-81: Prefer SHA‑256 over MD5 for IDs

MD5 triggers linters and is discouraged. Document IDs don’t need speed over safety; sha256 is fine.

-        return hashlib.md5(absolute_path_str.encode("utf-8")).hexdigest()
+        return hashlib.sha256(absolute_path_str.encode("utf-8")).hexdigest()
src/aideck/knowledge_base/knowledge_base.py (3)

270-273: Prefer SHA‑256 for content hash IDs

Avoid MD5; use sha256 for deterministic content-based IDs.

-        content_hash = hashlib.md5(content.encode("utf-8")).hexdigest()
+        content_hash = hashlib.sha256(content.encode("utf-8")).hexdigest()

371-371: Remove unnecessary f‑string

Avoid F541 lint: f-string without placeholders.

-                    print(f"      Detected direct content, creating document directly...")
+                    print("      Detected direct content, creating document directly...")

747-766: Avoid complex async teardown in del

Running event‑loop operations in del is fragile. Prefer explicit close() in user code; in destructor, only log a warning.

src/aideck/embeddings/bedrock_provider.py (3)

363-373: Preserve exception chaining for diagnostics

Use “raise … from e” when wrapping to keep original traceback.

-                raise ConfigurationError(
+                raise ConfigurationError(
                     f"Bedrock model error: {error_message}",
                     error_code=f"BEDROCK_{error_code.upper()}",
                     original_error=e
-                )
+                ) from e
@@
-                raise ModelConnectionError(
+                raise ModelConnectionError(
                     f"Bedrock API error ({error_code}): {error_message}",
                     error_code=f"BEDROCK_{error_code.upper()}",
                     original_error=e
-                )
+                ) from e
@@
-            raise ModelConnectionError(
+            raise ModelConnectionError(
                 f"Bedrock embedding failed: {str(e)}",
                 error_code="BEDROCK_EMBEDDING_ERROR",
                 original_error=e
-            )
+            ) from e

Also applies to: 369-373, 376-380


283-284: Rename unused parameter

Avoid ARG002 lint by prefixing with underscore.

-    def _extract_embeddings(self, response_body: Dict[str, Any], num_texts: int) -> List[List[float]]:
+    def _extract_embeddings(self, response_body: Dict[str, Any], _num_texts: int) -> List[List[float]]:

41-75: Pydantic v2: prefer field_validator

validator works but is deprecated in v2; migrate to field_validator for forward compatibility.

Based on learnings

Also applies to: 63-75

src/aideck/embeddings/gemini_provider.py (2)

457-461: Batch path also blocks the event loop

Wrap embed_content in asyncio.to_thread in batch loop.

-                response = self.client.models.embed_content(
-                    model=self.config.model_name,
-                    contents=text,
-                    config=config
-                )
+                response = await asyncio.to_thread(
+                    lambda: self.client.models.embed_content(
+                        model=self.config.model_name,
+                        contents=text,
+                        config=config
+                    )
+                )

63-112: Pydantic v2: migrate validators

Switch to field_validator and model_validator for forward compatibility.

Based on learnings

DEVELOPMENT.md (1)

103-119: Annotate the project-structure fence with a language hint.

Adding an explicit language (e.g., text) quiets markdownlint MD040 and makes downstream renderers handle the block consistently.

-```
+```text
 aideck/
 ├── src/aideck/           # Main source code
 │   ├── api/             # FastAPI routes
@@
 └── scripts/            # Utility scripts
src/aideck/gpt_computer_agent.py (1)

3-28: Drop the duplicate agent_executor wildcard import.

We now pull in agent_executor twice in both the package and fallback branches. The second import buys us nothing and just adds noise.

-    from .agent.agent_executor import *
-
-    from .agent.agent_executor import *
+    from .agent.agent_executor import *
@@
-    from agent.agent_executor import *
-
-    from agent.agent_executor import *
+    from agent.agent_executor import *
CLAUDE.md (1)

71-75: Clarify the recovery command sequence.

uv run requires a command; running it bare will just error. Spell out the two steps (uninstall, then rerun the target command) so readers aren’t left with a broken snippet.

-```python
-uv pip uninstall aideck && uv run 
-```
+```bash
+uv pip uninstall aideck
+uv run python -m aideck
+```
notebooks/README.md (1)

97-105: Specify a language for the directory tree fence

markdownlint (MD040) is flagging this fence because it lacks a language hint. Please mark it as plain text so docs linting stays green. Based on static analysis hints.

-```
+```text
 notebooks/
 ├── stock_report.ipynb              # Original notebook
 └── stock_report_improved.ipynb     # Enhanced version
@@
-```
+```
.claude/agents/unittest-generator.md (2)

115-148: Add language tags to the guidance fences

Multiple instruction fences in this section are missing a language tag, which trips markdownlint (MD040). Please tag them as plain text so linting succeeds. Based on static analysis hints.

-```
+```text
 Generate unit tests for the OpenAIEmbedding class in embeddings/openai_provider.py
 - Include initialization tests
 - Test embedding generation
 - Test error handling
 - Test configuration validation
@@
-```
+```

-```
+```text
 Create integration tests for the chat API endpoints
 - Test successful chat requests
 - Test error responses
 - Test authentication
 - Test rate limiting
@@
-```
+```

-```
+```text
 Generate tests for the storage layer
 - Test CRUD operations
 - Test connection handling
 - Test transaction management
 - Test error scenarios
@@
-```
+```

-```
+```text
 Create tests for async functions in the reliability layer
 - Test retry mechanisms
 - Test circuit breaker patterns
 - Test async error handling
 - Test concurrent operations
-```
+```

313-329: Tag the directory layout fence as text

The directory structure fence also needs a language hint to satisfy markdownlint (MD040). Based on static analysis hints.

-```
+```text
 tests/
 ├── unit/
 │   ├── test_embeddings.py
@@
 └── conftest.py
-```
+```
src/aideck/audio/tts.py (2)

18-23: Implement a real availability check for local TTS.

This always returns True, defeating the guard. At minimum, probe that the provider module/deps are importable.

 def is_local_tts_available():
-    try:
-        return True
-    except:
-        return False
+    try:
+        import importlib
+        # Replace with the actual package(s) your microsoft_local provider needs
+        return (
+            importlib.util.find_spec("pyttsx3") is not None
+            or importlib.util.find_spec("edge_tts") is not None
+        )
+    except Exception:
+        return False

Optionally, also wrap tts_microsoft_local(...) in try/except and log a clear error if it fails.


2-5: Prefer explicit imports over wildcard imports.

Wildcard imports hinder readability and static analysis. Import only what’s needed from llm/utils and the providers.

Also applies to: 7-10

src/aideck/context/sources.py (1)

2-3: Constrain retrieval_mode with Literal for safer typing.

Limit to allowed values at type level; optional validation later if needed.

-from pydantic import BaseModel
-from typing import Optional, TYPE_CHECKING
+from pydantic import BaseModel
+from typing import Optional, Literal
@@
-    retrieval_mode: str = "full"  # Options: "full", "summary". Might be useful
+    retrieval_mode: Literal["full", "summary"] = "full"

Also applies to: 22-23

src/aideck/agent/context_managers/task_manager.py (1)

3-14: Add type hints and make end-of-task behavior explicit/safe on exceptions.

Ensure task_end always runs; only set response on successful path.

-class TaskManager:
-    def __init__(self, task, agent):
-        self.task = task
-        self.agent = agent
-        self.model_response = None
+from typing import Any, Optional
+
+class TaskManager:
+    def __init__(self, task: Any, agent: Any):
+        self.task = task
+        self.agent = agent
+        self.model_response: Optional[Any] = None
@@
-    async def manage_task(self):
+    async def manage_task(self):
         # Start the task
         self.task.task_start(self.agent)
-        
-        try:
-            yield self
-        finally:
-            # Set task response and end the task if we have a model response
-            if self.model_response is not None:
-                self.task.task_response(self.model_response)
-                self.task.task_end() 
+        ok = False
+        try:
+            yield self
+            ok = True
+        except Exception:
+            # Optionally: self.task.task_error(e) if your Task supports it
+            raise
+        finally:
+            if ok and self.model_response is not None:
+                self.task.task_response(self.model_response)
+            self.task.task_end()

Are task_start/task_response/task_end synchronous? If any are async, we should await them or provide async variants.

Also applies to: 16-27

src/aideck/gui/settings.py (1)

256-257: Consider matching the wake-word copy

Most of the UI copy now refers to “Aideck Tiger Tools,” but this message still mentions “Her Computer” and “jarvis.” Please double-check that these wake words are still correct for the current branding, or update them accordingly.

src/aideck/agent/context_managers/llm_manager.py (1)

20-31: Remove unused Celery locals and log failures

The Celery override block assigns task_id, task_args, and task_kwargs but only task_kwargs is used. Also, a bare except Exception: pass will swallow unexpected errors with no visibility. Please drop the unused assignments and add at least debug-level logging (or narrow the exception) so misconfigurations don’t go unnoticed.

-                task_id = current_task.request.id
-                task_args = current_task.request.args
-                task_kwargs = current_task.request.kwargs
-
-
-                if task_kwargs.get("bypass_llm_model", None) is not None:
-                    model = task_kwargs.get("bypass_llm_model")
-
-            except Exception as e:
-                pass
+                task_kwargs = current_task.request.kwargs
+                bypass_model = task_kwargs.get("bypass_llm_model")
+                if bypass_model:
+                    model = bypass_model
+            except ImportError:
+                # Celery not installed; ignore.
+                pass
+            except AttributeError:
+                # Not running inside a Celery task; ignore.
+                pass
src/aideck/agent/context_managers/__init__.py (1)

9-17: Alphabetize __all__ or justify ordering

The exported list isn’t sorted, which can cause churn in future diffs. Please alphabetize the names (or add a comment explaining intentional ordering) to keep the surface tidy.

src/aideck/loaders/docx.py (1)

117-118: Avoid silent blanket exception

Catching Exception and passing errors to _handle_loading_error without context obscures actionable details. Narrow the exception types you expect (e.g., docx.opc.exceptions.PackageNotFoundError, FileNotFoundError) or at least log before delegating so callers can triage failures.

notebooks/stock_report_improved.ipynb (2)

621-673: Remove hidden global dependencies in save_report.

save_report reads market_report, research_report, portfolio_report from outer scope, which is brittle in notebooks and hard to reuse/test.

Apply this diff and adjust the call:

-def save_report(report, config):
+def save_report(report, config, market_report=None, research_report=None, portfolio_report=None):
@@
-        report_data = {
+        report_data = {
             "timestamp": timestamp,
             "companies": config.companies,
             "config": {
                 "risk_tolerance": config.risk_tolerance,
                 "investment_horizon": config.investment_horizon,
                 "report_format": config.report_format
             },
             "reports": {
-                "market_analysis": market_report,
-                "research_analysis": research_report,
-                "portfolio_strategy": portfolio_report
+                "market_analysis": market_report,
+                "research_analysis": research_report,
+                "portfolio_strategy": portfolio_report
             },
             "comprehensive_report": report
         }

And where it is called:

-    save_report(comprehensive_report, config)
+    save_report(
+        comprehensive_report,
+        config,
+        market_report=market_report,
+        research_report=research_report,
+        portfolio_report=portfolio_report,
+    )

626-631: Sanitize filenames to avoid OS path issues.

Ticker lists can include characters that aren’t safe for filenames. Sanitize to alphanumerics, dash and underscore.

Apply this diff:

-    companies_short = config.companies.replace(', ', '_')[:50]
+    import re
+    companies_short = re.sub(r'[^A-Za-z0-9_-]+', '_', config.companies.replace(', ', '_'))[:50]
src/aideck/loaders/html.py (5)

30-33: Avoid MD5 for identifiers; prefer SHA‑256 to satisfy security scanners.

Even if non-cryptographic, MD5 triggers SAST policies. SHA‑256 avoids noise with negligible cost.

Apply this diff:

-    def _generate_document_id(self, source_identifier: str) -> str:
-        """Creates a universal MD5 hash for any source identifier string."""
-        return hashlib.md5(source_identifier.encode("utf-8")).hexdigest()
+    def _generate_document_id(self, source_identifier: str) -> str:
+        """Creates a deterministic SHA-256 hash for any source identifier string (non-cryptographic use)."""
+        return hashlib.sha256(source_identifier.encode("utf-8")).hexdigest()

218-236: Use aiohttp.ClientTimeout instead of raw int.

Passing an int for timeout is brittle; aiohttp expects ClientTimeout.

Apply this diff:

-            async with session.get(url, timeout=20) as response:
+            timeout = aiohttp.ClientTimeout(total=getattr(self.config, "timeout", 20) or 20)
+            async with session.get(url, timeout=timeout) as response:
                 response.raise_for_status()

And when creating the session (Lines 252-256):

-            headers = {"User-Agent": self.config.user_agent}
-            async with aiohttp.ClientSession(headers=headers) as session:
+            headers = {"User-Agent": self.config.user_agent}
+            timeout = aiohttp.ClientTimeout(total=getattr(self.config, "timeout", 20) or 20)
+            async with aiohttp.ClientSession(headers=headers, timeout=timeout) as session:
                 for url in urls:
                     tasks.append(self._aload_from_url(url, session))

162-174: Don’t raise for already-processed sources; skip them.

Raising then catching to report an “error” for duplicates degrades UX. Prefer idempotent skip.

Apply this diff:

-            if document_id in self._processed_document_ids:
-                raise FileExistsError(f"Source file '{source_identifier}' has already been processed.")
+            if document_id in self._processed_document_ids:
+                return []
             self._processed_document_ids.add(document_id)

196-213: Guard against duplicate processing races in async paths.

Membership check then add on a shared set isn’t atomic; concurrent tasks can double‑process the same source.

Consider protecting _processed_document_ids with a lock (threading.Lock for sync, asyncio.Lock for async) around the check/add critical section. If BaseLoader already provides such protection, use it.


175-195: SSRF and content-size controls for URL loads.

If URLs come from users, add:

  • Allow/Deny lists or URL validation
  • Max content length checks
  • IP blacklisting (no RFC1918/cloud metadata), redirects limit
  • Respect Content-Type
src/aideck/loaders/pdf.py (4)

63-70: Remove unused variable and simplify event-loop handling.

loop is unused. Keep the thread-executor path for “loop already running” and direct asyncio.run otherwise.

Apply this diff:

-        try:
-            loop = asyncio.get_running_loop()
-            import concurrent.futures
+        try:
+            asyncio.get_running_loop()
+            import concurrent.futures
             with concurrent.futures.ThreadPoolExecutor() as executor:
                 future = executor.submit(asyncio.run, self.aload(source))
                 return future.result()
         except RuntimeError:
             return asyncio.run(self.aload(source))

136-137: Remove unused variable.

page_numbers is assigned but never used.

Apply this diff:

-            page_numbers = [num for content, num in page_contents_with_nums]

156-159: Re-raise without specifying the exception to preserve traceback.

Use bare raise inside except block.

Apply this diff:

-        except (PermissionError, FileExistsError) as e:
-
-            raise e
+        except (PermissionError, FileExistsError):
+            raise

241-245: Add explicit strict parameter to zip.

Make intent clear and satisfy linters.

Apply this diff:

-            correct_count = sum(1 for actual, expected in zip(page_numbers, expected_numbers) if actual == expected)
+            correct_count = sum(1 for actual, expected in zip(page_numbers, expected_numbers, strict=False) if actual == expected)
src/aideck/embeddings/factory.py (2)

215-216: Use warnings/logging instead of print for config conflicts.

Printing in libraries is noisy. Prefer warnings.warn or logging.

Apply this diff:

-    elif kwargs:
-        print(f"Warning: Both config object and kwargs provided. Using config object, ignoring kwargs: {list(kwargs.keys())}")
+    elif kwargs:
+        import warnings
+        warnings.warn(
+            f"Both config object and kwargs provided. Using config object, ignoring kwargs: {list(kwargs.keys())}",
+            stacklevel=2,
+        )

311-328: Replace print with logging for provider selection/fallback.

Library code should log, not print.

Apply this diff:

+import logging
+logger = logging.getLogger(__name__)
@@
-            print(f"Selected {provider} embedding provider for {use_case} use case with {preference} preference")
+            logger.info("Selected %s embedding provider for %s use case with %s preference", provider, use_case, preference)
@@
-        print(f"No preferred provider available, using {provider}")
+        logger.info("No preferred provider available, using %s", provider)
src/aideck/embeddings/huggingface_provider.py (3)

95-107: Avoid prints; use warnings/logging for auth feedback.

Printing in providers is noisy.

Apply this diff:

-        print(f"HuggingFace embedding using device: {self.device}")
+        import logging; logging.getLogger(__name__).info("HuggingFace embedding using device: %s", self.device)
@@
-                print("Successfully authenticated with HuggingFace Hub")
+                import logging; logging.getLogger(__name__).info("Authenticated with HuggingFace Hub")
             except Exception as e:
-                print(f"Warning: HuggingFace authentication failed: {e}")
+                import logging; logging.getLogger(__name__).warning("HuggingFace authentication failed: %s", e)

162-163: Drop extraneous f-string.

Minor Ruff F541 nit.

Apply this diff:

-            print(f"Model loaded successfully.")
+            print("Model loaded successfully.")

297-303: Preserve exception context when re-raising.

Use raise ... from e for wrapped exceptions.

Apply this diff:

-        except Exception as e:
-            raise ModelConnectionError(
+        except Exception as e:
+            raise ModelConnectionError(
                 f"Local embedding failed: {str(e)}",
                 error_code="LOCAL_EMBEDDING_ERROR",
                 original_error=e
-            )
+            ) from e
@@
-        except Exception as e:
-            raise ModelConnectionError(f"Failed to get embeddings via API: {str(e)}", original_error=e)
+        except Exception as e:
+            raise ModelConnectionError(f"Failed to get embeddings via API: {e!s}", original_error=e) from e

Also applies to: 330-332

src/aideck/embeddings/__init__.py (2)

41-76: Sort __all__ to quiet RUF022 and keep exports tidy.

Optional, but it reduces churn and keeps public API clear.


78-91: Export the registry via __all__ for a stable public API.

If consumers import via from ... import *, the registry won’t be exposed. Add it explicitly.

Apply this diff:

     "create_gemini_semantic_embedding",
     "create_gemini_cloud_embedding",
+    "PROVIDER_REGISTRY",
 ]
src/aideck/embeddings/base.py (9)

197-201: Remove unused variable and drop unused param from the helper.

Avoid assigning to progress_table; and _create_progress_display doesn’t use its total arg.

Apply this diff:

-        if show_progress and len(uncached_texts) > 1:
-            progress_table = self._create_progress_display(len(uncached_texts))
+        if show_progress and len(uncached_texts) > 1:
+            self._create_progress_display()

302-312: Drop unused total parameter (ARG002) from _create_progress_display.

Apply this diff:

-    def _create_progress_display(self, total: int) -> Table:
+    def _create_progress_display(self) -> Table:
         """Create a progress display table."""
         table = Table(show_header=True, header_style="bold magenta")
         table.add_column("Provider", style="cyan", no_wrap=True)
         table.add_column("Model", style="green")
         table.add_column("Progress", style="yellow")
         table.add_column("Time", style="blue")
         
         console.print(Panel(table, title="Embedding Progress", expand=False))
         return table

273-279: Add strict to zip (B905) in cache update loop.

Apply this diff:

-        for i, (text, embedding) in enumerate(zip(texts, embeddings)):
+        for i, (text, embedding) in enumerate(zip(texts, embeddings, strict=True)):
             meta_idx = indices[i] if indices else i
             cache_key = self._get_cache_key(text, metadata[meta_idx] if metadata else None)
             self._cache[cache_key] = embedding

253-256: Use explicit f-string conversion flag (RUF010) and trim message.

Keep the detailed cause in original_error; message can be concise.

Apply this diff:

-        raise ModelConnectionError(
-            f"Failed to embed texts after {self.config.max_retries} attempts: {str(last_error)}",
+        raise ModelConnectionError(
+            f"Failed to embed texts after {self.config.max_retries} attempts: {last_error!s}",
             error_code="EMBEDDING_FAILED",
             original_error=last_error
         )

321-327: Avoid catching blind Exception (BLE001); narrow to known failures.

Validation should catch expected configuration/connection errors only.

Apply this diff:

     async def validate_connection(self) -> bool:
         """Validate that the embedding provider is properly configured and accessible."""
         try:
             test_embedding = await self.embed_texts(["test"], show_progress=False)
             return len(test_embedding) > 0 and len(test_embedding[0]) > 0
-        except Exception:
+        except (ConfigurationError, ModelConnectionError, asyncio.TimeoutError):
             return False

315-319: Remove unused locals in _update_progress (F841).

These aren’t used; keep the method as a no-op for now.

Apply this diff:

-        progress_pct = (current / total) * 100
-        elapsed = self._metrics.embedding_time_ms / 1000
-        eta = (elapsed / current) * (total - current) if current > 0 else 0
-        
-        pass
+        pass

216-221: Consider normalizing after compression (semantics).

If compression alters vector space, normalize after compression so outputs remain unit length.


345-351: Avoid polluting cache during connection validation.

validate_connection can insert a “test” entry if caching is enabled. Temporarily bypass cache or use a unique non-cached path.


211-213: Add strict=True to zip call to catch length mismatches

-            for idx, embedding in zip(uncached_indices, all_embeddings):
+            for idx, embedding in zip(uncached_indices, all_embeddings, strict=True):
                 embeddings[idx] = embedding
TESTING.md (1)

151-155: Avoid teaching per-test sys.path mutations

Since the project is now packaged (pyproject + src layout), it’s cleaner to rely on the packaging/pytest configuration instead of inserting src into sys.path inside individual tests. Suggest removing this snippet or replacing it with guidance on using the configured import path.

Comment on lines +13 to +15
Thanks for using Aideck! Questions are welcome, but in the future will be moving over to
[Discussions](https://github.com/khulnaSoft/aideck/discussions) page.
Quick questions should be asked [here](https://github.com/khulnaSoft/aideck/discussions/148) instead.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Update checklist URLs to Aideck

Nice branding update on Line 13, but the “Please tick the boxes” section (Line 59 onwards) still points to the old GPT-Computer-Agent repo, which will confuse reporters and send them through the wrong guidance. Let’s update those links to the new Aideck paths so the checklist stays accurate.

-        - label: You are using a [supported](https://github.com/khulnaSoft/gpt-computer-agent/blob/master/.github/SECURITY.md#supported-versions) version of GPT-Computer-Agent (check the first two digits of the version number)
+        - label: You are using a [supported](https://github.com/khulnaSoft/aideck/blob/master/.github/SECURITY.md#supported-versions) version of Aideck (check the first two digits of the version number)
           required: true
-        - label: You've checked that this [question hasn't already been raised](https://github.com/khulnaSoft/gpt-computer-agent/issues?q=is%3Aissue)
+        - label: You've checked that this [question hasn't already been raised](https://github.com/khulnaSoft/aideck/issues?q=is%3Aissue)
           required: true
-        - label: You've checked the [docs](https://github.com/khulnaSoft/gpt-computer-agent/tree/master/docs#readme) and [troubleshooting](https://github.com/khulnaSoft/gpt-computer-agent/blob/master/docs/troubleshooting.md#troubleshooting) guide
+        - label: You've checked the [docs](https://github.com/khulnaSoft/aideck/tree/master/docs#readme) and [troubleshooting](https://github.com/khulnaSoft/aideck/blob/master/docs/troubleshooting.md#troubleshooting) guide
           required: true
-        - label: You agree to the [code of conduct](https://github.com/khulnaSoft/gpt-computer-agent/blob/master/.github/CODE_OF_CONDUCT.md#contributor-covenant-code-of-conduct)
+        - label: You agree to the [code of conduct](https://github.com/khulnaSoft/aideck/blob/master/.github/CODE_OF_CONDUCT.md#contributor-covenant-code-of-conduct)
           required: true
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Thanks for using Aideck! Questions are welcome, but in the future will be moving over to
[Discussions](https://github.com/khulnaSoft/aideck/discussions) page.
Quick questions should be asked [here](https://github.com/khulnaSoft/aideck/discussions/148) instead.
- label: You are using a [supported](https://github.com/khulnaSoft/aideck/blob/master/.github/SECURITY.md#supported-versions) version of Aideck (check the first two digits of the version number)
required: true
- label: You've checked that this [question hasn't already been raised](https://github.com/khulnaSoft/aideck/issues?q=is%3Aissue)
required: true
- label: You've checked the [docs](https://github.com/khulnaSoft/aideck/tree/master/docs#readme) and [troubleshooting](https://github.com/khulnaSoft/aideck/blob/master/docs/troubleshooting.md#troubleshooting) guide
required: true
- label: You agree to the [code of conduct](https://github.com/khulnaSoft/aideck/blob/master/.github/CODE_OF_CONDUCT.md#contributor-covenant-code-of-conduct)
required: true
🤖 Prompt for AI Agents
In .github/ISSUE_TEMPLATE/question.yml around lines 13 to 15, the checklist
links later in the file still point to the old GPT-Computer-Agent repo; update
each checklist URL (from ~line 59 onwards) to the corresponding Aideck
documentation/discussion paths so reporters are directed to the correct
resources — search for any links referencing "GPT-Computer-Agent" or the old
repo path and replace them with the new Aideck URLs, double-check
relative/absolute link formats and update anchor text if needed to reflect
Aideck branding.

Comment on lines +34 to +37
- name: Run Claude Code Review
id: claude-review
uses: anthropics/claude-code-action@v1
with:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Pin third-party action to an immutable revision

This workflow references anthropics/claude-code-action@v1, a mutable tag. If Anthropic ever force-pushes that tag, our reviews will execute unvetted code. Pin the action to a specific commit SHA (and document how to bump it) to keep the supply-chain surface bounded.

Apply this diff to pin the action:

-        uses: anthropics/claude-code-action@v1
+        uses: anthropics/claude-code-action@<commit-sha>
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Run Claude Code Review
id: claude-review
uses: anthropics/claude-code-action@v1
with:
- name: Run Claude Code Review
id: claude-review
uses: anthropics/claude-code-action@<commit-sha>
with:
🤖 Prompt for AI Agents
.github/workflows/claude-code-review.yml lines 34-37: the workflow uses the
mutable tag anthropics/claude-code-action@v1 which can be force-updated; replace
the tag with the action pinned to a specific commit SHA (e.g.
anthropics/claude-code-action@<commit-sha>) and add a short comment in the
workflow file describing how to bump it (update the SHA to the new commit and
run CI/verify changes); commit the change so the workflow references the
immutable revision.

Comment on lines +33 to +36
- name: Run Claude Code
id: claude
uses: anthropics/claude-code-action@v1
with:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Lock the Claude workflow action to a commit SHA

Same risk here—anthropics/claude-code-action@v1 is mutable. Please pin it to a vetted commit so CI can’t silently run altered code.

Apply this diff to constrain the action:

-        uses: anthropics/claude-code-action@v1
+        uses: anthropics/claude-code-action@<commit-sha>
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Run Claude Code
id: claude
uses: anthropics/claude-code-action@v1
with:
- name: Run Claude Code
id: claude
uses: anthropics/claude-code-action@<commit-sha>
with:
🤖 Prompt for AI Agents
.github/workflows/claude.yml around lines 33 to 36: the workflow uses the
mutable tag anthropics/claude-code-action@v1 which can change; replace the tag
with a specific, vetted commit SHA (e.g.
anthropics/claude-code-action@<commit-sha>) so the action is pinned immutably;
obtain the desired commit SHA from the action repository, update the uses: line
to reference that SHA, and commit the change.

Comment on lines +17 to +22
- name: Install uv
uses: astral-sh/setup-uv@v3
with:
enable-cache: true
cache-dependency-glob: uv.lock

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

Pin a Python version and verify cache input name

  • Without an explicit Python, uv may use whatever is available on the runner. Pin for reproducibility.
  • Double-check the input name cache-dependency-glob; the action’s schema may use a different key.

Suggested change:

-      - name: Install uv 
-        uses: astral-sh/setup-uv@v3
-        with:
-          enable-cache: true
-          cache-dependency-glob: uv.lock
+      - name: Install uv
+        uses: astral-sh/setup-uv@v3
+        with:
+          enable-cache: true
+          python-version: "3.12"
+          cache-dependency-glob: uv.lock

Also, consider adding --frozen to uv sync to enforce the lock:

-      - name: Run UV Sync
-        run: |
-          uv sync
+      - name: Run UV Sync
+        run: |
+          uv sync --frozen

🌐 Web query:

What are the valid inputs for astral-sh/setup-uv@v3 (in particular, is the input named "cache-dependency-glob" or "cache-dependency-path")?

💡 Result:

The action uses the input named cache-dependency-glob (not cache-dependency-path). The v3 README documents cache-dependency-glob and other inputs such as enable-cache, cache-suffix, cache-local-path, prune-cache, etc. [1][2]

Sources:
[1] astral-sh/setup-uv README (v3).
[2] v3 release notes / changelog (cache-dependency-glob default change).


🌐 Web query:

What inputs are supported by astral-sh/setup-uv@v3 in its action.yml? Does it include a "python-version" input?

💡 Result:

Short answer: No — the v3 action.yml for astral-sh/setup-uv does not provide a python-version input. The python-version input was added in later releases (examples/docs show it in v4/v6). Sources: astral-sh setup-uv v3 docs/README and later docs showing python-version. [1][2]

Sources

  • setup-uv README / v3 examples (no python-version shown). [1]
  • setup-uv / repo docs showing python-version input in later releases (v4/v6). [2]

Pin Python version and enforce UV lock

  • Add a separate actions/setup-python@v4 step to pin Python 3.12 before running astral-sh/setup-uv@v3 (since v3 has no python-version input).
  • Confirmed cache-dependency-glob is correct.
  • Add --frozen flag to the uv sync step.
🤖 Prompt for AI Agents
.github/workflows/publish.yml lines 17-22: the workflow installs uv without
pinning Python and doesn't enforce the uv lock during sync; add a prior step
using actions/setup-python@v4 to pin python-version: '3.12' (placed before the
astral-sh/setup-uv@v3 step), keep cache-dependency-glob as-is, and update the uv
sync invocation to include the --frozen flag to ensure the lockfile is enforced.

Comment on lines +52 to +53
- name: Publish to TestPyPI
run: uv publish -t ${{ secrets.PYPI_TOKEN }} No newline at end of file
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Point uv publish at TestPyPI

This job is labelled “Publish to TestPyPI,” but uv publish defaults to PyPI. Without explicitly targeting TestPyPI we’ll upload alpha builds to production if the token works, which is dangerous. Let’s wire the repository URL.

-      - name: Publish to TestPyPI
-        run: uv publish -t ${{ secrets.PYPI_TOKEN }}
+      - name: Publish to TestPyPI
+        run: uv publish --repository-url https://test.pypi.org/legacy/ -t ${{ secrets.PYPI_TOKEN }}
🤖 Prompt for AI Agents
.github/workflows/test_publisher.yml lines 52-53: the "Publish to TestPyPI" step
runs `uv publish` which defaults to the real PyPI; update the step to explicitly
point at TestPyPI by adding the repository URL flag (use --repository-url
https://test.pypi.org/legacy/) to the uv publish command and keep using the
TestPyPI token secret so uploads go to TestPyPI instead of production.

Comment on lines +73 to +82
for chunk_idx in range(0, len(all_rows), self.config.rows_per_chunk):
chunk_rows = all_rows[chunk_idx:chunk_idx + self.config.rows_per_chunk]
chunk_content = "\n\n".join(chunk_rows)
if self.config.include_metadata:
metadata = self._create_metadata(file_path)
metadata["chunk_index"] = chunk_idx // self.config.rows_per_chunk
metadata["rows_in_chunk"] = len(chunk_rows)
metadata["total_rows"] = len(all_rows)
chunk_doc_id = f"{document_id}_chunk_{chunk_idx // self.config.rows_per_chunk}"
if metadata:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Guard against rows_per_chunk <= 0

range(..., step=self.config.rows_per_chunk) will crash for 0/None. Compute a safe rows_per_chunk once and reuse it.

-            for chunk_idx in range(0, len(all_rows), self.config.rows_per_chunk):
-                chunk_rows = all_rows[chunk_idx:chunk_idx + self.config.rows_per_chunk]
+            rows_per_chunk = max(1, int(self.config.rows_per_chunk or 1))
+            for chunk_idx in range(0, len(all_rows), rows_per_chunk):
+                chunk_rows = all_rows[chunk_idx:chunk_idx + rows_per_chunk]
@@
-                chunk_doc_id = f"{document_id}_chunk_{chunk_idx // self.config.rows_per_chunk}"
+                chunk_doc_id = f"{document_id}_chunk_{chunk_idx // rows_per_chunk}"

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/aideck/loaders/csv.py around lines 73 to 82, guard against
self.config.rows_per_chunk being 0 or None by computing a safe rows_per_chunk
once before the loop (e.g. convert to int and if <= 0 fall back to 1 or raise a
clear ValueError) and use that variable in the range(...) and chunk_index
calculations; also initialize metadata = None at the top of each iteration
before optionally populating it so subsequent checks like if metadata: are safe.

Comment on lines +52 to +60
try:
loop = asyncio.get_running_loop()
import concurrent.futures
with concurrent.futures.ThreadPoolExecutor() as executor:
future = executor.submit(asyncio.run, self.aload(source))
return future.result()
except RuntimeError:
return asyncio.run(self.aload(source))

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove unused variable and simplify event-loop detection in load().

Avoid assigning the running loop to an unused variable; keep the branching logic intact.

Apply this diff:

-        try:
-            loop = asyncio.get_running_loop()
-            import concurrent.futures
-            with concurrent.futures.ThreadPoolExecutor() as executor:
-                future = executor.submit(asyncio.run, self.aload(source))
-                return future.result()
-        except RuntimeError:
-            return asyncio.run(self.aload(source))
+        try:
+            asyncio.get_running_loop()
+            import concurrent.futures
+            with concurrent.futures.ThreadPoolExecutor() as executor:
+                future = executor.submit(asyncio.run, self.aload(source))
+                return future.result()
+        except RuntimeError:
+            return asyncio.run(self.aload(source))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
loop = asyncio.get_running_loop()
import concurrent.futures
with concurrent.futures.ThreadPoolExecutor() as executor:
future = executor.submit(asyncio.run, self.aload(source))
return future.result()
except RuntimeError:
return asyncio.run(self.aload(source))
try:
asyncio.get_running_loop()
import concurrent.futures
with concurrent.futures.ThreadPoolExecutor() as executor:
future = executor.submit(asyncio.run, self.aload(source))
return future.result()
except RuntimeError:
return asyncio.run(self.aload(source))
🧰 Tools
🪛 Ruff (0.13.1)

53-53: Local variable loop is assigned to but never used

Remove assignment to unused variable loop

(F841)

🤖 Prompt for AI Agents
In src/aideck/loaders/text.py around lines 52 to 60, remove the unused
assignment of the running loop and keep the same branching: call
asyncio.get_running_loop() just to detect whether an event loop is running (no
need to store it), and if it raises RuntimeError run and return
asyncio.run(self.aload(source)); otherwise use
concurrent.futures.ThreadPoolExecutor to submit asyncio.run(self.aload(source))
and return the future result. Ensure imports remain local if preferred and no
unused variables are left.

Comment on lines +43 to +50
loop = asyncio.get_running_loop()
import concurrent.futures
with concurrent.futures.ThreadPoolExecutor() as executor:
future = executor.submit(asyncio.run, self.aload(source))
return future.result()
except RuntimeError:
return asyncio.run(self.aload(source))

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove unused variable and simplify event-loop detection in load().

Same pattern as TextLoader; eliminate unused assignment.

Apply this diff:

-        try:
-            loop = asyncio.get_running_loop()
-            import concurrent.futures
-            with concurrent.futures.ThreadPoolExecutor() as executor:
-                future = executor.submit(asyncio.run, self.aload(source))
-                return future.result()
-        except RuntimeError:
-            return asyncio.run(self.aload(source))
+        try:
+            asyncio.get_running_loop()
+            import concurrent.futures
+            with concurrent.futures.ThreadPoolExecutor() as executor:
+                future = executor.submit(asyncio.run, self.aload(source))
+                return future.result()
+        except RuntimeError:
+            return asyncio.run(self.aload(source))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
loop = asyncio.get_running_loop()
import concurrent.futures
with concurrent.futures.ThreadPoolExecutor() as executor:
future = executor.submit(asyncio.run, self.aload(source))
return future.result()
except RuntimeError:
return asyncio.run(self.aload(source))
try:
asyncio.get_running_loop()
import concurrent.futures
with concurrent.futures.ThreadPoolExecutor() as executor:
future = executor.submit(asyncio.run, self.aload(source))
return future.result()
except RuntimeError:
return asyncio.run(self.aload(source))
🧰 Tools
🪛 Ruff (0.13.1)

43-43: Local variable loop is assigned to but never used

Remove assignment to unused variable loop

(F841)

🤖 Prompt for AI Agents
In src/aideck/loaders/xml.py around lines 43 to 50, remove the unused variable
assignment `loop = asyncio.get_running_loop()` and simplify the event-loop
detection to match TextLoader: call `asyncio.get_running_loop()` only to detect
a running loop (no assignment), keep the ThreadPoolExecutor branch to submit
`asyncio.run(self.aload(source))`, and keep the `except RuntimeError` branch to
call and return `asyncio.run(self.aload(source))`; also remove the unnecessary
inline `import concurrent.futures` if you prefer importing it at module top.

Comment on lines +48 to +56
try:
loop = asyncio.get_running_loop()
import concurrent.futures
with concurrent.futures.ThreadPoolExecutor() as executor:
future = executor.submit(asyncio.run, self.aload(source))
return future.result()
except RuntimeError:
return asyncio.run(self.aload(source))

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove unused variable and simplify event-loop detection in load().

Same minor cleanup as other loaders.

Apply this diff:

-        try:
-            loop = asyncio.get_running_loop()
-            import concurrent.futures
-            with concurrent.futures.ThreadPoolExecutor() as executor:
-                future = executor.submit(asyncio.run, self.aload(source))
-                return future.result()
-        except RuntimeError:
-            return asyncio.run(self.aload(source))
+        try:
+            asyncio.get_running_loop()
+            import concurrent.futures
+            with concurrent.futures.ThreadPoolExecutor() as executor:
+                future = executor.submit(asyncio.run, self.aload(source))
+                return future.result()
+        except RuntimeError:
+            return asyncio.run(self.aload(source))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
loop = asyncio.get_running_loop()
import concurrent.futures
with concurrent.futures.ThreadPoolExecutor() as executor:
future = executor.submit(asyncio.run, self.aload(source))
return future.result()
except RuntimeError:
return asyncio.run(self.aload(source))
try:
asyncio.get_running_loop()
import concurrent.futures
with concurrent.futures.ThreadPoolExecutor() as executor:
future = executor.submit(asyncio.run, self.aload(source))
return future.result()
except RuntimeError:
return asyncio.run(self.aload(source))
🧰 Tools
🪛 Ruff (0.13.1)

49-49: Local variable loop is assigned to but never used

Remove assignment to unused variable loop

(F841)

🤖 Prompt for AI Agents
In src/aideck/loaders/yaml.py around lines 48 to 56, remove the unused variable
and simplify event-loop detection in load(): eliminate the unnecessary
assignment to loop and avoid creating a ThreadPoolExecutor to call asyncio.run
inside a running loop; instead, detect a running loop via
asyncio.get_event_loop().is_running() (or catch RuntimeError) and call
self.aload appropriately—if an event loop is running, schedule and run the
coroutine with asyncio.get_event_loop().run_until_complete or use
asyncio.create_task/await semantics from the caller; otherwise call
asyncio.run(self.aload(source)). Ensure no unused variables remain and keep the
logic concise and consistent with other loaders.

Comment on lines +98 to +144
def _parse_and_extract(self, content: str, path: Path, document_id: str) -> List[Document]:
"""
Synchronous helper that performs the actual parsing and document creation.
"""
documents = []

parsed_docs = yaml.safe_load_all(content) if self.config.handle_multiple_docs else [yaml.safe_load(content)]

for doc_data in parsed_docs:
if doc_data is None:
continue

try:
data_chunks = jq.all(self.config.split_by_jq_query, doc_data)
except Exception as e:
raise ValueError(f"Invalid jq query '{self.config.split_by_jq_query}': {e}") from e

for chunk in data_chunks:
if self.config.content_synthesis_mode == "canonical_yaml":
doc_content = yaml.dump(chunk, indent=self.config.yaml_indent, sort_keys=False)
elif self.config.content_synthesis_mode == "json":
doc_content = json.dumps(chunk, indent=self.config.json_indent)
else: # "smart_text"
doc_content = " ".join(self._extract_smart_text(chunk))

if self.config.skip_empty_content and not doc_content.strip():
continue

metadata = self._create_metadata(path)

if self.config.flatten_metadata and isinstance(chunk, dict):
metadata.update(self._flatten_dict(chunk))

if self.config.metadata_jq_queries and isinstance(chunk, (dict, list)):
for key, query in self.config.metadata_jq_queries.items():
try:
result = jq.first(query, chunk)
if result is not None:
metadata[key] = result
except Exception:
pass
if self.config.include_metadata:
documents.append(Document(document_id=document_id, content=doc_content, metadata=metadata))
else:
documents.append(Document(document_id=document_id, content=doc_content))

return documents
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Document IDs collide across chunks; make per-chunk IDs unique.

Multiple Documents from one YAML file share the same document_id, causing collisions downstream.

Apply this diff:

-        for chunk in data_chunks:
+        for idx, chunk in enumerate(data_chunks):
@@
-                if self.config.include_metadata:
-                    documents.append(Document(document_id=document_id, content=doc_content, metadata=metadata))
-                else:
-                    documents.append(Document(document_id=document_id, content=doc_content))
+                doc_id = f"{document_id}:{idx}"
+                if self.config.include_metadata:
+                    documents.append(Document(document_id=doc_id, content=doc_content, metadata=metadata))
+                else:
+                    documents.append(Document(document_id=doc_id, content=doc_content))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _parse_and_extract(self, content: str, path: Path, document_id: str) -> List[Document]:
"""
Synchronous helper that performs the actual parsing and document creation.
"""
documents = []
parsed_docs = yaml.safe_load_all(content) if self.config.handle_multiple_docs else [yaml.safe_load(content)]
for doc_data in parsed_docs:
if doc_data is None:
continue
try:
data_chunks = jq.all(self.config.split_by_jq_query, doc_data)
except Exception as e:
raise ValueError(f"Invalid jq query '{self.config.split_by_jq_query}': {e}") from e
for chunk in data_chunks:
if self.config.content_synthesis_mode == "canonical_yaml":
doc_content = yaml.dump(chunk, indent=self.config.yaml_indent, sort_keys=False)
elif self.config.content_synthesis_mode == "json":
doc_content = json.dumps(chunk, indent=self.config.json_indent)
else: # "smart_text"
doc_content = " ".join(self._extract_smart_text(chunk))
if self.config.skip_empty_content and not doc_content.strip():
continue
metadata = self._create_metadata(path)
if self.config.flatten_metadata and isinstance(chunk, dict):
metadata.update(self._flatten_dict(chunk))
if self.config.metadata_jq_queries and isinstance(chunk, (dict, list)):
for key, query in self.config.metadata_jq_queries.items():
try:
result = jq.first(query, chunk)
if result is not None:
metadata[key] = result
except Exception:
pass
if self.config.include_metadata:
documents.append(Document(document_id=document_id, content=doc_content, metadata=metadata))
else:
documents.append(Document(document_id=document_id, content=doc_content))
return documents
def _parse_and_extract(self, content: str, path: Path, document_id: str) -> List[Document]:
"""
Synchronous helper that performs the actual parsing and document creation.
"""
documents = []
parsed_docs = (
yaml.safe_load_all(content)
if self.config.handle_multiple_docs
else [yaml.safe_load(content)]
)
for doc_data in parsed_docs:
if doc_data is None:
continue
try:
data_chunks = jq.all(self.config.split_by_jq_query, doc_data)
except Exception as e:
raise ValueError(
f"Invalid jq query '{self.config.split_by_jq_query}': {e}"
) from e
for idx, chunk in enumerate(data_chunks):
if self.config.content_synthesis_mode == "canonical_yaml":
doc_content = yaml.dump(
chunk, indent=self.config.yaml_indent, sort_keys=False
)
elif self.config.content_synthesis_mode == "json":
doc_content = json.dumps(chunk, indent=self.config.json_indent)
else: # "smart_text"
doc_content = " ".join(self._extract_smart_text(chunk))
if self.config.skip_empty_content and not doc_content.strip():
continue
metadata = self._create_metadata(path)
if self.config.flatten_metadata and isinstance(chunk, dict):
metadata.update(self._flatten_dict(chunk))
if self.config.metadata_jq_queries and isinstance(chunk, (dict, list)):
for key, query in self.config.metadata_jq_queries.items():
try:
result = jq.first(query, chunk)
if result is not None:
metadata[key] = result
except Exception:
pass
# make document_id unique per chunk
doc_id = f"{document_id}:{idx}"
if self.config.include_metadata:
documents.append(
Document(document_id=doc_id, content=doc_content, metadata=metadata)
)
else:
documents.append(
Document(document_id=doc_id, content=doc_content)
)
return documents
🧰 Tools
🪛 Ruff (0.13.1)

113-113: Avoid specifying long messages outside the exception class

(TRY003)


137-138: try-except-pass detected, consider logging the exception

(S110)


137-137: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
In src/aideck/loaders/yaml.py around lines 98 to 144, multiple Documents emitted
from the same YAML file reuse the same document_id causing collisions; change
the ID generation so each chunk/document gets a unique id (for example derive a
base id from the input path/document_id then append a per-parsed-doc and
per-chunk counter or a short uuid). Implement by tracking indexes while
iterating parsed_docs and data_chunks (or by generating uuid4 for each created
Document) and use that unique id when constructing Document instances instead of
the original shared document_id.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review continued from previous batch...

Comment on lines +24 to +40
# Test if we can import the main module
sys.path.insert(0, os.path.dirname(__file__))

# Try importing version
try:
with open(os.path.join(src_path, 'version.py'), 'r') as f:
version_content = f.read()
if '__version__' in version_content:
print("✅ Version file exists")
else:
print("⚠️ Version file found but no version info")
except Exception as e:
print(f"⚠️ Could not read version file: {e}")

print("\n🎉 Basic setup verification complete!")
print("🚀 You can now run: python -m aideck")
return True
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Actually import aideck so the setup test proves something.

The comment says we’re testing that the main module imports, but the code never does it—so the script happily reports success even if the package is broken. Import the package (or a lightweight module inside it) to make the check meaningful.

         # Test if we can import the main module
         sys.path.insert(0, os.path.dirname(__file__))
 
+        import importlib
+        importlib.import_module("aideck")
+        print("✅ aideck package imports successfully")
+
         # Try importing version
         try:
             with open(os.path.join(src_path, 'version.py'), 'r') as f:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Test if we can import the main module
sys.path.insert(0, os.path.dirname(__file__))
# Try importing version
try:
with open(os.path.join(src_path, 'version.py'), 'r') as f:
version_content = f.read()
if '__version__' in version_content:
print("✅ Version file exists")
else:
print("⚠️ Version file found but no version info")
except Exception as e:
print(f"⚠️ Could not read version file: {e}")
print("\n🎉 Basic setup verification complete!")
print("🚀 You can now run: python -m aideck")
return True
# Test if we can import the main module
sys.path.insert(0, os.path.dirname(__file__))
import importlib
importlib.import_module("aideck")
print("✅ aideck package imports successfully")
# Try importing version
try:
with open(os.path.join(src_path, 'version.py'), 'r') as f:
version_content = f.read()
if '__version__' in version_content:
print("✅ Version file exists")
else:
print("⚠️ Version file found but no version info")
except Exception as e:
print(f"⚠️ Could not read version file: {e}")
print("\n🎉 Basic setup verification complete!")
print("🚀 You can now run: python -m aideck")
return True
🧰 Tools
🪛 Ruff (0.13.1)

35-35: Do not catch blind exception: Exception

(BLE001)


40-40: Consider moving this statement to an else block

(TRY300)

🤖 Prompt for AI Agents
In basic_test.py around lines 24 to 40, the test only reads version.py but never
imports the package so it can falsely succeed; modify the block to actually
import the package (e.g., import aideck or a minimal submodule like
aideck.__version__ or aideck.cli) after ensuring src_path is on sys.path,
perform the import inside a try/except, print a clear success message on import
success and print the exception and return False on failure so the test
meaningfully verifies the module can be imported.

echo ""

# Start uvicorn server
uv run uvicorn src.aideck.api:app --host 0.0.0.0 --port 8000 --reload --log-level info
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Replace 'uv run uvicorn' with direct uvicorn invocation (or add a guard).

Activating .venv and then using uv’s runner is inconsistent and will fail if uv isn’t installed. Call uvicorn directly from the venv.

-uv run uvicorn src.aideck.api:app --host 0.0.0.0 --port 8000 --reload --log-level info
+uvicorn src.aideck.api:app --host 0.0.0.0 --port 8000 --reload --log-level info

(Optional) Fallback if uvicorn isn’t on PATH:

command -v uvicorn >/dev/null 2>&1 || exec python -m uvicorn src.aideck.api:app --host 0.0.0.0 --port 8000 --reload --log-level info
🤖 Prompt for AI Agents
In dev.sh around line 33, the script uses "uv run uvicorn ..." which will fail
if the "uv" runner isn't installed; replace that invocation with a direct
"uvicorn src.aideck.api:app ..." call so the venv's uvicorn is used, or add a
guard that checks for uvicorn on PATH and falls back to "python -m uvicorn ..."
if missing; update the line accordingly to either call uvicorn directly or
perform the presence check and fallback before executing.

Comment on lines +60 to +69
"# Add project root to path\n",
"project_root = Path.cwd().parent\n",
"if str(project_root) not in sys.path:\n",
" sys.path.insert(0, str(project_root))\n",
"\n",
"print(f\"🚀 Aideck Stock Analysis System\")\n",
"print(f\"📅 Analysis Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\")\n",
"print(f\"🔧 Project Root: {project_root}\")\n",
"print(f\"✅ Environment loaded successfully!\")"
]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Fix sys.path for a src/ layout and drop no-op f-strings.

  • In a src/ layout, inserting the repo root isn’t sufficient for from aideck import .... Add <repo>/src to sys.path.
  • Remove f prefix from constant prints (Lines 65, 68) flagged by Ruff F541.

Apply this diff:

-# Add project root to path
-project_root = Path.cwd().parent
-if str(project_root) not in sys.path:
-    sys.path.insert(0, str(project_root))
+# Add project root and src/ to path for src-layout imports
+project_root = Path.cwd().parent
+src_dir = project_root / "src"
+for p in (src_dir, project_root):
+    p_str = str(p)
+    if p_str not in sys.path:
+        sys.path.insert(0, p_str)

-print(f"🚀 Aideck Stock Analysis System")
+print("🚀 Aideck Stock Analysis System")
 print(f"📅 Analysis Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
 print(f"🔧 Project Root: {project_root}")
-print(f"✅ Environment loaded successfully!")
+print("✅ Environment loaded successfully!")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"# Add project root to path\n",
"project_root = Path.cwd().parent\n",
"if str(project_root) not in sys.path:\n",
" sys.path.insert(0, str(project_root))\n",
"\n",
"print(f\"🚀 Aideck Stock Analysis System\")\n",
"print(f\"📅 Analysis Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\")\n",
"print(f\"🔧 Project Root: {project_root}\")\n",
"print(f\"✅ Environment loaded successfully!\")"
]
# Add project root and src/ to path for src-layout imports
project_root = Path.cwd().parent
src_dir = project_root / "src"
for p in (src_dir, project_root):
p_str = str(p)
if p_str not in sys.path:
sys.path.insert(0, p_str)
print("🚀 Aideck Stock Analysis System")
print(f"📅 Analysis Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print(f"🔧 Project Root: {project_root}")
print("✅ Environment loaded successfully!")
🤖 Prompt for AI Agents
In notebooks/stock_report_improved.ipynb around lines 60 to 69, the code only
adds the repository root to sys.path which breaks imports for a src/ layout and
also uses unnecessary f-strings for constant print calls; modify the path logic
to compute repo_root = Path.cwd().parent and src_path = repo_root / "src", then
insert str(src_path) into sys.path (ensuring you check for and avoid
duplicates), and change the constant print statements to plain strings by
removing the f prefix (e.g., print("✅ Environment loaded successfully!") and
print("🚀 Aideck Stock Analysis System") / print(f"📅 Analysis Date:
{datetime.now().strftime('%Y-%m-%d %H:%M:%S')}") keep the date as an f-string
only where dynamic).

Comment on lines +205 to +259
"""
Execute a direct LLM call and print the result asynchronously.
Args:
task: The task to execute or list of tasks
model: The LLM model to use
debug: Whether to enable debug mode
retry: Number of retries for failed calls
Returns:
The response from the LLM
"""
result = await self.do_async(task, model, debug, retry)
return result


def do(self, task: Union["Task", List["Task"]], model: Optional[Union[str, BaseModelProvider]] = None, debug: bool = False, retry: int = 1):
"""
Execute a direct LLM call with the given task and model synchronously.
Args:
task: The task to execute or list of tasks
model: The LLM model to use
debug: Whether to enable debug mode
retry: Number of retries for failed calls
Returns:
The response from the LLM
"""
# Refresh price_id and tool call history at the start for each task
if isinstance(task, list):
for each_task in task:
each_task.price_id_ = None # Reset to generate new price_id
_ = each_task.price_id # Trigger price_id generation
each_task._tool_calls = [] # Clear tool call history
else:
task.price_id_ = None # Reset to generate new price_id
_ = task.price_id # Trigger price_id generation
task._tool_calls = [] # Clear tool call history

try:
loop = asyncio.get_event_loop()
if loop.is_running():
# Event loop is already running, we need to run in a new thread
import concurrent.futures
with concurrent.futures.ThreadPoolExecutor() as executor:
future = executor.submit(asyncio.run, self.do_async(task, model, debug, retry))
return future.result()
else:
# Event loop exists but not running, we can use it
return loop.run_until_complete(self.do_async(task, model, debug, retry))
except RuntimeError:
# No event loop running, create a new one
return asyncio.run(self.do_async(task, model, debug, retry))

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

List inputs to do/print_do_async currently blow up

do (and print_do_async) are typed to accept Union[Task, List[Task]], but after resetting the IDs you still forward the original task object straight into do_async. When a caller really does pass a list, do_async immediately executes task.is_paused (Line 620) and raises AttributeError: 'list' object has no attribute 'is_paused'. This breaks multi-task flows and contradicts the public signature. Repro:

agent = Direct()
agent.do([Task(description="t1"), Task(description="t2")])
# -> AttributeError inside do_async

Please normalize the iterable case before invoking do_async (and mirror the same handling in print_do_async), or explicitly drop list support from the public API.

@@
-        if isinstance(task, list):
-            for each_task in task:
-                each_task.price_id_ = None  # Reset to generate new price_id
-                _ = each_task.price_id  # Trigger price_id generation
-                each_task._tool_calls = []  # Clear tool call history
-        else:
-            task.price_id_ = None  # Reset to generate new price_id
-            _ = task.price_id  # Trigger price_id generation
-            task._tool_calls = []  # Clear tool call history
-
-        try:
-            loop = asyncio.get_event_loop()
-            if loop.is_running():
-                # Event loop is already running, we need to run in a new thread
-                import concurrent.futures
-                with concurrent.futures.ThreadPoolExecutor() as executor:
-                    future = executor.submit(asyncio.run, self.do_async(task, model, debug, retry))
-                    return future.result()
-            else:
-                # Event loop exists but not running, we can use it
-                return loop.run_until_complete(self.do_async(task, model, debug, retry))
-        except RuntimeError:
-            # No event loop running, create a new one
-            return asyncio.run(self.do_async(task, model, debug, retry))
+        if isinstance(task, list):
+            return [self.do(single_task, model, debug, retry) for single_task in task]
+
+        task.price_id_ = None  # Reset to generate new price_id
+        _ = task.price_id  # Trigger price_id generation
+        task._tool_calls = []  # Clear tool call history
+
+        try:
+            loop = asyncio.get_event_loop()
+            if loop.is_running():
+                import concurrent.futures
+                with concurrent.futures.ThreadPoolExecutor() as executor:
+                    future = executor.submit(asyncio.run, self.do_async(task, model, debug, retry))
+                    return future.result()
+            else:
+                return loop.run_until_complete(self.do_async(task, model, debug, retry))
+        except RuntimeError:
+            return asyncio.run(self.do_async(task, model, debug, retry))
🤖 Prompt for AI Agents
In src/aideck/agent/agent.py around lines 205 to 259, the synchronous do(...)
accepts a list but forwards the original list into do_async, which expects a
single Task and therefore raises AttributeError; fix this by normalizing the
list case before calling do_async: if task is a list, iterate over each_task and
invoke the async path for each item (resetting IDs already done), collect and
return a list of individual results (respect synchronous vs thread/existing-loop
branches), and mirror the same normalization logic in print_do_async so list
inputs are handled consistently.

Comment on lines +51 to +61
call_end(
self.model_response.output,
self.model_provider,
self.task.response_format,
self.start_time,
self.end_time,
usage,
tool_usage_result,
self.debug,
self.task.price_id
) No newline at end of file
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard access to self.task attributes

call_end is invoked with self.task.response_format and self.task.price_id, but task is accepted without type guarantees. If a caller passes a task object that doesn’t define these attributes (or leaves task as None), this will raise an AttributeError during cleanup. Consider handling the absence of these fields explicitly (e.g., via getattr with sensible defaults) before calling call_end.

-                call_end(
-                    self.model_response.output,
-                    self.model_provider,
-                    self.task.response_format,
-                    self.start_time,
-                    self.end_time,
-                    usage,
-                    tool_usage_result,
-                    self.debug,
-                    self.task.price_id
-                )
+                response_format = getattr(self.task, "response_format", None)
+                price_id = getattr(self.task, "price_id", None)
+                call_end(
+                    self.model_response.output,
+                    self.model_provider,
+                    response_format,
+                    self.start_time,
+                    self.end_time,
+                    usage,
+                    tool_usage_result,
+                    self.debug,
+                    price_id,
+                )
🤖 Prompt for AI Agents
In src/aideck/agent/context_managers/call_manager.py around lines 51 to 61,
call_end is called with self.task.response_format and self.task.price_id without
guarding that self.task exists or has those attributes; update the code to
safely extract these values (e.g., response_format = getattr(self.task,
"response_format", None) and price_id = getattr(self.task, "price_id", None) or
other sensible defaults) and pass those guarded values to call_end so
AttributeError cannot occur during cleanup.

Comment on lines +47 to +53
header = [cell.text for cell in table.rows[0].cells]
markdown.append("| " + " | ".join(header) + " |")
markdown.append("| " + " | ".join(["---"] * len(header)) + " |")
for row in table.rows[1:]:
row_text = [cell.text.replace("\n", " ") for cell in row.cells]
markdown.append("| " + " | ".join(row_text) + " |")
return "\n".join(markdown)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle empty or malformed tables

table.rows[0] will raise IndexError if a DOCX table has no rows (which can happen with malformed or placeholder tables). Please guard against that before treating the first row as a header.

-            header = [cell.text for cell in table.rows[0].cells]
+            if not table.rows:
+                return ""
+            header = [cell.text for cell in table.rows[0].cells]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
header = [cell.text for cell in table.rows[0].cells]
markdown.append("| " + " | ".join(header) + " |")
markdown.append("| " + " | ".join(["---"] * len(header)) + " |")
for row in table.rows[1:]:
row_text = [cell.text.replace("\n", " ") for cell in row.cells]
markdown.append("| " + " | ".join(row_text) + " |")
return "\n".join(markdown)
if not table.rows:
return ""
header = [cell.text for cell in table.rows[0].cells]
markdown.append("| " + " | ".join(header) + " |")
markdown.append("| " + " | ".join(["---"] * len(header)) + " |")
for row in table.rows[1:]:
row_text = [cell.text.replace("\n", " ") for cell in row.cells]
markdown.append("| " + " | ".join(row_text) + " |")
return "\n".join(markdown)
🤖 Prompt for AI Agents
In src/aideck/loaders/docx.py around lines 47 to 53, the code assumes
table.rows[0] exists which will raise IndexError for empty/malformed tables;
guard by checking if table.rows is empty before using rows[0] (e.g., if not
table.rows: return "" or skip this table), and only proceed to build header/rows
when table.rows and header cells exist, so you avoid IndexError and handle empty
tables gracefully.

Comment on lines +184 to +187
if not source.startswith(('http://', 'https://', 'ftp://')) and not source.startswith('{') and not source.startswith('<') and not source.startswith('#'):
if not os.path.exists(source):
raise FileNotFoundError(f"Source file does not exist: {source}")

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Stop flagging JSON/YAML content as missing files

get_loader still treats JSON array payloads (e.g. '[{"text": "hi"}]') and YAML strings ('---\nkey: value') as filesystem paths because the guard only exempts {, <, and #. As soon as you call create_loader_for_content with those perfectly valid inputs, we raise FileNotFoundError before the loader is even constructed. This breaks the new content-loading APIs outright. Please expand the content-prefix checks (and strip leading whitespace) so that structured payloads don’t trip the file-existence branch.

-            if not source.startswith(('http://', 'https://', 'ftp://')) and not source.startswith('{') and not source.startswith('<') and not source.startswith('#'):
+            stripped_source = source.lstrip()
+            content_prefixes = ('{', '[', '<', '#', '---')
+            if not stripped_source.startswith(self._URL_PREFIXES) and not stripped_source.startswith(content_prefixes):
                 if not os.path.exists(source):
                     raise FileNotFoundError(f"Source file does not exist: {source}")

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.13.1)

186-186: Abstract raise to an inner function

(TRY301)


186-186: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In src/aideck/loaders/factory.py around lines 184 to 187, the guard that skips
file-existence checks doesn't handle JSON array payloads and YAML documents and
also doesn't ignore leading whitespace; update the check to lstrip the input
first (e.g. s = source.lstrip()) and then test s.startswith against
('http://','https://','ftp://','{','<','#','[','---') so JSON arrays (starting
with '[') and YAML document markers ('---') are treated as content rather than
filesystem paths.

Comment on lines +200 to +201
urls_to_process = [s for s in sources if isinstance(s, str) and s.startswith(("http:", "https:"))]
paths_to_process = [s for s in sources if s not in urls_to_process]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Fix URL scheme checks.

Use "http://" and "https://" consistently; current "http:"/“https:” checks can yield false positives.

Apply this diff:

-        urls_to_process = [s for s in sources if isinstance(s, str) and s.startswith(("http:", "https:"))]
+        urls_to_process = [s for s in sources if isinstance(s, str) and s.startswith(("http://", "https://"))]
-        paths_to_process = [s for s in sources if s not in urls_to_process]
+        paths_to_process = [s for s in sources if s not in urls_to_process]

And in aload:

-        urls = [s for s in sources if isinstance(s, str) and s.startswith(("http:", "https:"))]
+        urls = [s for s in sources if isinstance(s, str) and s.startswith(("http://", "https://"))]

Also applies to: 243-244

🤖 Prompt for AI Agents
In src/aideck/loaders/html.py around lines 200-201 (and similarly at lines
243-244 and in the aload function), the URL scheme checks use "http:" and
"https:" which can produce false positives; change the startswith checks to use
the full schemes "http://" and "https://" (e.g.,
startswith(("http://","https://"))) so only true URLs are classified as
urls_to_process and ensure paths_to_process is computed as the remaining sources
(exclude items matching those full-scheme prefixes).

Comment on lines +128 to +134
start_idx = (self.config.start_page - 1) if self.config.start_page else 0
end_idx = self.config.end_page if self.config.end_page else len(reader.pages)
pages_to_process = reader.pages[start_idx:end_idx]

page_tasks = [self._extract_page_content(page, page_num=start_idx + i + 1) for i, page in enumerate(pages_to_process)]
page_contents_with_nums = await asyncio.gather(*page_tasks)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Avoid slicing reader.pages; construct explicit page list.

Some pypdf versions don’t fully support slicing on reader.pages. Build the list by index.

Apply this diff:

-            pages_to_process = reader.pages[start_idx:end_idx]
+            pages_to_process = [reader.pages[i] for i in range(start_idx, end_idx)]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
start_idx = (self.config.start_page - 1) if self.config.start_page else 0
end_idx = self.config.end_page if self.config.end_page else len(reader.pages)
pages_to_process = reader.pages[start_idx:end_idx]
page_tasks = [self._extract_page_content(page, page_num=start_idx + i + 1) for i, page in enumerate(pages_to_process)]
page_contents_with_nums = await asyncio.gather(*page_tasks)
start_idx = (self.config.start_page - 1) if self.config.start_page else 0
end_idx = self.config.end_page if self.config.end_page else len(reader.pages)
pages_to_process = [reader.pages[i] for i in range(start_idx, end_idx)]
page_tasks = [self._extract_page_content(page, page_num=start_idx + i + 1) for i, page in enumerate(pages_to_process)]
page_contents_with_nums = await asyncio.gather(*page_tasks)
🤖 Prompt for AI Agents
In src/aideck/loaders/pdf.py around lines 128 to 134, avoid slicing reader.pages
(some pypdf versions don't support it); instead compute start_idx and end_idx as
before, build pages_to_process by iterating indices: pages_to_process =
[reader.pages[i] for i in range(start_idx, end_idx)], then create page_tasks
with enumerate(pages_to_process) using page_num=start_idx + i + 1 and await
asyncio.gather as before.

Comment on lines +181 to +199
async def _perform_ocr(self, page: PageObject) -> str:
"""
Performs OCR on all images within a single PDF page.
"""
if not page.images:
return ""

image_data_list = [img.data for img in page.images]

loop = asyncio.get_running_loop()
with concurrent.futures.ThreadPoolExecutor() as pool:
ocr_tasks = [
loop.run_in_executor(pool, self._run_single_ocr, img_data)
for img_data in image_data_list
]
ocr_results = await asyncio.gather(*ocr_tasks)

return "\n".join(filter(None, ocr_results))

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

OCR path will crash: pypdf PageObject doesn’t expose page.images.

Accessing page.images will raise AttributeError in pypdf. This breaks ocr_only/hybrid modes.

Apply this defensive fix to avoid crashes and gracefully degrade OCR:

     async def _perform_ocr(self, page: PageObject) -> str:
         """
         Performs OCR on all images within a single PDF page.
         """
-        if not page.images:
-            return ""
-
-        image_data_list = [img.data for img in page.images]
+        # pypdf PageObject does not expose images directly; graceful degradation
+        images = getattr(page, "images", None)
+        if not images:
+            # TODO: Optionally integrate pdf rendering (e.g., pypdfium2/pdf2image) to OCR rasterized pages.
+            return ""
+        image_data_list = []
+        for img in images:
+            data = getattr(img, "data", None)
+            if data:
+                image_data_list.append(data)

Follow-up: If OCR is a must-have, integrate pypdfium2 to rasterize pages and feed bitmaps to RapidOCR. I can provide a patch.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def _perform_ocr(self, page: PageObject) -> str:
"""
Performs OCR on all images within a single PDF page.
"""
if not page.images:
return ""
image_data_list = [img.data for img in page.images]
loop = asyncio.get_running_loop()
with concurrent.futures.ThreadPoolExecutor() as pool:
ocr_tasks = [
loop.run_in_executor(pool, self._run_single_ocr, img_data)
for img_data in image_data_list
]
ocr_results = await asyncio.gather(*ocr_tasks)
return "\n".join(filter(None, ocr_results))
async def _perform_ocr(self, page: PageObject) -> str:
"""
Performs OCR on all images within a single PDF page.
"""
# pypdf PageObject does not expose images directly; graceful degradation
images = getattr(page, "images", None)
if not images:
# TODO: Optionally integrate pdf rendering (e.g., pypdfium2/pdf2image) to OCR rasterized pages.
return ""
image_data_list = []
for img in images:
data = getattr(img, "data", None)
if data:
image_data_list.append(data)
loop = asyncio.get_running_loop()
with concurrent.futures.ThreadPoolExecutor() as pool:
ocr_tasks = [
loop.run_in_executor(pool, self._run_single_ocr, img_data)
for img_data in image_data_list
]
ocr_results = await asyncio.gather(*ocr_tasks)
return "\n".join(filter(None, ocr_results))
🤖 Prompt for AI Agents
In src/aideck/loaders/pdf.py around lines 181-199, accessing page.images will
raise AttributeError for pypdf PageObject; change the implementation to
defensively check for images (e.g., use getattr(page, "images", None) or call
page.get_images() if that API exists) and handle any AttributeError with a
try/except so OCR gracefully degrades — if no images are found return an empty
string; also wrap the extraction loop so only valid image data is passed to the
threadpool and catch/log unexpected exceptions per-image to avoid crashing the
whole page OCR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants