Skip to content

Latest commit

 

History

History
487 lines (389 loc) · 24 KB

File metadata and controls

487 lines (389 loc) · 24 KB

Code Intelligence Status Report

Last Updated: 2026-02-04 Purpose: Track all code intelligence features, implementation details, and language support.

Executive Summary

Category Live Partial Missing Total
Reference Analysis 7 0 0 7
Code Quality Markers 5 0 1 6
File Relationships 1 1 1 3
Auto-Triggers 0 0 1 1
Total 13 1 3 17

Recent Changes (v0.13.4+)

  • Hybrid LSP Fallback - When LSP returns 0 refs, automatically falls back to ripgrep text search to detect cross-package references
  • Cross-Package Detection - Catches references that LSP misses due to lazy imports or installed packages vs source
  • Enhanced Risk Assessment - Risk level upgraded based on text matches when LSP fails

How Code Intelligence Works

Relationship Flow

                              IMPORTS                           CALLS
                         ┌──────────────┐                 ┌──────────────┐
                         │              │                 │              │
                         ▼              │                 ▼              │
┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│  module_a   │───▶│  module_b   │───▶│  function   │◀───│  caller_1   │
│             │    │             │    │             │    │             │
│ imports     │    │ imports     │    │ calls ────────▶  │  caller_2   │
│ module_b    │    │ module_c    │    │ helper()    │    │             │
└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘
                         │                   │
                         │ imported_by       │ called_by
                         ▼                   ▼
                   "Who imports me?"    "Who calls me?"

                         │                   │
                         │ imports           │ calling
                         ▼                   ▼
                   "What do I import?"  "What do I call?"

Metrics Explained

Metric Direction Question Answered Example Used For
called_by ← incoming "Who calls this function?" get_user()login(), signup() Impact analysis before refactoring
calling → outgoing "What does this function call?" login()get_user(), validate() Understanding dependencies
imported_by ← incoming "What files import this module?" utils.pyapi.py, cli.py Safe to modify? Who depends on me?
imports → outgoing "What modules does this file import?" api.pyutils, models, config Dependency tracking
used_by summary "How widely used is this?" "3f 12r c:45" = 3 files, 12 refs, 45% complexity Quick usage overview
refs count "Total reference count" 12 references across codebase Raw usage metric
unused flag "Is this dead code?" refs <= 2unused: true Dead code detection
complexity score "How complex is this code?" c:45 = 45th percentile Risk assessment
risk level "How risky to change?" HIGH (11+ refs), MED (3-10), LOW (0-2) Change impact

Hybrid LSP Fallback (NEW in v0.13.4)

When LSP returns 0 references for a symbol, Aurora automatically falls back to text search:

┌─────────────────────────────────────────────────────────────────────┐
│                    Hybrid Reference Detection                        │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  1. LSP Request                2. Fallback (if 0 refs)               │
│  ┌─────────────┐               ┌─────────────┐                       │
│  │ LSP Server  │──0 refs?──▶   │  ripgrep    │                       │
│  │ (jedi/ts)   │               │ -w symbol   │                       │
│  └─────────────┘               └─────────────┘                       │
│        │                              │                              │
│        ▼                              ▼                              │
│  ┌─────────────┐               ┌─────────────┐                       │
│  │  result =   │               │ text_matches│                       │
│  │  0 usages   │               │ text_files  │                       │
│  └─────────────┘               │ note:...    │                       │
│                                │ risk: adj.  │                       │
│                                └─────────────┘                       │
└─────────────────────────────────────────────────────────────────────┘

Why This Helps:

  • LSP tracks references within the analyzed workspace
  • Cross-package imports (e.g., from aurora_soar import SOAROrchestrator) may not resolve if the package is installed vs source
  • Text search catches these with ~85% accuracy

Response Fields (when fallback activates):

Field Description Example
text_matches Total text occurrences 12
text_files Files containing matches 5
note Explanation of divergence "LSP found 0 refs but text search found 12 matches in 5 files - likely cross-package usage"
risk Adjusted risk level "medium" (upgraded from "low")

Reading the used_by Format

"3f 12r c:45"
 │   │    │
 │   │    └── complexity: 45th percentile (higher = more complex)
 │   └─────── refs: 12 total references across codebase
 └─────────── files: referenced in 3 different files

Risk Calculation

Risk Level Criteria Action
LOW 0-2 refs Safe to change, minimal impact
MEDIUM 3-10 refs Review callers before changing
HIGH 11+ refs Careful refactoring needed, many dependents

Usage Markers

Marker Condition Meaning
#DEADCODE 0 external refs Safe to remove
#UNUSED refs ≤ 2 Low usage, consider removing
#REFAC refs > 10 High usage, careful refactoring needed
#COMPLEX complexity > 80% High cyclomatic complexity

Language Support Matrix

Feature Python JS/TS Go Rust Java Ruby
LSP references ✅ Full ⚠️ Via multilspy ⚠️ Via multilspy ⚠️ Via multilspy ⚠️ Via multilspy ⚠️ Via multilspy
Deadcode (fast) ✅ Full ✅ ripgrep ✅ ripgrep ✅ ripgrep ✅ ripgrep ✅ ripgrep
Deadcode (accurate) ✅ Full ⚠️ Untested ⚠️ Untested ⚠️ Untested ⚠️ Untested ⚠️ Untested
Complexity ✅ tree-sitter ❌ Not impl ❌ Not impl ❌ Not impl ❌ Not impl ❌ Not impl
Calling (outgoing) ✅ tree-sitter ❌ Not impl ❌ Not impl ❌ Not impl ❌ Not impl ❌ Not impl
Import filtering ✅ Custom ❌ Not impl ❌ Not impl ❌ Not impl ❌ Not impl ❌ Not impl
Risk calculation ✅ Full ⚠️ No complexity ⚠️ No complexity ⚠️ No complexity ⚠️ No complexity ⚠️ No complexity

Legend: ✅ Full support | ⚠️ Partial/Untested | ❌ Not implemented


Feature Implementation Details

Reference Analysis

Feature Status Implementation Languages Speed Notes
used_by (usage count) ✅ LIVE LSP get_usage_summary() via multilspy Python (tested), others via multilspy ~1000ms/symbol Returns files + refs count
called_by (incoming) ✅ LIVE LSP get_callers() + import filtering Python only (filter) ~1500ms/symbol Filters import statements
calling (outgoing) ✅ LIVE Tree-sitter AST parsing Python only ~50ms/symbol Filters built-ins, shows meaningful calls
references (raw) ✅ LIVE LSP request_references() All via multilspy ~800ms/symbol Raw LSP, no filtering
hybrid_fallback ✅ LIVE ripgrep text search when LSP=0 All languages ~100ms Catches cross-package refs
definition ✅ LIVE (unused) LSP request_definition() All via multilspy ~200ms Not exposed via MCP
hover ✅ LIVE (unused) LSP request_hover() All via multilspy ~200ms Not exposed via MCP

Code Quality Markers

Feature Status Implementation Languages Speed Notes
#DEADCODE (fast) ✅ LIVE Batched ripgrep + within-file check All (text search) ~2s/dir 85% accuracy
#DEADCODE (accurate) ✅ LIVE LSP references per symbol Python (tested) ~20s/dir 95%+ accuracy
#REFAC (high usage) ✅ LIVE Usage count > 10 = "high" risk Python (tested) ~1s/symbol Part of lsp impact
#COMPLEX ✅ LIVE Tree-sitter branch counting Python only <10ms/file Shown as c:95
#UNUSED (low usage) ✅ LIVE unused: true when refs <= 2 All (uses LSP count) ~1s/symbol In mem_search + lsp check
#TYPE ❌ MISSING Would need type checker - - Language-specific

File Relationships

Feature Status Implementation Languages Speed Notes
imports (outgoing) ⚠️ INDEXED Tree-sitter _extract_imports() Python only <10ms/file Not queryable via MCP
imported_by (incoming) ✅ LIVE lsp(action="imports") via ripgrep All languages <1s Query-time search
calls_files ❌ MISSING Would derive from calling Python (calling ready) - calling is now ready, need file mapping

Implementation Stack

Current Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                         MCP Tools Layer                              │
│  lsp_tool.py              mem_search_tool.py                        │
│  - lsp(action, path)      - mem_search(query, limit)                │
└─────────────────────────────┬───────────────────────────────────────┘
                              │
┌─────────────────────────────▼───────────────────────────────────────┐
│                         Analysis Layer                               │
│  aurora_lsp/analysis.py                                              │
│  - CodeAnalyzer.find_dead_code(accurate=False)                       │
│  - CodeAnalyzer.find_usages()                                        │
│  - CodeAnalyzer.get_callers()                                        │
│  - CodeAnalyzer.get_callees()         ← Python only (tree-sitter)    │
│  - _batched_ripgrep_search()          ← Language-agnostic            │
│  - _get_complexity()                  ← Python only (tree-sitter)    │
└─────────────────────────────┬───────────────────────────────────────┘
                              │
┌─────────────────────────────▼───────────────────────────────────────┐
│                         LSP Client Layer                             │
│  aurora_lsp/client.py (multilspy wrapper)                            │
│  - request_references()                                              │
│  - request_document_symbols()                                        │
│  - request_definition()                                              │
│  Supported: Python, JS/TS, Go, Rust, Java, Ruby, C#, Dart, Kotlin   │
└─────────────────────────────┬───────────────────────────────────────┘
                              │
┌─────────────────────────────▼───────────────────────────────────────┐
│                      Language Servers (via multilspy)                │
│  Python: jedi-language-server                                        │
│  JS/TS: typescript-language-server                                   │
│  Go: gopls                                                           │
│  Rust: rust-analyzer                                                 │
│  Java: jdtls                                                         │
│  Ruby: solargraph                                                    │
└─────────────────────────────────────────────────────────────────────┘

Component Breakdown

Component Technology Custom Code Language Support
LSP client multilspy library Thin wrapper 10+ languages
Reference search LSP protocol None All via LSP
Import filtering Custom regex aurora_lsp/filters.py Multi-language patterns
Deadcode (fast) ripgrep subprocess _batched_ripgrep_search() All languages
Deadcode (accurate) LSP references find_usages() loop Python (tested)
Complexity tree-sitter aurora_lsp/languages/ Python only (config-based)
Risk calculation Custom formula _calculate_risk() All (uses counts)
Entry point filter Config patterns aurora_lsp/languages/python.py Python (extensible)

Language Abstraction Layer (NEW)

packages/lsp/src/aurora_lsp/languages/
  __init__.py       # Registry: get_config(), is_entry_point(), etc.
  base.py           # LanguageConfig dataclass
  python.py         # Python config (entry points, branch types, patterns)

LanguageConfig Fields

Field Type Purpose Example (Python)
name str Language identifier "python"
extensions list[str] File extensions [".py", ".pyi"]
tree_sitter_module str | None Tree-sitter parser module "tree_sitter_python"
branch_types set[str] AST nodes for complexity {"if_statement", "for_statement", ...}
entry_points set[str] Skip in deadcode (exact) {"main", "cli", "app", "setup"}
entry_patterns set[str] Skip in deadcode (glob) {"pytest_*", "test_*"}
entry_decorators set[str] Decorator entry points {"@click.command", "@app.route"}
nested_patterns set[str] Nested helper patterns {"wrapper", "inner", "on_*"}
import_patterns list[str] Import regex patterns [r"^\s*import\s+", ...]
call_node_type str AST node for calls "call"
function_def_types set[str] AST nodes for defs {"function_definition", "class_definition"}

Registry API

from aurora_lsp.languages import (
    get_config,                    # Get full LanguageConfig for file
    get_language,                  # Get language name for file
    get_complexity_branch_types,   # Get branch types for complexity calc
    get_call_node_type,            # Get AST node type for calls
    get_function_def_types,        # Get AST node types for function defs
    is_entry_point,                # Check if name is entry point
    is_nested_helper,              # Check if name is nested helper
    supported_extensions,          # Get all supported extensions
)

# Usage
config = get_config("foo.py")           # Returns PYTHON config
config = get_config("foo.js")           # Returns None (not yet supported)

is_entry_point("foo.py", "main")        # True
is_entry_point("foo.py", "my_func")     # False
is_entry_point("foo.py", "pytest_configure")  # True (matches pytest_*)

branch_types = get_complexity_branch_types("foo.py")  # {"if_statement", ...}

call_type = get_call_node_type("foo.py")             # "call"
def_types = get_function_def_types("foo.py")         # {"function_definition", ...}

Adding a New Language

1. Create config file (languages/javascript.py):

from aurora_lsp.languages.base import LanguageConfig

JAVASCRIPT = LanguageConfig(
    name="javascript",
    extensions=[".js", ".jsx", ".mjs"],
    tree_sitter_module="tree_sitter_javascript",

    branch_types={
        "if_statement", "for_statement", "while_statement",
        "switch_statement", "ternary_expression", "catch_clause",
    },

    entry_points={"main", "default"},
    entry_patterns={"test_*", "spec_*"},
    entry_decorators=set(),  # JS doesn't use decorators same way

    nested_patterns={"callback", "handler", "wrapper"},

    import_patterns=[
        r"^\s*import\s+",
        r"^\s*import\s*\{",
        r"^\s*(const|let|var)\s+.*=\s*require\(",
    ],
)

2. Register in __init__.py:

from aurora_lsp.languages.javascript import JAVASCRIPT

LANGUAGES["javascript"] = JAVASCRIPT
EXTENSION_MAP.update({
    ".js": "javascript",
    ".jsx": "javascript",
    ".mjs": "javascript",
})

3. Add dependency (if complexity needed):

# pyproject.toml
dependencies = [
    "tree-sitter-javascript>=0.20",
]

Scaling to Other Languages

What Would Be Needed

Feature Current (Python) To Add Language X
LSP references ✅ Works ✅ Already works (multilspy)
Deadcode (fast) ✅ Works ✅ Already works (ripgrep)
Deadcode (accurate) ✅ Works ⚠️ Needs testing with language X server
Complexity tree-sitter-python Need tree-sitter-X + parser code
Import filtering Python regex Need language-specific patterns
Entry point filter Python patterns Need language-specific patterns

Effort Estimate Per Language

Language LSP Deadcode Complexity Import Filter Total
JavaScript/TS ✅ Ready ✅ Ready 2 days 1 day 3 days
Go ✅ Ready ✅ Ready 2 days 1 day 3 days
Rust ✅ Ready ✅ Ready 2 days 1 day 3 days
Java ✅ Ready ✅ Ready 2 days 2 days 4 days

MCP Tool Parameters

lsp Tool

lsp(
    action: "check" | "impact" | "deadcode" | "imports",
    path: str,           # File or directory
    line: int | None,    # Required for check/impact (1-indexed)
    accurate: bool,      # For deadcode: True=LSP refs (slow), False=ripgrep (fast)
)
Action What It Does Languages Speed
check Quick usage count before editing Python (tested) ~1s
impact Full analysis with top callers Python (tested) ~2s
deadcode Find all unused symbols All (fast), Python (accurate) 2-20s
imports Find files that import this module All (ripgrep) <1s

mem_search Tool

mem_search(
    query: str,          # Search query
    limit: int = 5,      # Max results
    enrich: bool = False # Add callers/callees/git
)
Output Field Source Languages
type Indexed metadata All
file Indexed metadata All
name Indexed metadata All
lines Indexed metadata All
used_by LSP + tree-sitter Python (full), others (no complexity)
risk Calculated Python (full), others (partial)
score Hybrid retrieval All

Performance Benchmarks

Deadcode Detection

Mode Symbols Time Accuracy Method
Fast (default) 50 2s 85% Batched ripgrep + within-file check
Accurate 50 20s 95%+ LSP references per symbol

Reference Counting

Approach Symbols Time Per Symbol
Ripgrep (batched) 50 0.1s 2ms
LSP references 50 15s 300ms

Complexity Calculation

Method Files Time Per File
Tree-sitter (Python) 10 0.1s 10ms

Known Limitations

Python-Only Features

  1. Complexity calculation - Uses tree-sitter-python
  2. Import filtering - Python regex patterns (from X import, import X)
  3. Entry point detection - Python patterns (main, pytest_*, decorators)

Cross-Package References (Mitigated)

LSP may miss references when:

  • Packages are installed (site-packages) rather than source
  • Lazy imports are used (if TYPE_CHECKING:)
  • Dynamic imports (importlib.import_module())

Mitigation (v0.13.4+): Hybrid fallback uses ripgrep text search when LSP returns 0 refs, catching ~85% of cross-package references. Response includes text_matches, text_files, and adjusted risk.

External Callers Not Detected

Both fast and accurate modes miss:

  • MCP tool calls (lsp_client.find_dead_code())
  • CLI entry points called via python -m
  • Framework callbacks (Flask routes, pytest fixtures)

LSP Limitations

  • jedi-language-server doesn't provide diagnostics (no linting)
  • Outgoing calls (calling) now use tree-sitter AST parsing (Python only)
  • Some language servers (Ruby/solargraph) less reliable

Recommended Next Steps

Quick Wins (1 day each)

  1. Add --accurate flag to deadcode DONE
  2. Add complexity to mem_search output DONE
  3. Add risk calculation DONE
  4. Add #UNUSED marker (usage <= 2) DONE

Medium Term (3-5 days each)

  1. Add JavaScript/TypeScript complexity (tree-sitter-typescript)
  2. Add JS/TS import filtering patterns
  3. Build imported_by reverse lookup DONE - lsp(action="imports")
  4. Build calling (outgoing calls) DONE - tree-sitter AST parsing (Python)
  5. Add pre-edit hook for related files

Long Term

  1. Multi-language complexity support
  2. Type checking integration per language
  3. LSP warm-up / persistent daemon for faster cold start

See Also