Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions content/docs/2025/beer-bears/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
bookCollapseSection: true
title: "Beer Bears"
---
154 changes: 154 additions & 0 deletions content/docs/2025/beer-bears/week1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
---
title: "Week #1"
---

# Week #1

## Project description

### Project name: Scaffold

**Code repository**:
- [Course Repository](https://github.com/IU-Capstone-Project-2025/scaffold)
- [Our Repository](https://github.com/Beer-Bears/scaffold)

Scaffold is a temporary structure used to support a work crew and materials to aid in the construction, maintenance and repair of buildings

Our Scaffold is a code management system designed to translate and maintain source code as a graph in a graph database,
enabling seamless context injection for large language models (LLMs). And helps AI agents in construction, maintenance and repair of your project.

### **Team Members**

| Team Member | Telegram Alias | Email Address | Track | Responsibilities |
| :-------------- | :--------------- | :---------------------- | :------------ | :---------------------------------------- |
| Melnikov Sergei | @peplxx | [email protected] | Project Owner | Team Management, RAG Algorithms |
| Razmakhov Serhei | @onemoreslacker | [email protected] | Developer | Languages parsers, AT Generation |
| Prosvirkin Dmitry | @dmitry5567 | [email protected] | Developer | Vector, Graph Database Management |
| Mashenkov Timofei | @mashfeii | [email protected] | Developer | Context Fethcing Algotihm |
| Glazov Sergei | @pushkin404 | [email protected] | QA | QA Research, MCP Analysis|

## Brainstorming

### Ideas during brainstorming

1 Graph-based code context platform for LLMs
Translate source code into a graph database (AST, function/class relations) to serve as rich structured context for AI agents. Enables scalable, accurate retrieval of relevant information for code generation and QA.

2 AI codebase companion (Scaffold CLI)
CLI tool integrated into developer workflows that allows querying, summarizing, or modifying the codebase using LLMs with graph-backed context.

3 LLM-aware refactoring assistant
Leverages code graphs and embeddings to propose or automate safe refactoring operations (rename symbols, split/merge functions, remove dead code).

### Brief market research / problem validation

Idea 1: Graph-based code context platform for LLMs
Problem: Modern LLMs operate on tokenized text and lack awareness of the structural and semantic organization of real-world codebases. Existing solutions (e.g., embedding chunks into a vector DB) do not capture hierarchical or reference-based relationships well.

Existing solutions: Tools like Sourcegraph Cody, Codeium, and GitHub Copilot use text embeddings but struggle with large-scale project structure and maintaining long-term context.

Validation: Research from OpenAI, Meta, and others highlights the importance of hierarchical and symbolic context in improving AI performance on large-scale code reasoning tasks. Graph-based representations are also used in tools like CodeQL for similar reasons.

Idea 2: LLM-aware refactoring assistant
Problem: Refactoring at scale (e.g., renaming a core service method used in hundreds of files) is high-risk and hard to reason about, especially across language boundaries.

Existing solutions: IDEs like IntelliJ or VSCode offer local static analysis refactors, but not AI-assisted reasoning or graph-level semantic refactoring.

Validation: Enterprise engineering teams report significant friction in large-scale refactoring, especially when team members are unfamiliar with legacy code or there’s poor documentation. GitHub Copilot lacks this structured reasoning.

## Basic requirements

Parse code into AST and build code graphs

Store in a graph DB (e.g., Neo4j) and vector DB (e.g., Qdrant)

Extract structural/code entity relationships (calls, imports, etc.)

Provide API/CLI for context queries

Support incremental updates (e.g., Git hooks or file watchers)

Enable context injection into LLMs (RAG)

Basic testing and validation tools

### Target users and their primary needs

Developers Understand and refactor code faster using AI and graph context
AI Engineers Provide structured context to LLMs for better accuracy
Tech Writers Auto-generate or update documentation from code structure
QA Engineers Understand dependencies and test impact of code changes

### User stories

As a developer, I want to find all references to a function to safely rename it.

As an AI engineer, I want structured code context to improve RAG results.

As a tech writer, I want to auto-generate docs from code relationships.

As a QA engineer, I want to trace service dependencies for better test coverage.

### Initial scope

Python code parser → graph + vector DB

Neo4j + Qdrant integration

Basic API/CLI for context lookup

LLM context injection (early RAG prototype)

CLI tool for developers

Basic graph update system (e.g., file watcher)


## Tech-stack

Python – Widely used in AI and tooling; ideal for building parsers, integrating LLMs, and rapid prototyping.

Neo4j – Purpose-built graph database optimized for modeling and querying complex code relationships.

VectorDB (e.g., Qdrant) – Enables high-performance semantic search over embedded code/document chunks.

Docker – Provides consistent, containerized environments for development, testing, and deployment.

LLM Chain (e.g., LangChain) – Modular framework for orchestrating Retrieval-Augmented Generation pipelines.

# Weekly commitments

## Individual contribution of each participant

Melnikov Sergei
- brainstorming,
- [READme and structure of repository](https://github.com/Beer-Bears/scaffold/pull/2),
- [Excalidraw Scheme](https://excalidraw.com/#json=8DxFWGT66eisYHShw5etd,FTPXATItDi33jkpVtVr54A),
- informtaion research
-
Razmakhov Sergey
- brainstorming,
- [Excalidraw Scheme](https://excalidraw.com/#json=8DxFWGT66eisYHShw5etd,FTPXATItDi33jkpVtVr54A),
- repository
-
Prosvirkin Dmitry
- brainstorming,
- writing report,
- [Excalidraw Scheme](https://excalidraw.com/#json=8DxFWGT66eisYHShw5etd,FTPXATItDi33jkpVtVr54A),
- [Google Docs description](https://docs.google.com/document/d/1K4CPKvia2kNnlKm9MNFnxmQRqHM1KS_lJMJzueEnQVE/edit?usp=sharing)

Mashenkov Timofei
- brainstorming,
- [Excalidraw Scheme](https://excalidraw.com/#json=8DxFWGT66eisYHShw5etd,FTPXATItDi33jkpVtVr54A),

Glazov Sergei
- brainstorming,
- informtaion research,
- [Excalidraw Scheme](https://excalidraw.com/#json=8DxFWGT66eisYHShw5etd,FTPXATItDi33jkpVtVr54A),


## Confirmation of the code's operability

We confirm that the code in the main branch:
- [ ] In working condition.
- [ ] Run via docker-compose (or another alternative described in the `README.md`).