Architecture Decision Records

Project: QAgent - Self-Healing QA Agent

This document captures key architecture decisions using the ADR (Architecture Decision Record) format.

ADR Index

ID	Title	Status	Date
ADR-001	Multi-Agent Architecture (ADK/A2A-compatible)	Accepted	2024
ADR-002	Browserbase + Stagehand for Browser Automation	Accepted	2024
ADR-003	Redis for Vector Knowledge Base	Accepted	2024
ADR-004	Vercel for Deployment Automation	Accepted	2024
ADR-005	W&B Weave for Observability	Accepted	2024
ADR-006	Marimo for Dashboard	Accepted	2024
ADR-007	OpenAI for Patch Generation	Accepted	2024
ADR-008	Next.js for Demo Application	Accepted	2024

ADR-001: Multi-Agent Architecture (ADK/A2A-compatible)

Status: Accepted Date: 2024 Deciders: Team

Context

QAgent requires coordination between multiple specialized components: testing, diagnosis, fixing, and verification. We need a way to structure these as independent agents that can communicate and hand off work reliably.

Decision

Implement a custom orchestrator now, keeping agent interfaces ADK/A2A-compatible and deferring full ADK/A2A integration until a dedicated integration phase.

Consequences

Positive:

Immediate integration with the existing TypeScript codebase
No new runtime dependency for the MVP
Clear agent boundaries for future ADK/A2A mapping

Negative:

Missing ADK/A2A features (standardized orchestration, tools, telemetry)
Additional migration work later

Mitigation:

Keep agent inputs/outputs explicit and stable
Document ADK/A2A mapping points
Revisit ADK/A2A integration post-MVP

Alternatives Considered

Adopt ADK/A2A now: Deferred - integration not implemented yet
Custom orchestration: Chosen - fastest path for MVP
LangGraph: Rejected - different paradigm, not sponsor

ADR-002: Browserbase + Stagehand for Browser Automation

Status: Accepted Date: 2024 Deciders: Team

Context

The Tester Agent needs to execute E2E tests in a real browser environment. Tests should be easy to write and maintain, ideally using natural language descriptions that get translated to browser actions.

Decision

Use Browserbase for cloud browser infrastructure and Stagehand for AI-powered browser automation.

Consequences

Positive:

Real browser testing (not jsdom)
Isolated, consistent environments
Stagehand enables natural language test writing
"Predictability of code and adaptability of AI"
Hackathon sponsor integration

Negative:

External dependency for test execution
Session limits on free tier
Network latency for browser operations

Mitigation:

Cache browser sessions where possible
Batch test operations
Have fallback to local Playwright if needed

Implementation Notes

// Stagehand natural language test
const stagehand = new Stagehand({ browserbase });

await stagehand.act("Go to the signup page");
await stagehand.act("Fill in the email field with test@example.com");
await stagehand.act("Click the Sign Up button");
await stagehand.assert("I see a welcome message");

Alternatives Considered

Playwright alone: Rejected - requires explicit selectors, brittle
Puppeteer: Rejected - less AI integration
Selenium: Rejected - dated, complex setup

ADR-003: Redis for Vector Knowledge Base

Status: Accepted Date: 2024 Deciders: Team

Context

QAgent's self-improvement capability requires storing past failures and fixes, then retrieving similar ones when new bugs are encountered. This needs semantic similarity search, not just keyword matching.

Decision

Use Redis Stack with vector search capabilities as the knowledge base.

Consequences

Positive:

Fast vector similarity search (HNSW)
Can also store structured metadata
Caching capabilities for other uses
Single database for multiple purposes
Hackathon sponsor integration

Negative:

Need to manage embeddings
Redis vector search is newer feature
Memory-bound storage

Mitigation:

Use OpenAI embeddings (1536 dim)
Keep index size reasonable for hackathon
Set TTL on old entries if needed

Schema Design

failure:{id}
├── embedding: VECTOR (1536 dim)
├── error_message: TEXT
├── stack_trace: TEXT
├── file: TAG
├── line: NUMERIC
├── failure_type: TAG
├── fix_description: TEXT
├── fix_diff: TEXT
├── success: TAG
└── created_at: NUMERIC

Query Pattern

// Find similar failures
const results = await redis.call(
  'FT.SEARCH', 'failure_idx',
  `*=>[KNN 5 @embedding $query_vec AS score]`,
  'PARAMS', 2, 'query_vec', embedding,
  'SORTBY', 'score',
  'RETURN', 4, 'error_message', 'fix_description', 'fix_diff', 'score'
);

Alternatives Considered

Pinecone: Rejected - another external service, not sponsor
Chroma: Rejected - less production-ready
PostgreSQL + pgvector: Rejected - more complex setup

ADR-004: Vercel for Deployment Automation

Status: Accepted Date: 2024 Deciders: Team

Context

The Verifier Agent needs to deploy code changes and get a running application to test against. This should be fast and automated.

Decision

Use Vercel for hosting the demo application and programmatic deployment after patches are applied.

Consequences

Positive:

Fast deployments (~30-60 seconds)
Git integration (push triggers deploy)
Preview URLs for each deployment
API for programmatic control
Hackathon sponsor integration

Negative:

Vercel-specific workflow
Build failures need handling
Deployment limits on free tier

Mitigation:

Monitor deployment status via API
Handle build failures gracefully
Use preview deployments to avoid prod issues

Deployment Flow

// After patch is applied and committed
const deployment = await vercel.deployments.create({
  name: 'qagent-demo',
  gitSource: {
    type: 'github',
    ref: 'main',
    repoId: process.env.REPO_ID
  }
});

// Poll for completion
while (deployment.readyState !== 'READY') {
  await sleep(5000);
  deployment = await vercel.deployments.get(deployment.id);
}

// Test against deployment URL
const testUrl = deployment.url;

Alternatives Considered

Netlify: Rejected - similar but not sponsor
Railway: Rejected - less frontend-focused
Self-hosted: Rejected - too much setup for hackathon

ADR-005: W&B Weave for Observability

Status: Accepted Date: 2024 Deciders: Team

Context

We need to trace every step of the agent pipeline for debugging, evaluation, and demonstrating improvement over time. This includes timing, inputs/outputs, and success metrics.

Decision

Use Weights & Biases Weave for tracing and evaluation of the multi-agent system.

Consequences

Positive:

Rich trace trees showing agent interactions
Automatic logging of function calls
Evaluation framework for measuring improvement
Links traces to final outcomes
Hackathon sponsor integration

Negative:

Requires wrapping functions with decorators
Data sent to W&B servers
Learning curve for trace analysis

Mitigation:

Use weave decorators consistently
Set up project early in development
Practice navigating traces before demo

Integration Pattern

import weave from 'weave';

weave.init({ project: 'qagent' });

// Wrap agent methods
class TesterAgent {
  @weave.op()
  async runTest(spec: TestSpec): Promise<TestResult> {
    // Implementation
  }
}

// Log metrics
weave.log({
  test_pass_rate: 0.87,
  avg_fix_time_seconds: 192,
  iterations_this_run: 2
});

Alternatives Considered

LangSmith: Rejected - LangChain ecosystem, not sponsor
OpenTelemetry: Rejected - lower-level, more setup
Custom logging: Rejected - reinventing tracing

ADR-006: Marimo for Dashboard

Status: Accepted Date: 2024 Deciders: Team

Context

We need a live dashboard to visualize agent performance, show metrics improvement, and provide a compelling demo experience.

Decision

Use Marimo, a reactive Python notebook, to create an interactive analytics dashboard.

Consequences

Positive:

Reactive UI updates automatically
Python-native (can use Weave SDK directly)
Easy to create charts and visualizations
Can be deployed as web app
Hackathon sponsor integration

Negative:

Python-based (rest of project is TypeScript)
Need to bridge data between systems
Less customizable than custom React app

Mitigation:

Keep dashboard logic simple
Fetch data from Weave API
Focus on key metrics only

Dashboard Structure

import marimo as mo
import weave

app = mo.App()

@app.cell
def metrics():
    # Fetch from Weave
    runs = weave.query(
        project='qagent',
        entity='team'
    )

    return mo.vstack([
        mo.stat("Pass Rate", f"{pass_rate}%"),
        mo.stat("Avg Fix Time", f"{avg_time}s"),
        mo.stat("Bugs Fixed", bugs_fixed)
    ])

@app.cell
def chart():
    return mo.ui.altair_chart(
        alt.Chart(data).mark_line().encode(
            x='run', y='pass_rate'
        )
    )

Alternatives Considered

Streamlit: Rejected - not sponsor, less reactive
Grafana: Rejected - overkill for demo
Custom React: Rejected - time consuming

ADR-007: OpenAI for Patch Generation

Status: Accepted Date: 2024 Deciders: Team

Context

The Fixer Agent needs an LLM to generate code patches based on the diagnosis and similar past fixes. The LLM should be reliable, fast, and capable of understanding code.

Decision

Use OpenAI GPT-4 as the primary LLM for patch generation, with Anthropic Claude as backup.

Consequences

Positive:

GPT-4 excellent at code generation
Well-documented API
Fast response times
Function calling for structured output

Negative:

API costs
Rate limits
External dependency

Mitigation:

Cache common patterns
Use GPT-3.5 for simple patches
Implement retry with backoff
Have Anthropic as fallback

Prompt Engineering

const patchPrompt = `
You are a senior developer fixing a bug.

## Diagnosis
${diagnosis}

## Source Code
\`\`\`${language}
${code}
\`\`\`

## Similar Past Fixes
${similarFixes.map(f => `- ${f.description}: ${f.diff}`).join('\n')}

Generate a minimal fix. Return JSON:
{
  "file": "path/to/file.ts",
  "changes": [
    { "line": 42, "old": "old code", "new": "new code" }
  ]
}
`;

Alternatives Considered

Anthropic only: Rejected - less code-focused historically
Open source (Llama): Rejected - hosting complexity
Fine-tuned model: Rejected - no time for hackathon

ADR-008: Next.js for Demo Application

Status: Accepted Date: 2024 Deciders: Team

Context

We need a demo web application that the QAgent agent will test and fix. It should be simple, have intentional bugs, and be easy to deploy on Vercel.

Decision

Create a simple Next.js application with 2-3 pages and intentional bugs that can be fixed by the agent.

Consequences

Positive:

First-class Vercel support
TypeScript support
Easy to introduce realistic bugs
Fast refresh for development

Negative:

Limited to React ecosystem
May be too simple to show full agent capability

Mitigation:

Make bugs representative of real issues
Include both frontend and API bugs
Keep scope minimal for reliability

Intentional Bugs

Bug 1: Missing onClick Handler
- File: components/CheckoutButton.tsx
- Issue: Button renders but click does nothing
- Fix: Add onClick callback
Bug 2: Wrong API Route
- File: pages/api/payment.ts → should be checkout.ts
- Issue: Form POST gets 404
- Fix: Rename file or change route
Bug 3: Typo Causing Error
- File: pages/cart.tsx
- Issue: cartItems spelled cartItmes
- Fix: Correct typo

Alternatives Considered

Real production app: Rejected - too unpredictable
Static HTML: Rejected - not representative
Vue/Svelte: Rejected - less Vercel integration

Technical Diagrams

System Overview

┌─────────────────────────────────────────────────────────────────────────┐
│                              USER/DEVELOPER                              │
└───────────────────────────────────┬─────────────────────────────────────┘
                                    │
                                    │ Triggers QAgent
                                    ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                            QAGENT SYSTEM                             │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │              ORCHESTRATOR (Custom, ADK-compatible)                 │  │
│  │                                                                    │  │
│  │    ┌──────────┐      ┌──────────┐      ┌──────────┐               │  │
│  │    │  TESTER  │─────▶│  TRIAGE  │─────▶│  FIXER   │               │  │
│  │    └────┬─────┘      └────┬─────┘      └────┬─────┘               │  │
│  │         │                 │                 │                      │  │
│  │         │                 │                 │     ┌──────────┐    │  │
│  │         │                 │                 └────▶│ VERIFIER │    │  │
│  │         │                 │                       └────┬─────┘    │  │
│  │         │                 │                            │          │  │
│  └─────────┼─────────────────┼────────────────────────────┼──────────┘  │
│            │                 │                            │             │
│  ┌─────────▼─────┐   ┌───────▼───────┐   ┌────────────────▼──────────┐ │
│  │  Browserbase  │   │     Redis     │   │         Vercel            │ │
│  │  + Stagehand  │   │  (Vectors)    │   │      (Deploy)             │ │
│  └───────────────┘   └───────────────┘   └───────────────────────────┘ │
│                                                                         │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │                     OBSERVABILITY                                │   │
│  │   ┌─────────────────────┐     ┌─────────────────────┐           │   │
│  │   │     W&B Weave       │────▶│   Marimo Dashboard  │           │   │
│  │   │    (Tracing)        │     │   (Visualization)   │           │   │
│  │   └─────────────────────┘     └─────────────────────┘           │   │
│  └─────────────────────────────────────────────────────────────────┘   │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                                    │ Tests & Fixes
                                    ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                         NEXT.JS DEMO APP (Vercel)                        │
│                                                                         │
│   ┌─────────────┐    ┌─────────────┐    ┌─────────────┐                │
│   │   Signup    │    │    Cart     │    │  Checkout   │                │
│   │    Page     │    │    Page     │    │    Page     │                │
│   └─────────────┘    └─────────────┘    └─────────────┘                │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

QAgent Loop Sequence

    User          Orchestrator      Tester        Triage        Fixer       Verifier
      │                │              │             │             │             │
      │   Start Run    │              │             │             │             │
      │───────────────▶│              │             │             │             │
      │                │   Run Tests  │             │             │             │
      │                │─────────────▶│             │             │             │
      │                │              │  Execute    │             │             │
      │                │              │  in Browser │             │             │
      │                │              │◀────────────│             │             │
      │                │   Failure    │             │             │             │
      │                │◀─────────────│             │             │             │
      │                │              │             │             │             │
      │                │   Diagnose   │             │             │             │
      │                │─────────────────────────▶  │             │             │
      │                │              │             │  Query      │             │
      │                │              │             │  Redis      │             │
      │                │              │             │◀────────────│             │
      │                │   Diagnosis  │             │             │             │
      │                │◀────────────────────────── │             │             │
      │                │              │             │             │             │
      │                │   Generate Fix             │             │             │
      │                │──────────────────────────────────────▶  │             │
      │                │              │             │             │   Call LLM  │
      │                │              │             │             │◀────────────│
      │                │   Patch      │             │             │             │
      │                │◀─────────────────────────────────────── │             │
      │                │              │             │             │             │
      │                │   Verify     │             │             │             │
      │                │───────────────────────────────────────────────────▶   │
      │                │              │             │             │   Deploy    │
      │                │              │             │             │   to Vercel │
      │                │              │             │             │◀────────────│
      │                │              │             │             │   Re-test   │
      │                │              │             │             │◀────────────│
      │                │   Result     │             │             │             │
      │                │◀──────────────────────────────────────────────────────│
      │                │              │             │             │             │
      │   Complete     │              │             │             │             │
      │◀───────────────│              │             │             │             │
      │                │              │             │             │             │

Infrastructure

Environments

Environment	URL	Purpose
Development	localhost:3000	Local development
Preview	*.vercel.app	PR preview deployments
Production	qagent-demo.vercel.app	Live demo application
Dashboard	localhost:2718	Marimo dashboard

Services

Service	Provider	Purpose	Tier
Browser Automation	Browserbase	Cloud browsers	Free/Pro
AI Browser Control	Stagehand	Natural language to actions	Free
Vector Database	Redis Cloud	Knowledge base	Free
Hosting	Vercel	App deployment	Hobby
Tracing	W&B Weave	Observability	Free
Dashboard	Marimo	Visualization	Free
LLM	OpenAI	Patch generation	Pay-as-go
Orchestration	Custom (ADK/A2A-compatible; ADK planned)	Agent coordination	n/a

Resource Limits (Free Tier)

Service	Limit	Impact
Browserbase	100 sessions/month	Plan test runs carefully
Redis Cloud	30MB	Keep embeddings minimal
Vercel	100 deployments/day	Should be plenty
OpenAI	$5 free credit	~50-100 patches
W&B	100GB storage	More than enough

Last updated: 2024

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History

ARCHITECTURE.md

File metadata and controls

Architecture Decision Records

Project: QAgent - Self-Healing QA Agent

ADR Index

ADR-001: Multi-Agent Architecture (ADK/A2A-compatible)

Context

Decision

Consequences

Alternatives Considered

ADR-002: Browserbase + Stagehand for Browser Automation

Context

Decision

Consequences

Implementation Notes

Alternatives Considered

ADR-003: Redis for Vector Knowledge Base

Context

Decision

Consequences

Schema Design

Query Pattern

Alternatives Considered

ADR-004: Vercel for Deployment Automation

Context

Decision

Consequences

Deployment Flow

Alternatives Considered

ADR-005: W&B Weave for Observability

Context

Decision

Consequences

Integration Pattern

Alternatives Considered

ADR-006: Marimo for Dashboard

Context

Decision

Consequences

Dashboard Structure

Alternatives Considered

ADR-007: OpenAI for Patch Generation

Context

Decision

Consequences

Prompt Engineering

Alternatives Considered

ADR-008: Next.js for Demo Application

Context

Decision

Consequences

Intentional Bugs

Alternatives Considered

Technical Diagrams

System Overview

QAgent Loop Sequence

Infrastructure

Environments

Services

Resource Limits (Free Tier)