Skip to content

Commit 7863ba4

Browse files
authored
Benchmarking (#240)
* benchmarking wip * benchmarking wip2 * Refactor benchmarking setup and tests - Updated .gitignore to exclude benchmarking results and specific file types. - Modified pytest configuration to ignore benchmark tests during CI runs. - Enhanced database connection fixtures for PostgreSQL and MySQL, removing SQLite. - Updated benchmark tests to focus on relevant database types and improved retrieval logic. - Added scripts for setting up and running benchmarks on AWS EC2, including environment variable configurations. - Introduced a README for the benchmarks directory with setup instructions and usage examples. - Improved the semantic accuracy dataset for finer-grained recall evaluation. * Remove benchmark result files and JSON outputs * Remove empty benchmark and fixture initialization files * Update pre-commit configuration and ty settings - Re-enable the ty type checker in the pre-commit configuration with proper settings. - Update pyproject.toml to specify source exclusions for ty and set the Python version to 3.12. - Add a type ignore comment in the benchmark fixture to resolve an attribute warning.
1 parent 2eb0bdf commit 7863ba4

16 files changed

+1393
-12
lines changed

.github/workflows/ci.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,7 @@ jobs:
8484
run: uv sync --dev
8585

8686
- name: Run pytest with coverage
87-
run: uv run pytest
87+
run: uv run pytest --ignore=tests/benchmarks
8888

8989
- name: Upload coverage to Codecov
9090
if: matrix.python-version == '3.12'

.gitignore

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -52,5 +52,11 @@ AGENTS.md
5252

5353
tests/examples/*
5454

55-
# Integration test files (contain credentials/connection strings)
56-
tests/llm/clients/oss/openai/async_integration.py
55+
# Benchmarking results
56+
tests/benchmarks/results/
57+
results/
58+
*.json
59+
*.csv
60+
!pyproject.toml
61+
!package.json
62+
!composer.json

.pre-commit-config.yaml

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -16,17 +16,17 @@ repos:
1616

1717
- repo: local
1818
hooks:
19-
#- id: ty
20-
#name: ty type checker
21-
#entry: uvx ty check --exclude 'tests/llm/clients/**/*.py'
22-
#language: system
23-
#types: [python]
24-
#pass_filenames: false
25-
#always_run: true
19+
- id: ty
20+
name: ty type checker
21+
entry: uvx ty check
22+
language: system
23+
types: [python]
24+
pass_filenames: false
25+
always_run: true
2626

2727
- id: pytest
2828
name: pytest
29-
entry: uv run pytest
29+
entry: uv run pytest --ignore=tests/benchmarks
3030
language: system
3131
pass_filenames: false
3232
always_run: true

pyproject.toml

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,11 +87,14 @@ python_classes = ["Test*"]
8787
python_functions = ["test_*"]
8888
markers = [
8989
"asyncio: marks tests as async (deselect with '-m \"not asyncio\"')",
90+
"benchmark: marks tests as performance benchmarks",
9091
]
9192
asyncio_mode = "auto"
9293
addopts = [
9394
"-v",
9495
"--strict-markers",
96+
"-m",
97+
"not benchmark",
9598
"--cov=memori",
9699
"--cov-report=term-missing",
97100
"--cov-report=html",
@@ -116,7 +119,15 @@ exclude_lines = [
116119
"if TYPE_CHECKING:",
117120
]
118121

119-
[tool.ty]
122+
[tool.ty.src]
123+
exclude = [
124+
"tests/llm/clients/**/*.py",
125+
"**/__pycache__/**",
126+
]
127+
128+
[tool.ty.environment]
129+
python-version = "3.12"
130+
120131

121132
[dependency-groups]
122133
dev = [
@@ -139,8 +150,10 @@ dev = [
139150
"pymysql>=1.1.2",
140151
"pytest>=8.4.2",
141152
"pytest-asyncio>=0.24.0",
153+
"pytest-benchmark>=4.0.0",
142154
"pytest-cov>=6.0.0",
143155
"pytest-mock>=3.15.1",
156+
"psutil>=5.9.0",
144157
"requests>=2.32.5",
145158
"ruff>=0.8.0",
146159
"sqlalchemy>=2.0.44",

tests/benchmarks/README.md

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# AWS EC2 Benchmark Guide
2+
3+
This guide explains how to run Memori benchmarks on an EC2 instance in the same VPC as your AWS database (RDS Postgres or MySQL).
4+
5+
## Setup on EC2
6+
7+
1. **SSH into EC2**:
8+
```bash
9+
ssh ec2-user@your-ec2-ip
10+
```
11+
12+
2. **Run Setup**:
13+
Copy `tests/benchmarks/setup_ec2_benchmarks.sh` to your EC2 or clone the repo and run it:
14+
```bash
15+
chmod +x tests/benchmarks/setup_ec2_benchmarks.sh
16+
./tests/benchmarks/setup_ec2_benchmarks.sh
17+
```
18+
19+
## Running Benchmarks
20+
21+
The `run_benchmarks_ec2.sh` script is flexible and handles automatic CSV generation.
22+
23+
### Environment Variables
24+
25+
- `DB_TYPE`: `postgres` (default) or `mysql`
26+
- `TEST_TYPE`: `all` (default), `end_to_end`, `db_retrieval`, `semantic_search`, `embedding`
27+
- `BENCHMARK_POSTGRES_URL`: Connection string for Postgres
28+
- `BENCHMARK_MYSQL_URL`: Connection string for MySQL
29+
30+
### Examples
31+
32+
#### Run all Postgres benchmarks
33+
```bash
34+
export BENCHMARK_POSTGRES_URL="CHANGEME"
35+
DB_TYPE=postgres TEST_TYPE=all ./tests/benchmarks/run_benchmarks_ec2.sh
36+
```
37+
38+
#### Run only End-to-End MySQL benchmarks
39+
```bash
40+
export BENCHMARK_MYSQL_URL="CHANGEME"
41+
DB_TYPE=mysql TEST_TYPE=end_to_end ./tests/benchmarks/run_benchmarks_ec2.sh
42+
```
43+
44+
## Results
45+
46+
All results are automatically saved to the `./results` directory with a timestamp to prevent overwriting:
47+
- JSON output: `results_{db}_{type}_{timestamp}.json`
48+
- **CSV Report**: `report_{db}_{type}_{timestamp}.csv`
49+
50+
To download the CSV reports to your local machine:
51+
```bash
52+
scp ec2-user@your-ec2-ip:~/Memori/results/report_*.csv ./local_results/
53+
```
54+
55+
## Database Connection Requirements
56+
57+
Ensure the EC2 Security Group allows outbound traffic to the database on ports 5432 (Postgres) or 3306 (MySQL).
58+
The database must be in the same VPC or accessible via VPC Peering/Transit Gateway.

tests/benchmarks/conftest.py

Lines changed: 175 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,175 @@
1+
"""Pytest fixtures for performance benchmarks."""
2+
3+
import os
4+
5+
import pytest
6+
from sqlalchemy import create_engine
7+
from sqlalchemy.orm import sessionmaker
8+
9+
from memori import Memori
10+
from memori.llm._embeddings import embed_texts
11+
from tests.benchmarks.fixtures.sample_data import (
12+
generate_facts_with_size,
13+
generate_sample_queries,
14+
)
15+
16+
17+
@pytest.fixture
18+
def postgres_db_connection():
19+
"""Create a PostgreSQL database connection factory for benchmarking (via AWS/Docker)."""
20+
postgres_uri = os.environ.get(
21+
"BENCHMARK_POSTGRES_URL",
22+
# Matches docker-compose.yml default DB name
23+
"postgresql://memori:memori@localhost:5432/memori_test",
24+
)
25+
26+
from sqlalchemy import text
27+
28+
# Support SSL root certificate via environment variable (for AWS RDS)
29+
connect_args = {}
30+
sslrootcert = os.environ.get("BENCHMARK_POSTGRES_SSLROOTCERT")
31+
if sslrootcert:
32+
connect_args["sslrootcert"] = sslrootcert
33+
# Ensure sslmode is set if using SSL cert
34+
if "sslmode" not in postgres_uri:
35+
# Add sslmode=require if not already in URI
36+
separator = "&" if "?" in postgres_uri else "?"
37+
postgres_uri = f"{postgres_uri}{separator}sslmode=require"
38+
39+
engine = create_engine(
40+
postgres_uri,
41+
pool_pre_ping=True,
42+
pool_recycle=300,
43+
connect_args=connect_args if connect_args else None,
44+
)
45+
46+
try:
47+
with engine.connect() as conn:
48+
conn.execute(text("SELECT 1"))
49+
except Exception as e:
50+
pytest.skip(
51+
f"PostgreSQL not available at {postgres_uri}: {e}. "
52+
"Set BENCHMARK_POSTGRES_URL to a database that exists."
53+
)
54+
55+
Session = sessionmaker(autocommit=False, autoflush=False, bind=engine)
56+
57+
yield Session
58+
engine.dispose()
59+
60+
61+
@pytest.fixture
62+
def mysql_db_connection():
63+
"""Create a MySQL database connection factory for benchmarking (via AWS/Docker)."""
64+
mysql_uri = os.environ.get(
65+
"BENCHMARK_MYSQL_URL",
66+
"mysql+pymysql://memori:memori@localhost:3306/memori_test",
67+
)
68+
69+
from sqlalchemy import text
70+
71+
engine = create_engine(
72+
mysql_uri,
73+
pool_pre_ping=True,
74+
pool_recycle=300,
75+
)
76+
77+
try:
78+
with engine.connect() as conn:
79+
conn.execute(text("SELECT 1"))
80+
except Exception as e:
81+
pytest.skip(f"MySQL not available at {mysql_uri}: {e}")
82+
83+
Session = sessionmaker(autocommit=False, autoflush=False, bind=engine)
84+
85+
yield Session
86+
engine.dispose()
87+
88+
89+
@pytest.fixture(
90+
params=["postgres", "mysql"],
91+
ids=["postgres", "mysql"],
92+
)
93+
def db_connection(request):
94+
"""Parameterized fixture for realistic database types (no SQLite)."""
95+
db_type = request.param
96+
97+
if db_type == "postgres":
98+
return request.getfixturevalue("postgres_db_connection")
99+
elif db_type == "mysql":
100+
return request.getfixturevalue("mysql_db_connection")
101+
102+
pytest.skip(f"Unsupported benchmark database type: {db_type}")
103+
104+
105+
@pytest.fixture
106+
def memori_instance(db_connection, request):
107+
"""Create a Memori instance with the specified database for benchmarking."""
108+
mem = Memori(conn=db_connection)
109+
mem.config.storage.build()
110+
111+
db_type_param = None
112+
for marker in request.node.iter_markers("parametrize"):
113+
if "db_connection" in marker.args[0]:
114+
db_type_param = marker.args[1][0] if marker.args[1] else None
115+
break
116+
117+
# Try to infer from connection
118+
if not db_type_param:
119+
try:
120+
# SQLAlchemy sessionmaker is callable, so detect it first by presence of a bind.
121+
bind = getattr(db_connection, "kw", {}).get("bind", None)
122+
if bind is not None:
123+
db_type_param = bind.dialect.name
124+
else:
125+
db_type_param = "unknown"
126+
except Exception:
127+
db_type_param = "unknown"
128+
129+
mem._benchmark_db_type = db_type_param # ty: ignore[unresolved-attribute]
130+
return mem
131+
132+
133+
@pytest.fixture
134+
def sample_queries():
135+
"""Provide sample queries of varying lengths."""
136+
return generate_sample_queries()
137+
138+
139+
@pytest.fixture
140+
def fact_content_size():
141+
"""Fixture for fact content size.
142+
143+
Note: Embeddings are always 768 dimensions (3072 bytes binary) regardless of text size.
144+
"""
145+
return "small"
146+
147+
148+
@pytest.fixture(
149+
params=[5, 50, 100, 300, 600, 1000],
150+
ids=lambda x: f"n{x}",
151+
)
152+
def entity_with_n_facts(memori_instance, fact_content_size, request):
153+
"""Create an entity with N facts for benchmarking database retrieval."""
154+
fact_count = request.param
155+
entity_id = f"benchmark-entity-{fact_count}-{fact_content_size}"
156+
memori_instance.attribution(entity_id=entity_id, process_id="benchmark-process")
157+
158+
facts = generate_facts_with_size(fact_count, fact_content_size)
159+
fact_embeddings = embed_texts(facts)
160+
161+
entity_db_id = memori_instance.config.storage.driver.entity.create(entity_id)
162+
memori_instance.config.storage.driver.entity_fact.create(
163+
entity_db_id, facts, fact_embeddings
164+
)
165+
166+
db_type = getattr(memori_instance, "_benchmark_db_type", "unknown")
167+
168+
return {
169+
"entity_id": entity_id,
170+
"entity_db_id": entity_db_id,
171+
"fact_count": fact_count,
172+
"content_size": fact_content_size,
173+
"db_type": db_type,
174+
"facts": facts,
175+
}

0 commit comments

Comments
 (0)