Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions .vscode/launch.json
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,40 @@
"package": "${workspaceFolder}/server/package.json",
"localRoot": "${workspaceFolder}/server"
}
},
{
"name": "Python: Debug Regression Tests",
"type": "debugpy",
"request": "launch",
"module": "pytest",
"args": [
"regression_tests",
"-v"
],
"console": "integratedTerminal",
"justMyCode": false,
"cwd": "${workspaceFolder}/delphi",
"python": "${workspaceFolder}/.venv/bin/python",
"env": {
"PYTHONPATH": "${workspaceFolder}/delphi"
}
},
{
"name": "Python: Debug Current Test File",
"type": "debugpy",
"request": "launch",
"module": "pytest",
"args": [
"${relativeFile}",
"-v"
],
"console": "integratedTerminal",
"justMyCode": false,
"cwd": "${workspaceFolder}/delphi",
"python": "${workspaceFolder}/.venv/bin/python",
"env": {
"PYTHONPATH": "${workspaceFolder}/delphi"
}
}
]
}
46 changes: 45 additions & 1 deletion .vscode/tasks.json
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,51 @@
"node": {
"package": "${workspaceFolder}/server/package.json",
"enableDebugging": true
}
}
},
{
"label": "Run Regression Tests",
"type": "shell",
"command": "${workspaceFolder}/.venv/bin/python",
"args": [
"-m",
"pytest",
"regression_tests",
"-v"
],
"options": {
"cwd": "${workspaceFolder}/delphi"
},
"group": {
"kind": "build"
},
"presentation": {
"reveal": "always",
"panel": "new"
},
"problemMatcher": []
},
{
"label": "Run Regression Tests (Parallel)",
"type": "shell",
"command": "${workspaceFolder}/.venv/bin/python",
"args": [
"-m",
"pytest",
"regression_tests",
"-v",
"-n",
"auto"
],
"options": {
"cwd": "${workspaceFolder}/delphi"
},
"group": "build",
"presentation": {
"reveal": "always",
"panel": "new"
},
"problemMatcher": []
}
]
}
3 changes: 3 additions & 0 deletions delphi/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -211,6 +211,9 @@ polis_data/
bandit-report.json
.coverage.*

# Test outputs (hidden directory for all test-generated files)
.test_outputs/

# Generated dependency files (can be recreated from pyproject.toml)
requirements-prod.txt
requirements-dev.txt
Expand Down
4 changes: 2 additions & 2 deletions delphi/CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,8 +57,8 @@ Always use the commands above to determine the most substantial conversation whe

### Environment Files

- Main project uses a `.env` file in the parent directory (`/Users/colinmegill/polis/.env`)
- Example environment file is available at `/Users/colinmegill/polis/delphi/example.env`
- Main project uses a `.env` file in the parent directory (`../polis/.env`)
- Example environment file is available at `example.env`

### Key Environment Variables

Expand Down
27 changes: 13 additions & 14 deletions delphi/docs/algorithm_analysis.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,25 +2,24 @@

This document analyzes the core mathematical algorithms in the Polis math codebase, focusing on implementation details that would be critical for Python conversion.

## 1. Named Matrix
## 1. DataFrames for Vote Matrices

### Overview
The NamedMatrix is a fundamental data structure used throughout the codebase, providing a matrix with labeled rows and columns.
The vote matrix is a fundamental data structure used throughout the codebase, providing a matrix with labeled rows (participants) and columns (comments).

### Implementation Details
- Uses `clojure.core.matrix` library for matrix operations
- Maintains separate indices for row and column names
- Uses `pandas.DataFrame` for efficient labeled data operations
- Maintains row indices for participant IDs and column indices for comment IDs
- Provides efficient lookups, subsets, and updates
- Handles sparse data efficiently

### Python Conversion Considerations
- **Alternatives**: `pandas.DataFrame` is the most natural equivalent, providing labeled rows and columns
- Numpy arrays with separate row and column name mappings could be used for performance
- Ensure efficient implementation of operations like:
- Getting rows/columns by name
- Updating values
- Creating subsets
- Handling sparse matrices
- Handles sparse data efficiently through pandas' optimized operations

### Key Operations
- **DataFrame operations**: Direct pandas operations replace the legacy NamedMatrix class
- Efficient implementation of operations like:
- Getting rows/columns by name using `.loc[]` accessor
- Updating values using `.at[]` for single values or `.loc[]` for slices
- Creating subsets using boolean indexing or `.loc[]`
- Handling sparse matrices with NaN values

## 2. PCA (Principal Component Analysis)

Expand Down
4 changes: 2 additions & 2 deletions delphi/docs/conversion_plan.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ This document outlines the plan for converting the Pol.is math codebase from Clo

| Component | Status | Notes |
|-----------|--------|-------|
| NamedMatrix | ✅ Completed | Implemented with pandas DataFrame |
| Vote Matrix (DataFrame) | ✅ Completed | Using pandas DataFrame directly (legacy NamedMatrix class deprecated) |
| Utility Functions | ✅ Completed | Implemented in utils/general.py |

### Mathematical Algorithms
Expand Down Expand Up @@ -37,7 +37,7 @@ This document outlines the plan for converting the Pol.is math codebase from Clo

### Core Data Structures

The `NamedMatrix` implementation uses pandas DataFrames as the underlying data structure, providing efficient named indexing and compatibility with the NumPy ecosystem. Utility functions provide common operations needed throughout the system.
Vote matrices now use pandas DataFrames directly, providing efficient named indexing and compatibility with the NumPy ecosystem. The legacy `NamedMatrix` class has been deprecated in favor of direct DataFrame usage. Utility functions provide common operations needed throughout the system.

### Mathematical Algorithms

Expand Down
5 changes: 0 additions & 5 deletions delphi/docs/project_structure.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,12 +33,7 @@ polismath/

## Key Modules

### Core Data Structures

**`named_matrix.py`**
- Python implementation of the NamedMatrix class
- Will likely use pandas DataFrame or numpy with custom indexing
- Must support all operations from the original Clojure version

### Mathematical Algorithms

Expand Down
4 changes: 2 additions & 2 deletions delphi/docs/summary.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
The conversion of the Pol.is math codebase from Clojure to Python is now complete. All components have been converted to Python with equivalent functionality:

1. **Core Data Structures**
- `NamedMatrix`: Implemented using pandas DataFrame as the underlying storage
- **Vote Matrices**: Using pandas DataFrame directly (legacy `NamedMatrix` class deprecated)
- Utility functions for matrix operations and data manipulation

2. **Mathematical Algorithms**
Expand Down Expand Up @@ -96,7 +96,7 @@ tests/

Key technical decisions that have guided the conversion:

1. **Data Structures**: Using pandas DataFrame for the `NamedMatrix` implementation provides efficient named indexing and compatibility with the NumPy ecosystem.
1. **Data Structures**: Using pandas DataFrame directly (deprecating the custom `NamedMatrix` wrapper) provides efficient named indexing and compatibility with the NumPy ecosystem.

2. **Web Framework**: FastAPI for its modern features, automatic documentation, and performance.

Expand Down
26 changes: 13 additions & 13 deletions delphi/docs/usage_examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,13 +74,13 @@ imported_id = manager.import_conversation("/path/to/export.json")

## Advanced Usage

### Working with the Named Matrix
### Working with DataFrames

```python
from polismath.pca_kmeans_rep.named_matrix import NamedMatrix
import pandas as pd
import numpy as np

# Create a named matrix
# Create a DataFrame with vote data
data = np.array([
[1, -1, 0],
[1, 0, 1],
Expand All @@ -89,29 +89,29 @@ data = np.array([
row_names = ["participant1", "participant2", "participant3"]
col_names = ["comment1", "comment2", "comment3"]

nmat = NamedMatrix(data, row_names, col_names)
df = pd.DataFrame(data, index=row_names, columns=col_names)

# Update a value
nmat = nmat.update("participant1", "comment3", 1)
df.at["participant1", "comment3"] = 1

# Create a subset
group1_matrix = nmat.rowname_subset(["participant1", "participant2"])
group1_matrix = df.loc[["participant1", "participant2"]]

# Get a row by name
votes = nmat.get_row_by_name("participant1")
votes = df.loc["participant1"].values
```

### PCA and Clustering

```python
from polismath.pca_kmeans_rep.pca import pca_project_named_matrix
from polismath.pca_kmeans_rep.clusters import cluster_named_matrix
from polismath.pca_kmeans_rep.pca import pca_project_dataframe
from polismath.pca_kmeans_rep.clusters import cluster_dataframe

# Perform PCA
pca_results, projections = pca_project_named_matrix(nmat)
pca_results, projections = pca_project_dataframe(df)

# Cluster the projections
clusters = cluster_named_matrix(nmat, k=3)
# Cluster the DataFrame
clusters = cluster_dataframe(df, k=3)

# Examine clusters
for cluster in clusters:
Expand All @@ -124,7 +124,7 @@ for cluster in clusters:
from polismath.pca_kmeans_rep.repness import conv_repness

# Calculate representativeness
repness = conv_repness(nmat, clusters)
repness = conv_repness(df, clusters)

# Get representative comments for each group
for group_id, comments in repness["group_repness"].items():
Expand Down
26 changes: 3 additions & 23 deletions delphi/notebooks/biodiversity_analysis.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -22,30 +22,10 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import sys\n",
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"import json\n",
"from IPython.display import display, HTML\n",
"\n",
"# Add the parent directory to the path to import the polismath modules\n",
"sys.path.append(os.path.abspath(os.path.join(os.path.dirname('__file__'), '..')))\n",
"\n",
"# Import polismath modules\n",
"from polismath.conversation.conversation import Conversation\n",
"from polismath.pca_kmeans_rep.named_matrix import NamedMatrix\n",
"from polismath.pca_kmeans_rep.pca import pca_project_named_matrix\n",
"from polismath.pca_kmeans_rep.clusters import cluster_named_matrix\n",
"from polismath.pca_kmeans_rep.repness import conv_repness, participant_stats\n",
"from polismath.pca_kmeans_rep.corr import compute_correlation"
]
"source": "import os\nimport sys\nimport pandas as pd\nimport numpy as np\nimport matplotlib.pyplot as plt\nimport seaborn as sns\nimport json\nfrom IPython.display import display, HTML\n\n# Add the parent directory to the path to import the polismath modules\nsys.path.append(os.path.abspath(os.path.join(os.path.dirname('__file__'), '..')))\n\n# Import polismath modules\nfrom polismath.conversation.conversation import Conversation\nfrom polismath.pca_kmeans_rep.pca import pca_project_dataframe\nfrom polismath.pca_kmeans_rep.clusters import cluster_dataframe\nfrom polismath.pca_kmeans_rep.repness import conv_repness, participant_stats\nfrom polismath.pca_kmeans_rep.corr import compute_correlation"
},
{
"cell_type": "markdown",
Expand Down Expand Up @@ -1019,4 +999,4 @@
},
"nbformat": 4,
"nbformat_minor": 4
}
}
10 changes: 4 additions & 6 deletions delphi/notebooks/run_analysis.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,8 @@ def check_environment():
try:
# Try importing key polismath modules
from polismath.conversation.conversation import Conversation
from polismath.pca_kmeans_rep.named_matrix import NamedMatrix
from polismath.pca_kmeans_rep.pca import pca_project_named_matrix

from polismath.pca_kmeans_rep.pca import pca_project_dataframe

print("Polismath modules imported successfully")
return True
except ImportError as e:
Expand All @@ -46,9 +45,8 @@ def check_environment():

# Import polismath modules
from polismath.conversation.conversation import Conversation
from polismath.pca_kmeans_rep.named_matrix import NamedMatrix
from polismath.pca_kmeans_rep.pca import pca_project_named_matrix
from polismath.pca_kmeans_rep.clusters import cluster_named_matrix
from polismath.pca_kmeans_rep.pca import pca_project_dataframe
from polismath.pca_kmeans_rep.clusters import cluster_dataframe
from polismath.pca_kmeans_rep.repness import conv_repness, participant_stats
from polismath.pca_kmeans_rep.corr import compute_correlation

Expand Down
Loading
Loading