Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,8 @@ Pipfile.lock
*.pem
*.crt
*.key
*.pub
*-key

# Testing
.pytest_cache/
Expand Down
162 changes: 162 additions & 0 deletions DATA_LOADING_IMPLEMENTATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
# Data Loading Implementation - Step 3

## Summary
Modified `examples/nanda_agent.py` to read `DATA_PATH` from environment and load attached data (CSV and JSON support). The agent now loads data on startup and makes it available via `AGENT_CONFIG["data"]`.

## Changes Made

### 1. Added Pandas Import (Lines 28-34)
```python
# Try to import pandas for CSV support
try:
import pandas as pd
PANDAS_AVAILABLE = True
except ImportError:
PANDAS_AVAILABLE = False
print("⚠️ Warning: pandas library not available. CSV data loading will be disabled. Install with: pip install pandas")
```

### 2. Added DATA_PATH Environment Variable (Line 100)
```python
# Data path from environment
DATA_PATH = os.getenv("DATA_PATH", None)
```

### 3. Added load_attached_data() Function (Lines 106-145)
- Supports CSV files (requires pandas)
- Supports JSON files (built-in json module)
- Graceful error handling
- Informative logging

**Function signature:**
```python
def load_attached_data(path):
"""
Load data from a file path. Supports CSV and JSON formats.

Args:
path: File path to load data from

Returns:
Loaded data (DataFrame for CSV, dict/list for JSON) or None if failed
"""
```

**Features:**
- CSV: Returns pandas DataFrame, logs shape
- JSON: Returns dict/list, logs keys or item count
- Error handling: FileNotFoundError, general exceptions
- Format validation: Only .csv and .json supported

### 4. Updated main() Function (Lines 245-257)
Added data loading logic:
```python
# Load attached data if DATA_PATH is provided
attached_data = None
if DATA_PATH:
print(f"📂 Loading data from: {DATA_PATH}")
attached_data = load_attached_data(DATA_PATH)
if attached_data is not None:
if PANDAS_AVAILABLE and isinstance(attached_data, pd.DataFrame):
print(f"📊 Loaded data with shape: {attached_data.shape}")
AGENT_CONFIG["data"] = attached_data
else:
print("⚠️ Data loading failed, continuing without attached data")
else:
print("ℹ️ No DATA_PATH provided; starting agent without attached data")
```

### 5. Updated setup.py
Added `pandas` to requirements (line 21):
```python
"pandas" # For CSV data loading
```

### 6. Created Sample Test Data
Created `tests/sample_hr.csv` with sample HR data for testing.

## Data Access

After loading, data is available in:
- `AGENT_CONFIG["data"]` - Contains the loaded DataFrame (CSV) or dict/list (JSON)
- Can be accessed in agent logic functions via the config parameter

## Testing

### Test Without Data
```bash
python examples/nanda_agent.py
```
Expected output:
```
ℹ️ No DATA_PATH provided; starting agent without attached data
```

### Test With CSV Data
```bash
# Windows PowerShell
$env:DATA_PATH="tests/sample_hr.csv"; python examples/nanda_agent.py

# Linux/Mac
DATA_PATH=tests/sample_hr.csv python examples/nanda_agent.py
```
Expected output:
```
📂 Loading data from: tests/sample_hr.csv
✅ Loaded CSV data with shape: (10, 5)
📊 Loaded data with shape: (10, 5)
```

### Test With JSON Data
```bash
# Create a sample JSON file first
echo '{"key": "value", "numbers": [1, 2, 3]}' > tests/sample.json

# Then run
DATA_PATH=tests/sample.json python examples/nanda_agent.py
```
Expected output:
```
📂 Loading data from: tests/sample.json
✅ Loaded JSON data
Keys: ['key', 'numbers']
```

### Test With Invalid Path
```bash
DATA_PATH=/nonexistent/file.csv python examples/nanda_agent.py
```
Expected output:
```
📂 Loading data from: /nonexistent/file.csv
❌ Failed to load data: File not found at /nonexistent/file.csv
⚠️ Data loading failed, continuing without attached data
```

## Next Steps (Step 4 - Tomorrow)

After data loads correctly, expose it via MCP tools:
- Create MCP tool: `get_row_count`
- Description: returns number of rows in attached dataset
- Input: none
- Output: integer

## Files Modified

1. `NEST/examples/nanda_agent.py` - Added data loading functionality
2. `NEST/setup.py` - Added pandas to requirements
3. `NEST/tests/sample_hr.csv` - Created sample test data

## Verification Checklist

- ✅ DATA_PATH read from environment
- ✅ CSV loading with pandas support
- ✅ JSON loading with built-in json
- ✅ Error handling for missing files
- ✅ Error handling for unsupported formats
- ✅ Logging for data loading status
- ✅ Data added to AGENT_CONFIG["data"]
- ✅ Agent starts normally without data
- ✅ Agent starts normally with data
- ✅ Sample test data created

86 changes: 86 additions & 0 deletions DATA_PATH_HOOK_CHANGES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# DATA_PATH Hook Implementation - Step 2

## Summary
Added DATA_PATH parameter support to the NEST AWS deployment script. This allows Maria to optionally pass a data path that will be available to the agent as the `DATA_PATH` environment variable.

## Changes Made

### 1. Added DATA_PATH Parameter (Line 21)
```bash
DATA_PATH="${12:-}" # Optional data path
```

### 2. Updated Usage/Help Text (Lines 25, 28, 38-42)
- Added `[DATA_PATH]` to usage statement
- Added DATA_PATH example in example command
- Added DATA_PATH parameter description

### 3. Added DATA_PATH to Deployment Output (Line 54)
```bash
echo "Data Path: ${DATA_PATH:-"None (no data attached)"}"
```

### 4. Added DATA_PATH Logging in User-Data Script (Lines 169-174)
```bash
# Log data path status
if [ -n "$DATA_PATH" ]; then
echo "DATA_PATH is set to: $DATA_PATH"
else
echo "No DATA_PATH provided; starting agent without attached data."
fi
```

### 5. Exported DATA_PATH Environment Variable (Line 191)
```bash
export DATA_PATH='$DATA_PATH'
```

## Updated Defaults
- `REGISTRY_URL` default changed to: `http://registry.chat39.com:6900`
- `INSTANCE_TYPE` default changed to: `t3.small`

## Testing

### Quick Syntax Check
```bash
bash -n scripts/aws-single-agent-deployment.sh
```

### Simulate Script Call (No AWS)
```bash
# Test with DATA_PATH
bash scripts/aws-single-agent-deployment.sh \
"test-agent" \
"sk-ant-test" \
"Test Agent" \
"testing" \
"test specialist" \
"Test description" \
"testing" \
"" \
"6000" \
"us-east-1" \
"t3.small" \
"/data/hr.csv"

# Test without DATA_PATH (should work with empty 12th arg)
bash scripts/aws-single-agent-deployment.sh \
"test-agent" \
"sk-ant-test" \
"Test Agent" \
"testing" \
"test specialist" \
"Test description" \
"testing"
```

### Verify User-Data Script Generation
After running the script (even with invalid AWS credentials), check the generated `user_data_${AGENT_ID}.sh` file:
- Should contain `export DATA_PATH='...'` line
- Should contain the logging section for DATA_PATH

## Next Steps
- Agent code (`examples/nanda_agent.py`) can now read `DATA_PATH` from environment
- Data loading logic can be added to agent initialization
- MCP tools can be registered to access the data

111 changes: 111 additions & 0 deletions DATA_SETUP_INSTRUCTIONS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# Creating HR Dataset for NEST Ecosystem

## Quick Setup (Already Done ✅)

The HR dataset has been created at: `NEST/data/hr.csv`

## Manual Setup Instructions

If you need to recreate or modify the dataset:

### Steps:

1. **Navigate to NEST directory:**
```bash
cd NEST
```

2. **Create data directory (if it doesn't exist):**
```bash
# Windows PowerShell
if (-not (Test-Path "data")) { New-Item -ItemType Directory -Path "data" }

# Linux/Mac
mkdir -p data
```

3. **Create the HR dataset:**
```bash
# Windows PowerShell
# Create file: data/hr.csv

# Linux/Mac
nano data/hr.csv
```

4. **Paste this EXACT content:**
```csv
employee_id,department,salary,years_at_company
1001,Engineering,145000,5
1002,Sales,92000,2
1003,HR,86000,4
1004,Marketing,78000,1
```

5. **Save the file:**
- **Windows**: Save in your editor
- **Linux/Mac**: CTRL+O, ENTER, CTRL+X

## Using the Dataset

### Local Testing

```bash
# From NEST directory
DATA_PATH=data/hr.csv python examples/nanda_agent.py
```

### AWS Deployment

```bash
bash scripts/aws-single-agent-deployment.sh \
"hr-agent" \
"sk-ant-xxxxx" \
"HR Assistant" \
"human resources" \
"HR specialist" \
"I help with HR questions and employee data" \
"HR,employee data,payroll,benefits" \
"http://registry.chat39.com:6900" \
"6000" \
"us-east-1" \
"t3.small" \
"data/hr.csv" # <-- 12th parameter: DATA_PATH
```

## Dataset Structure

The HR dataset contains:
- **4 employees** across different departments
- **Columns:**
- `employee_id`: Unique identifier (1001-1004)
- `department`: Engineering, Sales, HR, Marketing
- `salary`: Annual salary (78k-145k)
- `years_at_company`: Years of service (1-5)

## Purpose

This dataset allows the HR agent to:
- Demonstrate "having data"
- Respond to questions based on actual employee data
- Show how agents can use attached datasets

## File Location in NEST

```
NEST/
├── data/
│ ├── hr.csv ← HR dataset (ready to use)
│ └── README.md ← Data directory documentation
├── examples/
│ └── nanda_agent.py ← Agent that loads data
└── scripts/
└── aws-single-agent-deployment.sh ← Deployment with DATA_PATH
```

## Next Steps

1. ✅ Dataset created at `data/hr.csv`
2. ✅ Agent can load it via `DATA_PATH=data/hr.csv`
3. 🔜 Step 4: Expose data via MCP tools (get_row_count, query_data, etc.)

Loading