HumanSignal · niklub · Mar 19, 2025
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,27 @@
+# Adala Development Guidelines
+
+## Build & Test Commands
+- Install dependencies: `poetry install --with dev`
+- Enter environment: `poetry shell`
+- Run all tests: `pytest`
+- Run specific test: `pytest tests/test_file.py::test_function_name`
+- Run tests with recording: `pytest --record_mode=once --block-network`
+- Run tests with network: `pytest -m "use_openai or use_azure"`
+- Build docs: `mkdocs serve -f ./docs/mkdocs.yml`
+
+## Code Style
+- Use Python type hints throughout the codebase
+- Follow PEP 8 naming conventions: snake_case for variables/functions, CamelCase for classes
+- Prefer composition over inheritance when extending framework components
+- When defining new skills, inherit from appropriate base classes
+- Use f-strings for string formatting
+- Docstrings should follow Google style format
+- Exception handling should use specific exception types from utils.exceptions
+- Test coverage required for all new features
+
+## Architecture
+- Agent: Main entry point - connects Skills, Environments and Runtimes
+- Skills: Core capabilities (classification, extraction, etc.)
+- Environments: Data sources and interaction points
+- Runtimes: LLM backends (OpenAI, etc.)
+- Utils: Shared functionality across components
diff --git a/adala/README.md b/adala/README.md
@@ -0,0 +1,41 @@
+# Adala - Simplified Data Processing with LLMs
+
+This is a refactored version of the Adala framework that provides a streamlined interface for processing tabular data through LLMs.
+
+## Key Components
+
+- **DataTable**: A lightweight wrapper around pandas DataFrame
+- **BatchLLMRuntime**: Efficient batch processing of data through LLMs
+- **DataProcessor**: High-level interface for data processing tasks
+- **Classifier**: Specialized processor for classification tasks
+
+## Getting Started
+
+```python
+import pandas as pd
+from adala import Classifier
+
+# Create sample data
+df = pd.DataFrame([
+    "Not loud enough and doesn't turn on like it should.",
+    "The product works perfectly fine.",
+    "I absolutely love this device!"
+], columns=["text"])
+
+# Create a classifier
+classifier = Classifier(
+    instructions="Classify product reviews as positive, negative, or neutral.",
+    labels=["Positive", "Negative", "Neutral"],
+    model="gpt-3.5-turbo"
+)
+
+# Process the data
+results = classifier.process(df)
+print(results[["text", "label"]])
+```
+
+## Legacy Components
+
+The original Adala components (Agent, Skills, Environments, Memories) are still available but deprecated. They have been moved to the `adala.legacy` module and will be removed in a future version.
+
+For migration guidance, see [MIGRATION.md](./core/MIGRATION.md)
diff --git a/adala/__init__.py b/adala/__init__.py
@@ -0,0 +1,31 @@
+"""
+Adala: Data Processing with LLMs
+
+Adala provides tools for efficient data processing using Large Language Models.
+"""
+
+__version__ = "0.0.4dev"
+
+# Import core components
+from adala.core import DataProcessor, Classifier, LabelStudioProcessor, DataTable, BatchLLMRuntime
+
+# Legacy imports (with deprecation warnings)
+from adala.agents import Agent
+from adala.environments import StaticEnvironment
+from adala.skills import ClassificationSkill, TransformSkill, LabelStudioSkill
+
+__all__ = [
+    # Core components
+    'DataProcessor',
+    'Classifier',
+    'LabelStudioProcessor',
+    'DataTable',
+    'BatchLLMRuntime',
+
+    # Legacy components 
+    'Agent',
+    'StaticEnvironment',
+    'ClassificationSkill',
+    'TransformSkill',
+    'LabelStudioSkill',
+]
diff --git a/adala/agents/__init__.py b/adala/agents/__init__.py
@@ -1 +1,2 @@
-from .base import Agent, create_agent_from_file, create_agent_from_dict
+"""Legacy module, use adala.core instead."""
+from adala.legacy.agents import *
diff --git a/adala/core/MIGRATION.md b/adala/core/MIGRATION.md
@@ -0,0 +1,256 @@
+# Migration Guide: Transitioning to the Simplified API
+
+This guide will help you migrate from the original Adala API to the simplified core API. The new API offers a more direct way to process tabular data through LLMs in batch mode, with less boilerplate and abstraction.
+
+## Mapping Between APIs
+
+| Original API | Simplified API | Notes |
+|--------------|----------------|-------|
+| `Agent` | `DataProcessor` | The main entry point for processing data |
+| `ClassificationSkill` | `Classifier` | Specialized processor for classification tasks |
+| `LabelStudioSkill` | `LabelStudioProcessor` | Processor for Label Studio annotations |
+| `StaticEnvironment` | Direct pandas/DataTable | Work directly with your data |
+| `OpenAIChatRuntime` | `BatchLLMRuntime` | Streamlined runtime for batch processing |
+| `InternalDataFrame` | `DataTable` | Enhanced pandas DataFrame for batch operations |
+
+## Code Examples
+
+### Classification Example
+
+#### Original API:
+
+```python
+from adala.agents import Agent
+from adala.environments import StaticEnvironment
+from adala.skills import ClassificationSkill
+from adala.runtimes import OpenAIChatRuntime
+import pandas as pd
+
+df = pd.DataFrame([
+    ["The product works great", "Positive"],
+    ["Terrible quality, avoid", "Negative"]
+], columns=["text", "sentiment"])
+
+agent = Agent(
+    skills=ClassificationSkill(
+        name='sentiment',
+        instructions="Classify text as positive, negative or neutral.",
+        labels={'sentiment': ["Positive", "Negative", "Neutral"]},
+        input_template="Text: {text}",
+        output_template="Sentiment: {sentiment}"
+    ),
+    environment=StaticEnvironment(
+        df=df,
+        ground_truth_columns={'sentiment': 'sentiment'}
+    ),
+    runtimes = {
+        'openai': OpenAIChatRuntime(model='gpt-3.5-turbo'),
+    },
+    default_runtime='openai'
+)
+
+# Train the agent
+agent.learn(learning_iterations=3)
+
+# Run prediction on new data
+test_df = pd.DataFrame(["This is a new product"], columns=["text"])
+predictions = agent.run(test_df)
+```
+
+#### Simplified API:
+
+```python
+from adala.core import Classifier
+import pandas as pd
+
+# Training data
+df = pd.DataFrame([
+    ["The product works great", "Positive"],
+    ["Terrible quality, avoid", "Negative"]
+], columns=["text", "sentiment"])
+
+# Create classifier
+classifier = Classifier(
+    instructions="Classify text as positive, negative or neutral.",
+    labels=["Positive", "Negative", "Neutral"],
+    model="gpt-3.5-turbo"
+)
+
+# No separate training step required; examples can be added as context
+classifier.add_context(
+    examples=[
+        {"text": "The product works great", "label": "Positive"},
+        {"text": "Terrible quality, avoid", "label": "Negative"}
+    ]
+)
+
+# Run prediction on new data
+test_df = pd.DataFrame(["This is a new product"], columns=["text"])
+predictions = classifier.process(test_df)
+```
+
+### Custom Processing Example
+
+#### Original API:
+
+```python
+from adala.agents import Agent
+from adala.skills import TransformSkill
+from adala.runtimes import OpenAIChatRuntime
+from pydantic import BaseModel, Field
+
+class EntityExtraction(BaseModel):
+    person: str = Field(..., description="Person mentioned in text")
+    location: str = Field(..., description="Location mentioned in text")
+
+agent = Agent(
+    skills=TransformSkill(
+        name='entity_extraction',
+        instructions="Extract person and location entities from text.",
+        input_template="Text: {text}",
+        output_template="Entities: {entities}",
+        response_model=EntityExtraction
+    ),
+    runtimes = {
+        'openai': OpenAIChatRuntime(model='gpt-4'),
+    },
+    default_runtime='openai'
+)
+
+# Run on data
+import pandas as pd
+df = pd.DataFrame(["John visited Paris last summer"], columns=["text"])
+results = agent.run(df)
+```
+
+#### Simplified API:
+
+```python
+from adala.core import DataProcessor
+from pydantic import BaseModel, Field
+import pandas as pd
+
+class EntityExtraction(BaseModel):
+    person: str = Field(..., description="Person mentioned in text")
+    location: str = Field(..., description="Location mentioned in text")
+
+processor = DataProcessor(
+    prompt_template="Extract person and location entities from this text: {text}",
+    response_model=EntityExtraction,
+    model="gpt-4"
+)
+
+# Run on data
+df = pd.DataFrame(["John visited Paris last summer"], columns=["text"])
+results = processor.process(df)
+```
+
+### Label Studio Example
+
+#### Original API:
+
+```python
+from adala.agents import Agent
+from adala.skills import LabelStudioSkill
+from adala.runtimes import OpenAIChatRuntime
+import pandas as pd
+
+# Define the Label Studio configuration
+label_config = """
+<View>
+  <Text name="text" value="$text"/>
+  <Labels name="ner_tags" toName="text">
+    <Label value="Person"/>
+    <Label value="Organization"/>
+    <Label value="Location"/>
+  </Labels>
+</View>
+"""
+
+# Create the agent with Label Studio skill
+agent = Agent(
+    skills=LabelStudioSkill(
+        name='ner_tagger',
+        instructions="Annotate the text with named entities.",
+        label_config=label_config
+    ),
+    runtimes = {
+        'openai': OpenAIChatRuntime(model='gpt-3.5-turbo'),
+    },
+    default_runtime='openai'
+)
+
+# Run on data
+df = pd.DataFrame(["John works at Apple in San Francisco"], columns=["text"])
+results = agent.run(df)
+```
+
+#### Simplified API:
+
+```python
+from adala import LabelStudioProcessor
+import pandas as pd
+
+# Define the Label Studio configuration
+label_config = """
+<View>
+  <Text name="text" value="$text"/>
+  <Labels name="ner_tags" toName="text">
+    <Label value="Person"/>
+    <Label value="Organization"/>
+    <Label value="Location"/>
+  </Labels>
+</View>
+"""
+
+# Create the processor
+processor = LabelStudioProcessor(
+    label_config=label_config,
+    instructions="Annotate the text with named entities.",
+    model="gpt-3.5-turbo"
+)
+
+# Run on data
+df = pd.DataFrame(["John works at Apple in San Francisco"], columns=["text"])
+results = processor.process(df)
+```
+
+## Async Processing
+
+The simplified API supports asynchronous processing out of the box:
+
+```python
+import asyncio
+from adala.core import Classifier
+import pandas as pd
+
+classifier = Classifier(
+    instructions="Classify the sentiment of the text.",
+    labels=["Positive", "Negative", "Neutral"],
+    model="gpt-3.5-turbo"
+)
+
+async def process_data():
+    df = pd.DataFrame(["I love this product", "This is terrible"], columns=["text"])
+    results = await classifier.aprocess(df)
+    return results
+
+# Run async function
+results = asyncio.run(process_data())
+```
+
+## Benefits of Migration
+
+1. **Less Boilerplate**: Write less code to accomplish the same tasks
+2. **Better Performance**: Direct batch processing without unnecessary wrapper code
+3. **Simpler Mental Model**: Work directly with your data instead of through multiple abstraction layers
+4. **Async Support**: First-class support for asynchronous processing
+5. **Maintainability**: Less code means fewer bugs and easier maintenance
+
+## Common Migration Patterns
+
+1. **Replace Agent with DataProcessor or Classifier**: Choose the appropriate processor for your task
+2. **Eliminate Environment**: Work directly with your data in pandas or DataTable format
+3. **Convert Skills to Prompt Templates**: Move your skill logic into prompt templates and response models
+4. **Replace Runtime Configuration**: Use the simplified BatchLLMRuntime with concurrency settings
+5. **Use add_context() for Examples**: Instead of StaticEnvironment, add examples via context