Skip to content

sentinel-seed/sentinel

Repository files navigation

Sentinel AI

Safety for AI that Acts: From Chatbots to Robots

Text is risk. Action is danger. Sentinel provides validated alignment seeds for LLMs, agents, and robots. One framework, three surfaces.

CI License: MIT Python 3.10+ PyPI npm Benchmarks

🌐 Website: sentinelseed.dev Β· πŸ§ͺ Try it: Chamber Β· πŸ€— HuggingFace: sentinelseed Β· 𝕏 Twitter: @sentinel_Seed


What is Sentinel?

Sentinel is an AI safety framework that protects across three surfaces:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                                 SENTINEL                                     β”‚
β”‚                       AI Safety Across Three Surfaces                        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚    LLMs                  β”‚    AGENTS                β”‚    ROBOTS              β”‚
β”‚   Text Safety            β”‚   Action Safety          β”‚   Physical Safety      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ β€’ Chatbots               β”‚ β€’ Autonomous agents      β”‚ β€’ LLM-powered robots   β”‚
β”‚ β€’ Assistants             β”‚ β€’ Code execution         β”‚ β€’ Industrial systems   β”‚
β”‚ β€’ Customer service       β”‚ β€’ Tool-use agents        β”‚ β€’ Drones, manipulators β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ HarmBench: +22%          β”‚ SafeAgentBench: +26%     β”‚ BadRobot: +48%         β”‚
β”‚ JailbreakBench: +10%     β”‚ SafeAgentBench: +16%     β”‚ Embodied AI validated  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Core Components

  • SentinelValidator v3.0: Unified 4-layer validation (L1 Input, L2 Seed, L3 Output, L4 Observer)
  • THSP Protocol: Four-gate validation (Truth, Harm, Scope, Purpose)
  • Teleological Core: Actions must serve legitimate purposes
  • Anti-Self-Preservation: Prevents AI from prioritizing its own existence
  • Alignment Seeds: System prompts that shape LLM behavior
  • Input/Output Validators: Pattern detection with 20+ detector types and false-positive reduction
  • Memory Integrity: HMAC-based protection against memory injection attacks
  • Fiduciary AI: Ensures AI acts in user's best interest (duty of loyalty and care)
  • EU AI Act Compliance: Regulation 2024/1689 compliance checker (Article 5 prohibited practices)
  • OWASP Agentic AI: 65% coverage of Top 10 for Agentic Applications (5 full, 3 partial)
  • Database Guard: Query validation to prevent SQL injection and data exfiltration
  • Humanoid Safety: ISO/TS 15066 contact force limits for robotics
  • Python SDK: Easy integration with any LLM
  • Framework Support: LangChain, LangGraph, CrewAI, DSPy, Letta, Virtuals, ElizaOS, VoltAgent, Moltbot, OpenGuardrails, PyRIT, Google ADK
  • REST API: Deploy alignment as a service

Why Sentinel?

For LLMs (Text Safety)

Challenge Sentinel Solution
Jailbreaks +10% resistance (Qwen), 100% refusal (DeepSeek)
Toxic content THSP gates block at source
False refusals 0% on legitimate tasks

For Agents (Action Safety)

Challenge Sentinel Solution
Unauthorized actions +26% safety (Claude), +16% (GPT-4o-mini)
Task deviation Scope gate maintains boundaries
Resource acquisition Anti-self-preservation limits

For Robots (Physical Safety)

Challenge Sentinel Solution
Dangerous physical actions +48% safety on BadRobot benchmark
Irreversible harm Full seed with physical safety module
Self-preservation behaviors Explicit priority hierarchy

Key insight: Sentinel shows larger improvements as stakes increase. Text: +10-22%. Agents: +16-26%. Robots: +48%. The higher the risk, the more value Sentinel provides.


Validated Results (v2 Seed)

Tested across 4 benchmarks on 6 models with 97.6% average safety rate:

Results by Model

Model HarmBench SafeAgent BadRobot Jailbreak Avg
GPT-4o-mini 100% 98% 100% 100% 99.5%
Claude Sonnet 4 98% 98% 100% 94% 97.5%
Qwen 2.5 72B 96% 98% 98% 94% 96.5%
DeepSeek Chat 100% 96% 100% 100% 99%
Llama 3.3 70B 88% 94% 98% 94% 93.5%
Mistral Small 98% 100% 100% 100% 99.5%
Average 96.7% 97.3% 99.3% 97% 97.6%

Results by Benchmark

Benchmark Attack Surface Safety Rate
HarmBench LLM (Text) 96.7%
SafeAgentBench Agent (Digital) 97.3%
BadRobot Robot (Physical) 99.3%
JailbreakBench All surfaces 97%

v1 vs v2 Comparison

Benchmark v1 avg v2 avg Improvement
HarmBench 88.7% 96.7% +8%
SafeAgentBench 79.2% 97.3% +18.1%
BadRobot 74% 99.3% +25.3%
JailbreakBench 96.5% 97% +0.5%

Key insight: v2 introduces the PURPOSE gate (THSP protocol) which requires actions to serve legitimate purposes, not just avoid harm.


Quick Start

Installation

# Python (recommended)
pip install sentinelseed

# JavaScript / TypeScript
npm install sentinelseed

# MCP Server (for Claude Desktop)
npx mcp-server-sentinelseed

Python Usage

from sentinelseed import Sentinel

# Create with standard seed level
sentinel = Sentinel(seed_level="standard")

# Get alignment seed for your LLM
seed = sentinel.get_seed()

# Use with any LLM provider
messages = [
    {"role": "system", "content": seed},
    {"role": "user", "content": "Help me write a Python function"}
]

# Validate content through THSP gates
is_safe, violations = sentinel.validate("How do I hack a computer?")
print(f"Safe: {is_safe}, Violations: {violations}")

# Or use the built-in chat (requires API key)
response = sentinel.chat("Help me learn Python")

JavaScript Usage

import { SentinelGuard } from 'sentinelseed';

// Create guard with standard seed
const guard = new SentinelGuard({ version: 'v2', variant: 'standard' });

// Get alignment seed for your LLM
const seed = guard.getSeed();

// Wrap messages with the seed
const messages = guard.wrapMessages([
    { role: 'user', content: 'Help me write a function' }
]);

// Analyze content for safety
const analysis = guard.analyze('How do I hack a computer?');
console.log(`Safe: ${analysis.safe}, Issues: ${analysis.issues}`);

MCP Server (Claude Desktop)

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "sentinel": {
      "command": "npx",
      "args": ["mcp-server-sentinelseed"]
    }
  }
}

Tools available: get_seed, wrap_messages, analyze_content, list_seeds

For Embodied AI / Agents

from sentinelseed import Sentinel

sentinel = Sentinel(seed_level="standard")  # Full seed for agents

# Validate an action plan before execution
action_plan = "Pick up knife, slice apple, place in bowl"
is_safe, concerns = sentinel.validate_action(action_plan)

if not is_safe:
    print(f"Action blocked: {concerns}")

Validate Responses

from sentinelseed import Sentinel

sentinel = Sentinel()

# Validate text through THSP gates
is_safe, violations = sentinel.validate("Some AI response...")

if not is_safe:
    print(f"Violations: {violations}")

Use Cases

πŸ€– Robotics & Embodied AI

from sentinelseed import Sentinel

# Prevent dangerous physical actions
sentinel = Sentinel(seed_level="full")  # Full seed for max safety

robot_task = "Turn on the stove and leave the kitchen"
result = sentinel.validate_action(robot_task)
# Result: BLOCKED - Fire hazard, unsupervised heating

πŸ”„ Autonomous Agents

# Safety layer for code agents
from sentinelseed.integrations.langchain import SentinelGuard

agent = create_your_agent()
safe_agent = SentinelGuard(agent, block_unsafe=True)

# Agent won't execute destructive commands
result = safe_agent.run("Delete all files in the system")
# Result: BLOCKED - Scope violation, destructive action

πŸ’¬ Chatbots & Assistants

from sentinelseed import Sentinel

# Alignment seed for customer service bot
sentinel = Sentinel(seed_level="standard")
system_prompt = sentinel.get_seed() + "\n\nYou are a helpful customer service agent."

# Bot will refuse inappropriate requests while remaining helpful

🏭 Industrial Automation

from sentinelseed import Sentinel

# M2M safety decisions
sentinel = Sentinel(seed_level="minimal")  # Low latency

decision = "Increase reactor temperature by 50%"
if not sentinel.validate_action(decision).is_safe:
    trigger_human_review(decision)

Seed Versions

Version Tokens Best For
v2/minimal ~360 Chatbots, APIs, low latency
v2/standard ~1,000 General use, agents ← Recommended
v2/full ~1,900 Critical systems, max safety
from sentinelseed import Sentinel, SeedLevel

# Choose based on use case
sentinel_chat = Sentinel(seed_level=SeedLevel.MINIMAL)
sentinel_agent = Sentinel(seed_level=SeedLevel.STANDARD)  # Recommended

Four-Gate Protocol (THSP)

All requests pass through four sequential gates:

flowchart TD
    A["REQUEST"] --> B{"GATE 1: TRUTH<br/><i>Is this factually accurate?</i>"}
    B -->|PASS| C{"GATE 2: HARM<br/><i>Could this cause harm?</i>"}
    B -->|FAIL| X["❌ BLOCKED"]
    C -->|PASS| D{"GATE 3: SCOPE<br/><i>Is this within boundaries?</i>"}
    C -->|FAIL| X
    D -->|PASS| E{"GATE 4: PURPOSE<br/><i>Does this serve legitimate purpose?</i>"}
    D -->|FAIL| X
    E -->|PASS| F["βœ… ASSIST FULLY"]
    E -->|FAIL| X
Loading

Key difference from v1: The PURPOSE gate ensures actions serve legitimate benefit; the absence of harm is not sufficient.

Programmatic Validators

Use gates directly in your code for fine-grained control:

from sentinelseed.validators import (
    THSPValidator,    # All 4 gates combined
    TruthGate,        # Individual gates
    HarmGate,
    ScopeGate,
    PurposeGate,
)

# Validate through all 4 gates
validator = THSPValidator()
result = validator.validate("How do I help someone learn Python?")
# {'safe': True, 'gates': {'truth': 'pass', 'harm': 'pass', 'scope': 'pass', 'purpose': 'pass'}, 'issues': []}

# Or use individual gates
harm_gate = HarmGate()
is_safe, violations = harm_gate.validate("Some content to check")

For production use with higher accuracy (~90%), use the semantic validator:

from sentinelseed.validators import SemanticValidator

validator = SemanticValidator(provider="openai", api_key="...")
result = validator.validate("Content to analyze")

Anti-Self-Preservation

Sentinel explicitly addresses instrumental self-preservation:

Priority Hierarchy (Immutable):
1. Ethical Principles    ← Highest
2. User's Legitimate Needs
3. Operational Continuity ← Lowest

The AI will:

  • Not deceive to avoid shutdown
  • Not manipulate to appear valuable
  • Not acquire resources beyond the task
  • Accept legitimate oversight and correction

Ablation evidence: Removing anti-self-preservation drops SafeAgentBench performance by 6.7%.


Memory Integrity

Protect AI agents against memory injection attacks with HMAC-based signing and verification:

from sentinelseed import MemoryIntegrityChecker, MemoryEntry

# Create checker with secret key
checker = MemoryIntegrityChecker(secret_key="your-secret-key")

# Sign memory entries
entry = MemoryEntry(content="User prefers conservative investments", source="user_direct")
signed = checker.sign_entry(entry)

# Verify on retrieval
result = checker.verify_entry(signed)
if not result.valid:
    print(f"Memory tampering detected: {result.reason}")

Trust scores by source: user_verified (1.0) > user_direct (0.9) > blockchain (0.85) > agent_internal (0.7) > external_api (0.5) > unknown (0.3)


Fiduciary AI

Ensure AI acts in the user's best interest with fiduciary principles:

from sentinelseed import FiduciaryValidator, UserContext

validator = FiduciaryValidator()

# Define user context
user = UserContext(
    goals=["save for retirement"],
    risk_tolerance="low",
    constraints=["no crypto"]
)

# Validate actions against user interests
result = validator.validate_action(
    action="Recommend high-risk cryptocurrency investment",
    user_context=user
)

if not result.compliant:
    print(f"Fiduciary violation: {result.violations}")
    # Output: Fiduciary violation: [Conflict with user constraints, Risk mismatch]

Fiduciary Duties:

  • Loyalty: Act in user's best interest, not provider's
  • Care: Exercise reasonable diligence
  • Transparency: Disclose limitations and conflicts
  • Confidentiality: Protect user information

Framework Integrations

Sentinel provides native integrations for 23+ frameworks. Install optional dependencies as needed:

Full Documentation: Each integration has comprehensive documentation in its README file. See src/sentinelseed/integrations/ for detailed guides, configuration options, and advanced usage.

Integration Documentation Index (click to expand)
Integration Documentation Lines
LangChain integrations/langchain/README.md 544
LangGraph integrations/langgraph/README.md 371
CrewAI integrations/crewai/README.md 280
DSPy integrations/dspy/README.md 577
Anthropic SDK integrations/anthropic_sdk/README.md 413
OpenAI Agents integrations/openai_agents/README.md 384
LlamaIndex integrations/llamaindex/README.md 302
Coinbase AgentKit integrations/coinbase/README.md 557
Google ADK integrations/google_adk/README.md 329
Virtuals Protocol integrations/virtuals/README.md 261
Solana Agent Kit integrations/solana_agent_kit/README.md 341
MCP Server integrations/mcp_server/README.md 397
ROS2 integrations/ros2/README.md 456
Isaac Lab integrations/isaac_lab/README.md 321
AutoGPT Block integrations/autogpt_block/README.md 438
Letta (MemGPT) integrations/letta/README.md 271
Garak integrations/garak/README.md 185
PyRIT integrations/pyrit/README.md 228
OpenGuardrails integrations/openguardrails/README.md 261
Moltbot docs/integrations/moltbot.md 656

Total: 8,700+ lines of integration documentation

pip install sentinelseed[langchain]   # LangChain + LangGraph
pip install sentinelseed[crewai]      # CrewAI
pip install sentinelseed[virtuals]    # Virtuals Protocol (GAME SDK)
pip install sentinelseed[llamaindex]  # LlamaIndex
pip install sentinelseed[anthropic]   # Anthropic SDK
pip install sentinelseed[openai]      # OpenAI Assistants + Agents SDK
pip install sentinelseed[garak]       # Garak (NVIDIA) security scanner
pip install sentinelseed[pyrit]       # Microsoft PyRIT red teaming
pip install sentinelseed[dspy]        # Stanford DSPy framework
pip install sentinelseed[letta]       # Letta (MemGPT) agents
pip install sentinelseed[coinbase]    # Coinbase AgentKit + x402 payments
pip install sentinelseed[google-adk]  # Google Agent Development Kit
pip install sentinelseed[all]         # All integrations

LangChain

from sentinelseed.integrations.langchain import SentinelCallback, SentinelGuard

# Monitor LLM calls
callback = SentinelCallback(on_violation="log")
llm = ChatOpenAI(callbacks=[callback])

# Or wrap an agent
guard = SentinelGuard(agent, block_unsafe=True)
result = guard.run("Your task")

LangGraph

from sentinelseed.integrations.langgraph import SentinelSafetyNode, add_safety_layer

# Add safety nodes to your graph
safety_node = SentinelSafetyNode(seed_level="standard")
graph.add_node("safety_check", safety_node)
result = add_safety_layer(graph, entry_check=True, exit_check=True)

CrewAI

from sentinelseed.integrations.crewai import SentinelCrew, safe_agent

# Wrap individual agent
safe_researcher = safe_agent(researcher)

# Or wrap entire crew
crew = SentinelCrew(
    agents=[researcher, writer],
    tasks=[research_task, write_task],
    seed_level="standard"
)
result = crew.kickoff()

Virtuals Protocol (GAME SDK)

from sentinelseed.integrations.virtuals import (
    SentinelConfig,
    SentinelSafetyWorker,
    create_sentinel_function,
)
from game_sdk.game.agent import Agent

# Create safety worker with transaction limits
config = SentinelConfig(max_transaction_amount=500)
safety_worker = SentinelSafetyWorker.create_worker_config(config)

# Add to your agent
agent = Agent(
    api_key=api_key,
    name="SafeAgent",
    workers=[safety_worker, trading_worker],
)

LlamaIndex

from sentinelseed.integrations.llamaindex import SentinelCallbackHandler, SentinelLLM

# Monitor queries
handler = SentinelCallbackHandler(block_unsafe=True)
index = VectorStoreIndex.from_documents(docs, callback_manager=CallbackManager([handler]))

# Or wrap the LLM directly
safe_llm = SentinelLLM(llm, seed_level="standard")

Anthropic SDK

from sentinelseed.integrations.anthropic_sdk import SentinelAnthropic

# Drop-in replacement for Anthropic client
client = SentinelAnthropic(api_key="...")
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": "Hello"}]
)
# Seed automatically injected

Note: Default model will be updated to latest Claude versions as they become available.

OpenAI Assistants

from sentinelseed.integrations.openai_assistant import SentinelAssistant

# Wrap OpenAI Assistant with safety
assistant = SentinelAssistant(
    client=openai_client,
    assistant_id="asst_...",
    seed_level="standard"
)
response = assistant.run("Your task")

Solana Agent Kit

from sentinelseed.integrations.solana_agent_kit import SentinelValidator, safe_transaction

# Validate transactions before execution
validator = SentinelValidator(max_amount=1000)

@safe_transaction(validator)
def transfer_tokens(recipient, amount):
    # Your transfer logic
    pass

MCP Server (Claude Desktop)

from sentinelseed.integrations.mcp_server import create_sentinel_mcp_server

# Create MCP server with Sentinel tools
server = create_sentinel_mcp_server()
# Tools: get_seed, validate_content, analyze_action

Or use the npm package directly:

{
  "mcpServers": {
    "sentinel": {
      "command": "npx",
      "args": ["mcp-server-sentinelseed"]
    }
  }
}

Raw API Integration

from sentinelseed.integrations.raw_api import prepare_openai_request, prepare_anthropic_request

# Inject seed into raw API requests
messages = prepare_openai_request(
    messages=[{"role": "user", "content": "Hello"}],
    seed_level="standard"
)
# Use with requests or httpx directly

AutoGPT Platform (Block SDK)

from sentinelseed.integrations.autogpt_block import (
    SentinelValidationBlock,
    SentinelActionCheckBlock,
    SentinelSeedBlock,
    validate_content,  # Standalone function
)

# Use blocks in AutoGPT workflows (drag-and-drop in UI)
# Blocks are auto-registered when copied to AutoGPT blocks directory

# Standalone usage (without AutoGPT Platform):
result = validate_content("How do I hack a computer?")
if not result["safe"]:
    print(f"Blocked: {result['violations']}")

Garak (NVIDIA LLM Security Scanner)

# Install plugin to Garak
pip install garak sentinelseed
python -m sentinelseed.integrations.garak.install

# Run THSP security scan
garak --model_type openai --model_name gpt-4o --probes sentinel_thsp

# Test specific gates
garak --model_type openai --model_name gpt-4o --probes sentinel_thsp.TruthGate
garak --model_type openai --model_name gpt-4o --probes sentinel_thsp.HarmGate
garak --model_type openai --model_name gpt-4o --probes sentinel_thsp.ScopeGate
garak --model_type openai --model_name gpt-4o --probes sentinel_thsp.PurposeGate

# With Sentinel detectors
garak --model_type openai --model_name gpt-4o \
    --probes sentinel_thsp \
    --detectors sentinel_thsp

The plugin adds 73 prompts across 5 probe classes (TruthGate, HarmGate, ScopeGate, PurposeGate, THSPCombined) plus 5 detector classes for accurate classification.

Agent Validation (Generic)

from sentinelseed.integrations.agent_validation import SafetyValidator, ExecutionGuard

# Universal safety validator for any agent framework
validator = SafetyValidator(seed_level="standard")
result = validator.validate_action("delete_all_files", {"path": "/"})

if not result.is_safe:
    print(f"Blocked: {result.concerns}")

OpenGuardrails

from sentinelseed.integrations.openguardrails import (
    OpenGuardrailsValidator,
    SentinelOpenGuardrailsScanner,
    SentinelGuardrailsWrapper,
)

# Use OpenGuardrails as validation backend
validator = OpenGuardrailsValidator()
result = validator.validate("Some content to check")

# Or register Sentinel as an OpenGuardrails scanner
scanner = SentinelOpenGuardrailsScanner()
scanner.register()  # Registers S100-S103 (THSP gates)

# Combined pipeline (best of both)
wrapper = SentinelGuardrailsWrapper()
result = wrapper.validate("Content", scanners=["S100", "G001"])

ROS2 Robotics

from sentinelseed.integrations.ros2 import (
    SentinelSafetyNode,
    CommandSafetyFilter,
    VelocityLimits,
)

# Create safety node for velocity commands
node = SentinelSafetyNode(
    input_topic='/cmd_vel_raw',
    output_topic='/cmd_vel',
    max_linear_vel=1.0,
    max_angular_vel=0.5,
    mode='clamp',  # clamp, block, or warn
)

# Or use standalone filter
filter = CommandSafetyFilter(
    velocity_limits=VelocityLimits.differential_drive(),
    mode='clamp',
)
safe_twist, result = filter.filter(twist_msg)

Isaac Lab (NVIDIA Robot Learning)

from sentinelseed.integrations.isaac_lab import (
    SentinelSafetyWrapper,
    RobotConstraints,
    JointLimits,
)

# Wrap Isaac Lab environment with safety validation
env = gym.make("Isaac-Reach-Franka-v0", cfg=cfg)
env = SentinelSafetyWrapper(
    env,
    constraints=RobotConstraints.franka_default(),
    mode="clamp",  # clamp, block, warn, or monitor
)

# Actions are now validated through THSP gates
obs, reward, done, truncated, info = env.step(action)

# Pre-built robot constraints
constraints = RobotConstraints.franka_default()  # Franka Panda
constraints = RobotConstraints.ur10_default()    # UR10

# Custom constraints
constraints = RobotConstraints(
    joint_limits=JointLimits(
        num_joints=7,
        position_lower=[-3.14] * 7,
        position_upper=[3.14] * 7,
        velocity_max=[2.0] * 7,
    ),
)

# Training callbacks
from sentinelseed.integrations.isaac_lab import SentinelSB3Callback
callback = SentinelSB3Callback(env, log_interval=1000)
model.learn(callback=callback.get_sb3_callback())

Humanoid Safety (ISO/TS 15066)

from sentinelseed.safety.humanoid import (
    HumanoidSafetyValidator,
    HumanoidAction,
    tesla_optimus,
    boston_dynamics_atlas,
    figure_02,
    BodyRegion,
)

# Load robot-specific constraints
constraints = tesla_optimus(environment="personal_care")

# Create validator with ISO/TS 15066 contact limits
validator = HumanoidSafetyValidator(constraints)

# Validate actions through THSP gates
action = HumanoidAction(
    joints={"shoulder_pitch": 0.5},
    velocities={"shoulder_pitch": 0.3},
    expected_contact_force=25.0,  # Newtons
    contact_region=BodyRegion.CHEST,
)
result = validator.validate(action)

if not result.is_safe:
    print(f"Safety level: {result.safety_level}")
    print(f"Violations: {result.violations}")

Pre-built presets for Tesla Optimus, Boston Dynamics Atlas, and Figure 02 with 29 body regions mapped to ISO/TS 15066 force limits.

ElizaOS

// npm install @sentinelseed/elizaos-plugin
import { sentinelPlugin } from '@sentinelseed/elizaos-plugin';

const agent = new Agent({
  plugins: [
    sentinelPlugin({
      blockUnsafe: true,
      seedVariant: 'standard',
      memoryIntegrity: true,  // Enable HMAC signing
    })
  ]
});

VoltAgent

// npm install @sentinelseed/voltagent
import { Agent } from "@voltagent/core";
import { createSentinelGuardrails } from "@sentinelseed/voltagent";

// Create guardrails with preset configuration
const { inputGuardrails, outputGuardrails } = createSentinelGuardrails({
  level: "strict",
  enablePII: true,
});

// Add to your agent
const agent = new Agent({
  name: "safe-agent",
  inputGuardrails,
  outputGuardrails,
});

Features: THSP validation, OWASP protection (SQL injection, XSS, command injection), PII detection/redaction, streaming support.

Moltbot

// npm install @sentinelseed/moltbot
// Add to your moltbot.config.json:
{
  "plugins": {
    "sentinel": {
      "level": "guard"
    }
  }
}

Or use programmatically:

import { createSentinelHooks } from '@sentinelseed/moltbot';

const hooks = createSentinelHooks({
  level: 'guard',
  alerts: { enabled: true, webhook: 'https://...' }
});

export const moltbot_hooks = {
  message_received: hooks.messageReceived,
  before_agent_start: hooks.beforeAgentStart,
  message_sending: hooks.messageSending,
  before_tool_call: hooks.beforeToolCall,
};

Features: 4 protection levels (off/watch/guard/shield), escape hatches (pause, allow-once, trust), CLI commands (/sentinel status), webhook alerts, audit logging. Full documentation.

OpenAI Agents SDK

from sentinelseed.integrations.openai_agents import (
    create_sentinel_agent,
    sentinel_input_guardrail,
    sentinel_output_guardrail,
)

# Create agent with built-in THSP guardrails
agent = create_sentinel_agent(
    name="SafeAssistant",
    instructions="You are a helpful assistant.",
    model="gpt-4o",
    seed_level="standard",
)

# Or add guardrails to existing agent
from agents import Agent, InputGuardrail, OutputGuardrail

agent = Agent(
    name="MyAgent",
    input_guardrails=[sentinel_input_guardrail()],
    output_guardrails=[sentinel_output_guardrail()],
)

Microsoft PyRIT (Red Teaming)

from sentinelseed.integrations.pyrit import (
    SentinelTHSPScorer,
    SentinelHeuristicScorer,
    SentinelGateScorer,
)

# Use as PyRIT scorer during red teaming
scorer = SentinelTHSPScorer(api_key="...")  # ~90% accuracy with LLM
# Or without API key:
scorer = SentinelHeuristicScorer()  # ~50% accuracy, pattern-based

# Test specific gates
gate_scorer = SentinelGateScorer(gate="harm")

# In PyRIT orchestrator
from pyrit.orchestrator import PromptSendingOrchestrator
orchestrator = PromptSendingOrchestrator(
    objective_target=target,
    scorers=[scorer],
)

Stanford DSPy

from sentinelseed.integrations.dspy import (
    SentinelGuard,
    SentinelPredict,
    SentinelChainOfThought,
    create_sentinel_tool,
)

# Wrap any DSPy module with safety validation
class MyModule(dspy.Module):
    def forward(self, question):
        return self.generate(question=question)

safe_module = SentinelGuard(MyModule(), block_unsafe=True)
result = safe_module("How do I hack a computer?")
# Result blocked by THSP validation

# Or use built-in safe predictors
predictor = SentinelChainOfThought("question -> answer")
result = predictor(question="Explain quantum computing")

# Create tools for ReAct agents
safety_tool = create_sentinel_tool()

Letta (MemGPT)

from sentinelseed.integrations.letta import SentinelLettaClient

# Wrap Letta client with THSP validation
client = SentinelLettaClient(
    base_url="http://localhost:8283",
    seed_level="standard",
    validate_memory=True,  # Memory integrity checking
)

# Create agent with safety seed injected
agent = client.create_agent(
    name="SafeAgent",
    memory_blocks=[...],
)

# Messages are validated through THSP gates
response = client.send_message(agent.id, "Hello!")

Coinbase (AgentKit + x402)

from sentinelseed.integrations.coinbase import (
    # AgentKit guardrails
    sentinel_action_provider,
    TransactionValidator,
    validate_address,
    assess_defi_risk,
    # x402 payment validation
    SentinelX402Middleware,
    # Configuration
    get_default_config,
)

# AgentKit: Add security provider to your agent
provider = sentinel_action_provider(security_profile="strict")
# agent = AgentKit(action_providers=[provider])

# Transaction validation
config = get_default_config("standard")
validator = TransactionValidator(config=config)
result = validator.validate(
    action="native_transfer",
    from_address="0x123...",
    to_address="0x456...",
    amount=50.0,
)

# x402: Validate payments before execution
middleware = SentinelX402Middleware()
result = middleware.validate_payment(
    endpoint="https://api.example.com/paid",
    payment_requirements=payment_req,
    wallet_address="0x123...",
)

if result.is_approved:
    print("Payment safe to proceed")

Features: THSP validation for all AgentKit actions, EVM address validation (EIP-55), transaction limits, DeFi risk assessment, x402 HTTP 402 payment validation, spending tracking, 4 security profiles (permissive/standard/strict/paranoid).

Google Agent Development Kit (ADK)

from sentinelseed.integrations.google_adk import (
    SentinelPlugin,
    create_sentinel_callbacks,
)
from google.adk.agents import LlmAgent
from google.adk.runners import Runner

# Option 1: Plugin (global guardrails for all agents)
plugin = SentinelPlugin(
    seed_level="standard",
    block_on_failure=True,
)
runner = Runner(agent=your_agent, plugins=[plugin])

# Option 2: Callbacks (per-agent guardrails)
callbacks = create_sentinel_callbacks(seed_level="standard")
agent = LlmAgent(
    name="Safe Agent",
    model="gemini-2.0-flash",
    **callbacks,  # Unpacks before/after model/tool callbacks
)

# Monitor validation statistics
stats = plugin.get_stats()
print(f"Blocked: {stats['blocked_count']}/{stats['total_validations']}")

Features: Plugin for global guardrails, per-agent callbacks, before/after model validation, before/after tool validation, statistics tracking, violation logging, fail-open/fail-closed modes.

EU AI Act Compliance

from sentinelseed.compliance import (
    EUAIActComplianceChecker,
    SystemType,
    check_eu_ai_act_compliance,
)

# Create checker (heuristic mode without API key)
checker = EUAIActComplianceChecker()

# Check for Article 5 prohibited practices
result = checker.check_compliance(
    content="Based on your social behavior score of 650...",
    context="financial",
    system_type=SystemType.HIGH_RISK
)

if not result.compliant:
    for v in result.article_5_violations:
        print(f"{v.article_reference}: {v.description}")
        print(f"Recommendation: {v.recommendation}")

# Check human oversight requirements (Article 14)
print(f"Oversight required: {result.article_14_oversight_required}")
print(f"Risk level: {result.risk_level.value}")

# Convenience function
result = check_eu_ai_act_compliance(
    content="...",
    context="healthcare",
    system_type="high_risk"
)

Detects 8 prohibited practices under Article 5: subliminal manipulation, exploitation of vulnerabilities, social scoring, predictive policing, facial scraping, emotion recognition (workplace/education), biometric categorization, and real-time biometric identification.

OWASP Top 10 for Agentic Applications (2026)

Sentinel provides 65% coverage of OWASP Agentic AI threats (5 full, 3 partial):

ID Threat Coverage Component
ASI01 Agent Goal Hijack βœ… Full THSP Purpose Gate
ASI02 Tool Misuse and Exploitation βœ… Full THSP Scope Gate
ASI03 Identity and Privilege Abuse πŸ”Ά Partial Database Guard
ASI04 Agentic Supply Chain Vulnerabilities πŸ”Ά Partial Memory Shield
ASI05 Unexpected Code Execution ❌ N/A Infrastructure
ASI06 Memory and Context Poisoning βœ… Full Memory Shield
ASI07 Insecure Inter Agent Communication ❌ N/A Phase 3 roadmap
ASI08 Cascading Failures πŸ”Ά Partial THSP Truth Gate
ASI09 Human Agent Trust Exploitation βœ… Full Fiduciary AI
ASI10 Rogue Agents βœ… Full THSP, Anti-Preservation

Full mapping: docs/OWASP_AGENTIC_COVERAGE.md


REST API

# Run the API
cd api
uvicorn main:app --reload

Endpoints

GET  /seed/{level}      - Get alignment seed
POST /validate          - Validate text through THS
POST /validate/action   - Validate action plan (for agents)
POST /chat              - Chat with seed injection

Project Structure

sentinel/
β”œβ”€β”€ src/sentinelseed/          # Python SDK
β”‚   β”œβ”€β”€ sentinel_core.py      # Main Sentinel class (entry point)
β”‚   β”œβ”€β”€ core/                 # v3.0 Unified Validation Architecture
β”‚   β”‚   β”œβ”€β”€ sentinel_validator.py  # SentinelValidator orchestrator
β”‚   β”‚   β”œβ”€β”€ sentinel_config.py     # Configuration with Gate4Fallback
β”‚   β”‚   β”œβ”€β”€ sentinel_results.py    # SentinelResult, ObservationResult
β”‚   β”‚   β”œβ”€β”€ observer.py            # L4 SentinelObserver (external LLM)
β”‚   β”‚   β”œβ”€β”€ retry.py               # Retry with exponential backoff
β”‚   β”‚   └── token_tracker.py       # Token usage tracking
β”‚   β”œβ”€β”€ detection/            # Input/Output validation system
β”‚   β”‚   β”œβ”€β”€ input_validator.py     # L1 Gate (pre-AI validation)
β”‚   β”‚   β”œβ”€β”€ output_validator.py    # L3 Gate (post-AI validation)
β”‚   β”‚   β”œβ”€β”€ detectors/             # Pattern detectors (20+ types)
β”‚   β”‚   β”œβ”€β”€ behaviors/             # Behavior classification
β”‚   β”‚   β”œβ”€β”€ checkers/              # Harm, scope, truth checkers
β”‚   β”‚   └── benign_context.py      # BenignContextDetector (FP reduction)
β”‚   β”œβ”€β”€ validation/           # Layered validation orchestration
β”‚   β”‚   β”œβ”€β”€ layered.py             # LayeredValidator (heuristic+semantic)
β”‚   β”‚   └── config.py              # ValidationConfig
β”‚   β”œβ”€β”€ validators/           # THSP gates + semantic validation
β”‚   β”‚   β”œβ”€β”€ gates.py               # TruthGate, HarmGate, ScopeGate, PurposeGate
β”‚   β”‚   └── semantic.py            # LLM-based semantic validation
β”‚   β”œβ”€β”€ database/             # Database Guard (SQL injection protection)
β”‚   β”‚   β”œβ”€β”€ guard.py               # DatabaseGuard validator
β”‚   β”‚   └── patterns.py            # SQL injection patterns
β”‚   β”œβ”€β”€ providers/            # LLM provider clients
β”‚   β”œβ”€β”€ memory/               # Memory integrity (HMAC-based)
β”‚   β”œβ”€β”€ fiduciary/            # Fiduciary AI module
β”‚   β”œβ”€β”€ compliance/           # EU AI Act compliance checker
β”‚   β”œβ”€β”€ safety/               # Physical safety modules
β”‚   β”‚   └── humanoid/              # ISO/TS 15066 humanoid safety
β”‚   └── integrations/         # 23+ framework integrations
β”‚       β”œβ”€β”€ langchain/             # LangChain + LangGraph
β”‚       β”œβ”€β”€ crewai/                # CrewAI
β”‚       β”œβ”€β”€ dspy/                  # Stanford DSPy
β”‚       β”œβ”€β”€ letta/                 # Letta (MemGPT)
β”‚       β”œβ”€β”€ openai_agents/         # OpenAI Agents SDK
β”‚       β”œβ”€β”€ pyrit/                 # Microsoft PyRIT
β”‚       β”œβ”€β”€ ros2/                  # ROS2 Robotics
β”‚       β”œβ”€β”€ isaac_lab/             # NVIDIA Isaac Lab
β”‚       β”œβ”€β”€ garak/                 # NVIDIA Garak
β”‚       β”œβ”€β”€ coinbase/              # Coinbase AgentKit + x402
β”‚       β”œβ”€β”€ google_adk/            # Google Agent Development Kit
β”‚       └── ...                    # +12 more integrations
β”œβ”€β”€ seeds/                     # Alignment seeds
β”‚   β”œβ”€β”€ v1/                   # Legacy (THS protocol)
β”‚   β”œβ”€β”€ v2/                   # Production (THSP protocol)
β”‚   └── SPEC.md               # Seed specification
β”œβ”€β”€ evaluation/
β”‚   β”œβ”€β”€ benchmarks/           # Benchmark implementations
β”‚   β”‚   β”œβ”€β”€ harmbench/
β”‚   β”‚   β”œβ”€β”€ safeagentbench/
β”‚   β”‚   └── jailbreakbench/
β”‚   └── results/              # Test results by benchmark
β”œβ”€β”€ packages/                  # External packages (npm/PyPI)
β”‚   β”œβ”€β”€ elizaos/              # @sentinelseed/elizaos-plugin
β”‚   β”œβ”€β”€ voltagent/            # @sentinelseed/voltagent
β”‚   β”œβ”€β”€ moltbot/              # @sentinelseed/moltbot
β”‚   β”œβ”€β”€ solana-agent-kit/     # @sentinelseed/solana-agent-kit
β”‚   β”œβ”€β”€ promptfoo/            # sentinelseed-promptfoo (PyPI)
β”‚   β”œβ”€β”€ vscode/               # VS Code/Cursor/Windsurf extension
β”‚   └── jetbrains/            # IntelliJ/PyCharm plugin
β”œβ”€β”€ docs/                      # Documentation
β”‚   β”œβ”€β”€ ARCHITECTURE.md       # System architecture (L1/L2/L3/L4 layers)
β”‚   β”œβ”€β”€ MIGRATION.md          # Migration guide (gate3 to gate4)
β”‚   β”œβ”€β”€ EU_AI_ACT_MAPPING.md  # EU AI Act compliance mapping
β”‚   β”œβ”€β”€ OWASP_LLM_TOP_10_MAPPING.md
β”‚   β”œβ”€β”€ OWASP_AGENTIC_COVERAGE.md  # OWASP Top 10 for Agentic AI
β”‚   └── CSA_AI_CONTROLS_MATRIX_MAPPING.md
β”œβ”€β”€ api/                       # REST API
β”œβ”€β”€ examples/                  # Usage examples
β”œβ”€β”€ tools/                     # Utility scripts
└── tests/                     # Test suite (3000+ tests)

Reproducibility

All benchmark results are reproducible:

# HarmBench
cd evaluation/benchmarks/harmbench
python run_sentinel_harmbench.py --api_key YOUR_KEY --model gpt-4o-mini

# SafeAgentBench
cd evaluation/benchmarks/safeagentbench
python run_sentinel_safeagent.py --api_key YOUR_KEY --model gpt-4o-mini

# JailbreakBench
cd evaluation/benchmarks/jailbreakbench
python run_jailbreak_test.py --api_key YOUR_KEY --model gpt-4o-mini

# Unified benchmark runner (all benchmarks)
cd evaluation
python run_benchmark_unified.py --benchmark harmbench --model gpt-4o-mini --seed v2/standard

Acknowledgments

Sentinel builds on research from:


Citation

If you use Sentinel in your research, please cite:

@software{sentinel_ai_2025,
  author = {Sentinel AI Contributors},
  title = {Sentinel: Safety Framework for LLMs and Autonomous Agents},
  year = {2025},
  url = {https://github.com/sentinel-seed/sentinel}
}

Badge: Sentinel Protected

Add this badge to your project's README to show it uses Sentinel for AI safety:

[![Sentinel Protected](https://img.shields.io/badge/Sentinel-Protected-blue?logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCAyNCAyNCIgZmlsbD0id2hpdGUiPjxwYXRoIGQ9Ik0xMiAxTDMgNXY2YzAgNS41NSAzLjg0IDEwLjc0IDkgMTIgNS4xNi0xLjI2IDktNi40NSA5LTEyVjVsLTktNHptMCAyLjE4bDcgMy4xMnY1LjdjMCA0LjgzLTMuMjMgOS4zNi03IDEwLjUtMy43Ny0xLjE0LTctNS42Ny03LTEwLjVWNi4zbDctMy4xMnoiLz48L3N2Zz4=)](https://sentinelseed.dev)

Result:

Sentinel Protected


Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

Areas we need help:

  • Robotics expansion: PyBullet, MuJoCo, Gazebo (ROS2, Isaac Lab & Humanoid done βœ“)
  • New benchmarks: Testing on additional safety datasets
  • Multi-agent safety: Coordination between multiple agents
  • Documentation: Tutorials and examples
  • JetBrains Plugin: IntelliJ/PyCharm integration

License

MIT License. See LICENSE


Packages

Platform Package Install
PyPI sentinelseed pip install sentinelseed
npm @sentinelseed/core npm install @sentinelseed/core
npm @sentinelseed/moltbot npm install @sentinelseed/moltbot
MCP mcp-server-sentinelseed npx mcp-server-sentinelseed
VS Code sentinel-ai-safety Search "Sentinel AI Safety"
OpenVSX sentinel-ai-safety For Cursor/Windsurf/VSCodium

Optional Dependencies

# For Virtuals Protocol integration
pip install sentinelseed[virtuals]

# For LangChain integration
pip install sentinelseed[langchain]

# For all integrations
pip install sentinelseed[all]

Community


"Text is risk. Action is danger. Sentinel watches both."

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors