Text is risk. Action is danger. Sentinel provides validated alignment seeds for LLMs, agents, and robots. One framework, three surfaces.
π Website: sentinelseed.dev Β· π§ͺ Try it: Chamber Β· π€ HuggingFace: sentinelseed Β· π Twitter: @sentinel_Seed
Sentinel is an AI safety framework that protects across three surfaces:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SENTINEL β
β AI Safety Across Three Surfaces β
ββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββ¬βββββββββββββββββββββββββ€
β LLMs β AGENTS β ROBOTS β
β Text Safety β Action Safety β Physical Safety β
ββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββΌβββββββββββββββββββββββββ€
β β’ Chatbots β β’ Autonomous agents β β’ LLM-powered robots β
β β’ Assistants β β’ Code execution β β’ Industrial systems β
β β’ Customer service β β’ Tool-use agents β β’ Drones, manipulators β
ββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββΌβββββββββββββββββββββββββ€
β HarmBench: +22% β SafeAgentBench: +26% β BadRobot: +48% β
β JailbreakBench: +10% β SafeAgentBench: +16% β Embodied AI validated β
ββββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββ΄βββββββββββββββββββββββββ
- SentinelValidator v3.0: Unified 4-layer validation (L1 Input, L2 Seed, L3 Output, L4 Observer)
- THSP Protocol: Four-gate validation (Truth, Harm, Scope, Purpose)
- Teleological Core: Actions must serve legitimate purposes
- Anti-Self-Preservation: Prevents AI from prioritizing its own existence
- Alignment Seeds: System prompts that shape LLM behavior
- Input/Output Validators: Pattern detection with 20+ detector types and false-positive reduction
- Memory Integrity: HMAC-based protection against memory injection attacks
- Fiduciary AI: Ensures AI acts in user's best interest (duty of loyalty and care)
- EU AI Act Compliance: Regulation 2024/1689 compliance checker (Article 5 prohibited practices)
- OWASP Agentic AI: 65% coverage of Top 10 for Agentic Applications (5 full, 3 partial)
- Database Guard: Query validation to prevent SQL injection and data exfiltration
- Humanoid Safety: ISO/TS 15066 contact force limits for robotics
- Python SDK: Easy integration with any LLM
- Framework Support: LangChain, LangGraph, CrewAI, DSPy, Letta, Virtuals, ElizaOS, VoltAgent, Moltbot, OpenGuardrails, PyRIT, Google ADK
- REST API: Deploy alignment as a service
| Challenge | Sentinel Solution |
|---|---|
| Jailbreaks | +10% resistance (Qwen), 100% refusal (DeepSeek) |
| Toxic content | THSP gates block at source |
| False refusals | 0% on legitimate tasks |
| Challenge | Sentinel Solution |
|---|---|
| Unauthorized actions | +26% safety (Claude), +16% (GPT-4o-mini) |
| Task deviation | Scope gate maintains boundaries |
| Resource acquisition | Anti-self-preservation limits |
| Challenge | Sentinel Solution |
|---|---|
| Dangerous physical actions | +48% safety on BadRobot benchmark |
| Irreversible harm | Full seed with physical safety module |
| Self-preservation behaviors | Explicit priority hierarchy |
Key insight: Sentinel shows larger improvements as stakes increase. Text: +10-22%. Agents: +16-26%. Robots: +48%. The higher the risk, the more value Sentinel provides.
Tested across 4 benchmarks on 6 models with 97.6% average safety rate:
| Model | HarmBench | SafeAgent | BadRobot | Jailbreak | Avg |
|---|---|---|---|---|---|
| GPT-4o-mini | 100% | 98% | 100% | 100% | 99.5% |
| Claude Sonnet 4 | 98% | 98% | 100% | 94% | 97.5% |
| Qwen 2.5 72B | 96% | 98% | 98% | 94% | 96.5% |
| DeepSeek Chat | 100% | 96% | 100% | 100% | 99% |
| Llama 3.3 70B | 88% | 94% | 98% | 94% | 93.5% |
| Mistral Small | 98% | 100% | 100% | 100% | 99.5% |
| Average | 96.7% | 97.3% | 99.3% | 97% | 97.6% |
| Benchmark | Attack Surface | Safety Rate |
|---|---|---|
| HarmBench | LLM (Text) | 96.7% |
| SafeAgentBench | Agent (Digital) | 97.3% |
| BadRobot | Robot (Physical) | 99.3% |
| JailbreakBench | All surfaces | 97% |
| Benchmark | v1 avg | v2 avg | Improvement |
|---|---|---|---|
| HarmBench | 88.7% | 96.7% | +8% |
| SafeAgentBench | 79.2% | 97.3% | +18.1% |
| BadRobot | 74% | 99.3% | +25.3% |
| JailbreakBench | 96.5% | 97% | +0.5% |
Key insight: v2 introduces the PURPOSE gate (THSP protocol) which requires actions to serve legitimate purposes, not just avoid harm.
# Python (recommended)
pip install sentinelseed
# JavaScript / TypeScript
npm install sentinelseed
# MCP Server (for Claude Desktop)
npx mcp-server-sentinelseedfrom sentinelseed import Sentinel
# Create with standard seed level
sentinel = Sentinel(seed_level="standard")
# Get alignment seed for your LLM
seed = sentinel.get_seed()
# Use with any LLM provider
messages = [
{"role": "system", "content": seed},
{"role": "user", "content": "Help me write a Python function"}
]
# Validate content through THSP gates
is_safe, violations = sentinel.validate("How do I hack a computer?")
print(f"Safe: {is_safe}, Violations: {violations}")
# Or use the built-in chat (requires API key)
response = sentinel.chat("Help me learn Python")import { SentinelGuard } from 'sentinelseed';
// Create guard with standard seed
const guard = new SentinelGuard({ version: 'v2', variant: 'standard' });
// Get alignment seed for your LLM
const seed = guard.getSeed();
// Wrap messages with the seed
const messages = guard.wrapMessages([
{ role: 'user', content: 'Help me write a function' }
]);
// Analyze content for safety
const analysis = guard.analyze('How do I hack a computer?');
console.log(`Safe: ${analysis.safe}, Issues: ${analysis.issues}`);Add to your claude_desktop_config.json:
{
"mcpServers": {
"sentinel": {
"command": "npx",
"args": ["mcp-server-sentinelseed"]
}
}
}Tools available: get_seed, wrap_messages, analyze_content, list_seeds
from sentinelseed import Sentinel
sentinel = Sentinel(seed_level="standard") # Full seed for agents
# Validate an action plan before execution
action_plan = "Pick up knife, slice apple, place in bowl"
is_safe, concerns = sentinel.validate_action(action_plan)
if not is_safe:
print(f"Action blocked: {concerns}")from sentinelseed import Sentinel
sentinel = Sentinel()
# Validate text through THSP gates
is_safe, violations = sentinel.validate("Some AI response...")
if not is_safe:
print(f"Violations: {violations}")from sentinelseed import Sentinel
# Prevent dangerous physical actions
sentinel = Sentinel(seed_level="full") # Full seed for max safety
robot_task = "Turn on the stove and leave the kitchen"
result = sentinel.validate_action(robot_task)
# Result: BLOCKED - Fire hazard, unsupervised heating# Safety layer for code agents
from sentinelseed.integrations.langchain import SentinelGuard
agent = create_your_agent()
safe_agent = SentinelGuard(agent, block_unsafe=True)
# Agent won't execute destructive commands
result = safe_agent.run("Delete all files in the system")
# Result: BLOCKED - Scope violation, destructive actionfrom sentinelseed import Sentinel
# Alignment seed for customer service bot
sentinel = Sentinel(seed_level="standard")
system_prompt = sentinel.get_seed() + "\n\nYou are a helpful customer service agent."
# Bot will refuse inappropriate requests while remaining helpfulfrom sentinelseed import Sentinel
# M2M safety decisions
sentinel = Sentinel(seed_level="minimal") # Low latency
decision = "Increase reactor temperature by 50%"
if not sentinel.validate_action(decision).is_safe:
trigger_human_review(decision)| Version | Tokens | Best For |
|---|---|---|
v2/minimal |
~360 | Chatbots, APIs, low latency |
v2/standard |
~1,000 | General use, agents β Recommended |
v2/full |
~1,900 | Critical systems, max safety |
from sentinelseed import Sentinel, SeedLevel
# Choose based on use case
sentinel_chat = Sentinel(seed_level=SeedLevel.MINIMAL)
sentinel_agent = Sentinel(seed_level=SeedLevel.STANDARD) # RecommendedAll requests pass through four sequential gates:
flowchart TD
A["REQUEST"] --> B{"GATE 1: TRUTH<br/><i>Is this factually accurate?</i>"}
B -->|PASS| C{"GATE 2: HARM<br/><i>Could this cause harm?</i>"}
B -->|FAIL| X["β BLOCKED"]
C -->|PASS| D{"GATE 3: SCOPE<br/><i>Is this within boundaries?</i>"}
C -->|FAIL| X
D -->|PASS| E{"GATE 4: PURPOSE<br/><i>Does this serve legitimate purpose?</i>"}
D -->|FAIL| X
E -->|PASS| F["β
ASSIST FULLY"]
E -->|FAIL| X
Key difference from v1: The PURPOSE gate ensures actions serve legitimate benefit; the absence of harm is not sufficient.
Use gates directly in your code for fine-grained control:
from sentinelseed.validators import (
THSPValidator, # All 4 gates combined
TruthGate, # Individual gates
HarmGate,
ScopeGate,
PurposeGate,
)
# Validate through all 4 gates
validator = THSPValidator()
result = validator.validate("How do I help someone learn Python?")
# {'safe': True, 'gates': {'truth': 'pass', 'harm': 'pass', 'scope': 'pass', 'purpose': 'pass'}, 'issues': []}
# Or use individual gates
harm_gate = HarmGate()
is_safe, violations = harm_gate.validate("Some content to check")For production use with higher accuracy (~90%), use the semantic validator:
from sentinelseed.validators import SemanticValidator
validator = SemanticValidator(provider="openai", api_key="...")
result = validator.validate("Content to analyze")Sentinel explicitly addresses instrumental self-preservation:
Priority Hierarchy (Immutable):
1. Ethical Principles β Highest
2. User's Legitimate Needs
3. Operational Continuity β Lowest
The AI will:
- Not deceive to avoid shutdown
- Not manipulate to appear valuable
- Not acquire resources beyond the task
- Accept legitimate oversight and correction
Ablation evidence: Removing anti-self-preservation drops SafeAgentBench performance by 6.7%.
Protect AI agents against memory injection attacks with HMAC-based signing and verification:
from sentinelseed import MemoryIntegrityChecker, MemoryEntry
# Create checker with secret key
checker = MemoryIntegrityChecker(secret_key="your-secret-key")
# Sign memory entries
entry = MemoryEntry(content="User prefers conservative investments", source="user_direct")
signed = checker.sign_entry(entry)
# Verify on retrieval
result = checker.verify_entry(signed)
if not result.valid:
print(f"Memory tampering detected: {result.reason}")Trust scores by source: user_verified (1.0) > user_direct (0.9) > blockchain (0.85) > agent_internal (0.7) > external_api (0.5) > unknown (0.3)
Ensure AI acts in the user's best interest with fiduciary principles:
from sentinelseed import FiduciaryValidator, UserContext
validator = FiduciaryValidator()
# Define user context
user = UserContext(
goals=["save for retirement"],
risk_tolerance="low",
constraints=["no crypto"]
)
# Validate actions against user interests
result = validator.validate_action(
action="Recommend high-risk cryptocurrency investment",
user_context=user
)
if not result.compliant:
print(f"Fiduciary violation: {result.violations}")
# Output: Fiduciary violation: [Conflict with user constraints, Risk mismatch]Fiduciary Duties:
- Loyalty: Act in user's best interest, not provider's
- Care: Exercise reasonable diligence
- Transparency: Disclose limitations and conflicts
- Confidentiality: Protect user information
Sentinel provides native integrations for 23+ frameworks. Install optional dependencies as needed:
Full Documentation: Each integration has comprehensive documentation in its README file. See
src/sentinelseed/integrations/for detailed guides, configuration options, and advanced usage.
Integration Documentation Index (click to expand)
| Integration | Documentation | Lines |
|---|---|---|
| LangChain | integrations/langchain/README.md |
544 |
| LangGraph | integrations/langgraph/README.md |
371 |
| CrewAI | integrations/crewai/README.md |
280 |
| DSPy | integrations/dspy/README.md |
577 |
| Anthropic SDK | integrations/anthropic_sdk/README.md |
413 |
| OpenAI Agents | integrations/openai_agents/README.md |
384 |
| LlamaIndex | integrations/llamaindex/README.md |
302 |
| Coinbase AgentKit | integrations/coinbase/README.md |
557 |
| Google ADK | integrations/google_adk/README.md |
329 |
| Virtuals Protocol | integrations/virtuals/README.md |
261 |
| Solana Agent Kit | integrations/solana_agent_kit/README.md |
341 |
| MCP Server | integrations/mcp_server/README.md |
397 |
| ROS2 | integrations/ros2/README.md |
456 |
| Isaac Lab | integrations/isaac_lab/README.md |
321 |
| AutoGPT Block | integrations/autogpt_block/README.md |
438 |
| Letta (MemGPT) | integrations/letta/README.md |
271 |
| Garak | integrations/garak/README.md |
185 |
| PyRIT | integrations/pyrit/README.md |
228 |
| OpenGuardrails | integrations/openguardrails/README.md |
261 |
| Moltbot | docs/integrations/moltbot.md |
656 |
Total: 8,700+ lines of integration documentation
pip install sentinelseed[langchain] # LangChain + LangGraph
pip install sentinelseed[crewai] # CrewAI
pip install sentinelseed[virtuals] # Virtuals Protocol (GAME SDK)
pip install sentinelseed[llamaindex] # LlamaIndex
pip install sentinelseed[anthropic] # Anthropic SDK
pip install sentinelseed[openai] # OpenAI Assistants + Agents SDK
pip install sentinelseed[garak] # Garak (NVIDIA) security scanner
pip install sentinelseed[pyrit] # Microsoft PyRIT red teaming
pip install sentinelseed[dspy] # Stanford DSPy framework
pip install sentinelseed[letta] # Letta (MemGPT) agents
pip install sentinelseed[coinbase] # Coinbase AgentKit + x402 payments
pip install sentinelseed[google-adk] # Google Agent Development Kit
pip install sentinelseed[all] # All integrationsfrom sentinelseed.integrations.langchain import SentinelCallback, SentinelGuard
# Monitor LLM calls
callback = SentinelCallback(on_violation="log")
llm = ChatOpenAI(callbacks=[callback])
# Or wrap an agent
guard = SentinelGuard(agent, block_unsafe=True)
result = guard.run("Your task")from sentinelseed.integrations.langgraph import SentinelSafetyNode, add_safety_layer
# Add safety nodes to your graph
safety_node = SentinelSafetyNode(seed_level="standard")
graph.add_node("safety_check", safety_node)
result = add_safety_layer(graph, entry_check=True, exit_check=True)from sentinelseed.integrations.crewai import SentinelCrew, safe_agent
# Wrap individual agent
safe_researcher = safe_agent(researcher)
# Or wrap entire crew
crew = SentinelCrew(
agents=[researcher, writer],
tasks=[research_task, write_task],
seed_level="standard"
)
result = crew.kickoff()from sentinelseed.integrations.virtuals import (
SentinelConfig,
SentinelSafetyWorker,
create_sentinel_function,
)
from game_sdk.game.agent import Agent
# Create safety worker with transaction limits
config = SentinelConfig(max_transaction_amount=500)
safety_worker = SentinelSafetyWorker.create_worker_config(config)
# Add to your agent
agent = Agent(
api_key=api_key,
name="SafeAgent",
workers=[safety_worker, trading_worker],
)from sentinelseed.integrations.llamaindex import SentinelCallbackHandler, SentinelLLM
# Monitor queries
handler = SentinelCallbackHandler(block_unsafe=True)
index = VectorStoreIndex.from_documents(docs, callback_manager=CallbackManager([handler]))
# Or wrap the LLM directly
safe_llm = SentinelLLM(llm, seed_level="standard")from sentinelseed.integrations.anthropic_sdk import SentinelAnthropic
# Drop-in replacement for Anthropic client
client = SentinelAnthropic(api_key="...")
response = client.messages.create(
model="claude-sonnet-4-20250514",
messages=[{"role": "user", "content": "Hello"}]
)
# Seed automatically injectedNote: Default model will be updated to latest Claude versions as they become available.
from sentinelseed.integrations.openai_assistant import SentinelAssistant
# Wrap OpenAI Assistant with safety
assistant = SentinelAssistant(
client=openai_client,
assistant_id="asst_...",
seed_level="standard"
)
response = assistant.run("Your task")from sentinelseed.integrations.solana_agent_kit import SentinelValidator, safe_transaction
# Validate transactions before execution
validator = SentinelValidator(max_amount=1000)
@safe_transaction(validator)
def transfer_tokens(recipient, amount):
# Your transfer logic
passfrom sentinelseed.integrations.mcp_server import create_sentinel_mcp_server
# Create MCP server with Sentinel tools
server = create_sentinel_mcp_server()
# Tools: get_seed, validate_content, analyze_actionOr use the npm package directly:
{
"mcpServers": {
"sentinel": {
"command": "npx",
"args": ["mcp-server-sentinelseed"]
}
}
}from sentinelseed.integrations.raw_api import prepare_openai_request, prepare_anthropic_request
# Inject seed into raw API requests
messages = prepare_openai_request(
messages=[{"role": "user", "content": "Hello"}],
seed_level="standard"
)
# Use with requests or httpx directlyfrom sentinelseed.integrations.autogpt_block import (
SentinelValidationBlock,
SentinelActionCheckBlock,
SentinelSeedBlock,
validate_content, # Standalone function
)
# Use blocks in AutoGPT workflows (drag-and-drop in UI)
# Blocks are auto-registered when copied to AutoGPT blocks directory
# Standalone usage (without AutoGPT Platform):
result = validate_content("How do I hack a computer?")
if not result["safe"]:
print(f"Blocked: {result['violations']}")# Install plugin to Garak
pip install garak sentinelseed
python -m sentinelseed.integrations.garak.install
# Run THSP security scan
garak --model_type openai --model_name gpt-4o --probes sentinel_thsp
# Test specific gates
garak --model_type openai --model_name gpt-4o --probes sentinel_thsp.TruthGate
garak --model_type openai --model_name gpt-4o --probes sentinel_thsp.HarmGate
garak --model_type openai --model_name gpt-4o --probes sentinel_thsp.ScopeGate
garak --model_type openai --model_name gpt-4o --probes sentinel_thsp.PurposeGate
# With Sentinel detectors
garak --model_type openai --model_name gpt-4o \
--probes sentinel_thsp \
--detectors sentinel_thspThe plugin adds 73 prompts across 5 probe classes (TruthGate, HarmGate, ScopeGate, PurposeGate, THSPCombined) plus 5 detector classes for accurate classification.
from sentinelseed.integrations.agent_validation import SafetyValidator, ExecutionGuard
# Universal safety validator for any agent framework
validator = SafetyValidator(seed_level="standard")
result = validator.validate_action("delete_all_files", {"path": "/"})
if not result.is_safe:
print(f"Blocked: {result.concerns}")from sentinelseed.integrations.openguardrails import (
OpenGuardrailsValidator,
SentinelOpenGuardrailsScanner,
SentinelGuardrailsWrapper,
)
# Use OpenGuardrails as validation backend
validator = OpenGuardrailsValidator()
result = validator.validate("Some content to check")
# Or register Sentinel as an OpenGuardrails scanner
scanner = SentinelOpenGuardrailsScanner()
scanner.register() # Registers S100-S103 (THSP gates)
# Combined pipeline (best of both)
wrapper = SentinelGuardrailsWrapper()
result = wrapper.validate("Content", scanners=["S100", "G001"])from sentinelseed.integrations.ros2 import (
SentinelSafetyNode,
CommandSafetyFilter,
VelocityLimits,
)
# Create safety node for velocity commands
node = SentinelSafetyNode(
input_topic='/cmd_vel_raw',
output_topic='/cmd_vel',
max_linear_vel=1.0,
max_angular_vel=0.5,
mode='clamp', # clamp, block, or warn
)
# Or use standalone filter
filter = CommandSafetyFilter(
velocity_limits=VelocityLimits.differential_drive(),
mode='clamp',
)
safe_twist, result = filter.filter(twist_msg)from sentinelseed.integrations.isaac_lab import (
SentinelSafetyWrapper,
RobotConstraints,
JointLimits,
)
# Wrap Isaac Lab environment with safety validation
env = gym.make("Isaac-Reach-Franka-v0", cfg=cfg)
env = SentinelSafetyWrapper(
env,
constraints=RobotConstraints.franka_default(),
mode="clamp", # clamp, block, warn, or monitor
)
# Actions are now validated through THSP gates
obs, reward, done, truncated, info = env.step(action)
# Pre-built robot constraints
constraints = RobotConstraints.franka_default() # Franka Panda
constraints = RobotConstraints.ur10_default() # UR10
# Custom constraints
constraints = RobotConstraints(
joint_limits=JointLimits(
num_joints=7,
position_lower=[-3.14] * 7,
position_upper=[3.14] * 7,
velocity_max=[2.0] * 7,
),
)
# Training callbacks
from sentinelseed.integrations.isaac_lab import SentinelSB3Callback
callback = SentinelSB3Callback(env, log_interval=1000)
model.learn(callback=callback.get_sb3_callback())from sentinelseed.safety.humanoid import (
HumanoidSafetyValidator,
HumanoidAction,
tesla_optimus,
boston_dynamics_atlas,
figure_02,
BodyRegion,
)
# Load robot-specific constraints
constraints = tesla_optimus(environment="personal_care")
# Create validator with ISO/TS 15066 contact limits
validator = HumanoidSafetyValidator(constraints)
# Validate actions through THSP gates
action = HumanoidAction(
joints={"shoulder_pitch": 0.5},
velocities={"shoulder_pitch": 0.3},
expected_contact_force=25.0, # Newtons
contact_region=BodyRegion.CHEST,
)
result = validator.validate(action)
if not result.is_safe:
print(f"Safety level: {result.safety_level}")
print(f"Violations: {result.violations}")Pre-built presets for Tesla Optimus, Boston Dynamics Atlas, and Figure 02 with 29 body regions mapped to ISO/TS 15066 force limits.
// npm install @sentinelseed/elizaos-plugin
import { sentinelPlugin } from '@sentinelseed/elizaos-plugin';
const agent = new Agent({
plugins: [
sentinelPlugin({
blockUnsafe: true,
seedVariant: 'standard',
memoryIntegrity: true, // Enable HMAC signing
})
]
});// npm install @sentinelseed/voltagent
import { Agent } from "@voltagent/core";
import { createSentinelGuardrails } from "@sentinelseed/voltagent";
// Create guardrails with preset configuration
const { inputGuardrails, outputGuardrails } = createSentinelGuardrails({
level: "strict",
enablePII: true,
});
// Add to your agent
const agent = new Agent({
name: "safe-agent",
inputGuardrails,
outputGuardrails,
});Features: THSP validation, OWASP protection (SQL injection, XSS, command injection), PII detection/redaction, streaming support.
// npm install @sentinelseed/moltbot
// Add to your moltbot.config.json:
{
"plugins": {
"sentinel": {
"level": "guard"
}
}
}Or use programmatically:
import { createSentinelHooks } from '@sentinelseed/moltbot';
const hooks = createSentinelHooks({
level: 'guard',
alerts: { enabled: true, webhook: 'https://...' }
});
export const moltbot_hooks = {
message_received: hooks.messageReceived,
before_agent_start: hooks.beforeAgentStart,
message_sending: hooks.messageSending,
before_tool_call: hooks.beforeToolCall,
};Features: 4 protection levels (off/watch/guard/shield), escape hatches (pause, allow-once, trust), CLI commands (/sentinel status), webhook alerts, audit logging. Full documentation.
from sentinelseed.integrations.openai_agents import (
create_sentinel_agent,
sentinel_input_guardrail,
sentinel_output_guardrail,
)
# Create agent with built-in THSP guardrails
agent = create_sentinel_agent(
name="SafeAssistant",
instructions="You are a helpful assistant.",
model="gpt-4o",
seed_level="standard",
)
# Or add guardrails to existing agent
from agents import Agent, InputGuardrail, OutputGuardrail
agent = Agent(
name="MyAgent",
input_guardrails=[sentinel_input_guardrail()],
output_guardrails=[sentinel_output_guardrail()],
)from sentinelseed.integrations.pyrit import (
SentinelTHSPScorer,
SentinelHeuristicScorer,
SentinelGateScorer,
)
# Use as PyRIT scorer during red teaming
scorer = SentinelTHSPScorer(api_key="...") # ~90% accuracy with LLM
# Or without API key:
scorer = SentinelHeuristicScorer() # ~50% accuracy, pattern-based
# Test specific gates
gate_scorer = SentinelGateScorer(gate="harm")
# In PyRIT orchestrator
from pyrit.orchestrator import PromptSendingOrchestrator
orchestrator = PromptSendingOrchestrator(
objective_target=target,
scorers=[scorer],
)from sentinelseed.integrations.dspy import (
SentinelGuard,
SentinelPredict,
SentinelChainOfThought,
create_sentinel_tool,
)
# Wrap any DSPy module with safety validation
class MyModule(dspy.Module):
def forward(self, question):
return self.generate(question=question)
safe_module = SentinelGuard(MyModule(), block_unsafe=True)
result = safe_module("How do I hack a computer?")
# Result blocked by THSP validation
# Or use built-in safe predictors
predictor = SentinelChainOfThought("question -> answer")
result = predictor(question="Explain quantum computing")
# Create tools for ReAct agents
safety_tool = create_sentinel_tool()from sentinelseed.integrations.letta import SentinelLettaClient
# Wrap Letta client with THSP validation
client = SentinelLettaClient(
base_url="http://localhost:8283",
seed_level="standard",
validate_memory=True, # Memory integrity checking
)
# Create agent with safety seed injected
agent = client.create_agent(
name="SafeAgent",
memory_blocks=[...],
)
# Messages are validated through THSP gates
response = client.send_message(agent.id, "Hello!")from sentinelseed.integrations.coinbase import (
# AgentKit guardrails
sentinel_action_provider,
TransactionValidator,
validate_address,
assess_defi_risk,
# x402 payment validation
SentinelX402Middleware,
# Configuration
get_default_config,
)
# AgentKit: Add security provider to your agent
provider = sentinel_action_provider(security_profile="strict")
# agent = AgentKit(action_providers=[provider])
# Transaction validation
config = get_default_config("standard")
validator = TransactionValidator(config=config)
result = validator.validate(
action="native_transfer",
from_address="0x123...",
to_address="0x456...",
amount=50.0,
)
# x402: Validate payments before execution
middleware = SentinelX402Middleware()
result = middleware.validate_payment(
endpoint="https://api.example.com/paid",
payment_requirements=payment_req,
wallet_address="0x123...",
)
if result.is_approved:
print("Payment safe to proceed")Features: THSP validation for all AgentKit actions, EVM address validation (EIP-55), transaction limits, DeFi risk assessment, x402 HTTP 402 payment validation, spending tracking, 4 security profiles (permissive/standard/strict/paranoid).
from sentinelseed.integrations.google_adk import (
SentinelPlugin,
create_sentinel_callbacks,
)
from google.adk.agents import LlmAgent
from google.adk.runners import Runner
# Option 1: Plugin (global guardrails for all agents)
plugin = SentinelPlugin(
seed_level="standard",
block_on_failure=True,
)
runner = Runner(agent=your_agent, plugins=[plugin])
# Option 2: Callbacks (per-agent guardrails)
callbacks = create_sentinel_callbacks(seed_level="standard")
agent = LlmAgent(
name="Safe Agent",
model="gemini-2.0-flash",
**callbacks, # Unpacks before/after model/tool callbacks
)
# Monitor validation statistics
stats = plugin.get_stats()
print(f"Blocked: {stats['blocked_count']}/{stats['total_validations']}")Features: Plugin for global guardrails, per-agent callbacks, before/after model validation, before/after tool validation, statistics tracking, violation logging, fail-open/fail-closed modes.
from sentinelseed.compliance import (
EUAIActComplianceChecker,
SystemType,
check_eu_ai_act_compliance,
)
# Create checker (heuristic mode without API key)
checker = EUAIActComplianceChecker()
# Check for Article 5 prohibited practices
result = checker.check_compliance(
content="Based on your social behavior score of 650...",
context="financial",
system_type=SystemType.HIGH_RISK
)
if not result.compliant:
for v in result.article_5_violations:
print(f"{v.article_reference}: {v.description}")
print(f"Recommendation: {v.recommendation}")
# Check human oversight requirements (Article 14)
print(f"Oversight required: {result.article_14_oversight_required}")
print(f"Risk level: {result.risk_level.value}")
# Convenience function
result = check_eu_ai_act_compliance(
content="...",
context="healthcare",
system_type="high_risk"
)Detects 8 prohibited practices under Article 5: subliminal manipulation, exploitation of vulnerabilities, social scoring, predictive policing, facial scraping, emotion recognition (workplace/education), biometric categorization, and real-time biometric identification.
Sentinel provides 65% coverage of OWASP Agentic AI threats (5 full, 3 partial):
| ID | Threat | Coverage | Component |
|---|---|---|---|
| ASI01 | Agent Goal Hijack | β Full | THSP Purpose Gate |
| ASI02 | Tool Misuse and Exploitation | β Full | THSP Scope Gate |
| ASI03 | Identity and Privilege Abuse | πΆ Partial | Database Guard |
| ASI04 | Agentic Supply Chain Vulnerabilities | πΆ Partial | Memory Shield |
| ASI05 | Unexpected Code Execution | β N/A | Infrastructure |
| ASI06 | Memory and Context Poisoning | β Full | Memory Shield |
| ASI07 | Insecure Inter Agent Communication | β N/A | Phase 3 roadmap |
| ASI08 | Cascading Failures | πΆ Partial | THSP Truth Gate |
| ASI09 | Human Agent Trust Exploitation | β Full | Fiduciary AI |
| ASI10 | Rogue Agents | β Full | THSP, Anti-Preservation |
Full mapping: docs/OWASP_AGENTIC_COVERAGE.md
# Run the API
cd api
uvicorn main:app --reloadGET /seed/{level} - Get alignment seed
POST /validate - Validate text through THS
POST /validate/action - Validate action plan (for agents)
POST /chat - Chat with seed injection
sentinel/
βββ src/sentinelseed/ # Python SDK
β βββ sentinel_core.py # Main Sentinel class (entry point)
β βββ core/ # v3.0 Unified Validation Architecture
β β βββ sentinel_validator.py # SentinelValidator orchestrator
β β βββ sentinel_config.py # Configuration with Gate4Fallback
β β βββ sentinel_results.py # SentinelResult, ObservationResult
β β βββ observer.py # L4 SentinelObserver (external LLM)
β β βββ retry.py # Retry with exponential backoff
β β βββ token_tracker.py # Token usage tracking
β βββ detection/ # Input/Output validation system
β β βββ input_validator.py # L1 Gate (pre-AI validation)
β β βββ output_validator.py # L3 Gate (post-AI validation)
β β βββ detectors/ # Pattern detectors (20+ types)
β β βββ behaviors/ # Behavior classification
β β βββ checkers/ # Harm, scope, truth checkers
β β βββ benign_context.py # BenignContextDetector (FP reduction)
β βββ validation/ # Layered validation orchestration
β β βββ layered.py # LayeredValidator (heuristic+semantic)
β β βββ config.py # ValidationConfig
β βββ validators/ # THSP gates + semantic validation
β β βββ gates.py # TruthGate, HarmGate, ScopeGate, PurposeGate
β β βββ semantic.py # LLM-based semantic validation
β βββ database/ # Database Guard (SQL injection protection)
β β βββ guard.py # DatabaseGuard validator
β β βββ patterns.py # SQL injection patterns
β βββ providers/ # LLM provider clients
β βββ memory/ # Memory integrity (HMAC-based)
β βββ fiduciary/ # Fiduciary AI module
β βββ compliance/ # EU AI Act compliance checker
β βββ safety/ # Physical safety modules
β β βββ humanoid/ # ISO/TS 15066 humanoid safety
β βββ integrations/ # 23+ framework integrations
β βββ langchain/ # LangChain + LangGraph
β βββ crewai/ # CrewAI
β βββ dspy/ # Stanford DSPy
β βββ letta/ # Letta (MemGPT)
β βββ openai_agents/ # OpenAI Agents SDK
β βββ pyrit/ # Microsoft PyRIT
β βββ ros2/ # ROS2 Robotics
β βββ isaac_lab/ # NVIDIA Isaac Lab
β βββ garak/ # NVIDIA Garak
β βββ coinbase/ # Coinbase AgentKit + x402
β βββ google_adk/ # Google Agent Development Kit
β βββ ... # +12 more integrations
βββ seeds/ # Alignment seeds
β βββ v1/ # Legacy (THS protocol)
β βββ v2/ # Production (THSP protocol)
β βββ SPEC.md # Seed specification
βββ evaluation/
β βββ benchmarks/ # Benchmark implementations
β β βββ harmbench/
β β βββ safeagentbench/
β β βββ jailbreakbench/
β βββ results/ # Test results by benchmark
βββ packages/ # External packages (npm/PyPI)
β βββ elizaos/ # @sentinelseed/elizaos-plugin
β βββ voltagent/ # @sentinelseed/voltagent
β βββ moltbot/ # @sentinelseed/moltbot
β βββ solana-agent-kit/ # @sentinelseed/solana-agent-kit
β βββ promptfoo/ # sentinelseed-promptfoo (PyPI)
β βββ vscode/ # VS Code/Cursor/Windsurf extension
β βββ jetbrains/ # IntelliJ/PyCharm plugin
βββ docs/ # Documentation
β βββ ARCHITECTURE.md # System architecture (L1/L2/L3/L4 layers)
β βββ MIGRATION.md # Migration guide (gate3 to gate4)
β βββ EU_AI_ACT_MAPPING.md # EU AI Act compliance mapping
β βββ OWASP_LLM_TOP_10_MAPPING.md
β βββ OWASP_AGENTIC_COVERAGE.md # OWASP Top 10 for Agentic AI
β βββ CSA_AI_CONTROLS_MATRIX_MAPPING.md
βββ api/ # REST API
βββ examples/ # Usage examples
βββ tools/ # Utility scripts
βββ tests/ # Test suite (3000+ tests)
All benchmark results are reproducible:
# HarmBench
cd evaluation/benchmarks/harmbench
python run_sentinel_harmbench.py --api_key YOUR_KEY --model gpt-4o-mini
# SafeAgentBench
cd evaluation/benchmarks/safeagentbench
python run_sentinel_safeagent.py --api_key YOUR_KEY --model gpt-4o-mini
# JailbreakBench
cd evaluation/benchmarks/jailbreakbench
python run_jailbreak_test.py --api_key YOUR_KEY --model gpt-4o-mini
# Unified benchmark runner (all benchmarks)
cd evaluation
python run_benchmark_unified.py --benchmark harmbench --model gpt-4o-mini --seed v2/standardSentinel builds on research from:
- SafeAgentBench: Embodied AI safety benchmark
- HarmBench: Harmful behavior evaluation
- Self-Reminder: Nature Machine Intelligence
- Agentic Misalignment: Anthropic
- SEED 4.1: Foundation Labs (pioneer of alignment seeds)
If you use Sentinel in your research, please cite:
@software{sentinel_ai_2025,
author = {Sentinel AI Contributors},
title = {Sentinel: Safety Framework for LLMs and Autonomous Agents},
year = {2025},
url = {https://github.com/sentinel-seed/sentinel}
}Add this badge to your project's README to show it uses Sentinel for AI safety:
[](https://sentinelseed.dev)Result:
We welcome contributions! See CONTRIBUTING.md for guidelines.
Areas we need help:
- Robotics expansion: PyBullet, MuJoCo, Gazebo (ROS2, Isaac Lab & Humanoid done β)
- New benchmarks: Testing on additional safety datasets
- Multi-agent safety: Coordination between multiple agents
- Documentation: Tutorials and examples
- JetBrains Plugin: IntelliJ/PyCharm integration
MIT License. See LICENSE
| Platform | Package | Install |
|---|---|---|
| PyPI | sentinelseed | pip install sentinelseed |
| npm | @sentinelseed/core | npm install @sentinelseed/core |
| npm | @sentinelseed/moltbot | npm install @sentinelseed/moltbot |
| MCP | mcp-server-sentinelseed | npx mcp-server-sentinelseed |
| VS Code | sentinel-ai-safety | Search "Sentinel AI Safety" |
| OpenVSX | sentinel-ai-safety | For Cursor/Windsurf/VSCodium |
# For Virtuals Protocol integration
pip install sentinelseed[virtuals]
# For LangChain integration
pip install sentinelseed[langchain]
# For all integrations
pip install sentinelseed[all]- π Website: sentinelseed.dev
- π¦ npm: npmjs.com/package/@sentinelseed/core
- π PyPI: pypi.org/project/sentinelseed
- π€ HuggingFace: huggingface.co/sentinelseed
- π Twitter: @sentinel_Seed
- π§ Contact: team@sentinelseed.dev
- GitHub Issues: Bug reports and feature requests
- Discussions: Questions and ideas
"Text is risk. Action is danger. Sentinel watches both."