Skip to content

Conversation

@Pouyanpi
Copy link
Collaborator

@Pouyanpi Pouyanpi commented Jan 9, 2026

Description

Implements lazy initialization of embedding indexes (user_message_index, bot_message_index, flows_index, instruction_flows_index) so FastEmbed is only loaded when semantic search is actually needed.

Previously, embedding indexes were eagerly initialized at LLMRails construction time, causing FastEmbed models to be downloaded even for simple configurations that only use input/output rails or passthrough mode.

Behavior by Configuration

Configuration Before After
Input rails only (e.g., self check input) Loads FastEmbed at init No FastEmbed loaded
Output rails only (e.g., self check output) Loads FastEmbed at init No FastEmbed loaded
Input + Output rails Loads FastEmbed at init No FastEmbed loaded
Passthrough mode Loads FastEmbed at init No FastEmbed loaded
Dialog rails with user messages Loads FastEmbed at init Loads FastEmbed on first generate()
RAG with knowledge base Loads FastEmbed at init Unchanged (RAG has separate init)

Implementation

  • Removed eager init() call from LLMGenerationActions.__init__()
  • Added _ensure_*_index() helper methods for lazy initialization
  • Actions call ensure methods before using indexes
  • Same pattern applied to V2.x in LLMGenerationActionsV2dotx

Test Plan

  • Added comprehensive tests for V1.0 (tests/test_actions_llm_embedding_lazy_init.py)
  • Added comprehensive tests for V2.x (tests/v2_x/test_llm_embedding_lazy_init.py)
  • Tests verify indexes are None at initialization
  • Tests verify FastEmbed cache stays empty for simple configs
  • Tests verify FastEmbed IS downloaded when dialog rails are used

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

Summary

This PR implements lazy initialization of embedding indexes in the LLM Guardrails framework to improve startup performance. Previously, embedding indexes for user/bot messages and flows were eagerly initialized at LLMGenerationActions construction time, causing FastEmbed models to be downloaded even for simple configurations (input/output rails, passthrough mode) that never use semantic search.

Key Changes

nemoguardrails/actions/llm/generation.py:

  • Removed threading.Thread wrapper and synchronous call to self.init() in __init__()
  • Moved _process_flows() call from async init() to synchronous __init__()
  • Added three helper methods: _ensure_user_message_index(), _ensure_bot_message_index(), _ensure_flows_index() for lazy initialization
  • Updated all actions that use indexes to call the appropriate _ensure_* method before accessing the index
  • Indexes are now initialized only when first needed during action execution

nemoguardrails/actions/v2_x/generation.py:

  • Added V2.x-specific implementations of _ensure_flows_index() and _ensure_instruction_flows_index()
  • Added lazy initialization calls to several actions: _collect_user_intent_and_examples(), generate_flow_from_instructions(), generate_flow_from_name(), generate_flow_continuation(), generate_value()

Tests: Comprehensive test coverage added for both V1.0 and V2.x versions, verifying:

  • Indexes are None at initialization time
  • FastEmbed cache is empty for simple configs (input/output rails, passthrough)
  • FastEmbed IS loaded on first generate() call when dialog rails are used

Behavior Changes

The PR successfully achieves its stated goal: FastEmbed is now only loaded when semantic search is actually needed, avoiding unnecessary downloads for simple configurations.

Issues Found

Critical Race Condition Issue in V2.x (Confidence impact: -2 points):
The V2.x implementation has concurrent initialization problems:

  • Both _ensure_flows_index() and _ensure_instruction_flows_index() can trigger concurrent calls to _init_flows_index() if called from multiple async tasks
  • Since _init_flows_index() performs expensive operations (embedding builds), concurrent execution could waste resources and cause unpredictable behavior
  • The ensure methods don't use locks or initialization guards to prevent redundant work

Base Class Inheritance Concern (Confidence impact: -1 point):

  • Similar concurrent access issue exists in base class _ensure_flows_index() method
  • Not immediately problematic in current single-threaded execution model but creates technical debt

Confidence Score: 2/5

  • Critical race condition vulnerability in V2.x implementation; concurrent initialization of expensive index building operations not prevented, could cause incorrect state or resource leaks
  • The PR successfully implements lazy loading as intended and includes comprehensive tests. However, the V2.x implementation introduces a critical concurrency issue where both _ensure_flows_index() and _ensure_instruction_flows_index() can trigger concurrent calls to _init_flows_index() without any synchronization. Since _init_flows_index() performs expensive async operations (embedding builds), this could cause: (1) duplicate initialization work, (2) resource exhaustion, (3) race conditions if state is checked between await points. The tests don't catch this because they're single-threaded. Base class has similar issue. This needs to be fixed before merge.
  • nemoguardrails/actions/v2_x/generation.py - requires synchronization mechanisms to prevent concurrent index initialization; nemoguardrails/actions/llm/generation.py - base class also needs concurrent access protection

Important Files Changed

File Analysis

Filename Score Overview
nemoguardrails/actions/llm/generation.py 3/5 Lazy initialization of embedding indexes removed eager loading at construction time. Mostly correct but has a subtle issue: the async init() method was removed from constructor, but this method was previously being called in a thread to initialize indexes synchronously during construction. The new approach defers initialization but doesn't handle all edge cases properly.
nemoguardrails/actions/v2_x/generation.py 2/5 V2.x override of lazy initialization has a critical issue: the _ensure_instruction_flows_index() method calls _init_flows_index() which initializes BOTH flows_index and instruction_flows_index, but the ensure method only checks if instruction_flows_index is None. This could cause race conditions or incorrect initialization states in concurrent scenarios.
tests/test_actions_llm_embedding_lazy_init.py 4/5 Test coverage for V1.0 is comprehensive and well-structured. Tests verify that indexes are not initialized at construction time and that FastEmbed is not downloaded for simple configurations. Tests correctly verify that FastEmbed IS downloaded when dialog rails are used.
tests/v2_x/test_llm_embedding_lazy_init.py 4/5 Test coverage for V2.x appears sound, testing initialization states and FastEmbed cache behavior. However, tests may not catch the race condition issue in the ensure_instruction_flows_index implementation due to single-threaded nature of tests.

Sequence Diagram

sequenceDiagram
    participant User
    participant LLMRails
    participant LLMGenerationActions
    participant IndexInit
    participant EmbeddingProvider

    User->>LLMRails: __init__(config)
    LLMRails->>LLMGenerationActions: __init__(config)
    Note over LLMGenerationActions: Before: async init() called in thread<br/>After: No init() call - indexes are None

    User->>LLMRails: generate(messages)
    LLMRails->>LLMGenerationActions: generate_user_intent(events)
    
    alt Indexes needed
        LLMGenerationActions->>LLMGenerationActions: _ensure_user_message_index()
        alt Index not initialized
            LLMGenerationActions->>IndexInit: _init_user_message_index()
            IndexInit->>EmbeddingProvider: get_embedding_search_provider_instance()
            EmbeddingProvider-->>IndexInit: EmbeddingsIndex instance
            IndexInit->>EmbeddingProvider: add_items(items)
            IndexInit->>EmbeddingProvider: build()
            Note over EmbeddingProvider: FastEmbed loaded here
            IndexInit-->>LLMGenerationActions: index initialized
        end
        LLMGenerationActions->>LLMGenerationActions: Use user_message_index
    else Indexes not needed
        Note over LLMGenerationActions: For simple input/output rails:<br/>Indexes remain None,<br/>FastEmbed not loaded
    end

Loading

@Pouyanpi Pouyanpi marked this pull request as draft January 9, 2026 14:39
@Pouyanpi Pouyanpi marked this pull request as ready for review January 9, 2026 15:30
@github-actions
Copy link
Contributor

github-actions bot commented Jan 9, 2026

Documentation preview

https://nvidia-nemo.github.io/Guardrails/review/pr-1572

Uses asyncio.Lock with double-checked locking pattern to prevent
race conditions when multiple async tasks call _ensure_*_index()
methods concurrently.

Also fixes tests to use TestChat (with FakeLLM) instead of creating
LLMRails directly, and adds skip conditions for fastembed tests.
@Pouyanpi Pouyanpi force-pushed the perf/lazy-init-indices branch 2 times, most recently from 3e64365 to aa30386 Compare January 9, 2026 15:59
@Pouyanpi Pouyanpi force-pushed the perf/lazy-init-indices branch from aa30386 to 46e4df9 Compare January 9, 2026 16:34
@codecov
Copy link

codecov bot commented Jan 9, 2026

Codecov Report

❌ Patch coverage is 82.22222% with 8 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
nemoguardrails/actions/v2_x/generation.py 61.11% 7 Missing ⚠️
nemoguardrails/actions/llm/generation.py 96.29% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@Pouyanpi
Copy link
Collaborator Author

Pouyanpi commented Jan 9, 2026

@greptileai re-review

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR successfully implements lazy initialization of embedding indexes to improve startup performance and reduce unnecessary FastEmbed downloads.

Key Changes

V1.0 (nemoguardrails/actions/llm/generation.py)

  • Removed eager init() call from __init__ that was running in a separate thread
  • Added _init_lock (asyncio.Lock) for thread-safe lazy initialization
  • Implemented three _ensure_*_index() methods using double-checked locking pattern
  • Updated all action methods (generate_user_intent, generate_next_step, generate_bot_message, generate_value, generate_intent_steps_message) to call ensure methods before index usage
  • Removed unnecessary imports: threading, check_sync_call_from_async_loop, get_or_create_event_loop
  • Added early return condition in _init_bot_message_index() when no user_messages exist

V2.x (nemoguardrails/actions/v2_x/generation.py)

  • Implemented _ensure_flows_index() and _ensure_instruction_flows_index() methods
  • Uses hasattr checks for instruction_flows_index since it's created dynamically
  • Updated all methods that use flow indexes to call ensure methods first

Behavior Changes

  • Input/output rails only: FastEmbed no longer loaded at initialization
  • Passthrough mode: No embedding index initialization
  • Dialog rails with user messages: FastEmbed loaded on first generate() call
  • All existing functionality preserved, just deferred until actually needed

Test Coverage

  • Comprehensive tests verify indexes are None after initialization
  • Tests verify FastEmbed cache stays empty for simple configurations
  • Tests verify proper lazy loading when dialog rails are used
  • Concurrent initialization tests ensure thread-safety of double-checked locking

Confidence Score: 5/5

  • Safe to merge - well-implemented lazy initialization with proper thread-safety and comprehensive test coverage.
  • The implementation uses the standard double-checked locking pattern with asyncio.Lock correctly. All action methods that access embedding indexes have been updated with appropriate ensure*_index() calls. The code properly handles edge cases where indexes may remain None after initialization (e.g., empty message lists). Comprehensive test coverage validates both the lazy behavior and thread-safety. No breaking changes to existing functionality.
  • No files require special attention

Important Files Changed

File Analysis

Filename Score Overview
nemoguardrails/actions/llm/generation.py 5/5 Implemented lazy initialization for embedding indexes with proper double-checked locking pattern. Removed eager init() call and unnecessary imports (threading, patch_asyncio). All action methods properly call ensure methods before index usage.
nemoguardrails/actions/v2_x/generation.py 5/5 Added lazy initialization for V2.x with _ensure_flows_index() and _ensure_instruction_flows_index() methods. Properly handles instruction_flows_index attribute that may not exist initially using hasattr checks.
tests/test_actions_llm_embedding_lazy_init.py 5/5 Comprehensive test coverage for V1.0 lazy initialization including verification that indexes are None at init, FastEmbed cache remains empty for simple configs, and indexes are initialized on first use.
tests/v2_x/test_llm_embedding_lazy_init.py 5/5 Test coverage for V2.x lazy initialization including passthrough mode and dialog configurations. Verifies indexes and instruction_flows_index remain uninitialized for simple configs.

Sequence Diagram

sequenceDiagram
    participant User
    participant LLMRails
    participant LLMGenerationActions
    participant EmbeddingIndex

    Note over LLMRails,LLMGenerationActions: Before PR: Eager Initialization
    User->>LLMRails: __init__(config)
    LLMRails->>LLMGenerationActions: __init__()
    LLMGenerationActions->>LLMGenerationActions: init() in thread
    LLMGenerationActions->>EmbeddingIndex: _init_user_message_index()
    EmbeddingIndex-->>LLMGenerationActions: FastEmbed loaded
    LLMGenerationActions->>EmbeddingIndex: _init_bot_message_index()
    LLMGenerationActions->>EmbeddingIndex: _init_flows_index()
    LLMGenerationActions-->>LLMRails: Ready (all indexes loaded)

    Note over LLMRails,LLMGenerationActions: After PR: Lazy Initialization
    User->>LLMRails: __init__(config)
    LLMRails->>LLMGenerationActions: __init__()
    Note over LLMGenerationActions: No init() call!
    LLMGenerationActions-->>LLMRails: Ready (no indexes loaded)
    
    User->>LLMRails: generate(messages)
    LLMRails->>LLMGenerationActions: generate_user_intent()
    LLMGenerationActions->>LLMGenerationActions: _ensure_user_message_index()
    alt user_message_index is None and user_messages exist
        LLMGenerationActions->>LLMGenerationActions: acquire _init_lock
        LLMGenerationActions->>LLMGenerationActions: check again if None
        LLMGenerationActions->>EmbeddingIndex: _init_user_message_index()
        EmbeddingIndex-->>LLMGenerationActions: FastEmbed loaded (lazy)
    end
    LLMGenerationActions->>EmbeddingIndex: search(text)
    EmbeddingIndex-->>LLMGenerationActions: results
    LLMGenerationActions-->>LLMRails: intent

    Note over User,EmbeddingIndex: Input/Output Rails Only: No Indexes Loaded
    User->>LLMRails: __init__(config with input rails)
    LLMRails->>LLMGenerationActions: __init__()
    LLMGenerationActions-->>LLMRails: Ready (no indexes)
    User->>LLMRails: generate(messages)
    Note over LLMGenerationActions: No embedding indexes needed
    LLMRails-->>User: response (FastEmbed never loaded)
Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants