Iris: Global Search intent routing with SetFit classifier and async webhook#523
Iris: Global Search intent routing with SetFit classifier and async webhook#523Nayer-kotry wants to merge 19 commits intomainfrom
Iris: Global Search intent routing with SetFit classifier and async webhook#523Conversation
- SetFit model trained on 2,506 examples (1,253 per class) using paraphrase-multilingual-MiniLM-L12-v2 as backbone - Exported to dynamic INT8-quantized ONNX for fast CPU inference (~2ms) - Singleton classifier loaded via lru_cache with TRIGGER_AI fallback when model files are absent - .gitattributes tracks *.onnx via Git LFS - .gitignore excludes intermediate training artifacts - .secrets.baseline updated with tokenizer.json false positive
…time Tokenizer files (tokenizer.json, tokenizer_config.json, special_tokens_map.json) are identical to the public paraphrase-multilingual-MiniLM-L12-v2 tokenizer on HuggingFace Hub. Committing them caused a detect-secrets false positive that required modifying .secrets.baseline — a security risk. Instead, the classifier now loads the tokenizer from the Hub using the base_model name stored in meta.json. HuggingFace caches the download locally after the first use.
…t routing and refusal suppression - Pipeline skips HyDE and LLM entirely when intent is SKIP_AI, returning sources only with answer=null - LLM refusal safety net: regex catches "not covered/mentioned/provided" strings and converts to null - Updated all prompts from "lecture content" to "course content" - implementation_id changed from lecture_search_answer_pipeline to global_search_pipeline
- Callback URL: {base_url}/api/iris/internal/pipelines/global-search/runs/{run_id}/status
- thinking() logs intent decision and sends in_progress to Artemis
- done() sends answer (str or null) and sources list
- GlobalSearchStatusUpdateDTO with answer, sources, and result fields
- POST /api/v1/pipelines/global-search/run replaces the old synchronous ask endpoint - search.py stripped back to only the synchronous lecture search endpoint - lecture_global_search_retrieval.py updated to use global_search_pipeline ID - application.example.yml renamed lecture_search_answer_pipeline to global_search_pipeline
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR introduces intent-based global search by adding a trained ONNX intent classifier, restructuring DTOs from lecture-specific to global search, updating the pipeline to conditionally skip AI processing based on intent, adding a new ChangesGlobal Search Intent Classification & Restructuring
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Pylint (4.0.5)iris/src/iris/web/routers/pipelines.pyTip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 60 minutes.Comment |
Iris: Global Search intent routing with SetFit classifier and async webhook
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
iris/src/iris/domain/search/lecture_search_dto.py (1)
49-60: 🛠️ Refactor suggestion | 🟠 Major
initial_stagesis declared in the request DTO but never consumed by the pipeline.
GlobalSearchRequestDTO.initial_stagesis defined here, butGlobalSearchCallback.__init__(lines 538–551 iniris/src/iris/web/status/status_update.py) accepts onlyrun_idandbase_url—notinitial_stages. It builds a fixed[Thinking, Responding]stages list with no support for prepending caller-provided stages, unlike every other callback (CourseChatStatusCallback,ExerciseChatStatusCallback,LectureChatCallback, etc.), which all accept and useinitial_stages.In
run_global_search_pipeline_worker(pipelines.py:476–486), the callback is instantiated without passinginitial_stages, whereas all other pipeline workers passinitial_stages=dto.initial_stages.Either:
- Add
initial_stagesparameter toGlobalSearchCallback.__init__and pass it when instantiating the callback, or- Remove the field from
GlobalSearchRequestDTOto avoid a misleading API contract.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@iris/src/iris/domain/search/lecture_search_dto.py` around lines 49 - 60, GlobalSearchRequestDTO.initial_stages is never used; update GlobalSearchCallback.__init__ to accept an initial_stages: List[StageDTO] (or Optional[List[StageDTO]]) parameter, store it on the instance, and use it when constructing the stages list instead of always using a fixed [Thinking, Responding] sequence; then update run_global_search_pipeline_worker to pass dto.initial_stages into GlobalSearchCallback(...) (mirror how CourseChatStatusCallback/ExerciseChatStatusCallback/LectureChatCallback handle initial_stages) so caller-provided stages are prepended/used.
🧹 Nitpick comments (9)
iris/src/iris/web/status/status_update.py (1)
565-579:done()override silently swallows parent kwargs and ignores delivery success.Two small concerns worth tightening:
- The
**_kwargscatch-all means a future caller invoking the parent's contract (done(message=..., final_result=..., tokens=...)etc.) on aGlobalSearchCallbackinstance will have their args silently dropped — there's no signal that those are unsupported here. Either declare the parameters you actually accept (answer,sources,tokens) without**_kwargs, or assert/log when unexpected kwargs arrive.self.on_status_update()returns a bool indicating delivery success, but it's discarded. The base class uses that signal to decide whether to clear transient fields and to drive retry semantics; here a failed POST silently leaves the stages marked DONE without any retry path.Also consider documenting in the docstring that this collapses both Thinking + Responding to
DONEin a single call (intentional for the skip_ai path wherethinking()was never invoked, so a NOT_STARTED → DONE transition is expected).♻️ Suggested signature/return tightening
- def done(self, answer=None, sources=None, tokens=None, **_kwargs): - """Advance to responding stage, attach result, and notify Artemis.""" + def done( + self, + answer: Optional[str] = None, + sources: Optional[List["LectureSearchResultDTO"]] = None, + tokens: Optional[List[TokenUsageDTO]] = None, + ) -> bool: + """Advance to responding stage, attach result, and notify Artemis. + + Collapses both Thinking and Responding stages to DONE in a single + webhook (intentional for the skip_ai path that bypasses thinking()). + """ logger.info( "[ask] → callback: done answer=%s sources=%d", "present" if answer else "null", len(sources) if sources else 0, ) self.stage.state = StageStateEnum.DONE self.stage = self.get_next_stage() if self.stage: self.stage.state = StageStateEnum.DONE self.status.answer = answer self.status.sources = sources or [] self.status.tokens = tokens or [] - self.on_status_update() + return self.on_status_update()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@iris/src/iris/web/status/status_update.py` around lines 565 - 579, The done() override on GlobalSearchCallback currently swallows unexpected kwargs and ignores delivery success; change the signature to accept only the explicit params (answer=None, sources=None, tokens=None) instead of **_kwargs or, if you must accept extras, explicitly assert/log when unexpected kwargs are passed so callers are not silently dropped; capture the boolean return from self.on_status_update() and propagate or act on it (e.g., if delivery failed, do not permanently mark StageStateEnum.DONE or schedule a retry/raise so the base-class retry/cleanup behavior can run) — update the docstring to state that this method collapses Thinking + Responding into DONE for the skip_ai path and reference the methods done() and on_status_update() and the StageStateEnum transitions so reviewers can find the change.iris/src/iris/pipeline/shared/global_search_intent_classifier.py (2)
100-101: Defensive_LABELSlookup.
_LABELS[pred_class]will raiseKeyErrorif the joblib head ever returns a class id outside{0, 1}(e.g. if a future multi-class head is loaded by mistake). Since the surroundingclassify()doesn't catch — and the upstream pipeline only wraps the call in a broadtry/except— a single bad model artifact would surface as a noisy 500 in the pipeline worker..get()with the safe default keeps behavior consistent with the rest of the module (which already biases towardTRIGGER_AIwhenever the classifier is unsure).♻️ Proposed change
- pred_class = int(self._head.predict(embedding)[0]) - return _LABELS[pred_class] + pred_class = int(self._head.predict(embedding)[0]) + return _LABELS.get(pred_class, SearchIntent.TRIGGER_AI)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@iris/src/iris/pipeline/shared/global_search_intent_classifier.py` around lines 100 - 101, The lookup of _LABELS[pred_class] is unsafe: change the return to use _LABELS.get(pred_class, _LABELS[TRIGGER_AI]) after obtaining pred_class (ensure pred_class is an int and handle empty/zero-length predict outputs before indexing), so replace the direct index in the classify flow (where self._head.predict(...) -> pred_class) with a .get(...) that falls back to the TRIGGER_AI label to maintain existing bias when the head returns an out-of-range or unexpected class id.
70-75: Hardcodedintra_op_num_threads = 4may starve or oversubscribe pods.The setting is fixed at 4 regardless of how many cores the pod actually has. Per ONNX Runtime's threading docs, the default (0 / unset) sizes the intra-op pool to the number of physical CPU cores, which is usually the right choice on a server. Hardcoding 4 will:
- Oversubscribe small pods (e.g. 1–2 vCPU containers running multiple Iris pipelines in parallel), increasing context-switching and latency variance.
- Underutilize larger hosts where >4 cores are available for a single inference.
Consider either leaving the default (
0) or making it configurable via env var (e.g.INTENT_INTRA_OP_THREADS).♻️ Proposed change
- sess_opts.intra_op_num_threads = 4 + # 0 = let ORT pick (= number of physical cores). Override via env if needed. + sess_opts.intra_op_num_threads = int(os.environ.get("INTENT_INTRA_OP_THREADS", "0"))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@iris/src/iris/pipeline/shared/global_search_intent_classifier.py` around lines 70 - 75, Replace the hardcoded sess_opts.intra_op_num_threads = 4 with a configurable or default behavior: read an INTENT_INTRA_OP_THREADS env var, parse it to an int (fall back to 0 when unset/invalid) and assign that value to sess_opts.intra_op_num_threads, or simply leave it at 0 to let ONNX Runtime pick physical cores; update the SessionOptions setup where sess_opts is created (the block that calls ort.SessionOptions(), sets graph_optimization_level, intra_op_num_threads, and constructs self._session via ort.InferenceSession) so the code uses the parsed env var or 0 instead of the hardcoded 4.iris/src/iris/domain/status/global_search_status_update_dto.py (1)
17-17: Usedefault_factory=listforsourcesto silence RUF012 and match sibling DTOs.Pydantic v2 will safely deep-copy the literal
[]default, so this is not a real bug — but Ruff flags it (RUF012), and the siblingGlobalSearchRequestDTO.initial_stagesalready usesField(default_factory=list). Aligning here keeps the static-analysis output clean and the codebase consistent.♻️ Proposed change
-from typing import List, Optional +from typing import List, Optional + +from pydantic import Field @@ - sources: List[LectureSearchResultDTO] = [] + sources: List[LectureSearchResultDTO] = Field(default_factory=list)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@iris/src/iris/domain/status/global_search_status_update_dto.py` at line 17, The field default for sources uses a mutable literal ([]), causing RUF012; change the declaration in GlobalSearchStatusUpdateDTO from `sources: List[LectureSearchResultDTO] = []` to use a factory: `sources: List[LectureSearchResultDTO] = Field(default_factory=list)`, and add an import for Field from pydantic if missing so it matches the pattern used by GlobalSearchRequestDTO.initial_stages and silences the linter.iris/src/iris/pipeline/global_search_pipeline.py (2)
172-174: Inconsistent log tag.The router worker logs
[global-search] ...; this line uses[ask], which is a leftover from the previous naming. Align for greppability.Proposed fix
- logger.info( - "[ask] LLM refusal detected in answer text — suppressing to null" - ) + logger.info( + "[global-search] LLM refusal detected in answer text — suppressing to null" + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@iris/src/iris/pipeline/global_search_pipeline.py` around lines 172 - 174, The log line using logger.info currently tags messages with "[ask]" which is inconsistent; update the logger.info call in global_search_pipeline.py (the logger.info that logs "LLM refusal detected in answer text — suppressing to null") to use the "[global-search]" tag so it matches the router worker's convention and is greppable across the codebase.
109-111: Rename enum toIRIS_GLOBAL_SEARCH_ANSWER_PIPELINEfor consistency with component refactoring.The class, DTO, and prompts module were renamed from
LectureSearch*toGlobalSearch*, but token accounting still usesPipelineEnum.IRIS_LECTURE_SEARCH_ANSWER_PIPELINE(lines 110, 179). Create a correspondingIRIS_GLOBAL_SEARCH_ANSWER_PIPELINEenum member to ensure telemetry is properly bucketed under the new pipeline name.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@iris/src/iris/pipeline/global_search_pipeline.py` around lines 109 - 111, Replace the old enum member PipelineEnum.IRIS_LECTURE_SEARCH_ANSWER_PIPELINE with a new member PipelineEnum.IRIS_GLOBAL_SEARCH_ANSWER_PIPELINE and update all callsites that pass the enum to _append_tokens (e.g., the call using self.hyde_llm.tokens in GlobalSearchPipeline / global_search_pipeline.py and the other occurrence near the answer DTO usage) so telemetry/bucketing uses IRIS_GLOBAL_SEARCH_ANSWER_PIPELINE; add the new enum entry to the PipelineEnum definition, run a project-wide find/replace for the old member name, and ensure imports/types compile after the rename.iris/models/global_search_intent/train.py (3)
200-204: Drop the no-op "sanity check" read.
(BACKBONE_DIR / "pooling_config.json").read_text()discards its result and only proves the file is readable — but you wrote it ~80 lines above withpathlib.Path.write_text, so this reads as dead code. The next line (204) authoritatively re-writes the file intoFINAL_DIR, which is what actually matters for the inference dir.Proposed cleanup
-joblib.dump(model.model_head, str(FINAL_DIR / "model_head.joblib")) -(BACKBONE_DIR / "pooling_config.json").read_text() # sanity check readable - -# Copy pooling config so the inference dir is self-contained -(FINAL_DIR / "pooling_config.json").write_text(json.dumps(pooling_config, indent=2)) +joblib.dump(model.model_head, str(FINAL_DIR / "model_head.joblib")) + +# Copy pooling config so the inference dir is self-contained +(FINAL_DIR / "pooling_config.json").write_text(json.dumps(pooling_config, indent=2))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@iris/models/global_search_intent/train.py` around lines 200 - 204, Remove the no-op read of the pooling config that discards its result: delete the standalone call `(BACKBONE_DIR / "pooling_config.json").read_text()` and rely on the existing write to FINAL_DIR via `(FINAL_DIR / "pooling_config.json").write_text(json.dumps(pooling_config, indent=2))`; ensure you keep the joblib.dump(model.model_head, str(FINAL_DIR / "model_head.joblib")) line and that the variable names BACKBONE_DIR, FINAL_DIR and pooling_config remain used so the inference directory is correctly self-contained.
18-34: Avoidpip install --upgradeas a side-effect at script load time.Running
subprocess.check_call([... "pip", "install", ..., "--upgrade"])at the top of the file mutates the active environment every time the script is imported or executed, can silently bump versions of unrelated callers in the same venv, and is a build-/CI-hostile pattern. Pin the training extras inpyproject.toml(e.g. an[project.optional-dependencies] traininggroup) and documentpip install -e .[training]in the README, then remove this auto-installer block. This also lets you drop thenosec B404/B603markers and the# noqa: E402late imports below.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@iris/models/global_search_intent/train.py` around lines 18 - 34, The top-level dependency installation block (the pkgs list and the loop invoking subprocess.check_call for pip installs) should be removed to avoid mutating the runtime environment at import/execute; instead add these packages to pyproject.toml as an optional dependency group (e.g., [project.optional-dependencies] training) and update README to instruct users to run pip install -e .[training] (or similar) for training setup; also remove the associated nosec B603/B404 and any noqa: E402 late-import comments that were only needed because of the installer block.
105-106: Useparents=Trueonmkdir(exist_ok=True).
BACKBONE_DIR.mkdir(exist_ok=True)(and the same call on line 165 forFINAL_DIR) will raiseFileNotFoundErrorif any intermediate directory (e.g. the script is invoked from a different cwd) is missing. Cheap to make robust:Proposed fix
-BACKBONE_DIR.mkdir(exist_ok=True) +BACKBONE_DIR.mkdir(parents=True, exist_ok=True)-FINAL_DIR.mkdir(exist_ok=True) +FINAL_DIR.mkdir(parents=True, exist_ok=True)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@iris/models/global_search_intent/train.py` around lines 105 - 106, The mkdir calls on BACKBONE_DIR and FINAL_DIR can raise FileNotFoundError if intermediate directories don't exist; update their creation to call mkdir with parents=True and keep exist_ok=True (i.e., change the BACKBONE_DIR.mkdir(...) and the FINAL_DIR.mkdir(...) calls to include parents=True) so the entire directory path is created atomically using the Path objects BACKBONE_DIR and FINAL_DIR.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@iris/models/global_search_intent/onnx/meta.json`:
- Line 10: The meta.json entry "onnx_file" is incorrect—update the value of the
onnx_file key in iris/models/global_search_intent/onnx/meta.json to match the
actual committed ONNX filename ("model_quantized.onnx") so metadata aligns with
the repository artifact; this keeps external tooling and maintainers consistent
with the model file that classifier._find_onnx_file() will pick up.
In `@iris/models/global_search_intent/train.py`:
- Around line 207-216: The meta dictionary creation writes eval metrics that may
be numpy scalars which json.dumps can't serialize; before serializing meta (the
variable meta) convert any numpy-typed values in metrics (returned from
trainer.evaluate()) to native Python types (e.g., via a helper that walks the
metrics dict and calls .item() for numpy scalars or uses json-safe casting) and
then write FINAL_DIR / "meta.json" using json.dumps on the cleaned meta so
serialization no longer raises; update the code around where metrics is set and
where meta is created/written to ensure all values are primitive Python types.
In `@iris/src/iris/pipeline/global_search_pipeline.py`:
- Around line 161-176: The refusal-suppression currently uses a broad re.search
on answer and clears used_sources; change it to only suppress when the entire
answer matches the refusal pattern and the answer is short/minimal (e.g., use
re.fullmatch on the existing regex and require a word-count or character-length
threshold like len(answer.split()) < 8 or len(answer) < 100), and do not clear
used_sources when suppressing — only set answer = None (keep used_sources
as-is); update the code that references answer, used_sources, and logger.info to
reflect these checks so legitimate grounded answers with embedded denial phrases
and citations aren’t discarded.
In `@iris/src/iris/pipeline/shared/global_search_intent_classifier.py`:
- Around line 107-134: The _get_classifier() function is currently cached with
`@lru_cache`(maxsize=1) and returns None on any Exception which gets permanently
memoized; update it so only successful _IntentClassifier instances are cached
and transient HF Hub failures are retried: remove or change the decorator usage
on _get_classifier(), create a small non-cached loader that attempts to
construct _IntentClassifier(model_dir) and on success stores/returns the
instance (e.g., by setting a module-level variable or a separate
`@lru_cache-wrapped` factory that only returns the instance), while preserving the
existing behavior that missing model_dir (from _model_dir()) remains a one-time
warning and returns None without being cached permanently; ensure classify()
still calls the new loader and that exceptions during _IntentClassifier(...) are
logged (logger.exception) but do not cause None to be cached for the lifetime of
the process.
In `@iris/src/iris/web/routers/pipelines.py`:
- Around line 490-499: The INFO log in the router currently emits raw user input
via dto.query[:120]; change it to avoid logging user queries at INFO by either
(a) removing dto.query from the logger.info call and only logging the decision
(intent) and the LLM branch, or (b) lowering the log level to DEBUG for the line
that includes dto.query so queries are only recorded when DEBUG is enabled
(matching the in-pipeline debug at global_search_pipeline.py:100). Update the
logger invocation in the router (the logger.info that references dto.query and
SearchIntent.TRIGGER_AI) accordingly so INFO logs contain only structural
information (intent and branch) and user input is not sent to standard log
shipping.
---
Outside diff comments:
In `@iris/src/iris/domain/search/lecture_search_dto.py`:
- Around line 49-60: GlobalSearchRequestDTO.initial_stages is never used; update
GlobalSearchCallback.__init__ to accept an initial_stages: List[StageDTO] (or
Optional[List[StageDTO]]) parameter, store it on the instance, and use it when
constructing the stages list instead of always using a fixed [Thinking,
Responding] sequence; then update run_global_search_pipeline_worker to pass
dto.initial_stages into GlobalSearchCallback(...) (mirror how
CourseChatStatusCallback/ExerciseChatStatusCallback/LectureChatCallback handle
initial_stages) so caller-provided stages are prepended/used.
---
Nitpick comments:
In `@iris/models/global_search_intent/train.py`:
- Around line 200-204: Remove the no-op read of the pooling config that discards
its result: delete the standalone call `(BACKBONE_DIR /
"pooling_config.json").read_text()` and rely on the existing write to FINAL_DIR
via `(FINAL_DIR / "pooling_config.json").write_text(json.dumps(pooling_config,
indent=2))`; ensure you keep the joblib.dump(model.model_head, str(FINAL_DIR /
"model_head.joblib")) line and that the variable names BACKBONE_DIR, FINAL_DIR
and pooling_config remain used so the inference directory is correctly
self-contained.
- Around line 18-34: The top-level dependency installation block (the pkgs list
and the loop invoking subprocess.check_call for pip installs) should be removed
to avoid mutating the runtime environment at import/execute; instead add these
packages to pyproject.toml as an optional dependency group (e.g.,
[project.optional-dependencies] training) and update README to instruct users to
run pip install -e .[training] (or similar) for training setup; also remove the
associated nosec B603/B404 and any noqa: E402 late-import comments that were
only needed because of the installer block.
- Around line 105-106: The mkdir calls on BACKBONE_DIR and FINAL_DIR can raise
FileNotFoundError if intermediate directories don't exist; update their creation
to call mkdir with parents=True and keep exist_ok=True (i.e., change the
BACKBONE_DIR.mkdir(...) and the FINAL_DIR.mkdir(...) calls to include
parents=True) so the entire directory path is created atomically using the Path
objects BACKBONE_DIR and FINAL_DIR.
In `@iris/src/iris/domain/status/global_search_status_update_dto.py`:
- Line 17: The field default for sources uses a mutable literal ([]), causing
RUF012; change the declaration in GlobalSearchStatusUpdateDTO from `sources:
List[LectureSearchResultDTO] = []` to use a factory: `sources:
List[LectureSearchResultDTO] = Field(default_factory=list)`, and add an import
for Field from pydantic if missing so it matches the pattern used by
GlobalSearchRequestDTO.initial_stages and silences the linter.
In `@iris/src/iris/pipeline/global_search_pipeline.py`:
- Around line 172-174: The log line using logger.info currently tags messages
with "[ask]" which is inconsistent; update the logger.info call in
global_search_pipeline.py (the logger.info that logs "LLM refusal detected in
answer text — suppressing to null") to use the "[global-search]" tag so it
matches the router worker's convention and is greppable across the codebase.
- Around line 109-111: Replace the old enum member
PipelineEnum.IRIS_LECTURE_SEARCH_ANSWER_PIPELINE with a new member
PipelineEnum.IRIS_GLOBAL_SEARCH_ANSWER_PIPELINE and update all callsites that
pass the enum to _append_tokens (e.g., the call using self.hyde_llm.tokens in
GlobalSearchPipeline / global_search_pipeline.py and the other occurrence near
the answer DTO usage) so telemetry/bucketing uses
IRIS_GLOBAL_SEARCH_ANSWER_PIPELINE; add the new enum entry to the PipelineEnum
definition, run a project-wide find/replace for the old member name, and ensure
imports/types compile after the rename.
In `@iris/src/iris/pipeline/shared/global_search_intent_classifier.py`:
- Around line 100-101: The lookup of _LABELS[pred_class] is unsafe: change the
return to use _LABELS.get(pred_class, _LABELS[TRIGGER_AI]) after obtaining
pred_class (ensure pred_class is an int and handle empty/zero-length predict
outputs before indexing), so replace the direct index in the classify flow
(where self._head.predict(...) -> pred_class) with a .get(...) that falls back
to the TRIGGER_AI label to maintain existing bias when the head returns an
out-of-range or unexpected class id.
- Around line 70-75: Replace the hardcoded sess_opts.intra_op_num_threads = 4
with a configurable or default behavior: read an INTENT_INTRA_OP_THREADS env
var, parse it to an int (fall back to 0 when unset/invalid) and assign that
value to sess_opts.intra_op_num_threads, or simply leave it at 0 to let ONNX
Runtime pick physical cores; update the SessionOptions setup where sess_opts is
created (the block that calls ort.SessionOptions(), sets
graph_optimization_level, intra_op_num_threads, and constructs self._session via
ort.InferenceSession) so the code uses the parsed env var or 0 instead of the
hardcoded 4.
In `@iris/src/iris/web/status/status_update.py`:
- Around line 565-579: The done() override on GlobalSearchCallback currently
swallows unexpected kwargs and ignores delivery success; change the signature to
accept only the explicit params (answer=None, sources=None, tokens=None) instead
of **_kwargs or, if you must accept extras, explicitly assert/log when
unexpected kwargs are passed so callers are not silently dropped; capture the
boolean return from self.on_status_update() and propagate or act on it (e.g., if
delivery failed, do not permanently mark StageStateEnum.DONE or schedule a
retry/raise so the base-class retry/cleanup behavior can run) — update the
docstring to state that this method collapses Thinking + Responding into DONE
for the skip_ai path and reference the methods done() and on_status_update() and
the StageStateEnum transitions so reviewers can find the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c197bd37-b536-4dbf-84ca-38a84e436b69
⛔ Files ignored due to path filters (1)
iris/models/global_search_intent/training_data.csvis excluded by!**/*.csv
📒 Files selected for processing (20)
iris/.gitattributesiris/application.example.ymliris/models/global_search_intent/.gitignoreiris/models/global_search_intent/onnx/config.jsoniris/models/global_search_intent/onnx/meta.jsoniris/models/global_search_intent/onnx/model_head.joblibiris/models/global_search_intent/onnx/model_quantized.onnxiris/models/global_search_intent/onnx/ort_config.jsoniris/models/global_search_intent/onnx/pooling_config.jsoniris/models/global_search_intent/train.pyiris/src/iris/domain/search/lecture_search_dto.pyiris/src/iris/domain/search/search_intent_dto.pyiris/src/iris/domain/status/global_search_status_update_dto.pyiris/src/iris/pipeline/global_search_pipeline.pyiris/src/iris/pipeline/prompts/global_search_prompts.pyiris/src/iris/pipeline/shared/global_search_intent_classifier.pyiris/src/iris/retrieval/lecture/lecture_global_search_retrieval.pyiris/src/iris/web/routers/pipelines.pyiris/src/iris/web/routers/search.pyiris/src/iris/web/status/status_update.py
- meta.json: correct onnx_file value to match committed filename - train.py: coerce numpy eval metrics to float before json.dumps - global_search_pipeline: narrow refusal-suppression to short answers only, keep used_sources on suppression - global_search_intent_classifier: don't cache None on transient HF Hub failure, retry on next call - pipelines router: lower query-logging to DEBUG to avoid PII in INFO logs
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@iris/models/global_search_intent/train.py`:
- Around line 207-216: The meta["onnx_file"] selection may pick a different
.onnx than the service's _find_onnx_file logic; update the code that builds meta
in train.py to use the same selection logic as
iris.pipeline.shared.global_search_intent_classifier._find_onnx_file (either
import and call _find_onnx_file or replicate its sorting/prefer-"quantized"
candidate rules) when choosing the ONNX filename from FINAL_DIR so the recorded
meta["onnx_file"] matches the artifact the runtime will actually load.
- Around line 18-34: The script currently auto-installs packages via the pkgs
list and the subprocess.check_call loop (install block using sys.executable and
pip), which undermines reproducibility; instead, add all training dependencies
(setfit, optimum[onnxruntime], onnxruntime, datasets, scikit-learn, joblib,
pandas) with pinned versions to your Poetry pyproject.toml (preferably a
dedicated "train" group), remove the dynamic installation loop from train.py,
and replace it with a fail-fast check that imports or verifies those packages
are present and raises a clear error if missing so CI/consumers must install via
Poetry rather than the script auto-upgrading them.
In `@iris/src/iris/web/routers/pipelines.py`:
- Around line 500-517: The current branch only calls callback.thinking() when
intent == SearchIntent.TRIGGER_AI, but the endpoint contract promises two
callbacks for all searches; update the handler so callback.thinking() is emitted
unconditionally before invoking GlobalSearchPipeline (i.e., call
callback.thinking() before or regardless of the SearchIntent check), then
proceed with intent-specific logic and finally callback.done(answer=...,
sources=...); alternatively, if you intend SKIP_AI to only send the final
webhook, update the endpoint docstring to remove the promise of an initial
status—modify the code paths around SearchIntent.TRIGGER_AI /
SearchIntent.SKIP_AI and the docstring consistently so callback.thinking() and
callback.done() behavior matches the documented contract.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 54be69be-2616-4c37-b1ac-8330d3171345
⛔ Files ignored due to path filters (1)
iris/poetry.lockis excluded by!**/*.lock
📒 Files selected for processing (7)
iris/models/global_search_intent/onnx/meta.jsoniris/models/global_search_intent/train.pyiris/pyproject.tomliris/src/iris/pipeline/global_search_pipeline.pyiris/src/iris/pipeline/shared/global_search_intent_classifier.pyiris/src/iris/web/routers/pipelines.pyiris/src/iris/web/status/status_update.py
✅ Files skipped from review due to trivial changes (1)
- iris/models/global_search_intent/onnx/meta.json
🚧 Files skipped from review as they are similar to previous changes (2)
- iris/src/iris/web/status/status_update.py
- iris/src/iris/pipeline/global_search_pipeline.py
- train.py: replace dynamic pip install with fail-fast import check; add poetry train group (setfit, optimum[onnxruntime] ^2.0, datasets, scikit-learn, pandas) — install with: poetry install --with train - train.py: mirror _find_onnx_file selection logic (sort + prefer quantized) when recording onnx_file in meta.json - pipelines.py: update docstring to document that thinking callback is only sent for TRIGGER_AI (LLM) path, not SKIP_AI
Without this, every query would log a warning after the directory-missing path returned None without setting the instance flag.
vLLM 0.20.0 switched PyPI default to CUDA 13, which required our TORCH_INDEX_URL fix. Pin to this known-good version to prevent future surprise breakages from unpinned >=0.19.1 resolution. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Athena Test Results Summary
Failing Tests Summary
|
||||||||||||||
📊 Detailed Coverage TableCombining 3 coverage files... 📊 Combined Coverage Summary:
Total packages: 31 Note: Coverage thresholds: ✅ (≥70%), ❌ (<70%) |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
athena/athena/athena/storage/submission_storage.py (1)
62-63: ⚡ Quick winConsider adding a debug log for disabled-DB write no-op.
store_submissionssilently returns when DB is off; a debug-level breadcrumb here would make misconfiguration diagnosis easier without adding warning noise.Proposed change
from athena.contextvars import get_lms_url from athena.database import get_db, is_database_enabled +from athena.logger import logger from athena.schemas import Submission @@ def store_submissions(submissions: List[Submission], lms_url: Optional[str] = None): """Stores the given submissions, all at once.""" if not is_database_enabled(): + logger.debug("Database disabled: skipping store_submissions for %d submissions", len(submissions)) return🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@athena/athena/athena/storage/submission_storage.py` around lines 62 - 63, In store_submissions, add a debug log right before the early return when is_database_enabled() is false (i.e., log that DB writes are disabled and the function is a no-op); use the module logger (or existing logger variable) and include contextual info such as the number of submissions (e.g., len(submissions)) or an identifier to make the breadcrumb useful for debugging without raising warnings.iris/models/global_search_intent/train.py (1)
39-40: ⚡ Quick winAnchor data/artifact paths to the script directory.
Current relative paths depend on the caller’s working directory. Resolving from
__file__makes the script robust in CI and local execution.💡 Proposed refactor
+SCRIPT_DIR = Path(__file__).resolve().parent -df = pd.read_csv("models/global_search_intent/training_data.csv") +df = pd.read_csv(SCRIPT_DIR / "training_data.csv") ... -BACKBONE_DIR = Path("models/global_search_intent/backbone") +BACKBONE_DIR = SCRIPT_DIR / "backbone" ... -ONNX_BASE_DIR = Path("models/global_search_intent/onnx_base") +ONNX_BASE_DIR = SCRIPT_DIR / "onnx_base" ... -ONNX_OPT_DIR = Path("models/global_search_intent/onnx_optimized") +ONNX_OPT_DIR = SCRIPT_DIR / "onnx_optimized" ... -FINAL_DIR = Path("models/global_search_intent/onnx") +FINAL_DIR = SCRIPT_DIR / "onnx"Also applies to: 100-101, 135-136, 143-144, 159-160
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@iris/models/global_search_intent/train.py` around lines 39 - 40, The CSV and other artifact paths in train.py are relative to the caller's cwd; change them to be anchored to the script directory by computing a base path from __file__ (e.g., base_dir = Path(__file__).resolve().parent) and replace direct string literals passed to pd.read_csv (and any writes/loads later) with base_dir / "models/global_search_intent/training_data.csv" (and the corresponding artifact filenames referenced at the other occurrences). Update the uses of pd.read_csv and any file write/load calls so they build paths with Path.joinpath or the / operator using that base_dir to make df loading and artifact operations robust in CI and local runs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@iris/models/global_search_intent/train.py`:
- Around line 42-43: The f-strings use double quotes both for the string and for
the df["label"] key which raises a syntax error; update the print statements
(the two occurrences around the top prints and the later prints at lines 61–62)
to use a different quoting style for the dictionary access (e.g., df['label'])
or escape the quotes so the f-string delimiter and the key quotes don't
conflict, leaving the rest of the print formatting unchanged.
---
Nitpick comments:
In `@athena/athena/athena/storage/submission_storage.py`:
- Around line 62-63: In store_submissions, add a debug log right before the
early return when is_database_enabled() is false (i.e., log that DB writes are
disabled and the function is a no-op); use the module logger (or existing logger
variable) and include contextual info such as the number of submissions (e.g.,
len(submissions)) or an identifier to make the breadcrumb useful for debugging
without raising warnings.
In `@iris/models/global_search_intent/train.py`:
- Around line 39-40: The CSV and other artifact paths in train.py are relative
to the caller's cwd; change them to be anchored to the script directory by
computing a base path from __file__ (e.g., base_dir =
Path(__file__).resolve().parent) and replace direct string literals passed to
pd.read_csv (and any writes/loads later) with base_dir /
"models/global_search_intent/training_data.csv" (and the corresponding artifact
filenames referenced at the other occurrences). Update the uses of pd.read_csv
and any file write/load calls so they build paths with Path.joinpath or the /
operator using that base_dir to make df loading and artifact operations robust
in CI and local runs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ea633a56-c8a0-4162-bd5d-3ec395bba250
⛔ Files ignored due to path filters (1)
iris/poetry.lockis excluded by!**/*.lock
📒 Files selected for processing (18)
athena/athena/athena/app.pyathena/athena/athena/database.pyathena/athena/athena/endpoints.pyathena/athena/athena/storage/exercise_storage.pyathena/athena/athena/storage/feedback_storage.pyathena/athena/athena/storage/structured_grading_criterion_storage.pyathena/athena/athena/storage/submission_storage.pyathena/env_example/module_programming_llm.envathena/env_example/module_text_llm.envathena/modules/modeling/module_modeling_llm/module_modeling_llm/__main__.pyathena/modules/programming/module_programming_llm/module_programming_llm/__main__.pyathena/modules/programming/module_programming_quality_llm/module_programming_quality_llm/__main__.pyathena/modules/text/module_text_llm/module_text_llm/__main__.pyiris/models/global_search_intent/train.pyiris/pyproject.tomliris/src/iris/pipeline/shared/global_search_intent_classifier.pyiris/src/iris/web/routers/pipelines.pylogos/logos-workernode/Dockerfile
✅ Files skipped from review due to trivial changes (3)
- athena/env_example/module_programming_llm.env
- logos/logos-workernode/Dockerfile
- athena/env_example/module_text_llm.env
🚧 Files skipped from review as they are similar to previous changes (2)
- iris/pyproject.toml
- iris/src/iris/pipeline/shared/global_search_intent_classifier.py
| print(f" trigger_ai : {df["label"].sum()}") | ||
| print(f" skip_ai : {(df["label"] == 0).sum()}") |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify parser-level correctness after the fix.
python -m py_compile iris/models/global_search_intent/train.pyRepository: ls1intum/edutelligence
Length of output: 254
🏁 Script executed:
cat -n iris/models/global_search_intent/train.py | sed -n '35,70p'Repository: ls1intum/edutelligence
Length of output: 1763
🏁 Script executed:
rg 'print\(f"[^"]*{[^}]*\["[^"]*"\]' iris/models/global_search_intent/train.pyRepository: ls1intum/edutelligence
Length of output: 195
🏁 Script executed:
rg 'print.*f".*\["' iris/models/global_search_intent/train.pyRepository: ls1intum/edutelligence
Length of output: 280
🏁 Script executed:
# Create a test version with the proposed fix applied
sed -i 's/df\["label"\]/df["'"'"'label'"'"']/g; s/df_eval\["label"\]/df_eval["'"'"'label'"'"']/g' iris/models/global_search_intent/train.py
# Verify the fix resolves the syntax error
python -m py_compile iris/models/global_search_intent/train.pyRepository: ls1intum/edutelligence
Length of output: 300
🏁 Script executed:
# Restore the original file from git
git checkout iris/models/global_search_intent/train.pyRepository: ls1intum/edutelligence
Length of output: 96
🏁 Script executed:
python <<'EOF'
# Read the file
with open('iris/models/global_search_intent/train.py', 'r') as f:
content = f.read()
# Apply the fix: replace df["label"] with df['label'] and df_eval["label"] with df_eval['label']
# Only within f-strings (print statements)
fixed_content = content.replace('df["label"]', "df['label']").replace('df_eval["label"]', "df_eval['label']")
# Write the fixed content
with open('iris/models/global_search_intent/train.py', 'w') as f:
f.write(fixed_content)
print("File updated")
EOF
# Verify the fix resolves the syntax error
python -m py_compile iris/models/global_search_intent/train.pyRepository: ls1intum/edutelligence
Length of output: 79
🏁 Script executed:
cat -n iris/models/global_search_intent/train.py | sed -n '40,62p'Repository: ls1intum/edutelligence
Length of output: 1212
Fix invalid f-string quoting (syntax error prevents execution).
Lines 42–43 and 61–62 use double quotes for both the f-string delimiter and df["label"] dictionary access, which is invalid Python syntax and causes an f-string: unmatched '[' error at parse time.
Proposed fix
-print(f" trigger_ai : {df["label"].sum()}")
-print(f" skip_ai : {(df["label"] == 0).sum()}")
+print(f" trigger_ai : {df['label'].sum()}")
+print(f" skip_ai : {(df['label'] == 0).sum()}")
...
-print(f" trigger_ai : {df_eval["label"].sum()}")
-print(f" skip_ai : {(df_eval["label"] == 0).sum()}\n")
+print(f" trigger_ai : {df_eval['label'].sum()}")
+print(f" skip_ai : {(df_eval['label'] == 0).sum()}\n")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| print(f" trigger_ai : {df["label"].sum()}") | |
| print(f" skip_ai : {(df["label"] == 0).sum()}") | |
| print(f" trigger_ai : {df['label'].sum()}") | |
| print(f" skip_ai : {(df['label'] == 0).sum()}") |
🧰 Tools
🪛 Flake8 (7.3.0)
[error] 42-42: SyntaxError: f-string: unmatched '['
(E999)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@iris/models/global_search_intent/train.py` around lines 42 - 43, The
f-strings use double quotes both for the string and for the df["label"] key
which raises a syntax error; update the print statements (the two occurrences
around the top prints and the later prints at lines 61–62) to use a different
quoting style for the dictionary access (e.g., df['label']) or escape the quotes
so the f-string delimiter and the key quotes don't conflict, leaving the rest of
the print formatting unchanged.
…wer-intent-routing
GlobalSearchPipeline was always instantiated with the default local=False, causing it to use cloud models regardless of the user's Artemis LLM setting.
Summary
trigger_ai(student wants an explanation) orskip_ai(bare keyword/navigation). Classification runs in ~2ms on CPU.skip_aiqueries skip HyDE and LLM entirely — sources are returned immediately withanswer: null.trigger_aiqueries go through the full HyDE + LLM pipeline.POST /api/v1/pipelines/global-search/runreturns 202 immediately. Two callbacks are sent to Artemis atpipelines/global-search/runs/{runId}/status— one when thinking starts, one when done.LectureSearch*pipeline classes, DTOs, and prompts renamed toGlobalSearch*to reflect that this feature serves the Global Search modal, not just lecture search.Artemis counterpart: ls1intum/Artemis#12612
Local setup
1. Install Git LFS (required — the ONNX model is stored in LFS)
If Git LFS is already installed on your machine, pulling the branch is enough — it downloads the model automatically. If it is not installed, you will get a 134-byte pointer file instead of the model and Iris will crash on the first request.
2. Install dependencies
3. Update
application.local.ymlRename the pipeline key:
4. Run Iris normally
On the first request, the tokenizer (~17MB) downloads from HuggingFace Hub and is cached locally. All subsequent startups load from cache instantly.
Test plan
trigger_ai, LLM answer + sourcesskip_ai, sources only,answer: nullskip_ai, no LLM callpipelines/global-search/runs/{runId}/statuson Artemis sideCloses IRIS-264
Summary by CodeRabbit
New Features
/global-search/runendpoint with asynchronous operations and real-time status updates.Bug Fixes
Chores
/search/askendpoint; updated configuration and dependencies.