ARAX-Shepherd is failing TestCase_4 en masse

In the latest test run of the Refactor:
https://arax.ncats.io/?systest=281

<img width="559" height="181" alt="Image" src="https://github.com/user-attachments/assets/23d82751-c0db-4f24-9b1e-cfdb249d1bf4" />

We are failing *a lot* of Test Case 4 that BTE and Aragorn apparently pass:

<img width="1063" height="685" alt="Image" src="https://github.com/user-attachments/assets/845fed61-9604-4b13-80a5-2f34020131d7" />

And apparently we are causing the ARS results to fail as well.
Looking at the results:
https://arax.ci.transltr.io/?r=56b0f4a1-ba29-4c96-8f7f-93a20c151942

<img width="1050" height="236" alt="Image" src="https://github.com/user-attachments/assets/0a6ec7c6-6568-4e24-ba1d-f8408dca72c8" />

It would appear that while Aragorn and BTE are only returning 4 answers (and thus easy for them to duck this huge list of "NeverShow"s, ARAX is returning 500 answers, and thus more vulnerable to NeverShows

https://arax.ci.transltr.io/?r=b6469a57-3f6d-43ef-94a8-75b17282a2f1

<img width="676" height="46" alt="Image" src="https://github.com/user-attachments/assets/53fb1510-ebbc-41a6-ba6f-835fd132e583" />
<img width="676" height="27" alt="Image" src="https://github.com/user-attachments/assets/2a2133b5-0eb9-41e6-9695-d75ed443c982" />

For example, MMP3 is a NeverShow
It is result number 330.

<img width="1329" height="816" alt="Image" src="https://github.com/user-attachments/assets/7252fcc9-f9d1-43dd-8d02-cfef64952d8c" />

It would appear that Retriever is providing this edge.
Since Retriever is the primary knowledge source, this is presumably a subclass reasoning edge. But due to bug #2662 we cannot see the support graph.

I suspect resolving this problem would result in much better Refactor test scores for ARAX-Shepherd as well as ARS (since many FAILs would be turned into PASSes).

We pass these in CI with legacy KPs:

<img width="1336" height="243" alt="Image" src="https://github.com/user-attachments/assets/bfda7e00-1b17-4239-8cfc-7d5d9054474e" />

Only 14 results in legacy CI:
https://arax.ci.transltr.io/?r=5ffc9544-4f84-4d6f-aafc-73cd59beae98

and does not include MMP3 et al.

It would seem that ARAX-Shepherd is failing many tests because it is getting NeverShow information from Retriever

This would seem like an impactful mystery to solve....


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARAX-Shepherd is failing TestCase_4 en masse #2676

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ARAX-Shepherd is failing TestCase_4 en masse #2676

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions