Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for Clarification on Sorting Mechanism in API Documentation #1099

Closed
massimobrivio opened this issue Feb 25, 2025 · 4 comments
Closed
Assignees
Labels
enhancement New feature or request

Comments

@massimobrivio
Copy link

massimobrivio commented Feb 25, 2025

Problem

When querying publications by category using the arXiv API and sorting them by relevance, it is unclear how the results are ordered. For example, the results from the following query are not sorted by publication or update date, and all should be equally relevant to the topic of AI, so what are the important parameters for the sorting?

http://export.arxiv.org/api/query?search_query=cat:cs.AI&sortBy=relevance&sortOrder=descending&start=0&max_results=10

Desired solution

A more detailed explanation in the API documentation about the sorting mechanism when querying by relevance. Specifically, the main parameters that determine the order of the results should be clarified. Specifically, how does sorting by relevance work when all the publications are equally relevant to the search query?

@massimobrivio massimobrivio added the enhancement New feature or request label Feb 25, 2025
@jweiskoff jweiskoff self-assigned this Feb 25, 2025
@jweiskoff
Copy link
Contributor

Fixed in #1100

@jweiskoff
Copy link
Contributor

Unfortunately this is just standard Lucene's RELEVANCE sort from an older version of the documentation. It's probably using some internal optimization that's not clearly explained on that page.

@massimobrivio
Copy link
Author

massimobrivio commented Feb 26, 2025

Unfortunately this is just standard Lucene's RELEVANCE sort from an older version of the documentation. It's probably using some internal optimization that's not clearly explained on that page.

@jweiskoff

I opened a clarification request on Apache Lucene, they might be able to clarify the internal workings of their sorting algorithm.
Link to issue: apache/lucene#14295

@jweiskoff
Copy link
Contributor

I suspect that you won't get much feedback from them, since this could be considered a trade secret, but maybe. 🤷🏼

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants