Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filtering terms for alphanumeric filter #79

Open
gsfk opened this issue May 2, 2023 · 9 comments
Open

filtering terms for alphanumeric filter #79

gsfk opened this issue May 2, 2023 · 9 comments

Comments

@gsfk
Copy link
Collaborator

gsfk commented May 2, 2023

I use "alphanumeric" filters for key/operator/value queries... motivated partly by using phenopackets, where many fields (eg sex) expect values from an enum, not ontology terms.

From what I understand, the goal of the /filtering_terms endpoint is to make the data in a particular beacon discoverable, but it's not clear to me what to return in /filtering_terms for these, if in fact I can return anything at all, since the only fields in a filtering terms result are type / id / label.. it seems odd to pack values into the "label" field.

For a simplified example, phenopacket sex is an enum of UNKNOWN_SEX, FEMALE, MALE, OTHER_SEX. So do I want four filtering terms? eg:

    {
        "type": "alphanumeric",
        "id": "subject.sex",
        "label": "UNKNOWN_SEX"
    }
    {
        "type": "alphanumeric",
        "id": "subject.sex",
        "label": "FEMALE"
    }

... and so on for MALE and OTHER_SEX. "label" doesn't really make sense in this context. I would prefer something along the lines of:

        "type": "alphanumeric",
        "id": "subject.sex",
        "options": ["UNKNOWN_SEX", "FEMALE", "MALE", "OTHER_SEX"]

... but this seems far from the spec. Possibly the issue is that these kinds of metadata queries are not considered "terms" so weren't really expected here. But I'm puzzled why the spec for filters and the spec for filtering term results are so far apart.

@mbaudis
Copy link
Member

mbaudis commented May 2, 2023

@gsfk Your first option looks correct per spec (field id in the case of numeric or alphanumeric fields) but IMO this description is problematic since one usually would match on the id. Also, the field id does not make much sense only for alphanumeric terms since the same "where does this map" applies to ontology terms, too.

IMO the correct way (but in conflict w/ the current description) would be

- type: alphanumeric
  id: UNKNOWN_SEX
  label: unknown genotypic sex
- type: alphanumeric
  id: FEMALE
  label: female genotypic sex

However: I suggest you use an ontology term for I/O and map.

@gsfk
Copy link
Collaborator Author

gsfk commented May 2, 2023

It would be great if I could map every key-value-operator tuple to an ontology term... that's possible for this simple case but prohibitive or outright impossible elsewhere.

I would want /filtering_terms to return something along the lines of "here are the filters you can use" but it seems more like "here are ontology terms you can use" which is not as helpful for my own hacky use case.

@mbaudis
Copy link
Member

mbaudis commented May 2, 2023

@gsfk You're right w/ the numerical etc. values; I don't think we have an expression for a response element here. This would actually make sense to define this key-value-operator concept in a filtering terms response - or provide the values as a range.

But for "mappable" enums etc. -> again, mostly doable though I strongly suggest using CURIEs & resolving locally.

Our example:

  "responseSummary": {
    "exists": true,
    "numTotalResults": 7980
  }

... filtering terms which we use e.g. for front end autocompletes in http://progenetix.org/search/:

Sure, this could become unmanageable... But your example of having the options listed is just slightly more compact (and we have e.g. 800 active NCIT diagnostic terms - long list).

@mbaudis
Copy link
Member

mbaudis commented May 2, 2023

@gsfk ... for numerical values you may want to define sensible classes (e.g.

- ageAtCollection:
    id: ">=18"
- ageAtCollection:
    id: ">=65"

... as reported filtering options - but those also exist as ontology classes, somewhere, so for the sake of cross-Beacon/aggregator compatibility should be provided as terms. See also the Phenopackets timeElement - obviously, for storage/transfer you may want to keep exact values but for querying purposes the classes are more compatible (in principle, if implemented).

@mbaudis
Copy link
Member

mbaudis commented May 31, 2023

@gsfk I have tried to make the documentation a bit clearer for age etc. ... http://docs.genomebeacons.org/filters/#pseudo-numerical-value-queries
This is based on our own way to handle this now. Still, have to think about the best representation in the filtering_terms endpoint (not doing this right now).

@jrambla
Copy link
Contributor

jrambla commented Jun 13, 2023

I would suggest another approach :

- type: custom
  id: UNKNOWN_SEX
  label: unknown genotypic sex
  scope: subject.sex

as you are not actually doing any comparison here, just using a non-ontologized dictionary.

@gsfk could you elaborate on why this construct doesn't fit you use case?

@gsfk
Copy link
Collaborator Author

gsfk commented Jun 13, 2023

Is the suggestion to use scope to pick out a phenopackets path? That's interesting, although doesn't fit the current spec.

Fundamentally all I really wanted was better alignment between filteringTerms and beaconFilteringTermsResults, because I don't understand why they're so far apart.

@mbaudis
Copy link
Member

mbaudis commented Jun 13, 2023

@gsfk

Fundamentally all I really wanted was better alignment between filteringTerms and beaconFilteringTermsResults, because I don't understand why they're so far apart.

Tru dat (and on the radar).

@mbaudis
Copy link
Member

mbaudis commented Jun 13, 2023

@jrambla @gsfk Maybe we should think about having the scope specifically documented to refer to the default model?! Would be a great way to solve some misunderstandings (and actually get back to something we had discussed before settling on the simple filters approach).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants