Question: AND/OR combined filter logic #120
Replies: 28 comments
-
@reisingerf This is IMO one of the areas where we need an addition to the current behaviour which not necessarily means that we have to change the protocol itself. I can come up easily with 4 scenarios:
For me no. 2. and 3. look more attractive than 1. In fact, we already follow 3. in
This could be documented as default if we could agree in the wider Beacon community and should be rather easy to support server side. Option 3. (e.g. doing something like Summary: I would love to see a call for / discussion of these (and other) scenarios with a my personal support at the start going to 3. |
Beta Was this translation helpful? Give feedback.
-
Hi all, Current specification e.g. individuals endpoint uses the notation
The reference implementation supports the notation Note that notation [edit] Regarding the POST queries, we can extend FilteringTerm with 'filters: [{"id": "A"}, {"id": "B", "either": true}, {"id": "C", "either": true}, {"id": "D"}]' This way we won't break too much the spec... Best, Dmitry |
Beta Was this translation helpful? Give feedback.
-
We have documented comma-concatenation for parameters as Attention "List Parameters in GET Requests"
There is a reason in that some implementations do not resolve multiple instances of the same parameter into a list but only keep one value; and that comma-concatenation (or
would be interpreted by
... which is IMO correct (though as mentioned above we would use OR internally for terms hitting the same par which is not yet per spec). But I see something similar as an option to define OR using another character than
or
... which would correspond to my number 2. above. Still, before introducing new syntax I'd be very much in favour for solving the most common OR example by documenting the expected behaviour (i.e.
... since AND would lead to failing queries if values not in same expansion tree). |
Beta Was this translation helpful? Give feedback.
-
The same for the current BSC implementation (just unroll A,B,C)
I am OK with this convention, unfortunately no way to express it in the OpenAPI ;:-( In the OpenAPI it is either |
Beta Was this translation helpful? Give feedback.
-
@redmitry and @mbaudis I have been thinking about a solution with filter groups. For example
This would be easier for parsing/validating. Filter groups are This, unfortunately, is not spec compliant but can be supported alongside the specification. Detection of filter groups should not be difficult. This may be translated in GETs as follows too;
|
Beta Was this translation helpful? Give feedback.
-
My concern is how many compatible implementations will fail parsing this. |
Beta Was this translation helpful? Give feedback.
-
Compatibility: Yes, and therefore I go back to my suggestion to document a practical & logical use scenario as: While the Beacon v2 framework defines the general chaining of multiple filter values with an AND logic, filters applied against the same parameter are expected to be treated as OR1 This clarification then serves the discussed scenarios and could - by agreement - directly find its way into the documentation - anybody want to set up a poll? Footnotes
|
Beta Was this translation helpful? Give feedback.
-
Hi everybody, I hope I'm not too late to the conversation. |
Beta Was this translation helpful? Give feedback.
-
I'm just not clear how the POST would look - like this?
... with the root boolean being the AND (which could be modified through a global directive - separate issue). The MongoDB syntax here (used for modelling. YMMV) could also be rewritten (though slightly different meaning - i.e. you have one filter w/ an alternative
|
Beta Was this translation helpful? Give feedback.
-
The approach with I wish we'll steer well away from the mongo syntax or flavour. Because beacon spec could allow custom operators to be used in this place which could cause confusion with mongo syntax. |
Beta Was this translation helpful? Give feedback.
-
Like |
Beta Was this translation helpful? Give feedback.
-
Answering the original question from @reisingerf: The spec only supports AND operators. My 2 cents |
Beta Was this translation helpful? Give feedback.
-
About which syntax to use, I would advocate to what HTTP spec says (if anything). |
Beta Was this translation helpful? Give feedback.
-
But ... that is a given? The moment you use filters you need to be able to apply filters to different entry types which basically implies aggregation of queries / map-reduce ... |
Beta Was this translation helpful? Give feedback.
-
Sorry to chime in, I haven't been working on this so please take it as an outsider's view: I'd personally stay away from complex query/search options. As soon as you expand into a complex search you'd end up with tons off issues/questions:
I don't think you could address the needs of all clients anyway, you'd always play catch-up with new requirements/use cases. To me it's more important that a client can get to the data consistently across many implementations/beacons, rather than running complex queries. If I wanted the complexity I'd probably get all the data and do it locally anyway. Mind you, there may be a middle way to cater for most (common) use cases, but... In the end, would more complex queries really help with beacon adoption? |
Beta Was this translation helpful? Give feedback.
-
Totally agree |
Beta Was this translation helpful? Give feedback.
-
After yesterday's Beacon meeting & general agreement about this I've updated the docs to emphasize the "currently always AND (but logically process OR for terms against same parameter, w/o special syntax)". Updated documentation: |
Beta Was this translation helpful? Give feedback.
-
We can implement "plain" logic following the simple rule:
The java implementation now have this scheme implemented: "filters": [
{
"id": "NCIT:C20197"
},
{
"id": "Weight",
"operator": ">",
"value": "80"
},
{
"id": "BMI",
"operator": ">",
"value": "25",
"logic": "or"
}
] "NCIT:C20197" "Weight>100" |"BMI>30" => "NCIT:C20197" & ("Weight>100" | "BMI>30") [https://] beacons.bsc.es/beacon/v2.0.0/individuals?filters=NCIT:C20197&filters=Weight>100,BMI>30 Best, Dmitry |
Beta Was this translation helpful? Give feedback.
-
@redmitry The simplification offered does not solve the need of the logical condition grouping. If the flat form is critical for the logical expression, the Polish notation should be used to encode logic and decode it on the backend. |
Beta Was this translation helpful? Give feedback.
-
equals '(A & ( B | C )) | D' and is unresolvable under the proposed 'flat' rules. That's true, that proposed "flat" logic may not provide complex grouping. Kind regards, D. |
Beta Was this translation helpful? Give feedback.
-
@redmitry and @mbaudis we implemented something along the lines you are discussing here. This is implemented for searching variants and associated datasets. But I have a strong feeling the approach might contribute here as well. We created a mutation format that is as follows. Example query: Our original implementation could handle single queries like parseMutationExpression(expression: string): [string, any[]] {
const _expression = expression.replace(/\s/g, "");
const varRegex =
/^((?<refName>[0-9a-zA-Z]+)-)?(?<refBases>[a-z]+)(?<start>\d+)(?<altBases>[a-z]+)/i;
const logicRegex = /[!&:()|]/;
const queries: any[] = [];
let combination: string = "";
let index: number = 0;
for (let pos = 0; pos < _expression.length; pos++) {
const char = _expression[pos];
if (char.match(logicRegex)) {
// consume the groupings (parenthesis) and logic (!|)
combination += char == ":" ? "&" : char;
} else {
// consume the mutation
const remainder = _expression.slice(pos);
const matches: any = remainder.match(varRegex);
// must have a match starting here else skip
if (matches) {
queries.push(matches.groups);
combination += index.toString();
// jump over the match and increment index
pos += matches[0].length - 1;
index += 1;
}
}
}
return [combination, queries];
} Our approach is backwards compatible and does not break the existing workflow. Only shift gears with the extra For the metadata scenario, we can use a similar approach. We do not change the filters in
We just have to add an extra query param as follows
Note we can URL encode to include these combination characters like Similarly for This is a slight change in protocol but should add immense flexibility for the community. My colleague and lead dev in this implementation @bhosking, can chime in for extended details if needed. Very happy to hear your thoughts. Cheers, |
Beta Was this translation helpful? Give feedback.
-
Hi all,
Introducing negation would allow standard logic transformation rules and removal of nested brackets in expense of less readable logic. @anuradhawick As I mentioned in the presentation:
My point is that the "flat" logic is an additional step that could be hard to use and we can always opt for the additional property Cheers, Dmitry |
Beta Was this translation helpful? Give feedback.
-
@anuradhawick , all (This is a bit OT for the specific filters topic but since variants were used in the example & the general principle applies): I just want to point out that for variant queries IMO we will move to "typed" queries ( @redmitry Per-filter negation is OMO a given & there is an issue somewhere (also aligning this w/ Phenopackets). Also, since all variant query parameters can be provided as GET or POST, and not with any defined order, they should be evaluated separately for correct form & then checked against the (implicit or better future explicit) query type.
... without checking character compatibility :-) Easier to write in JSON, with a bit more complex example (here matching cases which have either a given protein sequence variant in CDKN2A OR a combination of a CNV DEL overlapping the gene together with a specific sequence alteration in the remaining Allele - I made this up, partially...): "g_variants": {
"variantQueryGroups": {
"logic": "OR",
"groups": [
{
"logic": "AND",
"variantQueries": [
{
"genomicBracketQuery": {
"referenceName": "refseq:NC_000009.12",
"start": [21000001, 21975098],
"end": [21967753, 23000000],
"variantMinLength": 1000,
"variantType": "EFO:0030067"
}
},
{
"genomicSequenceQuery": {
"referenceName": "refseq:NC_000009.12",
"start": 21675032,
"alternateBases": "A"
}
}
]
},
{
"variantQueries": [
{
"aminoacidAlterationQuery": {
"geneId": "CDKN2A",
"aminoacidAlteration": "D153Y"
}
}
]
}
]
}
} So, using this I just want to caution against very specific implementations ... The above example is also designed to allow the legacy parameters directly in the |
Beta Was this translation helpful? Give feedback.
-
@redmitry I have some concerns about the following from your slides.
The conversion - This could be complex if we had May be what @mbaudis has proposed with query groups will solve this problem? |
Beta Was this translation helpful? Give feedback.
-
Current procedure for filters is "AND". The main idea was introducing implicit grouping:
Because
The problem is that I want to be able to provide groping like:
Following the "rules" As I already said this "rules" of grouping may not provide complex grouping, Since I got many questions about the proposed "logic" even from technical people, I am afraid it is hardly applicable for "normal" people... I am biasing towards adding a common "filters_logic" field like Cheers, D. |
Beta Was this translation helpful? Give feedback.
-
I agree that the need to "translate" the logic statement in a specific form can potentially confuse API users and lead to unexpected results. Plus, I'm concerned about limited applicability. @anuradhawick Your solution/proposal is more clear for users, though I would call the new parameter queryExpression (vs queryCombination). I have a question: how do you get around the use of non-standard URL encoding for GET operation? The proposal from @mbaudis resembles DNF: (group of expressions joined by AND) OR (group of expressions joined by AND), from our proposal to use DNF or CNF. |
Beta Was this translation helpful? Give feedback.
-
@zykonda for GET requests, we must to use URL encoding (usually handled by web frameworks to ensure safety, etc). This should not be huge problem as POSTMAN, etc support this out of box. At the moment though, we use POST requests from our GUI. For a GET request, the query expression
|
Beta Was this translation helpful? Give feedback.
-
Here is the proposal for supporting OR Logic Operator in GA4GH Beacon API Standard presented by Microsoft Research on GA4GH Connect in April 2024 (the detailed deck is available here: https://docs.google.com/presentation/d/19gXn516tHPFQTRdONkbCNUHqBXnE6CC0/edit?usp=sharing&ouid=107275616466533633561&rtpof=true&sd=true) Current standard Technical problem FilteringNormalForm (working name) proposal FilteringNormalForm is supposed to join FilteringTerms elements with OR. Theoretical proof Note, it should be up to requestor to choose to use FilteringNormalForm to union several filtering terms structures or skip directly to FilteringTerms only. |
Beta Was this translation helpful? Give feedback.
-
Hi all,
a quick question about supported/expected filtering logic, especially for ontology terms:
Reading the documentation I understand that the default filtering logic for multiple filters is AND (and that makes total sense), but there are more complex use cases.
For example: I am interested in data from male individuals that have one of 3 diseases (which are not hierarchically related).
Currently, I'd query multiple times and do the intersection/combination of results on the client side to get to my final answer.
I was wondering if there is a defined/supported way to express a logic that combines AND with OR logic to perform more complex queries on the server side. Perhaps I just overlooked it in the docs or there's a semi-official solution?
My particular view point is with regards to beacon network and UI, with the following two opposing issues:
I have seen implementations using multiple ontology terms (space separated) per filter allowing an OR logic for terms within a filter and AND logic across filters. But I don't know if that is to be supported.
As I am fairly new to the beacon world I was hoping this has come up elsewhere and I could get some guidance/advice from more experienced people.
Any and all feedback is much appreciated! Thanks!
FYI: @victorskl @anuradhawick
Beta Was this translation helpful? Give feedback.
All reactions