Example cohorts and benchmarks for ibis by azimov · Pull Request #22 · OHDSI/Circepy

azimov · 2026-03-18T21:34:36Z

@egillax here is a basic benchmark. I've done this in R as that's where people currently interface with it. This is also pointing at your branch as I didn't want to step on any toes.

Core takeaway - When timing the process - running the cohorts is generally faster on ibis in duckdb, but the overall process is slower because of the overhead of generating the SQL. I'm not sure how much we care about that though. There are also probably signficant savings we can make from re-using concept sets (which is in your branch) and other planning optimizations when generating multiple cohorts simultaniously.

I will add a report for databricks shortly

This reverts commit 01cdc29.

azimov · 2026-03-19T03:37:44Z

benchmark_report_databricks.md

@egillax this shows a ~55% improvement but it should be noted that this is when the lazy evaluation has been completed. A better approach might be to render the sql and just compare that. From a user perspective ibis adds processing overhead that may make cohort generation slower.

egillax · 2026-03-19T09:31:47Z

@azimov did you run both duckdb and databricks on eunomia?

From the numbers at least it looks like small data. Process overhead is probably constant as you scale. But we should definitely take not of it and explore further at some point. I can also test this on postgres and duckdb on real data locally, after this week though since there's some maintenance going on in our server room.

We should also note cohorts where there are big differences as places to investigate further. I think with new engine we can have like a custom materialization/caching strategy tuned to each backend, some backends may be really good at figuring out the best plan themselves while others not (Looking at you postgres).

…' into features/ibis-benchmark

azimov · 2026-03-19T14:19:07Z

@egillax - the databricks testing is actually on healthverity so this is a meaningful performance increase. I will make the script available separatley to the duckdb one - however I had to tweak the ibis code to build the relation and separately which is kind of messy.

I think performance tuning would be good, but this is something we may not have the ability to set for the user. For example, adaptive query execution and partition coalescing are configurable on spark but most users probably won't be able to adjust these settings.

…hmark

azimov and others added 4 commits March 18, 2026 14:28

Example cohorts and benchmarks for ibis

399a4fb

run tests on all branches

01cdc29

Revert "run tests on all branches"

9aa54b0

This reverts commit 01cdc29.

Added benchmark for databricks

6b5856f

azimov commented Mar 19, 2026

View reviewed changes

Merge remote-tracking branch 'origin/features/ibis-new-design-develop…

622d930

…' into features/ibis-benchmark

azimov added 3 commits March 19, 2026 07:33

Benchmark code for databricks

53472a0

Benchmark code for databricks

8e3cc5c

Benchmark code for databricks

8795132

Base automatically changed from features/ibis-new-design-develop to develop March 20, 2026 12:43

Merge remote-tracking branch 'origin/develop' into features/ibis-benc…

cb21bde

…hmark

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Example cohorts and benchmarks for ibis#22

Example cohorts and benchmarks for ibis#22
azimov wants to merge 9 commits intodevelopfrom
features/ibis-benchmark

azimov commented Mar 18, 2026

Uh oh!

azimov Mar 19, 2026

Uh oh!

egillax commented Mar 19, 2026

Uh oh!

azimov commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

azimov commented Mar 18, 2026

Uh oh!

azimov Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

egillax commented Mar 19, 2026

Uh oh!

azimov commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants