ENH: improve performance of read_dataframe if a filter is used #577

theroggy · 2025-09-13T01:10:50Z

In read_dataframe without arrow, the number of rows of the result was counted first, and then the full data was read.

Especially when using a filter, counting the rows can take significant time.

This PR avoids doing the rowcount before reading to improve performance, with these results with the new zealand building outlines geopackage (3.3 million rows) as test file:

If the filter limits the rows a lot counting the rows can even take the same time as the subsequent reading of all data... so in this case the time taken ~halves.
- e.g. reading the test file file with where="ST_NPOINTS(st_buffer(geom, 10)) > 2000" (returning 9 rows) took 82 s, now 45 s.
When reading the entire file without filter, both implementations take 55-60 seconds on my windows laptop (plugged in), with the new implementation giving the same average timings.

…ance-of-read_dataframe

theroggy added 4 commits September 8, 2025 09:02

ENH: improve performance of read_dataframe

f73eae3

ENH: improve performance of read_dataframe if a filter is used

ec645b5

Update CHANGES.md

1fad77a

Update _io.pyx

af294f2

theroggy marked this pull request as ready for review September 13, 2025 15:49

theroggy marked this pull request as draft September 13, 2025 15:50

theroggy added 3 commits September 13, 2025 20:17

Try to fix for pandas 3

d3fcf15

Update test_geopandas_io.py

8e1328a

Update test_geopandas_io.py

6b816c3

theroggy marked this pull request as ready for review September 13, 2025 20:30

theroggy modified the milestones: 0.11.0, 0.12.0 Sep 14, 2025

theroggy added 3 commits September 25, 2025 21:25

Merge remote-tracking branch 'upstream/main' into ENH-improve-perform…

b6a90b4

…ance-of-read_dataframe

Merge remote-tracking branch 'upstream/main' into ENH-improve-perform…

fdc55be

…ance-of-read_dataframe

Merge remote-tracking branch 'upstream/main' into ENH-improve-perform…

9376dec

…ance-of-read_dataframe

theroggy modified the milestones: 0.12.0, 0.12.1, 0.13.0 Nov 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: improve performance of read_dataframe if a filter is used #577

ENH: improve performance of read_dataframe if a filter is used #577

theroggy commented Sep 13, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

ENH: improve performance of read_dataframe if a filter is used #577

Are you sure you want to change the base?

ENH: improve performance of read_dataframe if a filter is used #577

Conversation

theroggy commented Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

theroggy commented Sep 13, 2025 •

edited

Loading