Optimize group by for single partition topics #836

ovv · 2025-04-14T15:09:29Z

Group by operations on topics with a single partition are now optimized to avoid creating a repartition topic. Instead, the messages are directly transformed to use the new key, as all messages go to the same partition.

quixstreams/dataframe/dataframe.py

quixstreams/dataframe/registry.py

tests/test_quixstreams/test_dataframe/test_dataframe.py

daniil-quix · 2025-04-30T18:21:39Z

quixstreams/dataframe/dataframe.py

+        groupby_sdf = self.__dataframe_clone__(
+            stream=stream, stream_id=f"{self.stream_id}--groupby--{operation}"
+        )
+        self._registry.register_groupby(
+            source_sdf=self, new_sdf=groupby_sdf, register_new_root=False
+        )


I'm not sure we even need .register_groupby() here since the operation doesn't create a new topic.
We can treat it as a stateless operation, and keep the same stream_id too.

daniil-quix

See the comments

ovv changed the title ~~Optimize groupb y for single partition topics~~ Optimize group by for single partition topics Apr 15, 2025

ovv force-pushed the quent/single-partition-groupby branch from 85cf8af to 87f2af7 Compare April 15, 2025 12:46

Optimize groupb y for single partition topics

5999c3a

Group by operations on topics with a single partition are now optimized to avoid creating a repartition topic. Instead, the messages are directly transformed to use the new key, as all messages go to the same partition.

ovv force-pushed the quent/single-partition-groupby branch from 87f2af7 to 5999c3a Compare April 15, 2025 12:53

ovv marked this pull request as ready for review April 16, 2025 11:58

gwaramadze reviewed Apr 17, 2025

View reviewed changes

quixstreams/dataframe/dataframe.py Show resolved Hide resolved

quixstreams/dataframe/registry.py Show resolved Hide resolved

quixstreams/dataframe/registry.py Outdated Show resolved Hide resolved

tests/test_quixstreams/test_dataframe/test_dataframe.py Show resolved Hide resolved

requested by review

cbbcea8

gwaramadze approved these changes Apr 21, 2025

View reviewed changes

daniil-quix reviewed Apr 30, 2025

View reviewed changes

daniil-quix requested changes Apr 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize group by for single partition topics #836

Optimize group by for single partition topics #836

ovv commented Apr 14, 2025

daniil-quix Apr 30, 2025 •

edited

Loading

daniil-quix left a comment

Optimize group by for single partition topics #836

Are you sure you want to change the base?

Optimize group by for single partition topics #836

Conversation

ovv commented Apr 14, 2025

daniil-quix Apr 30, 2025 • edited Loading

Choose a reason for hiding this comment

daniil-quix left a comment

Choose a reason for hiding this comment

daniil-quix Apr 30, 2025 •

edited

Loading