Skip to content

ENH: String functions for df.aggregate() #62050

@JustusKnnck

Description

@JustusKnnck

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

I wish I could use string functions like "first" and "last" when aggregating a dataframe just like they are used when aggregating a gorupby-object.

Feature Description

The goal is to allow "first" and "last" as valid aggregation strings in DataFrame.agg() and Series.agg() without requiring a groupby.

Implementation idea:

Currently, Series.agg() checks if the passed function name is a valid aggregation from NumPy or Pandas’ reduction methods. We can extend this logic to explicitly map "first" and "last" to the first and last elements of the Series.

Pseudocode:

Inside Series.agg() (simplified)

if isinstance(func, str):
if func == "first":
return self.iloc[0]
if func == "last":
return self.iloc[-1]
# existing code follows...

Expected behavior after change:
df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6], "c":[7,8,9]})

aggregations = {"a": "sum", "b": "first", "c": "last"}
df.agg(aggregations)

Returns:

a 6
b 4
c 9

This would align the behavior with groupby().agg(), which already supports "first" and "last".

Alternative Solutions

aggregations = {col: ("sum" if col in sumcols else (lambda x: x.iloc[-1])) for col in df.columns}
df.agg(aggregations)

Additional Context

No response

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions