You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In tidyverse you can use unite to combine columns into a new column. This is useful when you want to create an id for example by combining multiple columns. In dplyr we have the function unite for this. We probably also want a separate function.
This is an initial proposition. I also added an option to reduce the number of significant digits on numeric quantities. Otherwise, numerical precision can lead to different IDs.
def signif(x, digits=2):
"""Round numeric values to significant digits."""
try:
return float(f"{x:.{digits}g}") if isinstance(x, (int, float)) else x
except:
return x # Return as-is if conversion fails
@register_dataframe_method
def unite(df, prefix, new_column_name, sep="_", digits=4):
"""
Combines all columns with a given prefix into a single column without removing the originals.
Parameters:
df (pd.DataFrame): The input DataFrame.
prefix (str): The prefix to filter columns.
new_column_name (str): The name of the new combined column.
sep (str): Separator for concatenating values.
digits (int): Number of significant digits for numeric values.
Returns:
pd.DataFrame: DataFrame with the new combined column.
"""
df2 = df.copy()
# Select columns with the given prefix using pyjanitor's select method
config_cols = df2.select(columns=[f"{prefix}*"])
# Apply rounding to numeric values using map
config_cols = config_cols.map(lambda x: signif(x, digits) if isinstance(x, (float, int)) else x)
# Create the new combined column
df2[new_column_name] = config_cols.astype(str).agg(sep.join, axis=1)
return df2
Example run:
import pandas as pd
df = pd.DataFrame({
"config_a": [1.234567, 2.345678, 3.456789],
"config_b": ["B1", "B2", "B3"],
"config_c": [100.567, 200.678, 300.789],
"other_col": [1, 2, 3]
})
# Use the custom pandas method via pandas_flavor
df = df.unite(prefix="config", new_column_name="id")
print(df)
The text was updated successfully, but these errors were encountered:
In tidyverse you can use unite to combine columns into a new column. This is useful when you want to create an id for example by combining multiple columns. In dplyr we have the function unite for this. We probably also want a separate function.
This is an initial proposition. I also added an option to reduce the number of significant digits on numeric quantities. Otherwise, numerical precision can lead to different IDs.
Example run:
The text was updated successfully, but these errors were encountered: