Skip to content

ENH: pd.to_sql(upsert=True, upsert_on_columns=['game_id']) #60434

@vile319

Description

@vile319

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

I think it would be fantastic if upserting was built in as part of the pandas .to_sql() function. This is because I have to make my tables have a unique index separately before using the to_sql function, which is annoying.

Feature Description

Just go ahead and make the upsert add a unique constraint to a specific table based on the specific columns that the user wants. Of course you should warn them that this will remove duplicates (but it they want to upsert, they probably don't want duplicates in the first place). Just have something similar to this when doing pd.to_sql(upsert=True, upsert_on_columns=[game_id, player_id])

def create_table_with_unique_constraint(table_name, engine, unique_columns):
    """Create a new table with unique constraints and copy data from old table."""
    # Get column names and types from existing table
    inspector = sqlalchemy.inspect(engine)
    columns = inspector.get_columns(table_name)
    
    # Create column definitions preserving data types
    cols_sql = ', '.join([f'{col["name"]} {col["type"]}' for col in columns])
    unique_cols = ', '.join(unique_columns)
    
    # Create new table with unique constraint
    temp_table = f"{table_name}_temp"
    create_sql = f'CREATE TABLE {temp_table} ({cols_sql}, UNIQUE({unique_cols}) ON CONFLICT REPLACE)'
    
    with engine.begin() as conn:
        # Create new table
        conn.execute(text(create_sql))
        
        # Copy data from old to new table
        conn.execute(text(f"INSERT INTO {temp_table} SELECT * FROM {table_name}"))
        
        # Drop old table
        conn.execute(text(f"DROP TABLE {table_name}"))
        
        # Rename new table to original name
        conn.execute(text(f"ALTER TABLE {temp_table} RENAME TO {table_name}"))
def to_sql_upsert(df, table_name, engine, unique_columns):
    """
    Write DataFrame to SQL with upsert functionality.
    If table exists with unique constraint, appends directly.
    If not, creates table with constraint after appending.
    
    Args:
        df: DataFrame to write
        table_name: Name of target SQL table
        engine: SQLAlchemy engine
        unique_columns: List of columns for unique constraint
    """
    inspector = sqlalchemy.inspect(engine)
    
    # Check if table exists and has unique constraint
    has_constraint = False
    if inspector.has_table(table_name):
        unique_constraints = inspector.get_unique_constraints(table_name)
        for constraint in unique_constraints:
            if set(constraint['column_names']) == set(unique_columns):
                has_constraint = True
                break
    
    # Write data
    df.to_sql(table_name, engine, if_exists='append', index=False)
    
    # Add constraint if needed
    if not has_constraint:
        create_table_with_unique_constraint(table_name, engine, unique_columns)

Alternative Solutions

I don't know of any existing functionality, and a solution as simple as this does not seem to exist in 3rd party packages

Additional Context

This SHOULD be able to handle the case where A. you have no table and it is making it for the first time (it will automatically make it into a unique index table) B. you have a table but it does not have unique on conflict replace (it will copy over the old data into the new unique index table) and C. you have a table and it does have unique on conflict replace (it will just to_sql append. From there, sql will automatically handle the upserting.
Please, let me know if you have any questions, and thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementIO SQLto_sql, read_sql, read_sql_queryNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions