Skip to content

[Bug]: Aug_msf_v2 fails on re-run without fresh WRDS download #56

@FernandoRDLL

Description

@FernandoRDLL

Which script were you running?

main.py (data generation)

Error Category

Other

Bug Description

aug_msf_v2() is not idempotent. It crashes with a duplicate column error if crsp_msf_v2.parquet already contains
the mthaskhi and mthbidlo columns it tries to add.

Steps to Reproduce

  1. Run the full pipeline (including download_raw_data_tables()) — succeeds
  2. Comment out download_raw_data_tables() and re-run — aug_msf_v2() fails:
    ibis.common.exceptions.IbisInputError: Duplicate column name 'mthaskhi' in result set

Expected Behavior

Run twice without having to download everything

Error Output / Stack Trace

Traceback (most recent call last):
  File "/gpfs/home/ffr7/jkp-data/code/main.py", line 50, in <module>
    aug_msf_v2()
  File "/gpfs/home/ffr7/jkp-data/code/aux_functions.py", line 774, in aug_msf_v2
    ).select([msf] + [m.mthaskhi, m.mthbidlo])
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/gpfs/home/ffr7/jkp-data/.venv/lib/python3.11/site-packages/ibis/expr/types/joins.py", line 385, in select
    values = unwrap_aliases(values)
             ^^^^^^^^^^^^^^^^^^^^^^
  File "/gpfs/home/ffr7/jkp-data/.venv/lib/python3.11/site-packages/ibis/expr/types/relations.py", line 509, in
un>
    raise com.IbisInputError(
ibis.common.exceptions.IbisInputError: Duplicate column name 'mthaskhi' in result set

Operating System

Linux

Python Version

3.13.11

Polars Version

Default uv

uv Version

0.10.9

Available RAM

No response

WRDS Authentication Method

Stored credentials (keyring)

WRDS Two-Factor Authentication

  • Yes, I approved the 2FA request
  • No 2FA prompt appeared
  • 2FA failed or timed out
  • Not applicable

Additional Context

No response

Pre-submission Checklist

  • I have searched existing issues to ensure this bug has not already been reported
  • I am using the latest version of the code from the main branch
  • I have read the README and followed the setup instructions

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions