This repo contains Python code to generate the global dataset of factor returns, stock returns, and firm characteristics from “Is there a Replication Crisis in Finance?” by Jensen, Kelly, and Pedersen (Journal of Finance, 2023).
- Obtain your WRDS credentials.
- Ensure you have uv installed on your system.
-
Clone the repo
- Clone the folder to your local machine by running the following command from your terminal:
git clone https://github.com/bkelly-lab/jkp-data.git
- Clone the folder to your local machine by running the following command from your terminal:
-
Input WRDS credentials
-
To save your WRDS credentials, navigate to the
jkp-data/folder and run:uv run python code/wrds_credentials.py
Kindly follow the prompts.
Note: If you need to change your password or credentials, run
uv run python code/wrds_credentials.py --resetand thenuv run python code/wrds_credentials.py
-
-
Run the script
-
We run the code via a Slurm scheduler, but we also show how to run it in an interactive Python session.
-
Before running the following commands, make sure you are in
jkp-data/ -
On a cluster with a Slurm scheduler, run:
sbatch slurm/submit_job_som_hpc.slurm
to create the factor returns, stock returns, and firm characteristics.
In an interactive session, run:
uv run python code/main.py
to create the stock returns and firm characteristics, and
uv run python code/portfolio.py
to create the factor returns.
IMPORTANT: When starting the code, you may be prompted to grant access to WRDS using two-factor authentication, for example via a Duo notification. You need to approve this request, as the program will otherwise fail. After a few seconds or minutes, you should see data being created in
data/raw. If that is not the case, please check your internet connection or credentials. -
When the code is finished, you can find the output in:
data/processed/
Please see the release notes (documentation/release_notes.html) for a description of the output files and a comparison between the output of the SAS/R codebase and the new Python codebase.
-
By default, the end date for the data in the code is 2024-12-31, which you can change by editing line 4 of the
code/main.pyfile. For example, for May 6, 1992, use:end_date = pl.datetime(1992, 5, 6). -
To run the code, we utilize a high performance computing cluster, where we request 450 GB RAM and 128 CPU cores. Running the routine takes about 6 hours.
-
To understand the data, please refer to our documentation.
-
We distribute the global factor returns generated from this codebase at jkpfactors.com and the stock returns and firm characteristics at wrds-www.wharton.upenn.edu/pages/get-data/contributed-data-forms/global-factor-data/.
-
The original SAS/R codebase is still available at github.com/bkelly-lab/ReplicationCrisis, but we recommend using this new Python codebase for future work.