Skip to content

hyunyulhenry/jkp-data

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

289 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Global Factor, Stock, and Firm data

This repo contains Python code to generate the global dataset of factor returns, stock returns, and firm characteristics from “Is there a Replication Crisis in Finance?” by Jensen, Kelly, and Pedersen (Journal of Finance, 2023).

Instructions

Prerequisites

  • Obtain your WRDS credentials.
  • Ensure you have uv installed on your system.

Steps

  1. Clone the repo

    • Clone the folder to your local machine by running the following command from your terminal:
      git clone https://github.com/bkelly-lab/jkp-data.git
  2. Input WRDS credentials

    • To save your WRDS credentials, navigate to the jkp-data/ folder and run:

      uv run python code/wrds_credentials.py

      Kindly follow the prompts.

      Note: If you need to change your password or credentials, run uv run python code/wrds_credentials.py --reset and then uv run python code/wrds_credentials.py

  3. Run the script

    • We run the code via a Slurm scheduler, but we also show how to run it in an interactive Python session.

    • Before running the following commands, make sure you are in jkp-data/

    • On a cluster with a Slurm scheduler, run:

      sbatch slurm/submit_job_som_hpc.slurm

      to create the factor returns, stock returns, and firm characteristics.

      In an interactive session, run:

      uv run python code/main.py

      to create the stock returns and firm characteristics, and

      uv run python code/portfolio.py

      to create the factor returns.

    IMPORTANT: When starting the code, you may be prompted to grant access to WRDS using two-factor authentication, for example via a Duo notification. You need to approve this request, as the program will otherwise fail. After a few seconds or minutes, you should see data being created in data/raw. If that is not the case, please check your internet connection or credentials.

When the code is finished, you can find the output in:

data/processed/

Please see the release notes (documentation/release_notes.html) for a description of the output files and a comparison between the output of the SAS/R codebase and the new Python codebase.

Notes

  • By default, the end date for the data in the code is 2024-12-31, which you can change by editing line 4 of the code/main.py file. For example, for May 6, 1992, use: end_date = pl.datetime(1992, 5, 6).

  • To run the code, we utilize a high performance computing cluster, where we request 450 GB RAM and 128 CPU cores. Running the routine takes about 6 hours.

  • To understand the data, please refer to our documentation.

  • We distribute the global factor returns generated from this codebase at jkpfactors.com and the stock returns and firm characteristics at wrds-www.wharton.upenn.edu/pages/get-data/contributed-data-forms/global-factor-data/.

  • The original SAS/R codebase is still available at github.com/bkelly-lab/ReplicationCrisis, but we recommend using this new Python codebase for future work.

About

Python codebase to create the global dataset of factor returns, stock returns, and firm characteristics from “Is there a Replication Crisis in Finance?” by Jensen, Kelly, and Pedersen (2023)

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 99.8%
  • Shell 0.2%