Skip to content

Expected Returns: Factor Formation

venom1204 edited this page Apr 2, 2025 · 4 revisions

Introduction

In this project, you shall reproduce work related to concepts referenced in Investing Amid Low Expected Returns by Antti Ilmanen.

Related work

You will use some functions found in popular R in finance packages such as FactorAnalytics and PerformanceAnalytics. While extremely helpful, these packages do not completely construct factors, variables, or features from the underlying financial data of a security master or universe, which is the aim of this project. Making this work open source aims to improve efficiency of academic researchers and those working who want to gain a deeper understanding of these methods.

Details of the Expected Returns Factor Formation project

This will be a series of functions to create a factor analysis framework researchers can use to input data, and construct any factor they wish.

Mentors will guide your understanding of the topic, support the learning of good practices in software development for quantitative finance using R, and provide quality market data for testing & validating these approaches.

Students engaged in this project will obtain a deeper understanding of:

    1. Writing high quality R functions for forming Factors
    1. Testing, validating, and replicating work
    1. Developing R packages, including documentation
    1. Factor Analysis & Active Portfolio Management

Repositories containing existing work useful for porting functions to R

Data Sources

Details of Your Coding Project

The 8-week short coding period will prioritize development in the following areas:

Core Functions

  • Data Management: Creation of utilities for efficient data processing, normalization, and alignment of financial time series data.
  • Factor Construction: Development of functions to build equity factors from financial data sources such as form 10k.

Documentation

  • Documentation for Chen and Zimmermann's naming convention
  • Documentation for multiple data providers, field names to Bryan Kelly's naming convention
  • Vignettes with test cases with FactorAnalytics R library
  • Vignettes with test cases with ExpectedReturns R library

Testing

  • Comprehensive unit testing for every major function and to test factors to ensure code quality and reliability.

Expected Impact

We aim to enhance the toolkit available for financial factor research. This project will democratize access to sophisticated factor replication methodologies, empowering a broader segment of the academic and professional finance community to conduct rigorous, reproducible research.

Steps for this project:

  • Read the texts referenced above
  • Get familiar with the ExpectedReturns project.
  • Create factor constructor functions for all papers listed above, ie feature engineering in ML parlance.
  • Add Unit tests using the tinytest R package throughout the course of creating and testing your functions.

Mentors

Student-developer

Applying and tests

Firstly, please reach out to mentors directly with questions. We would love to chat with you and gauge your interest in the project.

Next, please do one or more of the following tests before contacting the mentors above. We encourage work on Linux Debian-based distributions.

  1. Pre-req: Please show show us a Github link, .R, .Rmd, or similar files which demonstrate an R project you've completed.

  2. Easy: Begin by downloading and building the ExpectedReturns and FactorAnalytics packages locally. List any build errors or issues you encounter on install, and see if you can work through those and get the package to build.

library(remotes)
install_github("JustinMShea/ExpectedReturns")
install_github("braverock/FactorAnalytics")
  1. Intermediate: Check the files in the vignettes directory and find one that doesn't build and identify bugs. Message the authors privately with issues you would open (don't post this publicly).

  2. Harder: Reflect on the steps above. Of the vignettes that rendered for you, how do you interpret the statistical estimates of the models? In addition, is there any repetitious code that may be written as a function for future use? If so, please include it as an example.

Solutions of tests

Students, please post a link to your GitHub here. Email us your test results and ping us on LinkedIn!

Al Pakrosnis - https://github.com/apakr

  1. venom1204 - github link
Clone this wiki locally