Dynamical Factor Models (DFM) Implementation (GSOC 2025) #446

andreacate · 2025-03-31T18:47:53Z

Dynamical Factor Models (DFM) Implementation

This PR provides a first draft implementation of Dynamical Factor Models as part of my application proposal for the PyMC GSoC 2025 project. A draft of my application report can be found at this link.

Overview

Added DFM.py with initial functionality

Current Status

This implementation is a work in progress and I welcome any feedback

Next Steps

Vectorize the construction of the transition and selection matrices (possibly by reordering state variables).
Add support for measurement error.

zaxtax · 2025-04-01T23:05:56Z

Looks interesting! Just say when you think it's ready for review

fonnesbeck · 2025-04-05T15:37:57Z

cc @jessegrabowski

review-notebook-app · 2025-04-07T15:01:00Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

andreacate · 2025-04-07T15:06:44Z

Thanks for the feedback!

I'm still exploring the best approach for implementing Dynamic Factor Models.
I've added a simple custom DFM model in a Jupyter notebook, which I plan to use as a prototype and testing tool while developing the main BayesianDynamicFactor class.

pymc_extras/statespace/models/DFM.py

jessegrabowski · 2025-07-25T11:37:07Z

Some tests are failing due to missing constants. You might have lost some changes in the reset/rebasing process

jessegrabowski

Left some comments. I didn't look over the tests because they still seem like WIP, but seem to be on the right track!

pymc_extras/statespace/models/DFM.py

jessegrabowski

I did a deeper pass on everything except the build_symbolic_graph method. I need to spend more time on that because it's gotten quite complex.

I'll finish ASAP.

tests/statespace/models/test_DFM.py

pymc_extras/statespace/models/DFM.py

In the notebook a comparison between the custom DFM and the implemented DFM (which has an hardcoded version of make_symbolic_graph, that work just in this case)

…M.py

…iables in DFM.py

…pymc_extras/statespace/models/structural/components/regression.py

…FM example

pymc_extras/statespace/models/DFM.py

jessegrabowski · 2025-08-26T05:25:08Z

pymc_extras/statespace/models/DFM.py

+                pt.zeros((self.k_endog, self.k_endog * (self.error_order - 1)), dtype=floatX)
+            )
+        if len(matrix_parts) == 1:
+            design_matrix = factor_loadings * 1.0


What does this line do?

Yes, this line is quite messy. I was running into a parameter-naming issue in the test. When error_order=0 and factor_order=0, the design matrix consists only of the first matrix block, which is "factor_loadings". For some reason, during model construction, it expects a parameter named "design" instead of "factor_loadings". This, of course, leads to an error.
Do you have any suggestions? I forgot to mention this issue when I first encountered it.

pymc_extras/statespace/models/DFM.py

jessegrabowski · 2025-08-26T05:41:26Z

pymc_extras/statespace/models/DFM.py

+                self.ssm["state_cov", :, :] = factor_cov
+
+        # Observation covariance matrix (H)
+        if self.error_order > 0:


This seems backwards? The first case looks like it assumes error_order == 0.

This comment is based on error_sigma appearing in the else branch

jessegrabowski · 2025-08-26T05:42:49Z

pymc_extras/statespace/models/DFM.py

+                sigma_obs = self.make_and_register_variable(
+                    "sigma_obs", shape=(self.k_endog,), dtype=floatX
+                )
+                total_obs_var = error_sigma**2 + sigma_obs**2


Are you sure error_sigma should appear here? It's already appearing in Q, so is this double-counting?

Your last 2 comments I think are related to this doubt: how should we handle the error_sigma/error_cov when the error_order=0.
In our implementation—which replicates what is done in Stats—when error_order = 0 there is no error term that is included in the state vector. This means I cannot add the error_sigma term to the state_cov, since its shape does not account for error terms. My first thought was to add it to the observation equation instead, but I don’t think that’s the right approach.

Conceptually, when error_order = 0, we should interpret it as having only the standard innovations (i.e., normal errors) on each endogenous variable, without introducing an explicit term in the state.

I have seen that in Stats, when error_order = 0, no additional states or innovations are introduced for the error term.

# Calculate the number of states k_states = self._factor_order k_posdef = self.k_factors if self.error_order > 0: k_states += self._error_order k_posdef += k_endog

tests/statespace/models/test_DFM.py

* First pass on exogenous variables in VARMA * Adjust state names for API consistency * Allow exogenous variables in BayesianVARMAX * Eagerly simplify model where possible * Typo fix

jessegrabowski

final tiny comments. This looks great!

pymc_extras/statespace/models/DFM.py

jessegrabowski · 2025-08-30T09:22:36Z

pymc_extras/statespace/models/DFM.py

            design_matrix = pt.concatenate([design_matrix_time, Z_exog], axis=2)

        self.ssm["design"] = design_matrix

-        # Transition matrix
-        # auxiliary function to build transition matrix block
+        # Transition matrix (T)


Can you make a little ascii diagram of how A B C fit together, or just write T = BlockDiag(A, B, C) before you introduce the names block A block B. Reading it I felt like "hey wait did I miss something"

review-notebook-app · 2025-08-30T09:48:41Z

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2025-08-30T09:48:40Z
----------------------------------------------------------------

Ipman was also a pretty good movie, but the sequels sucked.

review-notebook-app · 2025-08-30T09:48:41Z

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2025-08-30T09:48:41Z
----------------------------------------------------------------

Is this why you choose factor_order = 2 later? It would be nice to make a more obvious bridge between this section and the final statistical model. You don't actually model the data that this analysis is based on (you do the log differences ultimately), so it's a bit of a loose connection.

andreacate commented on 2025-08-30T09:55:06Z
----------------------------------------------------------------

Yes sure, you are right. No I have not made decision about parameters, since I wanted just to replicate what was done in Stats. Cointegration was not present in Stats, I was just curious at the beginning. I can just delete the cells about cointegration

review-notebook-app · 2025-08-30T09:48:42Z

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2025-08-30T09:48:42Z
----------------------------------------------------------------

You can consider cutting this section IMO, and just say "Looking at the graphs, these time series are obviously non-stationary"

review-notebook-app · 2025-08-30T09:48:43Z

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2025-08-30T09:48:42Z
----------------------------------------------------------------

Here's a fancier ADF test function that I use if you want. It matches the output of STATA, and gives all 3 variants of the ADF test:

`py

def ADF_test_summary(df, maxlag=None, autolag='BIC', missing='error'):
    if missing == 'error':
        if df.isna().any().any():
            raise ValueError("df has missing data; handle it or pass missing='drop' to automatically drop it.")
            
    if isinstance(df, pd.Series):
        df = df.to_frame()
        
    for series in df.columns:
        data = df[series].copy()
        if missing == 'drop':
            data.dropna(inplace=True)
            
        print(series.center(110))
        print(('=' * 110))
        line = 'Specification' + ' ' * 15 + 'Coeff' + ' ' * 10 + 'Statistic' + ' ' * 5 + 'P-value' + ' ' * 6 + 'Lags' + ' ' * 6 + '1%'
        line += ' ' * 10 + '5%' + ' ' * 8 + '10%'
        print(line)
        print(('-' * 110))
        spec_fixed = False
        for i, (name, reg) in enumerate(zip(['Constant and Trend', 'Constant Only', 'No Constant'], ['ct', 'c', 'n'])):
            stat, p, crit, regresult = sm.tsa.adfuller(data, regression=reg, regresults=True, maxlag=maxlag,
                                                       autolag=autolag)
            n_lag = regresult.usedlag
            gamma = regresult.resols.params[0]
            names = make_var_names(series, n_lag, reg)
            reg_coefs = pd.Series(regresult.resols.params, index=names)
            reg_tstat = pd.Series(regresult.resols.tvalues, index=names)
            reg_pvals = pd.Series(regresult.resols.pvalues, index=names)

            line = f'{name:<21}{gamma:13.3f}{stat:15.3f}{p:13.3f}{n_lag:11}{crit["1%"]:10.3f}{crit["5%"]:12.3f}{crit["10%"]:11.3f}'
            print(line)

            for coef in reg_coefs.index:
                if coef in name:
                    line = f"\t{coef:<13}{reg_coefs[coef]:13.3f}{reg_tstat[coef]:15.3f}{reg_pvals[coef]:13.3f}"
                    print(line)
`

review-notebook-app · 2025-08-30T09:48:43Z

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2025-08-30T09:48:43Z
----------------------------------------------------------------

Plot the transformed data and comment before you run the stationarity test

review-notebook-app · 2025-08-30T09:48:44Z

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2025-08-30T09:48:43Z
----------------------------------------------------------------

They're not that simple, because you constrained the sign. You should comment on the prior choices here, in addition to in the comments.

review-notebook-app · 2025-08-30T09:48:45Z

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2025-08-30T09:48:45Z
----------------------------------------------------------------

Add some commentary on what is being shown here

review-notebook-app · 2025-08-30T09:48:46Z

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2025-08-30T09:48:45Z
----------------------------------------------------------------

Show this before sampling

review-notebook-app · 2025-08-30T09:48:46Z

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2025-08-30T09:48:46Z
----------------------------------------------------------------

Use constrained_layout so the labels and titles aren't crushed together. Also consider sharex='col' ? Currently it looks like all the estimates are the same.

review-notebook-app · 2025-08-30T09:48:47Z

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2025-08-30T09:48:46Z
----------------------------------------------------------------

The legend is wrong -- the gray are recessions, not HDI. Is the HDI plotted here, but it's just really tight? If so, comment on this.

review-notebook-app · 2025-08-30T09:48:47Z

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2025-08-30T09:48:47Z
----------------------------------------------------------------

Commentary? What is state 0 (consider renaming the title to be clear, like Estimated Latent Factor 1)? Add recessions?

review-notebook-app · 2025-08-30T09:48:48Z

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2025-08-30T09:48:48Z
----------------------------------------------------------------

typo: Statsmodels

jessegrabowski · 2025-08-30T09:51:23Z

The notebook also looks great!

Could you add some more headings, and motivate all the analysis that you're doing with some commentary? Make sure the piece connect together clearly. I'd move the Bayesian latent factor graph above the Coincident index.

I would also suggest you comment on the fact that the optimizer doesn't converge in the statsmodels model. It ends up being "close enough", but make some hay out of the fact that MCMC doesn't "converge", it explores all equiprobable solutions, which in this case is important because the model is only weakly identified.

andreacate · 2025-08-30T09:55:07Z

Yes sure, you are right. No I have not made decision about parameters, since I wanted just to replicate what was done in Stats. Cointegration was not present in Stats, I was just curious at the beginning. I can just delete the cells about cointegration

View entire conversation on ReviewNB

jessegrabowski requested changes Jul 13, 2025

View reviewed changes

jessegrabowski reviewed Jul 17, 2025

View reviewed changes

pymc_extras/statespace/models/DFM.py Outdated Show resolved Hide resolved

jessegrabowski reviewed Jul 17, 2025

View reviewed changes

pymc_extras/statespace/models/DFM.py Outdated Show resolved Hide resolved

andreacate force-pushed the DFM_draft_implementation branch 2 times, most recently from 21560db to a459a1a Compare July 25, 2025 10:44

andreacate force-pushed the DFM_draft_implementation branch from 1c04f65 to bc3fcf2 Compare July 25, 2025 13:51

jessegrabowski requested changes Jul 27, 2025

View reviewed changes

andreacate force-pushed the DFM_draft_implementation branch 3 times, most recently from 7846f15 to e15cdd3 Compare July 29, 2025 07:59

andreacate force-pushed the DFM_draft_implementation branch from e15cdd3 to 3b8bfe4 Compare August 8, 2025 12:36

andreacate force-pushed the DFM_draft_implementation branch from 6496f38 to 615960b Compare August 15, 2025 21:20

jessegrabowski requested changes Aug 16, 2025

View reviewed changes

andreacate and others added 12 commits August 26, 2025 12:51

Added new file DFM.py for GSOC 2025 Dynamical Factor Models

38f16a3

Add initial notebook on custom DFM implementation

a07f9e6

Update of DFM draft implementation

ced49ce

In the notebook a comparison between the custom DFM and the implemented DFM (which has an hardcoded version of make_symbolic_graph, that work just in this case)

Aligning the order of vector state with statsmodel and updating the test

67efb3d

Added test_DFM_update_matches_statsmodels and small corrections to DF…

85c96ec

…M.py

Updating test following test_ETS.py and small adjustment for exog var…

4d5fcc5

…iables in DFM.py

Added support for joint VAR modelling (error_var=True)

7fc5e11

Adding a first implemntation of exogeneous variable support based on …

3290549

…pymc_extras/statespace/models/structural/components/regression.py

Completing the implementation of exogeneous varibales support

5cd01da

Small adjustments and improvements in DFM.py

ad8c0af

Small adjustments and improvements in DFM.py

ba6ec2e

Adjustments after Jesse review

c864af2

andreacate and others added 4 commits August 26, 2025 12:51

Adjustments following Jesse suggestions and added tests for exog support

d10fce8

Added new DFM example notebook and deleted an old version of custom D…

3204a19

…FM example

Add tests for names/dims/coords

1c35fc7

De-duplicate exogenous dim between DFM and SARIMAX

ca1a86e

jessegrabowski force-pushed the DFM_draft_implementation branch from 8a03197 to ca1a86e Compare August 26, 2025 05:20

jessegrabowski requested changes Aug 26, 2025

View reviewed changes

jessegrabowski reviewed Aug 26, 2025

View reviewed changes

tests/statespace/models/test_DFM.py Show resolved Hide resolved

jessegrabowski reviewed Aug 26, 2025

View reviewed changes

tests/statespace/models/test_DFM.py Outdated Show resolved Hide resolved

jessegrabowski reviewed Aug 26, 2025

View reviewed changes

tests/statespace/models/test_DFM.py Show resolved Hide resolved

andreacate and others added 5 commits August 26, 2025 15:50

Small adjustments and refactoring after code review

7252d46

Allow exogenous regressors in BayesianVARMAX (pymc-devs#567)

b1a3c27

* First pass on exogenous variables in VARMA * Adjust state names for API consistency * Allow exogenous variables in BayesianVARMAX * Eagerly simplify model where possible * Typo fix

Small adjustments in the tests after review

a329450

Harmonizing names for EXOG dimension between DFM and VARMAX

4b6f804

Merge branch 'main' into DFM_draft_implementation

b659d61

andreacate marked this pull request as ready for review August 27, 2025 09:21

jessegrabowski requested changes Aug 30, 2025

View reviewed changes

Dynamical Factor Models (DFM) Implementation (GSOC 2025) #446

Are you sure you want to change the base?

Dynamical Factor Models (DFM) Implementation (GSOC 2025) #446

Conversation

andreacate commented Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dynamical Factor Models (DFM) Implementation

Overview

Current Status

Next Steps

Uh oh!

zaxtax commented Apr 1, 2025

Uh oh!

fonnesbeck commented Apr 5, 2025

Uh oh!

review-notebook-app bot commented Apr 7, 2025

Uh oh!

andreacate commented Apr 7, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jessegrabowski commented Jul 25, 2025

Uh oh!

jessegrabowski left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jessegrabowski left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jessegrabowski Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

andreacate Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jessegrabowski Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

jessegrabowski Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

andreacate Aug 30, 2025

Choose a reason for hiding this comment

andreacate commented Mar 31, 2025 •

edited

Loading

review-notebook-app bot commented Aug 30, 2025 •

edited

Loading

review-notebook-app bot commented Aug 30, 2025 •

edited

Loading

review-notebook-app bot commented Aug 30, 2025 •

edited

Loading

review-notebook-app bot commented Aug 30, 2025 •

edited

Loading

review-notebook-app bot commented Aug 30, 2025 •

edited

Loading

review-notebook-app bot commented Aug 30, 2025 •

edited

Loading

review-notebook-app bot commented Aug 30, 2025 •

edited

Loading

review-notebook-app bot commented Aug 30, 2025 •

edited

Loading

review-notebook-app bot commented Aug 30, 2025 •

edited

Loading

review-notebook-app bot commented Aug 30, 2025 •

edited

Loading

review-notebook-app bot commented Aug 30, 2025 •

edited

Loading

review-notebook-app bot commented Aug 30, 2025 •

edited

Loading