Skip to content

Conversation

andreacate
Copy link
Contributor

@andreacate andreacate commented Mar 31, 2025

Dynamical Factor Models (DFM) Implementation

This PR provides a first draft implementation of Dynamical Factor Models as part of my application proposal for the PyMC GSoC 2025 project. A draft of my application report can be found at this link.

Overview

  • Added DFM.py with initial functionality

Current Status

This implementation is a work in progress and I welcome any feedback

Next Steps

  • Vectorize the construction of the transition and selection matrices (possibly by reordering state variables).
  • Add support for measurement error.

@zaxtax
Copy link
Contributor

zaxtax commented Apr 1, 2025

Looks interesting! Just say when you think it's ready for review

@fonnesbeck
Copy link
Member

cc @jessegrabowski

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@andreacate
Copy link
Contributor Author

Thanks for the feedback!

I'm still exploring the best approach for implementing Dynamic Factor Models.
I've added a simple custom DFM model in a Jupyter notebook, which I plan to use as a prototype and testing tool while developing the main BayesianDynamicFactor class.

@andreacate andreacate force-pushed the DFM_draft_implementation branch 2 times, most recently from 21560db to a459a1a Compare July 25, 2025 10:44
@jessegrabowski
Copy link
Member

Some tests are failing due to missing constants. You might have lost some changes in the reset/rebasing process

@andreacate andreacate force-pushed the DFM_draft_implementation branch from 1c04f65 to bc3fcf2 Compare July 25, 2025 13:51
Copy link
Member

@jessegrabowski jessegrabowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. I didn't look over the tests because they still seem like WIP, but seem to be on the right track!

@andreacate andreacate force-pushed the DFM_draft_implementation branch 3 times, most recently from 7846f15 to e15cdd3 Compare July 29, 2025 07:59
@andreacate andreacate force-pushed the DFM_draft_implementation branch from e15cdd3 to 3b8bfe4 Compare August 8, 2025 12:36
@andreacate andreacate force-pushed the DFM_draft_implementation branch from 6496f38 to 615960b Compare August 15, 2025 21:20
Copy link
Member

@jessegrabowski jessegrabowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a deeper pass on everything except the build_symbolic_graph method. I need to spend more time on that because it's gotten quite complex.

I'll finish ASAP.

@jessegrabowski jessegrabowski force-pushed the DFM_draft_implementation branch from 8a03197 to ca1a86e Compare August 26, 2025 05:20
pt.zeros((self.k_endog, self.k_endog * (self.error_order - 1)), dtype=floatX)
)
if len(matrix_parts) == 1:
design_matrix = factor_loadings * 1.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this line do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this line is quite messy. I was running into a parameter-naming issue in the test. When error_order=0 and factor_order=0, the design matrix consists only of the first matrix block, which is "factor_loadings". For some reason, during model construction, it expects a parameter named "design" instead of "factor_loadings". This, of course, leads to an error.
Do you have any suggestions? I forgot to mention this issue when I first encountered it.

self.ssm["state_cov", :, :] = factor_cov

# Observation covariance matrix (H)
if self.error_order > 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems backwards? The first case looks like it assumes error_order == 0.

This comment is based on error_sigma appearing in the else branch

sigma_obs = self.make_and_register_variable(
"sigma_obs", shape=(self.k_endog,), dtype=floatX
)
total_obs_var = error_sigma**2 + sigma_obs**2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure error_sigma should appear here? It's already appearing in Q, so is this double-counting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your last 2 comments I think are related to this doubt: how should we handle the error_sigma/error_cov when the error_order=0.
In our implementation—which replicates what is done in Stats—when error_order = 0 there is no error term that is included in the state vector. This means I cannot add the error_sigma term to the state_cov, since its shape does not account for error terms. My first thought was to add it to the observation equation instead, but I don’t think that’s the right approach.

Conceptually, when error_order = 0, we should interpret it as having only the standard innovations (i.e., normal errors) on each endogenous variable, without introducing an explicit term in the state.

I have seen that in Stats, when error_order = 0, no additional states or innovations are introduced for the error term.

       # Calculate the number of states
        k_states = self._factor_order
        k_posdef = self.k_factors
        if self.error_order > 0:
            k_states += self._error_order
            k_posdef += k_endog

andreacate and others added 5 commits August 26, 2025 15:50
@andreacate andreacate marked this pull request as ready for review August 27, 2025 09:21
Copy link
Member

@jessegrabowski jessegrabowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final tiny comments. This looks great!

design_matrix = pt.concatenate([design_matrix_time, Z_exog], axis=2)

self.ssm["design"] = design_matrix

# Transition matrix
# auxiliary function to build transition matrix block
# Transition matrix (T)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make a little ascii diagram of how A B C fit together, or just write T = BlockDiag(A, B, C) before you introduce the names block A block B. Reading it I felt like "hey wait did I miss something"

Copy link

review-notebook-app bot commented Aug 30, 2025

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2025-08-30T09:48:40Z
----------------------------------------------------------------

Ipman was also a pretty good movie, but the sequels sucked.


Copy link

review-notebook-app bot commented Aug 30, 2025

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2025-08-30T09:48:41Z
----------------------------------------------------------------

Is this why you choose factor_order = 2 later? It would be nice to make a more obvious bridge between this section and the final statistical model. You don't actually model the data that this analysis is based on (you do the log differences ultimately), so it's a bit of a loose connection.


andreacate commented on 2025-08-30T09:55:06Z
----------------------------------------------------------------

Yes sure, you are right. No I have not made decision about parameters, since I wanted just to replicate what was done in Stats. Cointegration was not present in Stats, I was just curious at the beginning. I can just delete the cells about cointegration

Copy link

review-notebook-app bot commented Aug 30, 2025

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2025-08-30T09:48:42Z
----------------------------------------------------------------

You can consider cutting this section IMO, and just say "Looking at the graphs, these time series are obviously non-stationary"


Copy link

review-notebook-app bot commented Aug 30, 2025

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2025-08-30T09:48:42Z
----------------------------------------------------------------

Here's a fancier ADF test function that I use if you want. It matches the output of STATA, and gives all 3 variants of the ADF test:

`py

def ADF_test_summary(df, maxlag=None, autolag='BIC', missing='error'):
    if missing == 'error':
        if df.isna().any().any():
            raise ValueError("df has missing data; handle it or pass missing='drop' to automatically drop it.")
            
    if isinstance(df, pd.Series):
        df = df.to_frame()
        
    for series in df.columns:
        data = df[series].copy()
        if missing == 'drop':
            data.dropna(inplace=True)
            
        print(series.center(110))
        print(('=' * 110))
        line = 'Specification' + ' ' * 15 + 'Coeff' + ' ' * 10 + 'Statistic' + ' ' * 5 + 'P-value' + ' ' * 6 + 'Lags' + ' ' * 6 + '1%'
        line += ' ' * 10 + '5%' + ' ' * 8 + '10%'
        print(line)
        print(('-' * 110))
        spec_fixed = False
        for i, (name, reg) in enumerate(zip(['Constant and Trend', 'Constant Only', 'No Constant'], ['ct', 'c', 'n'])):
            stat, p, crit, regresult = sm.tsa.adfuller(data, regression=reg, regresults=True, maxlag=maxlag,
                                                       autolag=autolag)
            n_lag = regresult.usedlag
            gamma = regresult.resols.params[0]
            names = make_var_names(series, n_lag, reg)
            reg_coefs = pd.Series(regresult.resols.params, index=names)
            reg_tstat = pd.Series(regresult.resols.tvalues, index=names)
            reg_pvals = pd.Series(regresult.resols.pvalues, index=names)

            line = f'{name:<21}{gamma:13.3f}{stat:15.3f}{p:13.3f}{n_lag:11}{crit["1%"]:10.3f}{crit["5%"]:12.3f}{crit["10%"]:11.3f}'
            print(line)

            for coef in reg_coefs.index:
                if coef in name:
                    line = f"\t{coef:<13}{reg_coefs[coef]:13.3f}{reg_tstat[coef]:15.3f}{reg_pvals[coef]:13.3f}"
                    print(line)
`


Copy link

review-notebook-app bot commented Aug 30, 2025

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2025-08-30T09:48:43Z
----------------------------------------------------------------

Plot the transformed data and comment before you run the stationarity test


Copy link

review-notebook-app bot commented Aug 30, 2025

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2025-08-30T09:48:43Z
----------------------------------------------------------------

They're not that simple, because you constrained the sign. You should comment on the prior choices here, in addition to in the comments.


Copy link

review-notebook-app bot commented Aug 30, 2025

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2025-08-30T09:48:45Z
----------------------------------------------------------------

Add some commentary on what is being shown here


Copy link

review-notebook-app bot commented Aug 30, 2025

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2025-08-30T09:48:45Z
----------------------------------------------------------------

Show this before sampling


Copy link

review-notebook-app bot commented Aug 30, 2025

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2025-08-30T09:48:46Z
----------------------------------------------------------------

Use constrained_layout so the labels and titles aren't crushed together. Also consider sharex='col' ? Currently it looks like all the estimates are the same.


Copy link

review-notebook-app bot commented Aug 30, 2025

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2025-08-30T09:48:46Z
----------------------------------------------------------------

The legend is wrong -- the gray are recessions, not HDI. Is the HDI plotted here, but it's just really tight? If so, comment on this.


Copy link

review-notebook-app bot commented Aug 30, 2025

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2025-08-30T09:48:47Z
----------------------------------------------------------------

Commentary? What is state 0 (consider renaming the title to be clear, like Estimated Latent Factor 1)? Add recessions?


Copy link

review-notebook-app bot commented Aug 30, 2025

View / edit / reply to this conversation on ReviewNB

jessegrabowski commented on 2025-08-30T09:48:48Z
----------------------------------------------------------------

typo: Statsmodels


@jessegrabowski
Copy link
Member

The notebook also looks great!

Could you add some more headings, and motivate all the analysis that you're doing with some commentary? Make sure the piece connect together clearly. I'd move the Bayesian latent factor graph above the Coincident index.

I would also suggest you comment on the fact that the optimizer doesn't converge in the statsmodels model. It ends up being "close enough", but make some hay out of the fact that MCMC doesn't "converge", it explores all equiprobable solutions, which in this case is important because the model is only weakly identified.

Copy link
Contributor Author

Yes sure, you are right. No I have not made decision about parameters, since I wanted just to replicate what was done in Stats. Cointegration was not present in Stats, I was just curious at the beginning. I can just delete the cells about cointegration


View entire conversation on ReviewNB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants