This project analyzes neural activity data from the Allen Brain Observatory visual behavior dataset to understand how neurons respond to visual stimuli and behavioral state.
The workflow combines:
- Tableau dashboards for exploratory and comparative visual analysis
- Python notebooks for machine learning and dimensionality reduction
- Public online data loading so the core analyses can run without a local SQL database
Main questions:
- How do neural responses differ across cell types (SST vs VIP)?
- How do responses differ for change versus omission events?
- Does running speed modulate neural activity?
- Is there interpretable low-dimensional structure in neural-response features?
- Source: Allen Institute for Brain Science (Visual Behavior 2P dataset)
- Access pattern used here: a preprocessed public parquet file downloaded on first run by the notebooks
- Scale: ~150,000 trial-cell response observations
- Entities: 223 cells, 25 sessions, 13 mice
mean_response— neural response magnitudeimage_name— stimulus identitycre_line/cell_type— cell type (SST, VIP)exposure_level— familiar vs. novel stimuliis_change,omitted,rewarded— event indicatorsmean_running_speed— locomotion / behavioral stateresponse_latency,baseline_response,peak_response— response summary features
The Tableau dashboard focuses on interpretable visual analytics:
- KPI summary
- Response by exposure level and cell type for change events
- Response by exposure level and cell type for omission events
- Running speed vs. neural response scatter plot with trend lines
- Image × cell-type heatmap
ml_analysis_online.ipynb downloads the public preprocessed dataset and trains a Random Forest classifier on a balanced high-vs-low neural response task.
pca_analysis_online.ipynb downloads the same dataset and applies PCA to examine latent structure in neural-response features.
- VIP neurons show consistently stronger responses than SST neurons
- Omission events produce weaker, but still measurable, neural responses
- These comparisons suggest that neural activity reflects both stimulus detection and expectation / surprise
- The global relationship between running speed and neural response is weak
- After stratifying by event type, exposure level, and cell type:
- VIP neurons show a slight positive association with running speed
- SST neurons show weak or negative dependence
- Differences are more pronounced under novel conditions
- Heatmaps show strong variability across image stimuli
- VIP neurons generally exhibit higher response levels than SST neurons
- Neural responses are stimulus-dependent, not uniform across images
PCA showed that neural-response variability is distributed across multiple latent dimensions.
- PC1 ≈ 23% of variance
- PC2 ≈ 18% of variance
- Top 2 PCs ≈ 41% cumulative variance
- Top 5 PCs ≈ 78% cumulative variance
Interpretation:
- The first two PCs are useful for visualization, but they do not fully capture the data structure
- Exposure level (familiar vs. novel) overlaps heavily in PCA space, so it is not the dominant source of variation in the first two PCs
- Cell type shows a broader distributional shift, with VIP neurons occupying a wider region of PCA space than SST neurons
Loadings suggest:
- PC1 behaves like an event / response-strength axis, contrasting omission-related and change/reward-related trials
- PC2 behaves like a broader activity-state axis, with strong contributions from baseline response, mean response, and running speed
Example outputs:
figures/pca_explained_variance.pngfigures/pca_by_cell_type.pngfigures/pca_by_exposure.pngfigures/pca_top_loadings.png
The Random Forest notebook formulates a conservative, interview-defensible task:
- Target: high vs. low neural response, defined by the median
mean_response
Result:
- Accuracy ≈ 64% on a balanced classification task
Interpretation:
- Trial-level state and condition features contain moderate predictive signal
- Neural response magnitude is partially predictable, but substantial residual variability remains
- This should be interpreted as an exploratory predictive analysis, not a full neural decoding system
Example outputs:
figures/ml_confusion_matrix.pngfigures/ml_feature_importance.png
Include exported figures such as:
figures/dashboard.pngfigures/scatter_plot.pngfigures/heatmap.pngfigures/pca_explained_variance.pngfigures/pca_by_cell_type.pngfigures/pca_by_exposure.pngfigures/pca_top_loadings.pngfigures/ml_confusion_matrix.pngfigures/ml_feature_importance.png
A simple root-level layout is assumed:
.
├── README.md
├── requirements.txt
├── ml_analysis_online.ipynb
├── pca_analysis_online.ipynb
├── figures/
│ ├── dashboard.png
│ ├── scatter_plot.png
│ ├── heatmap.png
│ ├── pca_explained_variance.png
│ ├── pca_by_cell_type.png
│ ├── pca_by_exposure.png
│ ├── pca_top_loadings.png
│ ├── ml_confusion_matrix.png
│ └── ml_feature_importance.png
├── data/
│ └── sample_data.csv
└── sql/
└── schema.sql
Notes:
- The notebooks download the parquet file automatically into
data/if it is missing sql/schema.sqlis optional and is mainly useful for documenting the Tableau / SQL modeling work
-
Install dependencies:
pip install -r requirements.txt
-
Run the notebooks from the project root:
ml_analysis_online.ipynbpca_analysis_online.ipynb
-
On first run, the notebooks will download the public preprocessed parquet file into
data/ -
Generated figures will be saved into
figures/
- Python: Pandas, Scikit-learn, Matplotlib
- Tableau: dashboard design and exported visuals
- SQL / PostgreSQL: used for the relational dashboard-side data model
- Public online dataset access:
requests+ parquet loading
- Neural responses depend strongly on stimulus identity and cell type
- Behavioral variables have weaker but structured influence
- Stratified analysis is necessary to uncover meaningful patterns
- The feature space is structured but moderately high-dimensional
- Moderate Random Forest accuracy suggests partial predictability with substantial unexplained variance remaining
