Forked from the replication for Anthropic's "Toward Monosemanticity" paper. The goal was to systematically find which feature out of a group with similar labels given by the auto interpreter modified an LLM's emotional response the most.
forked from neelnanda-io/1L-Sparse-Autoencoder
-
Notifications
You must be signed in to change notification settings - Fork 0
License
matt-suncy/modifying-llm-emotions
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published
Languages
- Jupyter Notebook 99.4%
- Python 0.6%