Project Info

Forked from the replication for Anthropic's "Toward Monosemanticity" paper. The goal was to systematically find which feature out of a group with similar labels given by the auto interpreter modified an LLM's emotional response the most.

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
checkpoint_storage_repo		checkpoint_storage_repo
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SAE_playground.ipynb		SAE_playground.ipynb
SAE_playground.py		SAE_playground.py
Sparse_Autoencoder.ipynb		Sparse_Autoencoder.ipynb
ablation_effect.pkl		ablation_effect.pkl
ablation_effect_all_layers.pkl		ablation_effect_all_layers.pkl
ablation_effect_layer_4_test.pkl		ablation_effect_layer_4_test.pkl
analysis.py		analysis.py
debug.txt		debug.txt
dependencies.txt		dependencies.txt
feature_steering copy.ipynb		feature_steering copy.ipynb
feature_steering.ipynb		feature_steering.ipynb
fewshot.txt		fewshot.txt
fewshot1.txt		fewshot1.txt
layer_0_effects.png		layer_0_effects.png
layer_10_effects.png		layer_10_effects.png
layer_11_effects.png		layer_11_effects.png
layer_1_effects.png		layer_1_effects.png
layer_2_effects.png		layer_2_effects.png
layer_3_effects.png		layer_3_effects.png
layer_4_effects.png		layer_4_effects.png
layer_5_effects.png		layer_5_effects.png
layer_6_effects.png		layer_6_effects.png
layer_7_effects.png		layer_7_effects.png
layer_8_effects.png		layer_8_effects.png
layer_9_effects.png		layer_9_effects.png
parallel_inference.ipynb		parallel_inference.ipynb
parallel_inference_script.py		parallel_inference_script.py
parallel_inference_test.txt		parallel_inference_test.txt
question_prompts.txt		question_prompts.txt
rare_freq_dir.pt		rare_freq_dir.pt
responses.csv		responses.csv
responses_rank_0.csv		responses_rank_0.csv
responses_rank_1.csv		responses_rank_1.csv
saved_dictionary.pkl		saved_dictionary.pkl
scrappy_feature_steering.ipynb		scrappy_feature_steering.ipynb
scratch.py		scratch.py
steer_10_effect_all_layers.pkl		steer_10_effect_all_layers.pkl
steer_20_effect.pkl		steer_20_effect.pkl
steer_20_effect_all_layers.pkl		steer_20_effect_all_layers.pkl
steer_20_effect_layer_4_test.pkl		steer_20_effect_layer_4_test.pkl
steer_20_logit_effect_all_layers.pkl		steer_20_logit_effect_all_layers.pkl
steer_20_norm_effect_all_layers.pkl		steer_20_norm_effect_all_layers.pkl
test_ground.py		test_ground.py
train.py		train.py
uploading_sae.py		uploading_sae.py
using_an_sae_as_a_steering_vector.ipynb		using_an_sae_as_a_steering_vector.ipynb
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Project Info

About

Uh oh!

Releases

Packages

Languages

License

matt-suncy/modifying-llm-emotions

Folders and files

Latest commit

History

Repository files navigation

Project Info

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages