Skip to content

Commit

Permalink
update high-lvl readme
Browse files Browse the repository at this point in the history
  • Loading branch information
Golovneva committed Sep 17, 2024
1 parent 877a870 commit dc4da5c
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions projects/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@ Here we list projects undertaken in the RAM framework that are shared publicly,
### [Following Length Constraints in Instructions](https://arxiv.org/abs/2406.17744)
Aligned instruction following models can better fulfill user requests than their unaligned counterparts. However, it has been shown that there is a length bias in evaluation of such models, and that training algorithms tend to exploit this bias by learning longer responses. In this work we show how to train models that can be controlled at inference time with instructions containing desired length constraints. Such models are superior in length instructed evaluations, outperforming standard instruction following models such as GPT4, Llama 3 and Mixtral.

### [Contextual Position Encoding: Learning to Count What's Important](https://arxiv.org/pdf/2405.18719)
The attention mechanism is a critical component of Large Language Models (LLMs) that allows tokens in a sequence to interact with each other but is orderinvariant. Incorporating position encoding (PE) makes it possible to address by position, such as attending to the i-th token. However, current PE methods use token counts to derive position, and thus cannot generalize to higher levels of abstraction, such as attending to the i-th sentence. In this paper, we propose a new position encoding method, Contextual Position Encoding (CoPE), that allows positions to be conditioned on context by incrementing position only on certain tokens determined by the model. This allows more general position addressing such as attending to the i-th particular word, noun, or sentence. We show that CoPE can solve the selective copy, counting and Flip-Flop tasks where popular position embeddings fail, and improves perplexity on language modeling and coding tasks.

<!---
### [System 2 Attention (is something you might need too)](https://arxiv.org/pdf/2311.11829.pdf)
Soft attention in Transformer-based Large Language Models (LLMs) is susceptible to incorporating irrelevant information from the context into its latent representations, which adversely affects next token generations. To help rectify these issues, we introduce System 2 Attention (S2A), which leverages the ability of LLMs to reason in natural language and follow instructions in order to decide what to attend to. S2A regenerates the input context to only include the relevant portions, before attending to the regenerated context to elicit the final response. In experiments, S2A outperforms standard attention-based LLMs on three tasks containing opinion or irrelevant information, QA, math word problems and longform generation, where S2A increases factuality and objectivity, and decreases sycophancy.
Expand Down

0 comments on commit dc4da5c

Please sign in to comment.