You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I recently discovered that Ollama has embed function. In this repo, there is an example of using it with 'llama3.2' (3B variant, by default). The output of this function is a vector of llama's hidden_size parameter (=3072).
Does this mean the model is doing more than just passing the input through the network (excluding the output layer) and instead applying some form of pooling?
Specifically, how is the final hidden representation being transformed from shape [input_tokenized_len, hidden_size] to [1, hidden_size]?
I understand how specialized sentence transformers achieve this, but in a typical decoder-only LLM like LLaMA, the hidden states usually retain the [input_tokenized_len, hidden_size] shape before the final output. Could someone clarify what transformation is applied here?
The text was updated successfully, but these errors were encountered:
Hello everyone!
I recently discovered that Ollama has
embed
function. In this repo, there is an example of using it with'llama3.2'
(3B variant, by default). The output of this function is a vector of llama'shidden_size
parameter (=3072).Does this mean the model is doing more than just passing the input through the network (excluding the output layer) and instead applying some form of pooling?
Specifically, how is the final hidden representation being transformed from shape
[input_tokenized_len, hidden_size]
to[1, hidden_size]
?I understand how specialized sentence transformers achieve this, but in a typical decoder-only LLM like LLaMA, the hidden states usually retain the
[input_tokenized_len, hidden_size]
shape before the final output. Could someone clarify what transformation is applied here?The text was updated successfully, but these errors were encountered: