Unintuitive Logic in masked_fill Function of DistilBERT Model Implementation #2721

sondalex · 2025-01-16T19:06:25Z

masked_fill function of distilbert model implementation has currently unintuitive logic

candle/candle-transformers/src/models/distilbert.rs

Lines 13 to 18 in efd0e68

    
           fn masked_fill(on_false: &Tensor, mask: &Tensor, on_true: f32) -> Result<Tensor> { 
        
               let shape = mask.shape(); 
        
               let on_true = Tensor::new(on_true, on_false.device())?.broadcast_as(shape.dims())?; 
        
               let m = mask.where_cond(&on_true, on_false)?; 
        
               Ok(m) 
        
           }

In the current setup, the user must invert the attention mask obtained from the tokenizer before passing it to the model.forward function. This requirement can be confusing as it differs from transformers implementation.

...
let text: Vec<&str>  = vec![...];
let encoded = tokenizer.encode_batch(text.to_vec().clone(), true)?;
let input_ids = encoded.iter().map(|v| v.get_ids().to_vec()).collect::<Vec<_>>();
let input_ids = Tensor::new(input_ids, &device)?;
let attention_mask = encoded.iter().map(|encoding| encoding.get_attention_mask().to_vec()).collect::<Vec<_>>();
let attention_mask = Tensor::new(attention_mask, &device)?;

let (batch_size, feature_size) = input_ids.dims2()?;

// Invert the attention mask for correct behavior --> Counterintuitive
let attention_mask = attention_mask.eq(0 as u32)?.reshape((batch_size, 1, 1, feature_size))?;

let output = model.forward(&input_ids, &attention_mask)?;
...

Proposition:

Replace masked_fill function with:

fn masked_fill(on_true: &Tensor, mask: &Tensor, on_false: f32) -> Result<Tensor> {
    let shape = mask.shape();
    let on_false = Tensor::new(on_false, on_true.device())?.broadcast_as(shape.dims())?;
    let m = mask.where_cond(&on_true, &on_false)?;
    Ok(m)
}

The text was updated successfully, but these errors were encountered:

fbilhaut · 2025-01-30T17:28:58Z

I second this. The masked_fill function is indeed counterintuitive (though its name isn’t that telling either ^^). In any case, the fact that the forward function actually expects an inverted mask for that reason can be quite troublesome.

I imagine this example isn’t the most critical one, as it’s not the most up-to-date, but I think many people (like me) might try it for initial tests with Candle, since it’s well known. In that sense, the example doesn’t really serve its purpose well because it’s misleading.

sondalex mentioned this issue Jan 16, 2025

Shape of attention mask in distilbert example #2667

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unintuitive Logic in masked_fill Function of DistilBERT Model Implementation #2721

Unintuitive Logic in masked_fill Function of DistilBERT Model Implementation #2721

sondalex commented Jan 16, 2025 •

edited

Loading

fbilhaut commented Jan 30, 2025 •

edited

Loading

Unintuitive Logic in masked_fill Function of DistilBERT Model Implementation #2721

Unintuitive Logic in masked_fill Function of DistilBERT Model Implementation #2721

Comments

sondalex commented Jan 16, 2025 • edited Loading

fbilhaut commented Jan 30, 2025 • edited Loading

sondalex commented Jan 16, 2025 •

edited

Loading

fbilhaut commented Jan 30, 2025 •

edited

Loading