Skip to content

Trouble with resize_to #1

@nricciardi

Description

@nricciardi

I'm trying to reproduce paper COCO results (unsuccessfully). I'm confused about resize_to parameter. Your default is default=[320, 320] and during evaluation you create a view of masks dividing 320 by ps where ps=14 using dinov2-vitb-14, i.e. 320 // 14 = 22. I think this is wrong (or I missed something), because DINO's output is (B, 16*16, dino_dim), not (B, 22*22, dino_dim).

Now, I'm trying with H = W = 224, in order to have exactly 16x16 after view operation.

# == train.py 75-82 ==

H, W = args.resize_to

features = vis_encoder(model_input)                      # (B, token, 768)
reconstruction, slots, masks = model(features)           # (B, token, 768), (B, S, D_slot), (B, S, token)
    
# == Here ==
masks = masks.view(-1, slot_num, H // ps, W // ps)                      # (B,  S, H // ps, W // ps)

predictions = F.interpolate(masks, size=(H_t, W_t), mode="bilinear")    # (B, S, H_t, W_t)
predictions = torch.argmax(predictions, dim=1)                          # (B, H_t, W_t)

Anyway, this is my (strongly) inspired implementation, maybe I have inserted some bugs here:

# patch_size = 14
# resize_to = 224
# During training I have resized to 224 images

for i, (model_input, instance_gt, semantic_gt) in enumerate(loader):
    model_input = model_input.cuda(non_blocking=True)  # (B, 3, H, W)
    instance_gt = instance_gt.cuda(non_blocking=True)  # (B, *, H_t, W_t)
    semantic_gt = semantic_gt.cuda(non_blocking=True)  # (B, *, H_t, W_t)

    H_t, W_t = instance_gt.shape[-2:]

    H = resize_to
    W = resize_to

    _, masks, _ = model.forward(model_input) # (batch_size, n_slots, n_patches)

    masks = masks.view(-1, n_slots, H // patch_size, W // patch_size)  # (B, S, H // ps, W // ps)
    predictions = F.interpolate(masks, size=(H_t, W_t), mode="bilinear")  # (B, S, H_t, W_t)
    predictions = torch.argmax(predictions, dim=1)  # (B, H_t, W_t)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions