Segment Anything Model (SAM) for Anomaly Segmentation Evaluation

This project aims to evaluate the performance of the Segment Anything Model (SAM) in anomaly segmentation without additional training. The goal is to determine whether a foundation model like SAM can effectively segment anomalies.

Overview

1. Background

Recent advances in anomaly detection have largely followed an unsupervised learning paradigm, primarily due to the scarcity of labeled anomalous data. These methods typically train models using only normal samples, detecting anomalies as deviations from the learned distribution. While effective in controlled settings, this approach suffers from several limitations:

It generally follows a One-Class One-Model paradigm, requiring a separate model for each category or domain.
Performance is often highly sensitive to the statistical distribution of the validation data, making generalization difficult.
Most models struggle with pixel-wise anomaly segmentation, especially in complex or noisy backgrounds.

To overcome these limitations, the field has recently turned toward leveraging foundation models—large-scale models pretrained on diverse datasets—to build unified, generalizable anomaly detection and segmentation frameworks. These models, such as CLIP, DINOv2, and SAM (Segment Anything Model), exhibit strong zero-shot or few-shot capabilities, making them ideal for domains with limited supervision.

2. Project Goal

In line with this trend, this project aims to explore the feasibility of using SAM for anomaly segmentation tasks. While SAM has demonstrated impressive generalization in natural image segmentation, its ability to localize anomalous regions without supervision remains largely unexplored.

This work investigates whether SAM can effectively segment anomalies when provided with appropriate prompt inputs. Furthermore, We propose a method to automatically generate those prompts using feature-based anomaly maps, simulating real-world scenarios where ground-truth masks are not available.

We hope this exploration can contribute to future research on building foundation model-based anomaly segmentation pipelines that require minimal task-specific tuning.

3. Dataset

The evaluation is conducted using MVTec-AD Dataset, which is a benchmark dataset for industrial anomaly detection.

Idea & Method

1. Evaluating SAM’s Anomaly Segmentation Capability

To explore the potential of Segment Anything Model (SAM) in anomaly segmentation tasks, We first aimed to answer a fundamental question:

“If we provide well-designed prompts, can SAM accurately segment anomalous regions without additional training?”

To verify this, We conducted controlled experiments using ground-truth (GT) masks from benchmark datasets.
We developed a method to generate different types of SAM prompts based on GT masks:

Box only: Bounding boxes that tightly cover the anomalous regions.
Box + 1 point: Bounding boxes with a single point located inside the anomalies
Box + multiple points: Bounding boxes with 20 points sampled within the anomalous regions.

This setup allowed us to simulate varying levels of prior information and test SAM’s robustness under different prompt configurations.

2. Designing Prompt Inputs Without Using GT Masks

Model Architecture Diagram

Binary Map → Prompt Flow Diagram

In real-world applications, GT masks are not available during inference. Therefore, it is necessary to develop a practical method for approximating the location of anomalies without relying on manual annotations.

To address this, we propose using anomaly maps as a proxy. These maps are generated by computing feature similarity between a query image and a few normal reference samples, following a strategy inspired by few-shot anomaly detection. The resulting similarity map highlights regions that deviate from normal patterns and provides a coarse localization of potential anomalies.
In this context, the anomaly map plays a crucial role by serving as a spatial guide that informs SAM where potential anomalies are likely to exist, enabling it to focus its segmentation on those regions.

The anomaly map is then binarized via thresholding, and the resulting binary mask is used to automatically generate prompt inputs for SAM (e.g., points or bounding boxes). This enables effective anomaly segmentation without requiring GT masks, effectively simulating real-world scenarios where manual annotations are not available. Importantly, the rough localization provided by the anomaly map can be refined by SAM, enabling precise segmentation of anomalous regions with minimal supervision.

How to Use

Prepare the MVTec-AD dataset.
Run test_mvtec.py to evaluate SAM’s anomaly segmentation.
Use anomaly_map.py to generate anomaly maps for SAM prompts.

Code Explanation

`test_mvtec.py`

This script creates SAM's prompt input using GT masks and saves the anomaly segmentation results. It consists of four different prompt settings:

a. Naive

Creates a box prompt covering the entire image.

python test_mvtec.py --data_dir "DATASET_DIR"

b. Box Only (`b`)

Creates a bounding box around the anomalous region.

python test_mvtec.py --data_dir "DATASET_DIR" --mode b

c. Box with One Point (`bp`)

Creates a bounding box along with one point inside the anomalous region.

python test_mvtec.py --data_dir "DATASET_DIR" --mode bp

d. Box with Multiple Points (`bps`)

Creates a bounding box along with multiple points inside the anomalous region.

python test_mvtec.py --data_dir "DATASET_DIR" --mode bps

`anomaly_map.py`

This script generates a heatmap-like anomaly localization map based on feature similarity between a query image and normal images.

Uses Swin Transformer as the image encoder to extract features.
The generated anomaly map is used to create SAM's prompt input.

python test_mvtec.py --data_dir "DATASET_DIR" --save_dir "SAVE_DIR"

Results

The table below compares the anomaly segmentation performance of SAM under different prompt conditions.

The first three columns show results when prompts are generated using ground-truth (GT) masks.
The last column shows performance when using prompts generated by our few-shot anomaly map method, which does not rely on GT masks.

Each cell shows results in the format (IoU / P-AUROC), capturing both localization and segmentation quality.

Type	SAM (Box only)	SAM (Box with 1 point)	SAM (Box with 20 points)	SAM (Few-shot) (Box with 1 point)
Bottle	76.8 / 97.0	74.8 / 95.9	78.8 / 99.1	51.5 / 81.4
Cable	69.2 / 96.3	68.9 / 96.2	72.6 / 98.4	55.2 / 82.4
Capsule	57.9 / 97.3	59.1 / 97.5	58.4 / 99.3	54.7 / 87.6
Carpet	59.3 / 97.7	59.3 / 97.8	52.7 / 97.1	45.9 / 83.0
Grid	51.9 / 84.0	56.4 / 86.1	47.4 / 95.2	33.4 / 71.5
Hazelnut	74.9 / 96.6	74.6 / 96.4	75.5 / 98.3	49.0 / 83.8
Leather	58.1 / 98.4	60.0 / 99.0	57.6 / 99.5	43.5 / 84.5
Metal Nut	78.4 / 98.0	78.9 / 98.1	77.4 / 98.9	50.2 / 83.3
Pill	73.4 / 98.1	72.6 / 98.7	68.7 / 99.4	49.8 / 85.8
Screw	62.0 / 91.2	64.9 / 94.6	66.8 / 99.3	69.9 / 88.4
Tile	72.8 / 92.1	70.0 / 88.0	67.3 / 95.6	45.5 / 80.7
Toothbrus	71.7 / 96.0	69.8 / 96.6	67.6 / 98.1	56.2 / 89.9
Transistor	50.3 / 88.5	54.0 / 91.3	57.1 / 94.1	43.8 / 77.9
Wood	71.9 / 89.8	71.5 / 89.5	70.0 / 94.2	39.4 / 75.8
Zipper	60.0 / 92.0	64.6 / 93.0	69.1 / 97.3	46.8 / 80.8
Unified	65.9 / 94.2	66.6 / 94.6	65.8 / 97.6	49.0 / 82.3

Box only: Only bounding boxes that tightly covering the anomalous regions.
Box with 1 point: Bounding boxes with a single point located inside the anomalies
Box with 20 points: Bounding boxes with 20 points sampled within the anomalous regions.

Key Observations

Performance Trend: 20 points > 1 point > Box only
- In most object types, providing 20 points leads to the highest performance.
- Significant improvements are observed for complex shapes or fine-grained textures such as Cable, Screw, Zipper, and Pill.
- Average performance (Unified row):
  - Box only: 65.9 / 94.2
  - 1 point: 66.6 / 94.6
  - 20 points: 65.8 / 97.6 → notable boost in P-AUROC score
Single-point supervision already improves results
- Even adding just one point shows measurable improvements over bounding box alone in categories like Capsule, Carpet, Grid, and Metal Nut.
- This suggests minimal user interaction can meaningfully enhance segmentation accuracy.
Per-category insights
- Categories like Metal Nut, Bottle, and Pill perform consistently well under all settings.
- On the other hand, classes like Grid, Transistor, and Carpet struggle under Box-only supervision and benefit substantially from additional points.
P-AUROC is more sensitive to refinement
- P-AUROC tends to show larger improvements than IoU when additional points are provided.
- For example, in Leather, Capsule, and Screw, P-AUROC reaches 99+ with 20 points.

Conclusion & Limitations

Through our experiments, we observed that SAM’s performance significantly degrades when inaccurate or suboptimal prompts are provided. The method we used to generate these prompts—based on feature similarity and anomaly maps—has several limitations that must be considered:

Sensitivity to structural mismatch between query and reference images:
The anomaly map relies on feature similarity between the query image and a small set of normal reference images. If the reference samples have significantly different shapes or structures compared to the query, the resulting similarity map may fail to accurately highlight the anomalous regions.
Difficulties in threshold selection during binarization:
While the goal is to provide a rough localization of potential anomalies to SAM, the process of converting the anomaly map into a binary mask is highly sensitive to the threshold value. An inappropriate threshold can introduce substantial noise or cause important regions to be missed, affecting the quality of the prompts fed into SAM.

These limitations suggest that refining the anomaly map generation pipeline—especially in terms of reference selection and adaptive thresholding—could lead to more robust and generalizable anomaly segmentation using SAM.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
src		src
LICENSE		LICENSE
README.md		README.md
test_mvtec.py		test_mvtec.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Segment Anything Model (SAM) for Anomaly Segmentation Evaluation

Table of Contents

Overview

1. Background

2. Project Goal

3. Dataset

Idea & Method

1. Evaluating SAM’s Anomaly Segmentation Capability

2. Designing Prompt Inputs Without Using GT Masks

How to Use

Code Explanation

`test_mvtec.py`

a. Naive

b. Box Only (`b`)

c. Box with One Point (`bp`)

d. Box with Multiple Points (`bps`)

`anomaly_map.py`

Results

Key Observations

Conclusion & Limitations

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

jinwwo/SAM-Anomaly-Segmentation

Folders and files

Latest commit

History

Repository files navigation

Segment Anything Model (SAM) for Anomaly Segmentation Evaluation

Table of Contents

Overview

1. Background

2. Project Goal

3. Dataset

Idea & Method

1. Evaluating SAM’s Anomaly Segmentation Capability

2. Designing Prompt Inputs Without Using GT Masks

How to Use

Code Explanation

test_mvtec.py

a. Naive

b. Box Only (b)

c. Box with One Point (bp)

d. Box with Multiple Points (bps)

anomaly_map.py

Results

Key Observations

Conclusion & Limitations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

`test_mvtec.py`

b. Box Only (`b`)

c. Box with One Point (`bp`)

d. Box with Multiple Points (`bps`)

`anomaly_map.py`

Packages