Skip to content

Improve GraphQL fingerprinting with specificity scoring#5

Open
Copilot wants to merge 1 commit intomainfrom
copilot/improve-fingerprint-detection
Open

Improve GraphQL fingerprinting with specificity scoring#5
Copilot wants to merge 1 commit intomainfrom
copilot/improve-fingerprint-detection

Conversation

Copy link
Copy Markdown

Copilot AI commented Feb 6, 2026

Overview

Current GraphQL fingerprinting uses sequential pattern matching with first-match-wins approach, which can cause generic patterns to shadow specific ones, leading to false positives.

This PR implements specificity-based scoring and descending order iteration for GraphQL framework detection to improve accuracy and reduce false positives.

Key Changes

1. Specificity Scoring System

Each framework fingerprint is assigned a specificity score (0-100) based on pattern uniqueness:

  • 90-95: Highly specific (unique error patterns) - tartiflette, hasura, dgraph, directus, inigo
  • 80-89: Very specific (distinctive patterns) - graphql-by-pop, wpgraphql, absinthe, lacinia, sangria
  • 70-79: Moderately specific (language/framework patterns) - caliban, strawberry, ariadne, graphql-java, graphql-dotnet, graphql-ruby, graphql-php, gqlgen, graphql-go, juniper, hotchocolate, pg_graphql, tailcall
  • 60-69: Generic (common GraphQL errors) - graphene, graphql-yoga, aws-appsync, hypergraphql, graphql-flutter, dianajl, morpheus, apollo, mercurius, jaal, agoo
  • 50-59: Very generic (prone to false positives) - lighthouse

2. Descending Order Iteration

Fingerprint checks are executed in descending specificity order:

  • Frameworks with higher scores are checked first
  • Returns immediately upon first match (early-exit optimization)
  • Prevents generic patterns from shadowing specific ones
  • Specificity scores used only for ordering, not as a threshold

3. Record-Based Type Safety

Introduced FingerprintCheck record to encapsulate:

  • Fingerprint check function (BooleanSupplier)
  • Specificity score with validation (0-100 range)

Architecture

private record FingerprintCheck(BooleanSupplier check, int specificityScore) {
    // Validates score is in range [0, 100]
    public FingerprintCheck {
        if (specificityScore < 0 || specificityScore > 100) {
            throw new IllegalArgumentException(...);
        }
    }
}

public void fingerprint() {
    String detectedFramework = performPatternBasedDetection();
    if (detectedFramework != null) {
        raiseAlertForFramework(detectedFramework);
    }
    matchedString = null;
    queryCache.clear();
}

private String performPatternBasedDetection() {
    // Register all fingerprints with scores
    Map<String, FingerprintCheck> fingerprinters = new LinkedHashMap<>();
    fingerprinters.put("tartiflette", new FingerprintCheck(this::checkTartifletteEngine, 95));
    // ... more frameworks
    
    // Sort by score (descending) and check in order
    var sortedFingerprinters = fingerprinters.entrySet().stream()
        .sorted(Map.Entry.comparingByValue(
            Comparator.comparingInt(FingerprintCheck::specificityScore).reversed()))
        .toList();
    
    // Return first match - any successful check is treated as a match
    for (var fingerprinter : sortedFingerprinters) {
        if (fingerprinter.getValue().check.getAsBoolean()) {
            return fingerprinter.getKey();
        }
    }
    return null;
}

Benefits

  1. Improved Accuracy: Specific patterns checked before generic ones, reducing false positives
  2. Performance: Early-exit on first match (no need to check all patterns)
  3. Maintainability: Clear scoring system makes it easy to understand pattern reliability
  4. Type Safety: Record validation ensures scores are always in valid range
  5. Clear Semantics: Updated JavaDoc and logging to accurately reflect that specificity scores determine execution order only

Backward Compatibility

  • All existing check*Engine() methods remain unchanged
  • Scoring system only affects execution order, not detection logic
  • No changes to alert generation or reporting
  • Any successful check is reported (no minimum score threshold)

Testing

  • Existing unit tests continue to pass
  • Lighthouse (score 50) detection preserved despite low score
  • All 36 framework detections remain functional

Related Issues

N/A

Original prompt

Improve GraphQL Fingerprinting with Header-Based Detection and Specificity Scoring

Overview

Enhance the GraphQL add-on's fingerprinting mechanism to use a tiered detection strategy that leverages HTTP response headers and implements specificity scoring for pattern-based detection. This improves both speed and accuracy of framework identification.

Current Issues

  1. No header-based detection: The current implementation only uses error message patterns, missing fast and reliable header-based indicators
  2. First-match-wins approach: Uses LinkedHashMap with break on first match, which can cause generic patterns to shadow specific ones
  3. No confidence scoring: All detections are treated equally regardless of pattern specificity
  4. Inefficient: Sends multiple queries even when headers provide definitive answers

Changes Required

1. Add Detection Result Records

Create two new record types in GraphQlFingerprinter.java:

/**
 * Result of framework detection with confidence score.
 * 
 * @param framework The detected GraphQL framework name, or null if none detected
 * @param confidence Confidence score (0-100):
 *                   <ul>
 *                   <li>100 = Definitive (proprietary headers/signatures)</li>
 *                   <li>70-90 = Strong hint (framework branding in standard headers)</li>
 *                   <li>40-69 = Weak hint (generic tech stack indicators)</li>
 *                   <li>0-39 = No useful headers detected</li>
 *                   </ul>
 */
private record DetectionResult(String framework, int confidence) {
    
    private static final int MIN_CONFIDENCE = 0;
    private static final int MAX_CONFIDENCE = 100;
    private static final int DEFINITIVE_THRESHOLD = 100;
    private static final int STRONG_HINT_MIN = 70;
    private static final int WEAK_HINT_MIN = 40;
    
    /**
     * Creates a detection result with validation.
     * 
     * @throws IndexOutOfBoundsException if confidence is not in range [0, 100]
     */
    public DetectionResult {
        Objects.checkFromToIndex(confidence, MIN_CONFIDENCE, MAX_CONFIDENCE + 1);
    }
    
    /** 
     * Returns true if this is a definitive detection requiring no verification.
     * Definitive detections have confidence = 100.
     */
    public boolean isDefinitive() {
        return confidence == DEFINITIVE_THRESHOLD;
    }
    
    /** 
     * Returns true if this is a strong hint requiring quick verification.
     * Strong hints have confidence in range [70, 100).
     */
    public boolean isStrongHint() {
        return confidence >= STRONG_HINT_MIN && confidence < DEFINITIVE_THRESHOLD;
    }
    
    /** 
     * Returns true if this is a weak hint requiring full verification.
     * Weak hints have confidence in range [40, 70).
     */
    public boolean isWeakHint() {
        return confidence >= WEAK_HINT_MIN && confidence < STRONG_HINT_MIN;
    }
    
    /** 
     * Returns true if confidence is below threshold for actionable detection.
     * No useful detection has confidence in range [0, 40).
     */
    public boolean isNoUsefulDetection() {
        return confidence >= MIN_CONFIDENCE && confidence < WEAK_HINT_MIN;
    }
    
    /** Returns true if any framework was detected with non-zero confidence. */
    public boolean hasDetection() {
        return framework != null && confidence > MIN_CONFIDENCE;
    }
}

/**
 * Encapsulates a fingerprint check function with its reliability score.
 * 
 * @param check The fingerprinting function that returns true if framework matches
 * @param specificityScore Reliability score (50-95):
 *                        <ul>
 *                        <li>90-95: Highly specific (unique error patterns)</li>
 *                        <li>80-89: Very specific (distinctive patterns)</li>
 *                        <li>70-79: Moderately specific (language/framework family patterns)</li>
 *                        <li>60-69: Generic (common GraphQL error patterns)</li>
 *                        <li>50-59: Very generic (prone to false positives)</li>
 *                        </ul>
 */
private record FingerprintCheck(BooleanSupplier check, int specificityScore) {
    
    private static final int MIN_SCORE = 0;
    private static final int MAX_SCORE = 100;
    
    /**
     * Creates a fingerprint check with score validation.
     * 
     * @throws IndexOutOfBoundsException if specificityScore is not in range [0, 100]
     */
    public FingerprintCheck {
        Objects.checkFromToIndex(specificityScore, MIN_SCORE, MAX_SCORE + 1);
    }
}

2. Refactor fingerprint() Method

Replace the current implementation with a tiered detection strategy:

/**
 * Performs GraphQL framework fingerprinting using a tiered detection strategy.
 * 
 * <h3>Detection Strategy (Decision Tree)</h3>
 * <pre>
 * 1. Send initial probe query: { __typename }
 * 2. Check response headers for framework indicators
 * 3. Apply tiered detection based on confidence...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

*This pull request was created from Copilot chat.*
>

<!-- START COPILOT CODING AGENT TIPS -->
---

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more [Copilot coding agent tips](https://gh.io/copilot-coding-agent-tips) in the docs.

Copilot AI changed the title [WIP] Enhance GraphQL fingerprinting with header-based detection Improve GraphQL fingerprinting with header-based detection and specificity scoring Feb 6, 2026
Copilot AI requested a review from kingthorin February 6, 2026 14:15
@kingthorin kingthorin requested a review from Copilot February 6, 2026 19:17
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the GraphQL add-on’s engine fingerprinting to use a tiered strategy that prefers high-confidence header indicators (with optional quick verification) and otherwise falls back to pattern-based detection using specificity scoring to reduce false positives.

Changes:

  • Refactored fingerprint() to perform an initial probe, header-based detection/verification, then scored pattern matching.
  • Added new unit tests covering definitive header detection, header-hint verification, and fallback behavior.
  • Added *.hprof to .gitignore.

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 7 comments.

File Description
addOns/graphql/src/main/java/org/zaproxy/addon/graphql/GraphQlFingerprinter.java Adds tiered header detection, quick verification, and specificity-scored pattern detection.
addOns/graphql/src/test/java/org/zaproxy/addon/graphql/GraphQlFingerprinterUnitTest.java Updates an existing test expectation and adds new tests for header-based detection/verification.
.gitignore Ignores Java heap dump files (*.hprof).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread addOns/graphql/src/main/java/org/zaproxy/addon/graphql/GraphQlFingerprinter.java Outdated
Comment thread addOns/graphql/src/main/java/org/zaproxy/addon/graphql/GraphQlFingerprinter.java Outdated
Comment thread addOns/graphql/src/main/java/org/zaproxy/addon/graphql/GraphQlFingerprinter.java Outdated
Comment thread addOns/graphql/src/main/java/org/zaproxy/addon/graphql/GraphQlFingerprinter.java Outdated
Comment thread addOns/graphql/src/main/java/org/zaproxy/addon/graphql/GraphQlFingerprinter.java Outdated
Comment thread addOns/graphql/src/main/java/org/zaproxy/addon/graphql/GraphQlFingerprinter.java Outdated
@kingthorin
Copy link
Copy Markdown
Owner

@copilot review again

Copy link
Copy Markdown
Author

Copilot AI commented Feb 6, 2026

@copilot review again

All previous review feedback has been addressed and resolved:

  • Locale-independent header matching (Locale.ROOT)
  • Null guard for headerHint parameter
  • matchedString evidence handling
  • Logger output validation in tests
  • Lighthouse detection preserved (removed score threshold)
  • Descending score iteration implemented
  • Inigo header detection restricted to x-inigo- prefix

The code compiles successfully and is ready for re-review.

@kingthorin kingthorin force-pushed the copilot/improve-fingerprint-detection branch 4 times, most recently from c4a3142 to 3a0d709 Compare February 6, 2026 22:55
Copilot stopped work on behalf of kingthorin due to an error February 6, 2026 23:04
Copilot stopped work on behalf of kingthorin due to an error February 6, 2026 23:08
@kingthorin kingthorin force-pushed the copilot/improve-fingerprint-detection branch 4 times, most recently from f0affeb to c8eed25 Compare February 7, 2026 01:07
@kingthorin
Copy link
Copy Markdown
Owner

@copilot update the PR description based on the current state of the branch

@kingthorin kingthorin requested a review from Copilot February 7, 2026 02:34
@kingthorin kingthorin changed the title Improve GraphQL fingerprinting with header-based detection and specificity scoring Improve GraphQL fingerprinting with specificity scoring Feb 7, 2026
Copy link
Copy Markdown
Author

Copilot AI commented Feb 7, 2026

@copilot update the PR description based on the current state of the branch

Updated PR description to reflect current implementation after the "Second opinion" commit (c8eed25). The tiered header-based detection has been simplified to focus on pattern-based detection with specificity scoring and descending order iteration only.

@kingthorin kingthorin marked this pull request as ready for review February 7, 2026 02:41
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread addOns/graphql/src/main/java/org/zaproxy/addon/graphql/GraphQlFingerprinter.java Outdated
Comment thread addOns/graphql/src/main/java/org/zaproxy/addon/graphql/GraphQlFingerprinter.java Outdated
@kingthorin
Copy link
Copy Markdown
Owner

@copilot are there literal strings used multiple times which should be converted to constants? It’s okay if no change is warranted simply reply with a comment if that’s the case.

Copy link
Copy Markdown
Author

Copilot AI commented Feb 7, 2026

@copilot are there literal strings used multiple times which should be converted to constants? It’s okay if no change is warranted simply reply with a comment if that’s the case.

No changes warranted. The literal strings in the code are either:

  1. Log messages (used once each, don't need extraction)
  2. Framework-specific test queries (intentionally different variations like "queryy {__typename}", specific to each detection method)

Extracting these wouldn't improve maintainability since they serve different purposes in different contexts.

@kingthorin
Copy link
Copy Markdown
Owner

@copilot what about repeated literal occurrences of framework/engine names?

Copy link
Copy Markdown
Author

Copilot AI commented Feb 7, 2026

@copilot what about repeated literal occurrences of framework/engine names?

No changes warranted for framework/engine names. Analysis shows:

Framework names: Each of the 36 framework names (like "tartiflette", "hasura", "apollo") is used only once as a map key in the fingerprinters registration. One instance ("inigo") appears 3 times total, but extracting constants for single-use identifiers would add boilerplate without improving maintainability.

JSON field names: Standard GraphQL response fields like "data" (6 uses), "extensions" (3 uses), and "errors" (2 uses) are used in different contexts throughout detection methods. These are well-known GraphQL standard field names, and their literal form aids readability in fingerprinting logic.

The current approach keeps the code clear and maintainable without unnecessary abstraction.

@kingthorin kingthorin force-pushed the copilot/improve-fingerprint-detection branch 5 times, most recently from 1ca02f1 to 2a39087 Compare February 8, 2026 03:19
@kingthorin kingthorin requested a review from Copilot February 8, 2026 03:33
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread addOns/graphql/src/main/java/org/zaproxy/addon/graphql/GraphQlFingerprinter.java Outdated
Comment thread addOns/graphql/CHANGELOG.md Outdated
@kingthorin kingthorin force-pushed the copilot/improve-fingerprint-detection branch 3 times, most recently from 69803d7 to 3d103a6 Compare February 8, 2026 12:03
Signed-off-by: kingthorin <kingthorin@users.noreply.github.com>
@kingthorin kingthorin force-pushed the copilot/improve-fingerprint-detection branch from 3d103a6 to b9803bb Compare February 8, 2026 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants