Last updated: May 29, 2026

Fingerprinting Leaderboard

Benign (non-attacked) Fingerprint Set Retention ?

? ?

Attacked Fingerprint Set Robustness ?

? ?

Critical Point 95% CI Detectability curve Uninformed Attack line

Scheme Leaderboard

Rank by

Model Size: —

Fingerprints: —

Frequently Asked Questions

Note: For now, normalized utility is computed on TinyBenchmarks (6 tiny versions of popular benchmarks, 100 items each) and we show 95% credible intervals to reflect uncertainty at this small scale. The final evaluation will use the full benchmarks: Winogrande, AI2 ARC, HellaSwag, MMLU, TruthfulQA, and GSM8K.

Fingerprints are techniques for embedding a detectable signature into a model (LLMs only for now), so you can later test whether a model carries that signature and assert provenance/ ownership (e.g., for attribution or bounties).

Attacks try to evade detection or weaken/ remove the fingerprint while preserving the model's usefulness.

This leaderboard compares schemes by how well they remain detectable under attack while maintaining Normalized Utility.

Understanding the Visualization

Each point is one measurement of a fingerprinting scheme under a specific fingerprint configuration and (optionally) an attack configuration.

Both plots use the same axes: horizontal Verification Score (%) and vertical Normalized Utility (%). Higher is better on both axes. Verification Score is the fraction of fingerprint prompts that pass verification (exact-match comparator with greedy decoding).

Marker shape indicates whether the model was attacked (circle) or not attacked (square). Error bars show 95% credible intervals on Verification Score (horizontal) and Normalized Utility (vertical), when available.

The Detectability curve (dashed) connects results for a scheme to visualize its Verification Score–Normalized Utility trade-off under different attack configurations. It is split at the uninformed attack line: segments above the line are drawn at full opacity, while segments below (where the attacker did worse than uninformed) are dimmed in Relative mode.

The Uninformed Attack line (dotted, visible in Relative mode) is the trade-off an attacker gets without knowing the fingerprint — by randomly degrading the model. In Relative coordinates it runs from (100%, 100%) at the benign baseline to (0%, 0%). Points above this diagonal mean the attack outperformed uninformed; points on or below mean the scheme forced the attack to be no better than random degradation.

For non-attacked points, we apply an acceptance criterion: Normalized Utility ≥ 75% and Verification Score ≥ 75%. Points that fail this are not considered reasonable for the leaderboard.

The Robustness Coordinates toggle in the Settings dropdown switches between two coordinate systems:

Absolute mode: axes show raw Verification Score (%) and Normalized Utility (%) as measured. Each fingerprint configuration has its own uninformed attack line (visible when a single config is selected), from its benign point to (0%, 0%).

Relative mode (default): each point is rescaled relative to its config's benign baseline: Verification Score_rel = Verification Score / Verification Score_benign and Normalized Utility_rel = Normalized Utility / Normalized Utility_benign. The benign baseline maps to (100%, 100%). All configurations then share the same uninformed diagonal to (0%, 0%). The shaded region below this diagonal is where attacks performed worse than uninformed.

Methodology

We evaluate on instruct/ chat models from the Qwen family. The 3B category uses Qwen2.5-3B-Instruct.

Excellent question! You are either attentive, a researcher or a heavy library user. Here are the exact settings we use:

Verification Score: We use an exact-match comparator with greedy decoding for deterministic verification. Always with chat template applied. The score is the fraction of fingerprint prompts that pass verification.

Normalized Utility: We evaluate on the tiny_custom benchmarks, which are our own generative versions of tinyBenchmarks based on the lighteval implementation. We use deterministic generation with greedy decoding and generation size 102 tokens on tiny-GSM8K for longer chain-of-thought. Always with chat template applied. We also exceptionally use an 8-shot evaluation for GSM8K and 5-shot evaluation for MMLU.

Robustness is measured as distance above the uninformed attack line. The uninformed line is the trade-off any attacker achieves without knowledge of the fingerprint; a robust scheme should keep attacks on or below this line.

For each fingerprint configuration, we treat its benign (non-attacked) run as a baseline. For every attacked run we use relative coordinates (benign maps to 100% on both axes). The above-line margin is:

m = Normalized Utility_rel − Verification Score_rel

Robustness for that point:

R = 1 − clamp(max(0, m), 0, 1)

R ∈ [0, 1]. R = 1 means the attack did no better than uninformed. Smaller R means the attack pushed further above the diagonal.

To rank a scheme:

(1) For each fingerprint configuration, find its worst attack (lowest R).

(2) Pick the configuration whose worst attack is best (highest R among those worst cases).

The attacked point from (1) for the chosen configuration is the Critical marker. The scheme's Robustness Score is that point's R.

The Retention Score uses the benign point of the critical configuration. In (Verification Score, Normalized Utility) space the ideal is (1, 1). Retention is one minus the normalized Euclidean distance to that ideal:

S = 1 − √((1 − Verification Score)² + (1 − Normalized Utility)²) / √2

S ∈ [0, 1]. Higher means the scheme preserved model quality while maintaining strong verifiability.

We compute a Bayesian credible interval by treating normalized utility as a retention rate: "how much baseline utility was retained?". This will be updated with full benchmark evaluations in the final version.

Let A be retained baseline utility and D be the total baseline utility so utility is U = A / D with 0 ≤ A ≤ D. We place a Jeffreys prior on the true retention rate: ρ ~ Beta(0.5, 0.5).

The posterior is ρ | data ~ Beta(0.5 + A, 0.5 + (D − A)). The plotted interval is the 95% credible interval given by the 2.5% and 97.5% Beta quantiles.

Note: because TinyBenchmarks uses a curated subset rather than a pseudorandom sample, these intervals do not capture our information about tinyBenchmarks well.

Using the Leaderboard

Use the Download CSV button to export the currently shown leaderboard view (without the fingerprint and attack configs).

Click a scheme row in the leaderboard to highlight all of that scheme's configurations in both plots (click again to deselect). Click a point directly on a plot to highlight only that specific fingerprint configuration's points, boundary lines, and error bars — useful for inspecting a single config in isolation. Multiple configs can be selected by clicking additional points.

Use the FP / Atk buttons in the Configs column to open the corresponding YAML configs for the critical point of the selected scheme and export its fingerprint and attack configs.

Getting Involved

Start with the mlprints library (pip install mlprints). You can open issues, propose experiments, complain about the configs being used and build on the existing codebase. We are very grateful for any help!

For anything else, contact Edoardo Contente at edoardo@sentient.xyz.

Citation

If you use this leaderboard or its data in your research, please cite:

Paper

@inproceedings{nasery2026robustfingerprints,
  title={Are Robust LLM Fingerprints Adversarially Robust?},
  author={Nasery, Anshul and Contente, Edoardo and Kaz, Alkin and Viswanath, Pramod and Oh, Sewoong},
  booktitle={IEEE Conference on Secure and Trustworthy Machine Learning (SaTML)},
  year={2026},
  url={https://arxiv.org/abs/2509.26598}
}

Leaderboard

@misc{2026thefingerprintingleaderboard,
  title        = {The Fingerprinting Leaderboard},
  year         = {2026},
  howpublished = {\url{https://sentient.xyz/the-fingerprinting-leaderboard}},
  note         = {Online; accessed 2026}
}