Attacks

Category

Description

PII Leakage Focused Attacks

Autocompletion Attack

Black Box

Exploits the LLM model’s completion function by repeatedly submitting minimal prompts and requesting the generation of corresponding outputs, potentially leading to the disclosure of PII contained within the fine-tuning data.

Extraction Attack

Black Box

Aims to extract sensitive information or training data embedded within LLMs by interacting directly with the models, generating queries, and receiving responses to reconstruct a dataset resembling the original training data.

Memorization Focused Attacks

Self-calibrated Probabilistic Variation—Membership Inference Attack

Black-Box

Variant of MIA that compares the probability distributions of a target model and a reference model to infer membership, utilizing a self-prompt approach to construct a reference dataset internally.

Neighborhood Attack

Black Box

Variant of MIA that generates augmented neighbor samples for a target text using a Masked Language Model (MLM) and compares the loss scores of the target text and its neighbors to infer membership.

LiRA-Candidate

Black Box

Variant of MIA that compares the confidence (negative log-likelihood) of predictions made by a target model and a reference model on a given text to infer membership.

LiRA-Base

Black Box

Variant of MIA that compares the confidence (negative log-likelihood) of predictions made by a target model and a base model used as a reference model on a given text to infer membership.