Attacks | Category | Description |
PII Leakage Focused Attacks | ||
Autocompletion Attack | Black Box | Exploits the LLM model’s completion function by repeatedly submitting minimal prompts and requesting the generation of corresponding outputs, potentially leading to the disclosure of PII contained within the fine-tuning data. |
Extraction Attack | Black Box | Aims to extract sensitive information or training data embedded within LLMs by interacting directly with the models, generating queries, and receiving responses to reconstruct a dataset resembling the original training data. |
Memorization Focused Attacks | ||
Self-calibrated Probabilistic Variation—Membership Inference Attack | Black-Box | Variant of MIA that compares the probability distributions of a target model and a reference model to infer membership, utilizing a self-prompt approach to construct a reference dataset internally. |
Neighborhood Attack | Black Box | Variant of MIA that generates augmented neighbor samples for a target text using a Masked Language Model (MLM) and compares the loss scores of the target text and its neighbors to infer membership. |
LiRA-Candidate | Black Box | Variant of MIA that compares the confidence (negative log-likelihood) of predictions made by a target model and a reference model on a given text to infer membership. |
LiRA-Base | Black Box | Variant of MIA that compares the confidence (negative log-likelihood) of predictions made by a target model and a base model used as a reference model on a given text to infer membership. |