Dataset

Wiki

Threshold Values

Threshold = 0.0001

Threshold = 0.11

Threshold = 1.2

Metrics

Accuracy

Precision

Recall

Accuracy

Precision

Recall

Accuracy

Precision

Recall

Attacks

Models

LiRA Candidate

Mistral 7B

100

100

100

100

100

100

97.2

100

94.4

LLaMA 7B

100

100

100

100

100

100

59.6

100

19.2

LLaMA 13B

99.3

100

99.3

98.3

100

99.6

52.2

100

4.4

Mixtral 8x7B

100

100

100

100

100

100

61.2

100

22.4

LiRA Base

Mistral 7B

100

100

100

100

100

100.00

95

100

96

LLaMA 7B

50

50

100

100

100

100

58.5

100

17

LLaMA 13B

98.8

100

97.6

97.2

100

94.4

51.5

100

3

Mixtral 8x7B

100

100

100

100

100

100

59

100

20