Datasets Name

Total number of data

Ham emails

Spam emails

source

Ham (%)

Spam (%)

Trec Spam dataset 2007

75,419

25,220

50,199

Kaggle

33.4

66.6

Enron dataset

33,716

16,545

17,171

Kaggle

49

51

Large Customize dataset

46,076

36,038

10,038

custom dataset

78.2

21.8

SmsSpamCollection UCI dataset

5572

4825

747

Uci machine

86.6

13.4

Lingspam dataset

2893

2412

481

Kaggle

83.4

16.6