Datasets Name | Total number of data | Ham emails | Spam emails | source | Ham (%) | Spam (%) |
Trec Spam dataset 2007 | 75,419 | 25,220 | 50,199 | Kaggle | 33.4 | 66.6 |
Enron dataset | 33,716 | 16,545 | 17,171 | Kaggle | 49 | 51 |
Large Customize dataset | 46,076 | 36,038 | 10,038 | custom dataset | 78.2 | 21.8 |
SmsSpamCollection UCI dataset | 5572 | 4825 | 747 | Uci machine | 86.6 | 13.4 |
Lingspam dataset | 2893 | 2412 | 481 | Kaggle | 83.4 | 16.6 |