| Metric | Definition | Interpretation |
| Accuracy | (TP + TN)/(TP + TN + FP + FN) | Overall correct predictions may be misleading with imbalanced classes |
| AUC (Area under ROC Curve) | The probability that a randomly chosen obese child is ranked higher than a non-obese child | 0.5 = random; 0.7 - 0.8 = acceptable; >0.8 = good discrimination |
| Precision | TP/(TP + FP) | Proportion of predicted positive cases that are truly obese |
| Recall (Sensitivity) | TP/(TP + FN) | Proportion of actual obese cases correctly identified |
| F1-Score | 2 × (Precision × Recall)/(Precision + Recall) | Harmonic mean of precision and recall |
| Calibration | Agreement between predicted probabilities and observed frequencies (e.g., Hosmer-Lemeshow test, calibration plots) | Well-calibrated models have predicted risks matching actual event rates |