Benchmark Datasets

Dataset

Target Type

Attribute Type

Dataset Characteristics

# Features

# Instances

Wine Quality

Binary

Numeric

Imbalanced

10

4898

Breast Cancer Wisconsin

Binary

Numeric

features have very dissimilar ranges, with half of the features near unary at 0

30

569

Congressional Voting Records

Binary

Categorical

Missing data

16

435

Abalone

Binary

Mixed

Imbalanced

8

4177

Arrhythmia

Binary

Mixed

Imbalanced; small dataset; # features more than 1/2 # of instances

279

452

Forest Fires

Continuous

Numeric

No missings

13

517

Solar Flare

Continuous

Categorical

# of common solar flares within 24 h; distribution of target is highly skewed towards 1

10

1066

Auto MPG

Continuous

Mixed

No missings

8

398