Bandit one

Bandit two

Probability

Reward

Probability

Reward

Arm

Probability

Reward

Probability

Reward

Arm

0.418

1

0.582

−1

1

0.4547

1

0.5453

−1

1

0.5039

1

0.4961

−1

2

1.0

1

0.0

−1

2

0.5278

1

0.4722

−1

3

0.154

1

0.846

−1

3

1.0

1

0.0

−1

4

0.3972

1

0.6028

−1

4

Bandit three

Bandit four

Probability

Reward

Probability

Reward

Arm

Probability

Reward

Probability

Reward

Arm

1.0

1

0.0

−1

1

0.3846

1

0.6154

−1

1

0.0708

1

0.9292

−1

2

0.0

1

1.0

−1

2

0.5804

1

0.4196

−1

3

1.0

1

0.0

−1

3

0.4601

1

0.5399

−1

4

0.4987

1

0.5013

−1

4