Parameter Name

Parameter

Activation function

Relu, Gelu

Gradient descent optimizer

Adam

Initial learning rate

0.001

Batch size

200

Epoch

50