Parameter Name
Parameter
Activation function
Relu, Gelu
Gradient descent optimizer
Adam
Initial learning rate
0.001
Batch size
200
Epoch
50