Parameter

Parameter value

Alpha

0.7

Epochs

60

Batch_Size

128

Dropout

0.5

Learning_Rate

1e−3

Hidden_Size

768

Max_Length

128

Optimizer

AdamW

LossFunction

Cross Entropy

N_Gram

4