Algorithm ViT-Base |
Input: An Image, the number of epochs J, batch size b, the number of the layers L |
Output: Predicted class |
Initialize model parameter Θ |
for j ← 1, …, J do |
for each batch B do |
Use token_embed to get global representation of the entire image is needed |
for l ← 1, …, L do |
Use transformer_embed: handling the processing of input tensors through a single transformer encoder layer |
end for |
end for |
end for |