Layer

Channels (H, W)

Heads

Features size (H, W)

Super-tokens size (h, w)

iterations

05

0512

08

00 (16, 16)

00 (1, 1)

3

04

0256

04

00 (32, 32)

00 (2, 2)

3

03

0256

04

00 (64, 64)

00 (4, 4)

3

02

0128

02

00 (128, 128)

0 (16, 16)

3