Works

Architecture

Modality

Sampling frames

# Params

Accuracy

(UCF101)

Accuracy

(HDMB51)

I3D-LSTM [24] (IOP’19)

3D CNN

RGB

whole video

-

95.1%

-

STH [20] (VCIP’19)

3D and 2D CNN

RGB, MV

16

88

94.3%

68.6%

T-C3D [26] (TCSVT’20)

3D CNN

RGB

24

31.7

92.5%

62.4%

IP-LSTM [37] (Access’20)

LSTM

RGB, OF

25

27.6

91.4%

68.2%

STDDCN [38] (PR’19)

2D CNN

RGB, OF

25

59

94.8%

69.49%

Heterogeneous Two-Stream [12]

2D CNN

RGB, OF

25

45.5

94.4%

67.2%

LVR [39] (ICMLA’19)

2D CNN

RGB, OF

25

92.8

94.4%

71.0%

Multi-teacher KD [21] (JSA’20)

2D CNN

RGB, MV, Residual

(1 + 11)

33.6

88.5%

56.16%

TSM [16] (ICCV’19)

2D CNN

RGB

8

23.7

94.9%

70.91%

TSN [11] (TPAMI’19)

2D CNN

RGB, OF

25

22.6

94.9%

71.0%

MSTSM-TFDEM (ours)

2D CNN

RGB

8

24.5

96.25%

72.83%

MSTSM-TFDEM-p (ours)

2D CNN

RGB

8

22.4

95.57%

72.19%

MSTSM-TFDEM (ours EfficientNet)

2D CNN

RGB

8

4.5

96.08%

72.48%