Works | Architecture | Modality | Sampling frames | # Params | Accuracy (UCF101) | Accuracy (HDMB51) |
I3D-LSTM [24] (IOP’19) | 3D CNN | RGB | whole video | - | 95.1% | - |
STH [20] (VCIP’19) | 3D and 2D CNN | RGB, MV | 16 | 88 | 94.3% | 68.6% |
T-C3D [26] (TCSVT’20) | 3D CNN | RGB | 24 | 31.7 | 92.5% | 62.4% |
IP-LSTM [37] (Access’20) | LSTM | RGB, OF | 25 | 27.6 | 91.4% | 68.2% |
STDDCN [38] (PR’19) | 2D CNN | RGB, OF | 25 | 59 | 94.8% | 69.49% |
Heterogeneous Two-Stream [12] | 2D CNN | RGB, OF | 25 | 45.5 | 94.4% | 67.2% |
LVR [39] (ICMLA’19) | 2D CNN | RGB, OF | 25 | 92.8 | 94.4% | 71.0% |
Multi-teacher KD [21] (JSA’20) | 2D CNN | RGB, MV, Residual | (1 + 11) | 33.6 | 88.5% | 56.16% |
TSM [16] (ICCV’19) | 2D CNN | RGB | 8 | 23.7 | 94.9% | 70.91% |
TSN [11] (TPAMI’19) | 2D CNN | RGB, OF | 25 | 22.6 | 94.9% | 71.0% |
MSTSM-TFDEM (ours) | 2D CNN | RGB | 8 | 24.5 | 96.25% | 72.83% |
MSTSM-TFDEM-p (ours) | 2D CNN | RGB | 8 | 22.4 | 95.57% | 72.19% |
MSTSM-TFDEM (ours EfficientNet) | 2D CNN | RGB | 8 | 4.5 | 96.08% | 72.48% |