Data: Layer f i , input activation z i − 1
CPU pinned memory buffer P i − 1
CPU thread T d a t a
CUDA events E d a t a i , E c o m p i
CUDA Streams S d a t a , S c o m p
Result: z i
Allocate (zi);
S c o m p ⇐ z i ← f i ( z i − 1 ) ;
S c o m p ⇐ E c o m p i ;
In Thread T d a t a :
S d a t a ⇐ P i − 1 ← z i − 1 ;
S d a t a ⇐ E d a t a i ;
Wait ( E d a t a i , E c o m p i );
Free ( z i − 1 ).