Processing math: 100%

Data: Layer fi , output gradients δLδzi

CPU pinned memory buffer Pi1

CPU thread Tcomp

CUDA events Eidata , Ei+1data , Eicomp

CUDA Streams Sdata , Scomp

Result: δLδzi1 , δLδθi

Allocate ( zi1 );

Sdatazi1Pi1 ;

SdataEidata ;

Wait ( Ei+1data );

Allocate ( δLδzi1 , δLδθi );

ScompδLδzi1δLδzi×δziδzi1 ;

ScompδLδθiδLδzi×δziδθi .