1 Each thread takes a separate super-node;

2 If all super-nodes on which current depends on are processed then

3 │ factor this super-node via MKL BLAS & LAPACK functionality;

4 else

5 ? wait until all needed super-nodes are factored;

6 Take the next super-node and go to 2.