For unit stride case, the complex routine calls the real equivalent
with N*2.
The main use of this routine that I am aware of is row-swapping in
factorizations, so this is one routine that it would make sense to
optimize the arbitrary increment cases, if it were possible.