Optimize BLAS interfaces for VectorTypes
Created by: youngmit
The current implementations of the BLAS interfaces for the vector types are quite inefficient, and make heavy use of SELECT TYPE where unnecessary. The main concern is that almost all of the native implementations make a copy of the vector data before operating on it, instead of doing so in-place. Some, but not all of the TPL-backed implementations do the same.