Incorporates substantial new performance optimisations from @drjuls: constructing the CI matrix is much faster for big, open-shell systems when running with lots of CPU cores. Also adds the ability to use MKL's ScaLAPACK interface, and some bug fixes and additional unit tests (via @emilyviolet).