Sigmoid activation in mixed activation for linear outputs

As discovered by @antndlcrx, we can get better performance by using sigmoid rather than identity activation functions for continuous inputs, since we scale these to 0-1 in the data preprocessing.

@tsrobinson to update codebase to make this default behavior.