Given a dataset of n features, we want to learn a function ℝⁿ ⟶ ℝ that fits the data without overfitting it. Fixing the method of training the model, the free parameter that decides what our function will be is the architecture of the network. The model space has networks with n inputs and one output.