Right now the fitting routines return control points as a n_ctrl_pts x n_data_dims sized matrix, but the evaluation routines consume them in the transpose of that storage convention. This makes it really easy to just pass in geom.transpose() to the evaluation routines, which are usually in a tight loop, and that is disastrous to performance.
To avoid that mistake, let's unify all control point matrices to be stored in a n_data_dims x n_ctrl_pts matrix across the codebase.