- Problems of Feature Engineering
from numpy.linalg import solve def fit(X, Y): return solve(X.T @ X, X.T @ Y) def add_ones_column(data): n,_ = data.shape return np.hstack([np.ones((n,1)), data]) X = data[['X']].to_numpy() Y = data[['Y']].to_numpy() ... class LinearModel: def __init__(self, phi): self.phi = phi def fit(self, X, Y): Phi = self.phi(X) self.theta_hat = solve(Phi.T @ Phi, Phi.T @ Y) return self.theta_hat def predict(self, X): Phi = self.phi(X) return Phi @ self.theta_hat def loss(self, X, Y): return np.mean((Y - self.predict(X))**2) model_line = LinearModel(phi_line) model_line.fit(X,Y) model_line.loss(X,Y)
If you try to copy it, when trying to solve this is a Singular Matrix Error.
It isn't full rank, we has redundancy, the column space is not linearly independent.
With too many, the optimal solution is underdetermined!
- Using RBF
- add RBF with linear
- Do 20 bumps, on 9 data points?
- Rank 9 matrix
- Test data points
- Training error decreases but test error is terrible!
- The bias variance tradeoff. The best fit point.