Linear Regression with Multiple Variables
Let’s recall the cost function J(θ), and ∂θ∂J(θ):
J(θ0,θ1)=2m1i=1∑m(hθ(x(i))−y(i))2=2m1i=1∑m(θ0+θ1x(i)−y(i))2
If we define:
θ=θ0θ1..θn
a single experiment x:
x=x0x1..xn
the input X:
X=[x(0),x(1),...x(m)]
the output Y:
Y=[y(0),y(1),...y(m)]
Therefore, we can get:
J(θ)=2m1(θTX−Y)(θTX−Y)T
∂θ∂J(θ)=m1((θTX−Y)⋅XT)T
the iteration will be:
θ=θ−mα((θTX−Y)⋅XT)T
Let’s work on the ex1data2.txt with sklearn:
%pylab inline
Populating the interactive namespace from numpy and matplotlib
import numpy as np
import matplotlib.pyplot as plt
data = np.loadtxt(open("ex1data2.txt"), delimiter=",")
X = data[:, :2]
Y = data[:, 2]
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
normalizer = StandardScaler().fit(X)
clf = LinearRegression()
clf.fit(normalizer.transform(X), Y)
print 'intercept: ', clf.intercept_
print 'Coefficients: ', clf.coef_
intercept: 340412.659574
Coefficients: [ 109447.79646964 -6578.35485416]
house = [1650, 3]
clf.predict(normalizer.transform(house))