Linear Regression with Multiple Variables

Let’s recall the cost function $J(\theta)$ , and $\frac{\partial}{\partial\theta} J(\theta)$ :

J(\theta_{0}, \theta_{1}) = \frac{1}{2m} \sum_{i=1}^m (h_{\theta}(x^{(i)}) - y^{(i)})^2 = \frac{1}{2m} \sum_{i=1}^m (\theta_{0} + \theta_{1}x^{(i)} - y^{(i)})^2

If we define:

\theta = \left( \begin{array}{ccc} \theta_{0} \\ \theta_{1} \\ . \\ . \\ \theta_{n} \end{array} \right)

a single experiment $x$ :

x = \left( \begin{array}{ccc} x_{0} \\ x_{1} \\ . \\ . \\ x_{n} \end{array} \right)

the input $X$ :

X = [x^{(0)}, x^{(1)}, ... x^{(m)}]

the output $Y$ :

Y = [y^{(0)}, y^{(1)}, ... y^{(m)}]

Therefore, we can get:

J(\theta) = \frac{1}{2m}(\theta^{T} X - Y)(\theta^{T} X - Y)^{T}

\frac{\partial}{\partial\theta} J(\theta) = \frac{1}{m} ((\theta^{T}X - Y)\cdot X^{T})^{T}

the iteration will be:

\theta = \theta -\frac{\alpha}{m} ((\theta^{T}X - Y)\cdot X^{T})^{T}

Let’s work on the ex1data2.txt with sklearn:

%pylab inline

Populating the interactive namespace from numpy and matplotlib

import numpy as np
import matplotlib.pyplot as plt

data = np.loadtxt(open("ex1data2.txt"), delimiter=",")
X = data[:, :2]
Y = data[:, 2]

from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler  


normalizer = StandardScaler().fit(X)
clf = LinearRegression()
clf.fit(normalizer.transform(X), Y)

print 'intercept: ', clf.intercept_
print 'Coefficients: ', clf.coef_

intercept:  340412.659574
Coefficients:  [ 109447.79646964   -6578.35485416]

house = [1650, 3]
clf.predict(normalizer.transform(house))