# Linear Regression

The idea behind the linear regression is Least squares: minimize the sum of the squares of the errors made between hypothese and the observed result, aka the cost function $J$.

$$$$Hyposis: h_{\theta}(x) = \theta_{0} + \theta_{1}x = \theta^{T}x$$$$$$$$Cost~function: J(\theta_{0}, \theta_{1}) = \frac{1}{2m} \sum_{i=1}^m (h_{\theta}(x^{i}) - y^{(i)})^2$$$$
In [1]:
%pylab inline

Populating the interactive namespace from numpy and matplotlib


Let's use the sample data from the exercise #1.

In [47]:
import numpy as np
import matplotlib.pyplot as plt

data = np.loadtxt(open("ex1data1.txt"), delimiter=",")
X = data[:, 0]
Y = data[:, 1]

plt.figure()
plt.scatter(X, Y)
plt.xlabel('Population of City in 10,000s')
plt.ylabel('Profit in $10,000s') plt.show()  Here is the scalar cost function, computeCost. In [3]: def computeCost(theta_0, theta_1): return np.sum((theta_0 + theta_1 * X - Y) ** 2) /(2 * len(X)) print computeCost(0, 0)  32.0727338775  Based on the trial-n-error, we will identify that, to minimize the cost function,$\theta_{0}$must be in$[-5, 5]$,$\theta_{1}$must be in$[0, 2]$: In [4]: theta_0_space = np.concatenate([ np.linspace(-10, -5, 400), np.linspace(-5, 5, 1000), np.linspace(5, 10, 400)]) theta_1_space = np.concatenate([ np.linspace(-1, 0, 400), np.linspace(0, 2, 1200), np.linspace(2, 4, 400)]) Theta_0, Theta_1 = np.meshgrid(theta_0_space, theta_1_space) rows, cols = np.shape(Theta_0) Z = np.zeros(shape=(rows, cols)) for i in range(rows): for j in range(cols): Z[i, j] = computeCost(Theta_0[i, j], Theta_1[i, j])  Here is the contour of the$J(\theta)$: In [5]: plt.figure() CS = plt.contour(Theta_0, Theta_1, Z, levels=[5, 8, 16, 32, 64, 128, 256]) plt.xlabel('theta0') plt.ylabel('theta1') plt.title('The contour of the J(theta)') plt.clabel(CS, inline=1, fontsize=10) plt.show()  The saddle point can be found at$(\theta_{0}, \theta_{1})$where$\frac{\partial}{\partial\theta}J(\theta) = 0$and$\frac{\partial^2}{\partial\theta}J(\theta) < 0$. The numerical approach is to use the graident descent. $$$$\theta_{0} = \theta_{0} - \alpha \frac{\partial}{\partial\theta_{0}}J(\theta)$$$$$$$$\theta_{1} = \theta_{1} - \alpha \frac{\partial}{\partial\theta_{1}}J(\theta)$$$$ More concretly: $$$$J(\theta_{0}, \theta_{1}) = \frac{1}{2m} \sum_{i=1}^m (h_{\theta}(x^{(i)}) - y^{(i)})^2 = \frac{1}{2m} \sum_{i=1}^m (\theta_{0} + \theta_{1}x^{(i)} - y^{(i)})^2$$$$$$$$\frac{\partial}{\partial\theta_{0}}J(\theta) = \frac{1}{2m} \sum_{i=1}^m 2 (\theta_{0} + \theta_{1}x^{(i)} - y^{(i)}) = \frac{1}{m} \sum_{i=1}^m (h_{\theta}(x^{(i)}) - y^{(i)})$$$$$$$$\frac{\partial}{\partial\theta_{1}}J(\theta) = \frac{1}{2m} \sum_{i=1}^m 2 (\theta_{0} + \theta_{1}x^{(i)} - y^{(i)})\cdot x^{(i)} = \frac{1}{m} \sum_{i=1}^m (h_{\theta}(x^{(i)}) - y^{(i)})\cdot x^{(i)}$$$$ In [6]: theta = np.array([0,0]) alpha = 0.01 m = len(X) iters = 1000 V = np.array([ones(shape(X)), X]) thetas = zeros(shape=(iters, 2)) for i in range(iters): thetas[i] = theta theta = theta - (alpha / m) * dot((np.dot(theta, V) - Y), V.transpose())  We can also render the gradient descent trace in the contour. In [7]: plt.figure() CS = plt.contour(Theta_0, Theta_1, Z, levels=[5, 8, 16, 32, 64, 128, 256]) plt.xlabel('theta0') plt.ylabel('theta1') plt.title('The trace of gradient descent in J(theta)') plt.clabel(CS, inline=1, fontsize=10) plt.plot(thetas[:, 0], thetas[:, 1], c='r') plt.show()  Let's check out how the hypothesis fit into the experiments: In [43]: B = np.linspace(0, 25, 100) Xh = np.array([ones(shape(B)), B]) Yh = np.dot(theta, Xh) plt.figure() plt.scatter(X, Y) plt.xlabel('Population of City in 10,000s') plt.ylabel('Profit in$10,000s')
plt.plot(B, Yh, c='red')
plt.xlim([0, 25])
plt.show()

print 'theta: \n', theta

theta:
[-3.24140214  1.1272942 ]


There is no reason to reinvent the wheel, we can leverage the power of Generalized Linear Models from scikit-learn:

In [46]:
from sklearn import linear_model
X = data[:, 0]
Y = data[:, 1]
clf = linear_model.LinearRegression()
X.shape = (len(X), 1)
clf.fit(X, Y)

print 'intercept: ', clf.intercept_
print 'Coefficients: ', clf.coef_

plt.figure()
plt.scatter(X, Y)
plt.xlabel('Population of City in 10,000s')
plt.ylabel('Profit in \$10,000s')
plt.plot(X, clf.predict(X), color='red')
plt.xlim([0, 25])
plt.show()

intercept:  -3.89578087831
Coefficients:  [ 1.19303364]