Personal study of the ISLR book

1) the p values indicate that the all the predictors except the newspaper is significant.
Since the intercept is non zero there is positive sales when there are no advertising on any channels.
Since radio is greater than tv, radio advertisement increases sales more than tv.

2) KNN classifier classifies a point as the class which the majority of the knns has, while regression estimates a value for that point which is the average of the knns.

3) a)

so the model is salary= 50+gpa*20+iq*0.07+gender*35+gpa*iq*0.01+gpa*gender*(-10)

if iq and gpa is fixed, if female you get 35-10*gpa more salary
so females get more salary if gpa<3.5, interesting!
since on average gpa<3.5 males earn more on average so i) is true ii) is false
iii) false iv) true

b) salary= 50+4*20+110*0.07+1*35+4*110*0.01+4*(-10)=137.1 in thousand dollars

c) false, there's no info about evidence, only info about how much it affects the salary in this model.

4)
a) training RSS should be lower for cubic
b) testing RSS should be lower for linear
c) training RSS lower for linear
d) testing RSS lower for cubic

5)
$$ (\sum_{i=1}^{n} x_{i}y_{i}) / (\sum_{i=1}^{n} x_{i\prime}^2)=
\sum_{i=1}^{n} a_{i\prime}y_{i\prime}
$$

where $$ a_{i\prime}= x_{i}x_{i\prime}/(\sum_{i\prime=1}^{n} x_{i\prime}^2)$$

6) By 3.4 its the case that we have
$$ \hat{\beta_0} + \hat{\beta_1}\bar{x} = \bar{y}$$
Therefore it passes thru the $$ (\bar{x},\bar{y}) $$

7) Here is the proof
http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient#Pearson.27s_correlation_and_least_squares_regression_analysis

and see this
http://en.wikipedia.org/wiki/Coefficient_of_determination#As_explained_variance

8)

auto<-read.csv("Auto.csv",na.strings="?")
attach(auto)
fit<-lm(mpg~horsepower)
summary(fit)

a)

i) there's a relationship
ii) significant because of low p-value
iii) its negative
iv) predict(fit,data.frame(horsepower=98))=24.46708
predict(fit,data.frame(horsepower=98), interval="confidence")
predict(fit,data.frame(horsepower=98), interval="prediction")

b)

plot(horsepower,mpg)
abline(fit)

c)

plot(fit)

9)
a) pairs(auto)
b) cor(auto[,-9])
c)

fit<-lm(mpg~.-name,data=auto)
summary(fit)

i) yes there is

ii) weight, year, origin strongly, displacement less so

iii) one year increase results in an increase in mpg of 0.750773

plot(fit)

high leverage point is 14

and there are a few outliers like 323, 327, 394

e)
all interactions except name:
fit<-lm(mpg~(.-name)*(.-name),data=auto)
acceleration:year 5.562e-02 2.558e-02 2.174 0.03033 *
acceleration:origin 4.583e-01 1.567e-01 2.926 0.00365 **
displacement:year 5.934e-03 2.391e-03 2.482 0.01352 *

f)
plot(log(auto$horsepower),auto$mpg)
plot(auto$horsepower,auto$mpg)
log is more linear looking

#10
a)
install.packages("ISLR")
library(ISLR)
data(Carseats)
fit<-lm(Sales~Price+Urban+US,data=Carseats)
summary(fit)

b)
(Intercept) 13.043469 0.651012 20.036 < 2e-16 ***
Price -0.054459 0.005242 -10.389 < 2e-16 ***
UrbanYes -0.021916 0.271650 -0.081 0.936
USYes 1.200573 0.259042 4.635 4.86e-06 ***

1$ increase in Price decreases Sales by ~ 54.46
If the store is in US it incrases Sales by ~ 1200
If the store is not Urban sales decrease by ~ 21.9 but this is not significant

c)
Sales=-0.054459*Price+-0.021916*Urban+1.200573*US

d)
For Price and US we can reject the null hypothesis that they are zero

e)
fit2<-lm(Sales~Price+US,data=Carseats)

f)
Both models have the same r2, the RSE for the more complicated model is a little higher, but that is expected right?

g)
confint(fit2)
2.5 % 97.5 %
(Intercept) 11.79032020 14.27126531
Price -0.06475984 -0.04419543
USYes 0.69151957 1.70776632

plot(fit2)
the leverage graph indicates that 368 might be a high leverage point
studentized residuals indicate that we don't have outliers

11)
set.seed(1)
x=rnorm(100)
y=2*x+rnorm(100)

a)
fit<-lm(y~x+0)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
x 1.9939 0.1065 18.73 <2e-16 ***

So y increases as 2*x as expected

b)
fit<-lm(x~y+0)
summary(fit)

Coefficients:
Estimate Std. Error t value Pr(>|t|)
y 0.39111 0.02089 18.73 <2e-16 ***

0.39111 looks interesting, it should have been closer to 0.5 no?

12)
a)
this needs to hold
$$ \sum x_i^2= \sum y_i^2 $$

b)
x<- 1:100
y<-2*x+rnorm(100,sd=0.0001)
lm.Y<- lm(y~x+0)
summary(lm.Y)
lm.X<- lm(x~y+0)
summary(lm.X)

c)
x<- 1:100
y<-x+rnorm(100,sd=0.0001)
lm.Y<- lm(y~x+0)
summary(lm.Y)
lm.X<- lm(x~y+0)
summary(lm.X)

13)

set.seed(1)

a)
x=rnorm(100)

b)
eps=rnorm(100,0,0.25)

c)
y=-1+0.5*x+eps
length(y) is 100
beta0=-1
beta1=0.5

d)
plot(x,y)
nicely linear

e)
fit<-lm(y~x)
summary(fit)

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.00942 0.02425 -41.63 <2e-16 ***
x 0.49973 0.02693 18.56 <2e-16 ***

the hatted betas are close to real ones

f)
abline(fit, col="black")

abline(-1, 0.5, col = "red")
legend("bottomright",c("least sq", "pop"), title="legend",col=c("black","red"),lty = c(1,1))

g)
fit2<-lm(y~x+I(x^2))
or
fit3<-lm(y~poly(x,2))
the squared term shows up as insignificant, so it does not improve the model

h)
set.seed(1)
x=rnorm(100)
eps=rnorm(100,0,0.1)
y=-1+0.5*x+eps
fit<-lm(y~x)
summary(fit)

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.003769 0.009699 -103.5 <2e-16 ***
x 0.499894 0.010773 46.4 <2e-16 ***

plot(x,y)
abline(fit, col="black")

abline(-1, 0.5, col = "red")
legend("bottomright",c("least sq", "pop"), title="legend",col=c("black","red"),lty = c(1,1))

i)

set.seed(1)
x=rnorm(100)
eps=rnorm(100,0,0.01)
y=-1+0.5*x+eps
fit<-lm(y~x)
summary(fit)
plot(x,y)
abline(fit)
abline(-1, 0.5, col = "red")

still the lines match closely

confint(fit)
the confidence intervals get tighter with less noisy data as expected

14)
a)

set.seed(1)
x1=runif(100)
x2=0.5*x1+rnorm(100)/10
y=2+2*x1+0.3*x2+rnorm(100)

b)
plot(x1,x2)
cor(x1,x2)

c)
fit<-lm(y~x1+x2)
summary(fit)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.1305 0.2319 9.188 7.61e-15 ***
x1 1.4396 0.7212 1.996 0.0487 *
x2 1.0097 1.1337 0.891 0.3754

we reject beta1=0 and fail to reject beta2=0

d)
fit<-lm(y~x1)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.1124 0.2307 9.155 8.27e-15 ***
x1 1.9759 0.3963 4.986 2.66e-06 ***

we do reject beta1=0

e)
fit<-lm(y~x2)
summary(fit)

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.3899 0.1949 12.26 < 2e-16 ***
x2 2.8996 0.6330 4.58 1.37e-05 ***

we do reject beta2=0

f) they don't contradict, with mlr one of them is redundant

g)
x1=c(x1, 0.1)
x2=c(x2, 0.8)
y=c(y,6)

mlr<-lm(y~x1+x2)
l1<-lm(y~x1)
l2<-lm(y~x2)

for mlr 101 is a leverage point but not an outlier
for l1 101 is an outlier and a leverage point
for l2 101 is a leverage point but not an outlier

15)

library(MASS)
data(Boston)
a)

So this gives us the insignificant predictors

unlist(lapply(names(Boston)[2:length(names(Boston))], FUN=function(x) {
if (summary(lm(as.formula(paste0("crim~",x)),data=Boston))$coefficients[2,4]>0.05) {
return (x)
}
}))

[1] "chas"

b)
fit<-lm(crim~.,data=Boston)
for dis,rad,black,medv,zn we can reject

c)

univ<-unlist(lapply(names(Boston)[2:length(names(Boston))], FUN=function(x) {
return (summary(lm(as.formula(paste0("crim~",x)),data=Boston))$coefficients[2,1])
}))

coef<-unname(fit$coefficients)
coef<-coef[2:length(coef)]

plot(univ,coef)
the graph shows all predictors clustered except nox far away from them, interesting

d)

lapply(names(Boston)[2:length(names(Boston))], FUN=function(x) {
try(
{
nlrsq<-summary(lm(as.formula(paste0("crim~","poly(",x,",3)")),data=Boston))$r.sq
lrsq<-summary(lm(as.formula(paste0("crim~",x)),data=Boston))$r.sq
return (c(x,(nlrsq-lrsq)/lrsq))
}, silent=T
)
})

we see the biggest improvement in medv, dis, nox, check the output!

Personal study of the ISLR book

Saturday, October 11, 2014

3.7 Exercises

Monday, October 6, 2014

Blog Archive