Monday, October 6, 2014

Proof of 2.7 Bias Variance Tradeoff
http://en.wikipedia.org/wiki/Bias–variance_tradeoff#Derivation

Nice explanation of it
http://scott.fortmann-roe.com/docs/BiasVariance.html

2.4 Exercises

1.

a) if n is large and p is small flexible method must be worse in general, the flexible method will try to fit more points reducing its performance.

b) if p is large and n is small, the reverse of a, flexible should be better.

c) flexible better

d) flexible worse, will fit to error


2.

a) n=500, p=4, a regression problem

b) classification, n=20, p=13

c) regression, n=#number of weeks in 2012, p=2. aren't those two the same thing "% change in the dollar, the % change in the US market"

3.

will draw later with R


4,5,6 skipped

7. a)

1=>3
2=>4
3=>sqrt(10)~3.16
4=>sqrt(5)~2.236
5=>sqrt(2)~1.41
6=>sqrt(3)~1.73


b) when k=1 the nearest point is the 5th, so its Green
c) when k=3, 3 nearest points are 5,6,4, 2 Greens 1 Red, so its Green
d) if Bayes decision boundary is highly nonlinear we would benefit from a low K because the data is inherently nonlinear

8.

setwd("~/Downloads/islr")

college<-read.csv("College.csv")
rownames(college)=college[,1]
college=college[,-1]

#i)
summary(college)

#ii)
pairs(college[,1:10])

#iii)
plot(college$Private,college$Outstate)

#iv)
Elite=rep("No",nrow(college))
Elite[college$Top10perc >50]="Yes"
Elite=as.factor(Elite)
college=data.frame(college ,Elite)
summary(Elite)
plot(college$Elite,college$Outstate)

#v)
par(mfrow=c(2,2))
hist(college$Outstate,breaks=10)
hist(college$Outstate,breaks=20)
hist(college$Outstate,breaks=30)
hist(college$Outstate,breaks=40)

hist(college$Apps,breaks=10)
hist(college$Apps,breaks=20)
hist(college$Apps,breaks=30)
hist(college$Apps,breaks=40)

#vi) Skipped


#9

auto<-read.csv("Auto.csv")

#a) all are quantitative except name,origin,year

#b)
sapply(names(auto),FUN=function(x) {
  if (class(auto[,x])!="factor") {
    range(auto[,x])
    }
  })

#c)
sapply(names(auto),FUN=function(x) {
  if (class(auto[,x])!="factor") {
    c(mean(auto[,x]),sd(auto[,x]))
  }
})

#d)
auto<-auto[-(10:85),]
sapply(names(auto),FUN=function(x) {
  if (class(auto[,x])!="factor") {
    c(mean(auto[,x]),sd(auto[,x]))
  }
})

#e)
auto<-read.csv("Auto.csv")
pairs(auto)

#f)
displacement, weight seems to have linear relationship with mpg

No comments:

Post a Comment