http://en.wikipedia.org/wiki/Bias–variance_tradeoff#Derivation
Nice explanation of it
http://scott.fortmann-roe.com/docs/BiasVariance.html
2.4 Exercises
1.
a) if n is large and p is small flexible method must be worse in general, the flexible method will try to fit more points reducing its performance.
b) if p is large and n is small, the reverse of a, flexible should be better.
c) flexible better
d) flexible worse, will fit to error
2.
a) n=500, p=4, a regression problem
b) classification, n=20, p=13
c) regression, n=#number of weeks in 2012, p=2. aren't those two the same thing "% change in the dollar, the % change in the US market"
3.
will draw later with R
4,5,6 skipped
7. a)
1=>3
2=>4
3=>sqrt(10)~3.16
4=>sqrt(5)~2.236
5=>sqrt(2)~1.41
6=>sqrt(3)~1.73
b) when k=1 the nearest point is the 5th, so its Green
c) when k=3, 3 nearest points are 5,6,4, 2 Greens 1 Red, so its Green
d) if Bayes decision boundary is highly nonlinear we would benefit from a low K because the data is inherently nonlinear
8.
setwd("~/Downloads/islr")
college<-read.csv("College.csv")
rownames(college)=college[,1]
college=college[,-1]
#i)
summary(college)
#ii)
pairs(college[,1:10])
#iii)
plot(college$Private,college$Outstate)
#iv)
Elite=rep("No",nrow(college))
Elite[college$Top10perc >50]="Yes"
Elite=as.factor(Elite)
college=data.frame(college ,Elite)
summary(Elite)
plot(college$Elite,college$Outstate)
#v)
par(mfrow=c(2,2))
hist(college$Outstate,breaks=10)
hist(college$Outstate,breaks=20)
hist(college$Outstate,breaks=30)
hist(college$Outstate,breaks=40)
hist(college$Apps,breaks=10)
hist(college$Apps,breaks=20)
hist(college$Apps,breaks=30)
hist(college$Apps,breaks=40)
#vi) Skipped
#9
auto<-read.csv("Auto.csv")
#a) all are quantitative except name,origin,year
#b)
sapply(names(auto),FUN=function(x) {
if (class(auto[,x])!="factor") {
range(auto[,x])
}
})
#c)
sapply(names(auto),FUN=function(x) {
if (class(auto[,x])!="factor") {
c(mean(auto[,x]),sd(auto[,x]))
}
})
#d)
auto<-auto[-(10:85),]
sapply(names(auto),FUN=function(x) {
if (class(auto[,x])!="factor") {
c(mean(auto[,x]),sd(auto[,x]))
}
})
#e)
auto<-read.csv("Auto.csv")
pairs(auto)
#f)
displacement, weight seems to have linear relationship with mpg
#9
auto<-read.csv("Auto.csv")
#a) all are quantitative except name,origin,year
#b)
sapply(names(auto),FUN=function(x) {
if (class(auto[,x])!="factor") {
range(auto[,x])
}
})
#c)
sapply(names(auto),FUN=function(x) {
if (class(auto[,x])!="factor") {
c(mean(auto[,x]),sd(auto[,x]))
}
})
#d)
auto<-auto[-(10:85),]
sapply(names(auto),FUN=function(x) {
if (class(auto[,x])!="factor") {
c(mean(auto[,x]),sd(auto[,x]))
}
})
#e)
auto<-read.csv("Auto.csv")
pairs(auto)
#f)
displacement, weight seems to have linear relationship with mpg
No comments:
Post a Comment