Home Page > > Details

STATS 330: Statistical Modelling

Department of Statistics
STATS 330: Statistical Modelling
Take Home Test
Semester 1, 2020
Total: 65 marks Due: 5pm NZDT, Thursday 7 May 2020
Notes:
(i) Enter the answers to each of the questions either in the R Markdown file provided, or
in a Word document or on A4 paper. If you are using R Markdown, knit your report
to either a Word or PDF document. If you are using A4 paper, follow the instructions
provided on Canvas to scan your work as a single PDF document.
(ii) This is an open book test. You may consult your notes when doing the test.
(iii) Please remember to submit your Word or PDF document to Canvas before the
deadline. No extensions are possible without prior arrangement.
(iv) Presentation is important. Make sure you check your spelling and layout.
(v) An Appendix contains R–output for use in Questions 3 and 4.
(vi) Useful equations can be found on page 10 of the Appendix.
1
(1) Statement.
Please type or write the below text into your R Markdown, Word or paper document,
and type or write your name below it:
For the 24 hour duration of this test, I confirm that I will not discuss
the content of the test with anyone else. I will not give any assistance to
another student taking this test. I will not receive any assistance from any
person or tutoring service. (0 marks)
If you do not include this statement, your test will not be marked.
(2) Doctors. There is no R–output for this question. It is not required to answer this
question.
The following data come from a famous study carried out by Sir Richard Doll and
colleagues. In 1951, all doctors in the UK were sent a questionnaire about whether
or not they smoked. In the years that followed, information about their deaths was
collected. The dataset doctors.df has the following variables:
deaths Number of deaths in a given age group
age Mid-point of a given age group (e.g. age=50 for all doctors in the 45-54 age
group)
years1 Total number of person-years of observation for a given age group
smoke Smoking status. Either "yes" or "no"
Here are the first three rows of the data:
head(doctors.df,3)
## deaths years age smoke
## 1 32 52407 40 yes
## 2 104 43248 50 yes
## 3 206 28612 60 yes
The following two models were fitted to these data:
poisson.fit<-glm(deaths~age+I(age^2)+smoke+offset(log(years)),
family="poisson", data=doctors.df)
binomial.fit<-glm(cbind(deaths,years-deaths)~age+I(age^2)+
smoke, family="binomial", data=doctors.df)
(a) Write equations to fully describe the fitted model poisson.fit. (5 marks)
(b) Write equations to fully describe the fitted model binomial.fit. (5 marks)
(c) Explain how the variable years is being used in the model poisson.fit. (3 marks)
1 For all doctors who were alive at the time of analysis, years=10. For a doctor who died
three and a half years after completion of the questionnaire, years=3.5.
2
(3) Insects. R–output for this question can be found in the Appendix: Insects,
pages 2 to 6.
An entomologist collected some data to investigate factors that may affect the infection
rates of insects. She was interested in exploring the effects of age, weight and sex
on infection status. The dataset insects.df has the following variables:
infected The infection status of the insect. Either 1 (for infected) or 0 (for not infected)
age The age of the insect (in days)
weight The weight of the insect (in grams)
sex The sex of the insect. Either "female" or "male"
Here are the first three rows of the data:
head(insects.df,3)
## infected age weight sex
## 1 0 2 1 female
## 2 0 9 13 female
## 3 1 15 2 female
(a) Write equations to fully describe the fitted model insects1.fit. (5 marks)
(b) Interpret the effect of sex based on the model insects1.fit. (5 marks)
The entomologist’s prior experience suggested that it would be worth exploring a non￾linear effect of both age and weight. She therefore performed the following analysis.
library(mgcv)
insects.quadcheck.fit<-gam(infected~s(age)+s(weight)+sex,
family="binomial", data=insects.df)
plot(insects.quadcheck.fit)
(c) On the basis of the relevant output in the Appendix: Insects, she decides to
include a quadratic term for age in the model, but not to include a quadratic
term for weight. Explain why you either agree or disagree with the entomologist.
Make sure you justify your answer. (3 marks)
The entomologist then fitted a new model, insects.quad, incorporating a quadratic
effect for age.
(d) If you were advising the entomologist, which model, out of insects1.fit and
insects.quad, would you recommend? Make sure you refer to relevant output
from each model to justify your recommendation. (5 marks)
Consider the following code:
anova(insects1.fit,insects.interactions.fit,test="Chisq")
(e) What is the null hypothesis being tested by the anova() function? (3 marks)
(f) What do you conclude from this hypothesis test? (3 marks)
3
(4) Rats. R–output for this question can be found in the Appendix: Rats, pages
7 to 9.
Recall the lirat.df dataset from Handout 7. Female rats were put on iron-deficient
diets and divided into four groups. Once pregnant, rats from each group were
subject to one of four iron-supplementation regimes. The dataset contains the
following variables:
N The size of the litter
R The number of dead fetuses
hb The mother’s haemoglobin level
grp A group corresponding to the treatment given to the mother. Group A was
given no supplementation, while groups B–D were given increasing dosages of
iron supplementation.
Here are the first three rows of the data:
head(lirat.df,3)
## N R hb grp
## 1 10 1 4.1 A
## 2 11 4 3.2 A
## 3 12 9 4.7 A
Three models were fitted to the data:
• A logistic regression model binom.fit
binom.fit<-glm(cbind(R,N-R) ~ grp + hb, family="binomial",
data=lirat.df)
• A quasi binomial model qbinom.fit
qbinom.fit<-glm(cbind(R,N-R) ~grp + hb, family="quasibinomial",
data=lirat.df)
• A beta binomial model bbinom.fit
bbinom.fit<-vglm(cbind(R,N-R) ~ grp + hb, family="betabinomial",
data=lirat.df)
The tenth observation in the dataset has the following observed values:
lirat.df[10,]
## N R hb grp
## 10 10 7 4.8 A
(a) Calculate the estimated variance of the response for this observation under the
following models. Calculate your answers to two decimal places and show all of
your working.
(i) binom.fit (5 marks)
(ii) qbinom.fit (5 marks)
(iii) bbinom.fit (5 marks)
(b) Calculate the raw residuals and the Pearson’s residuals for this observation under
the following models. Calculate your answers to two decimal places and show
your working.
(i) binom.fit (5 marks)
(ii) bbinom.fit (5 marks)
(c) How would you determine whether the model bbinom.fit could be replaced by
the model binom.fit? Justify your answer.
Note: you do not need to perform any analysis here. Just describe what you would do. (3 marks)
Contact Us - Email:99515681@qq.com    WeChat:codinghelp
Programming Assignment Help!