The Prostate Dataset

The prostate dataset comes from a study on 97 men with pros

The Prostate Dataset

The prostate dataset comes from a study on 97 men with prostate cancer who were due to receive radical prostatectomy.

The data contain the following variables:

lcavol: log(cancer volume in cm3)

lweight: log(prostate weight in gm)

age: age in years

lbph: log(benign prostatic hyperplasia amount)

svi: seminal vesicle invasion

lcp: log(capsular penetration)

Gleason: Gleason score

pgg45: percentage Gleason scores 4 or 5

lpsa: log(prostate specific antigen in ng/mL)

Question 1

Validate that the prostate data frame contains 97 observations.

Hint: First install the faraway package (if you haven’t already) as instructed on Lesson 1, Slide 49. The following R statement will load the prostate data frame:data(“prostate”, package = “faraway”).

Use the nrow() function to see how many overvaluations (rows) the data frame has. For example: the following statement prints the number of observations in the car data frame: nrow(cars).

Question 2

Calculate descriptive statistics of each of the variables.

Hint: Use the summary() function. For example: summary(cars).

Question 3

Create a new data frame that includes the following variables: lcavol, lweight, age and lpsa.

Use this new data frame for all questions below.

Hint: In the following example, we select two variables (agegp and alcgp) from the esoph data frame and name the new data frame esophSubDf

esophSubDf <- esoph[c("agegp", "alcgp")]
Question 4
Calculate descriptive statistics of each of the variables using the new data frame.
Question 5
Create a scatter plot matrix for all the variables using the new data frame.
Hint: Use the pairs() function (see Lesson 2, Slide 50).
Question 6
Create a (Pearson) correlation matrix for all the variables.
Hint: Use the cor() function (see Lesson 2, Slide 48).
Question 7
Show the same matrix again, but round the correlations (use two decimal places).
Hint: Use the round() function. The following example calculates the correlation matrix for the cars data frame and rounds the numbers:
round(cor(cars),2)
Question 8
Create a regression model:
The predictor variable (X) should be lpsa.
The outcome variable (Y) should be lcavol.
Show the summary of the model.
Hint: Use the lm() and summary() functions (see Lesson 2, Slide 51).
Question 9
Visualize the two variables and the model you just created by doing the following:
Create a scatter plot. Put lcavol in the y-axis and lpsa in the x-axis. Include the regression line and label the axis.
Hint: See Lesson 2, Slide 52.
Question 10
Update the regression model by adding a second predictor: age
Show the regression model summary
Hint: See Lesson 2, Slide 53.