The Prostate Dataset The prostate dataset comes from a study on 97 men with pros

The Prostate Dataset
The prostate dataset comes from a study on 97 men with pros

The Prostate Dataset
The prostate dataset comes from a study on 97 men with prostate cancer who were due to receive radical prostatectomy.
The data contain the following variables:
lcavol: log(cancer volume in cm3)
lweight: log(prostate weight in gm)
age: age in years
lbph: log(benign prostatic hyperplasia amount)
svi: seminal vesicle invasion
lcp: log(capsular penetration)
Gleason: Gleason score
pgg45: percentage Gleason scores 4 or 5
lpsa: log(prostate specific antigen in ng/mL)
Question 1
Validate that the prostate data frame contains 97 observations.
Hint: First install the faraway package (if you haven’t already) as instructed on Lesson 1, Slide 49. The following R statement will load the prostate data frame:data(“prostate”, package = “faraway”).
Use the nrow() function to see how many overvaluations (rows) the data frame has. For example: the following statement prints the number of observations in the car data frame: nrow(cars).
Question 2
Calculate descriptive statistics of each of the variables.
Hint: Use the summary() function. For example: summary(cars).
Question 3
Create a new data frame that includes the following variables: lcavol, lweight, age and lpsa.
Use this new data frame for all questions below.
Hint: In the following example, we select two variables (agegp and alcgp) from the esoph data frame and name the new data frame esophSubDf
esophSubDf <- esoph[c("agegp", "alcgp")] Question 4 Calculate descriptive statistics of each of the variables using the new data frame. Question 5 Create a scatter plot matrix for all the variables using the new data frame. Hint: Use the pairs() function (see Lesson 2, Slide 50). Question 6 Create a (Pearson) correlation matrix for all the variables. Hint: Use the cor() function (see Lesson 2, Slide 48). Question 7 Show the same matrix again, but round the correlations (use two decimal places). Hint: Use the round() function. The following example calculates the correlation matrix for the cars data frame and rounds the numbers: round(cor(cars),2) Question 8 Create a regression model: The predictor variable (X) should be lpsa. The outcome variable (Y) should be lcavol. Show the summary of the model. Hint: Use the lm() and summary() functions (see Lesson 2, Slide 51). Question 9 Visualize the two variables and the model you just created by doing the following: Create a scatter plot. Put lcavol in the y-axis and lpsa in the x-axis. Include the regression line and label the axis. Hint: See Lesson 2, Slide 52. Question 10 Update the regression model by adding a second predictor: age Show the regression model summary Hint: See Lesson 2, Slide 53.

Posted in R