Open the HomePrices.xlsx text file in Tableau.
Drag the Home Prices Spreadshee
Open the HomePrices.xlsx text file in Tableau.
Drag the Home Prices Spreadsheet into the data connection canvas.
Run interpreter.
Review the Excel file that Tableau generated. (Note that this dataset is formatted correctly)
Create a bar chart showing the distribution of building types:
Click on Sheet 1
Double-click on the Number of Records (or drag it onto the rows shelf). Note that the aggregation Sum(Number of Records) is placed on the Rows Shelf
Drag Bldg Type to the Columns Shelf
Question 1: What have you learned about the distribution of building types? Which Building Type has the most homes for sale? The least?
Create a histogram of Sale Price:
Click on Sheet 2
Double-click on Sale Price (or drag onto rows shelf)
Select histogram in ShowMe
Question 2: What field did Tableau create to make the histogram? Why?
Question 3: What is the shape of the histogram? Will the mean or median have a higher value? Which value (mean or median) would be a better measure for the center of the distribution?
Create a boxplot of the Sale Price:
Create new sheet
Double-click on Sale Price (or drag onto rows shelf)
Disaggregate measures (de-select “aggregate measures” in the Analysis menu)
Select the box plot from the ShowMe menu
Drag Bldg Type and House Style to Tooltip on the Marks card
Question 4: Are there outliers present? If yes, are outliers above or below the center of the distribution?
Question 5: What house style has the highest sale price?
Create a set of box plots of Sale Price for Building Type:
Duplicate the Boxplot of the Sale Price Sheet (right-click the tab and select duplicate from the dropdown menu)
Drag Bldg Type to the Columns shelf
Question 6: Which building type has the lowest median Sale Price?
Question 7: Which building type has no outliers in Sale Price?
Question 8: Which distribution has the largest spread? The smallest spread?
Part II: Scatter Plots and Regression
You are interested in determining what factors influence the house sale price (SalePrice). To investigate this question, do the following:
Construct three scatter plots. For each plot, place the Explanatory variable on the x-axis and the Response variable on the y-axis.
Variable combinations:
YearBuilt and SalePrice
1stFlrSF and SalePrice
LotArea and SalePrice
Question 9: Evaluate the regression conditions for each plot. Explain why or why not it is appropriate to run a regression analysis on each plot. Please address all conditions covered in this module (Hint: Slide 25 in the lecture notes).
For plots that meet the regression conditions, add a linear trend line.
Please submit your .twbx file + a screenshot of your notebook here and respond to the questions