requirement and data: https://www.dropbox.com/scl/fi/scbap7jvls2cay2vppkcs/ARE10

requirement and data: https://www.dropbox.com/scl/fi/scbap7jvls2cay2vppkcs/ARE106_HW1_F2024.zip?rlkey=um5t20bt4kfj0jm9ui4txfqja&st=c1tbj908&dl=0

Skip to content
# Category: R

## requirement and data: https://www.dropbox.com/scl/fi/scbap7jvls2cay2vppkcs/ARE10

## requirement and data: https://www.dropbox.com/scl/fi/scbap7jvls2cay2vppkcs/ARE10

## requirement and data: https://www.dropbox.com/scl/fi/scbap7jvls2cay2vppkcs/ARE10

requirement and data: https://www.dropbox.com/scl/fi/scbap7jvls2cay2vppkcs/ARE106_HW1_F2024.zip?rlkey=um5t20bt4kfj0jm9ui4txfqja&st=c1tbj908&dl=0

## 1. Suppose there is a population of 1000 people and 500 of them have already ado

## Instructions
Provide the code that parallelizes the following:
library(M

## Review the attached file. Suzie has an issue. She can either move to NY or FL an

## d) Data Mining
i. Use the chosen data mining methods for exploring, analyzing, a

## Download the dataset here Download the dataset herefor this question.
The data s

## Assignment Instructions:
Dataset Selection: Select a suitable dataset for perfor

## Assignment Instructions:
Dataset Selection: Select a suitable dataset for perfor

requirement and data: https://www.dropbox.com/scl/fi/scbap7jvls2cay2vppkcs/ARE10

requirement and data: https://www.dropbox.com/scl/fi/scbap7jvls2cay2vppkcs/ARE106_HW1_F2024.zip?rlkey=um5t20bt4kfj0jm9ui4txfqja&st=c1tbj908&dl=0

requirement and data: https://www.dropbox.com/scl/fi/scbap7jvls2cay2vppkcs/ARE10

requirement and data: https://www.dropbox.com/scl/fi/scbap7jvls2cay2vppkcs/ARE106_HW1_F2024.zip?rlkey=um5t20bt4kfj0jm9ui4txfqja&st=c1tbj908&dl=0

requirement and data: https://www.dropbox.com/scl/fi/scbap7jvls2cay2vppkcs/ARE10

1. Suppose there is a population of 1000 people and 500 of them have already ado

1. Suppose there is a population of 1000 people and 500 of them have already adopted a new behavior. In the next time period, how many will begin the behavior if there is a constant hazard of .5? What about if the hazard is .5 * current adoption base? Show your work.

2. Draw a CDF (cumulative distribution over time) graph for internal influence and another for external influence. (No numbers, just the general shape). Label which of the graphs reflects a constant hazard.

3. (Two points) Open the NetLogo model “epiDEM Basic,” which simulates a S-I-R diffusion model.

https://www.netlogoweb.org/launch#https://www.netl…

Set it to 400 people, 20% infection-chance, 30% recovery-chance, and average recovery time of 100. Let it run for about 100 simulated hours (this will only take a few seconds of real time). Now examine the “Cumulative Infected and Recovered” and the “Infection and Recovery Rates” data. Note that NetLogo draws these graphs too flat to read, so you will probably want to click the three horizontal lines and either “download CSV” or “view full screen.” Include a copy of the graph. (Note the “download PNG” button just gives you the smushed graph so you might want to screenshot). What does the shape of the “% infected” line on the “cumulative infected and recovered” graph suggest about internal influence vs external influence?

4. What is the R0? Read the “model info” tab to learn what “R0” means and then explain it in your own words. (If you use a source besides the “model info” tab please cite it). Given the parameters in question #3, after 20 hours R0 is probably around 5.5. Play around with the “infection-chance,” “recovery-chance,” and “average-recovery-time” sliders. Include a screenshot if you can find a combination of parameters that gets R0 below 5 after 50 hours.

5. Consider Granovetter’s threshold model of collective behavior and whether each of the following assumptions about a population of 500 would be consistent with frequent riots.

* a uniform distribution of rioting thresholds from 0 to 100

* a normal distribution of rioting thresholds with a mean of 10 and a standard deviation of 2.

* a normal distribution of rioting thresholds with a mean of 12 and a standard deviation of 4.

* a Poisson distribution with a mean of 10

Explain the model and use it to justify your answer.

(if your stats knowledge is too rusty to visualize what these distributions look like, see pdf attached)

6. According to Rossman and Fisher’s simulation, under what conditions does it matter if an innovation starts with the most central person in a network?

7. In Centola’s model would a “simple contagion” spread faster in a pure ring lattice or a Watts-Strogatz with 2% rewiring? Why? How about a “complex contagion”? Why?

Instructions

Provide the code that parallelizes the following:

Instructions

Provide the code that parallelizes the following:

library(MKinfer) # Load package used for permutation t-test

# Create a function for running the simulation:

simulate_type_I <- function(n1, n2, distr, level = 0.05, B = 999,alternative = "two.sided", ...)
{
# Create a data frame to store the results in:
p_values <- data.frame(p_t_test = rep(NA, B),p_perm_t_test = rep(NA, B),p_wilcoxon = rep(NA, B))
for(i in 1:B)
{
# Generate data:
x <- distr(n1, ...)
y <- distr(n2, ...)
# Compute p-values:
p_values[i, 1] <- t.test(x, y,
alternative = alternative)$p.value
p_values[i, 2] <- perm.t.test(x, y,alternative = alternative,R = 999)$perm.p.value
p_values[i, 3] <- wilcox.test(x, y,alternative = alternative)$p.value
}
# Return the type I error rates:
return(colMeans(p_values < level))
}
2. Provide the code that runs the following code in parallel with 4 workers (with mclapply):
lapply(airquality, function(x) { (x-mean(x))/sd(x) })

Review the attached file. Suzie has an issue. She can either move to NY or FL an

Review the attached file. Suzie has an issue. She can either move to NY or FL and needs to review some data that her agent gave her. The agent reviewed house prices and crime ratings for houses that Suzie would be interested in based on her selection criteria. She wants to live in an area with lower crime but wants to know a few things:

Is it more expensive or less expensive to live in FL or NY?

Is the crime rate higher in FL or NY (Note a low score in crime means lower crime)?

Is the crime rate higher in lower or higher house price areas?

Using the R tool, show the data in the tool to answer each of the questions. Also, show the data visualization to go along with the summary.

If you were Suzie, where would you move based on the questions above?

After you gave Suzie the answer above (to #4), she gave you some additional information that you need to consider:She has $100,000 to put down for the house.

If she moves to NY she will have a job earning $120,000 per year.

If she moves to FL she will have a job earning $75,000 per year.

She wants to know the following:On average what location will she be able to pay off her house first based on average housing prices and income she will receive?

Where should she move and why? Please show graphics and thoroughly explain your answer here based on the new information provided above.

Note: The screenshots should be copied and pasted and must be legible. Only upload the word document. Be sure to answer all of the questions above and number the answers. Be sure to also explain the rational for each answer and also ensure that there are visuals for each question above. Use two peer reviewed articles to support your position.

d) Data Mining

i. Use the chosen data mining methods for exploring, analyzing, a

d) Data Mining

i. Use the chosen data mining methods for exploring, analyzing, and extracting

important information from the prepared data set.

ii. Perform the data mining process based on the chosen method by using R

software.

iii. You are required to fine tune the parameter setting of the data mining methods

in order to achieve high quality of model. Show the parameter tuning process

and select the best parameter setting as default setting.

iv. Describe the data mining methods, the resulting data mining models, and any

important information obtained from the mining process.

fyi, i have done with data preparation,now i need u to help me to data mining in classification (• Decision Tree

Support Vector Machine • Naïve Bayes) only

• Neural Network

• K-Nearest Neighbour

Download the dataset here Download the dataset herefor this question.

The data s

Download the dataset here Download the dataset herefor this question.

The data set contains information on sales of 1oz gold coins on eBay. Further details will be available in the key after the exam ends. The file contains the following variables:

DATE: date of the sale

SALE: final selling price of the coin

GOLDPRICE: price of gold, one ounce, at the end of trading on the date of the sale, or, if the sale is on a weekend or holiday, the end of the previous day of trading.

BIDS: the number of bids submitted for the auction (these eBay sales were in an auction format)

TYPE: E for Eagle or a US coin, KR for Krugerrand or a South African coin and ML for Maple Leaf or a Canadian coin.

SHIPPING: cost of shipping; this is an additional fee the buyer must pay so that SALE+SHIPPING is the total cost to the buyer.

SLABBER: P for PCGS, N for NGC or U for not slabbed; a slabbed coin is a coin inside a tamper proof holder that also indicates the coin’s condition or grade.

GRADE: the grade of slabbed coins. If SLABBER=’U’ then this is 0.

other: additional characteristics of slabbed coins are noted here; example FD means the “slab” or coin holder notes that the coin was minted on the first day of minting and FDIFLAG means that it is labeled was first day of issue and the holder has an image of a flag on it.

a) What is the average for SALE?

[ Select ] [“1915”, “1651”, “1654”, “1930”] .

b) What is the maximum for BIDS?

[ Select ] [“55”, “60”, “57”, “62”] .

c) Create a boxplot of SALE. You should see that there are 3 (three) outliers. Look at those three observations and choose the correct statement. (i) the observations either have only 1 bid or other=”BURNISHED”, (ii) the observations all have SLABBER=”P”, (iii) the observation(s) with low value(s) for SALE has/have only 1 or 2 bids while the observation(s) with high value(s) for SALE has/have numbers of bids near the maximum, say within 5 of the maximum, (iv) the observations either have other=”ME” or “LD”.

[ Select ] [“(ii)”, “(iii)”, “(i)”, “(iv)”] .

d) In R type the following command, table(yourdataset$other), where yourdataset is the name you gave to the dataset with the ebay coin sales. This will produce a table showing the value for “other” and the number of observations which have that value. For example, it will show the value “0” and under that the number 20, meaning that there are 20 observations where other is 0 and then it will show BURNISHED and under that a 2, meaning there are 2 coins where other is BURNISHED. How many observations are there where other is “LD”?

[ Select ] [“4”, “6”, “1”, “2”] .

e) Run a regression where SALE is the dependent variable and GOLDPRICE, BIDS, and SHIPPING are the explanatory variables. Consider the following statements and select which ones are correct (1) although there is little explanatory power the model is basically a good model (2) the model has minimal explanatory power (3) none of the independent variables have statistically significant coefficients at standard levels of significance (4) at least 1 of the estimated coefficients has the wrong sign, (5) some combination of items (2), (3) and (4) suggest this is not a good model.

[ Select ] [“(1), (2) and (3)”, “(1) and (3)”, “(2) and (4)”, “(1) and (2)”, “(2) and (3)”, “(2), (3), (4) and (5)”] .

f) Run a regression where SALE is the dependent variable and GOLDPRICE, BIDS, SHIPPING and a set of dummy variables for the values of other are the explanatory variables. NOTE: remove the observations where other=”ME” since there is only one such observation. Because there is only one observation with “ME” it will have a residual of 0 since the “ME” will perfectly explain why it is different from all other observations. This means that your regression is run with only 46 observations and you should see the df for the F statistic being 10 and 35.

What is the R2 value for this regression?

[ Select ] [“0.6268”, “0.6801”, “0.431”, “0.5527”, “0.3496”] .

g) Using this model what is the expected value for SALE for an auction with a gold price of $1650, 5 bids, free shipping (SHIPPING=0), and other= FDIFLAG?

[ Select ] [“$1988”, “$1945”, “$1956”, “$1919”, “$1972”] .

h) Is the coefficient on FDIFLAG statistically significant at the 0.05 level?

[ Select ] [“NO”, “YES”] .

i) Examine the residual plots. Find the observation with the largest absolute residual and the observation with the largest Cook’s Distance. Identify the correct statement. (i) the observation with the largest absolute residual is an outlier in the residual space and this is due to an extremely low sale price which might relate to only receiving one bid (ii) the observation with the largest Cook’s Distance is influential and has high leverage which might be because it has an unusual grade, GRADE, for a slabbed coin (iii) the observation with the largest absolute residual is an outlier in the residual space and this is due to an extremely high sale price which might relate to the unusually high price for gold at the time of the sale (iv) the value for the largest absolute residual is not an outlier and the largest value for Cook’s Distance does not qualify as being influential.

[ Select ] [“(i)”, “(iii)”, “(ii)”, “(iv)”] .

j) Remove the observations or observation from part (i) that had the largest absolute residual and the largest Cook’s Distance. If those are the same observation then remove only one observation. If they are different then remove them both, i.e., two observations. With this smaller dataset (which also has other=”ME” removed from before) regress SALE on GOLDPRICE, BIDS, SHIPPING and a set of dummy variables for the values of other. The estimated coefficient on GOLDPRICE is

[ Select ] [“2.5083”, “1.5076”, “2.1763”, “1.763”, “2.0756”] .

k) Using the most recent model, from part (j), test the hypothesis that the coefficient on GOLDPRICE is 1. The t test statistic for this test is

[ Select ] [“1.232”, “1.733”, “0.833”, “1.497”] .

l) Examine the model results, from part (j). Based on these results, if you were auctioning off a gold coin to maximize your revenue, would you rather offer free shipping or would you rather charge $7.5 for shipping?

[ Select ] [“Offer free shipping.”, “It doesn’t appear to matter.”, “Charge $7.5 for shipping.”] .

m) Again, using the model from part (j), test whether the errors have constant variance using the test covered in the lectures. What is the p-value?

Assignment Instructions:

Dataset Selection: Select a suitable dataset for perfor

Assignment Instructions:

Dataset Selection: Select a suitable dataset for performing a cluster analysis. Explain why you have chosen this specific dataset and what you hope to discover from this analysis. Cluster Analysis: Perform a cluster analysis on your selected dataset. Document the steps you took and include the code you used for your analysis. Hierarchical and Non-Hierarchical Agglomeration Schedules: Discuss how you applied hierarchical and non-hierarchical agglomeration schedules in your cluster analysis. Explain the differences between these schedules and their impacts on the results of your analysis. Results Interpretation: Interpret the results of your cluster analysis. Discuss the insights gained from this analysis and explain how the agglomeration schedules impacted your results. Real-world Applications: Discuss how the insights from your cluster analysis could be applied in a real-world context. Explain the relevance and potential impact of these insights.

Submission Format: Your submission should be a maximum of 500-600 words (excluding Python/R code). Submit your assignment in APA format as a Word document or a PDF file. Include your written analysis and any tables or visualizations that support your findings. If you used any software for your calculations (like R, Python, Excel), please include your code or formulas as well. Include an APA-formatted reference list for any external resources used.

Assignment Instructions:

Dataset Selection: Select a suitable dataset for perfor

Assignment Instructions:

Dataset Selection: Select a suitable dataset for performing a cluster analysis. Explain why you have chosen this specific dataset and what you hope to discover from this analysis. Cluster Analysis: Perform a cluster analysis on your selected dataset. Document the steps you took and include the code you used for your analysis. Hierarchical and Non-Hierarchical Agglomeration Schedules: Discuss how you applied hierarchical and non-hierarchical agglomeration schedules in your cluster analysis. Explain the differences between these schedules and their impacts on the results of your analysis. Results Interpretation: Interpret the results of your cluster analysis. Discuss the insights gained from this analysis and explain how the agglomeration schedules impacted your results. Real-world Applications: Discuss how the insights from your cluster analysis could be applied in a real-world context. Explain the relevance and potential impact of these insights.

Submission Format: Your submission should be a maximum of 500-600 words (excluding Python/R code). Submit your assignment in APA format as a Word document or a PDF file. Include your written analysis and any tables or visualizations that support your findings. If you used any software for your calculations (like R, Python, Excel), please include your code or formulas as well. Include an APA-formatted reference list for any external resources used.