Find bigrams in the attached text. Bigrams are word pairs and their counts. To b

Find bigrams in the attached text. Bigrams are word pairs and their counts. To b

Find bigrams in the attached text. Bigrams are word pairs and their counts. To build them do the following:
Tokenize by word.
Create two almost-duplicate files of words, off by one line, using tail.
Paste them together so as to get word(i) and word(i +1) on the same line.
Count
Then, after you have the data from the procedure above: Provide the commands to find the 10 most common bigrams.
For the submission, provide all the commands that accomplishes the steps from 1. to 5.
After completing the above, go to following web page: NLTK :: nltk.lm package. First, implement the tutorial to develop an understanding of the library and its usage foo bigrams. Then, replicate all steps for the attached text.

Posted in R

Project 2: Decision making based on historical data Attached Files: I_1.jpeg (4

Project 2: Decision making based on historical data
Attached Files:
I_1.jpeg (4

Project 2: Decision making based on historical data
Attached Files:
I_1.jpeg (49.983 KB)
I_2.jpeg (48.819 KB)
dataG2.csv (149.28 KB)
This project reflects the basics of data distribution. The project topics relate to the definitions of variance and skewness.
Files needed for the project are attached.
Cover in the project the following: Explain the variance and skewnessShow a simple example of how to calculate variance and then explain the meaning of it.
Show a simple example of how to calculate skewness and then explain the meaning of it. After loading dataG2.csv into R or Octave, explain the meaning of each column or what the attributes explain. Columns are for skewness, median, mean, standard deviation, and the last price (each row describes with the numbers the distribution of the stock prices): Draw your own conclusions based on what you learned under 1. and 2.Explain the meaning of variables ‘I_1’ and ‘I_2’ after you execute (after dataG2.csv is loaded in R or Octave) imported_data <- read.csv("dataG2.csv") S=imported_data[,5]-imported_data[,3] I_1 =which.min(S) # use figure I_1 (see attached)I_2 = which.max(S) # use figure I_2 (see attached) Based on the results in a., which row (stock) would you buy and sell and why (if you believe history repeats)? Explain how would you use the skewness (first column attribute) to decide about buying or selling a stock. If you want to decide, based on the historical data, which row (stock) to buy or sell, would you base your decision on skewness attribute (1st column) or the differences between the last prices with mean (differences between 5th attribute and 3rd attribute)? Explain.

Posted in R

Data scientists conduct continual experiments. This process starts with a hypoth

Data scientists conduct continual experiments. This process starts with a hypoth

Data scientists conduct continual experiments. This process starts with a hypothesis. An experiment is designed to test the hypothesis. It is designed in such a way that it hopefully will deliver conclusive results. The data from a population is collected and analyzed, and then a conclusion is drawn. From your own experiences and reading:
Explain what are the 2 major problems with collecting the samples? Is it possible to fix the problems you mentioned? If not, explain why is that so. If it is, explain how you would do it. To participate in the discussion, respond to the discussion promptly by Thursday at 11:59PM EST. Then, read a selection of your colleagues’ postings. Finally, respond to at least two classmates by Sunday at 11:59PM EST in one or more of the following ways: I will post two classmates’s work later and you will respond to both of them

Posted in R

Data scientists conduct continual experiments. This process starts with a hypoth

Data scientists conduct continual experiments. This process starts with a hypoth

Data scientists conduct continual experiments. This process starts with a hypothesis. An experiment is designed to test the hypothesis. It is designed in such a way that it hopefully will deliver conclusive results. The data from a population is collected and analyzed, and then a conclusion is drawn. From your own experiences and reading:
Explain what are the 2 major problems with collecting the samples? Is it possible to fix the problems you mentioned? If not, explain why is that so. If it is, explain how you would do it. To participate in the discussion, respond to the discussion promptly by Thursday at 11:59PM EST. Then, read a selection of your colleagues’ postings. Finally, respond to at least two classmates by Sunday at 11:59PM EST in one or more of the following ways: I will post two classmates’s work later and you will respond to both of them

Posted in R

Data scientists conduct continual experiments. This process starts with a hypoth

Data scientists conduct continual experiments. This process starts with a hypoth

Data scientists conduct continual experiments. This process starts with a hypothesis. An experiment is designed to test the hypothesis. It is designed in such a way that it hopefully will deliver conclusive results. The data from a population is collected and analyzed, and then a conclusion is drawn. From your own experiences and reading:
Explain what are the 2 major problems with collecting the samples? Is it possible to fix the problems you mentioned? If not, explain why is that so. If it is, explain how you would do it. To participate in the discussion, respond to the discussion promptly by Thursday at 11:59PM EST. Then, read a selection of your colleagues’ postings. Finally, respond to at least two classmates by Sunday at 11:59PM EST in one or more of the following ways: I will post two classmates’s work later and you will respond to both of them

Posted in R

Continuing with the theme of hypothesis testing, this week, we turn our attentio

Continuing with the theme of hypothesis testing, this week, we turn our attentio

Continuing with the theme of hypothesis testing, this week, we turn our attention to conducting tests for one sample, two paired samples, and two independent samples. To further develop our understanding of these tests, this assignment will focus on the application of these statistical techniques. You will select a dataset, conduct the appropriate tests, and share your findings.Assignment Requirements: Dataset Selection: Choose a dataset that allows for one sample, two paired samples, and two independent sample tests. Briefly explain why you have chosen this dataset.
Hypothesis Formulation: Formulate hypotheses appropriate for one sample, two paired samples, and two independent sample tests. Describe the hypotheses for each test clearly.
Execution of Tests: Perform the tests using Python or R, and document the steps you have taken. Be sure to include your code in your initial post.
Results Interpretation: Interpret the results of your tests. What do the results tell you about your dataset and the hypotheses you formulated?
Conclusions and Applications: Summarize your findings and discuss potential real-world applications of your conclusions. Submission Format: Your submission should be a maximum of 500-600 words (excluding Python/R code). Submit your assignment in APA format as a Word document or a PDF file. Include your written analysis and any tables or visualizations that support your findings. If you used any software for your calculations (like R, Python, Excel), please include your code or formulas as well. Include an APA-formatted reference list for any external resources used.

Posted in R

Introduction: Provide a concise overview of the concepts of parametric tests, un

Introduction: Provide a concise overview of the concepts of parametric tests, un

Introduction: Provide a concise overview of the concepts of parametric tests, univariate tests for normality, and hypothesis testing. Dataset Selection: Identify and describe a dataset suitable for applying these tests. Explain your reasons for choosing it. Parametric Test Application: Conduct a parametric test on your selected dataset. Include all steps and any Python or R code you used. Univariate Test for Normality Application:Perform a univariate test for normality on your dataset. Again, include all steps and any Python or R code used. Results and Conclusion: Summarize your test results. Were your hypotheses confirmed or rejected? What conclusions can you draw about the population from your sample? Submission Format: Your submission should be a maximum of 500-600 words (excluding Python/R code). Submit your assignment in APA format as a Word document or a PDF file. Include your written analysis and any tables or visualizations that support your findings. If you used any software for your calculations (like R, Python, Excel), please include your code or formulas as well. Include an APA-formatted reference list for any external resources used.

Posted in R

Submission: executive report (2 pages) with appendix(no page limitation), slides

Submission: executive report (2 pages) with appendix(no page limitation), slides

Submission: executive report (2 pages) with appendix(no page limitation), slides (10 mins)Tentative Grading Rules:
+Nice coding
+Nice EDA analysis
+Well written executive report.
+Nice presentation
+tried Arima, regression, and smoothed methods
+tried advanced models (for example combined methods)
+ Performed model selections
+ good prediction
+good recommendation
+class discussion
– not sufficient EDA
– The prediction part is not consistent with your conclusion
– Report writing can be improved.
– Presentation can be improved.
– R coding can be improved
– Need to try advanced models
-not consider multi-seasonality.
A public transportation company is expecting increasing demand for…
A public transportation company is expecting increasing demand for its services and is planning to acquire new buses and to extend its terminals.These investments require a reliable forecast of future demand which should be based on historic demand stored in the companyĆ¢s data warehouse. For each 15-minute interval between 6:30 hours and 22 hours the number of passengers arriving at the terminal has been recorded and stored. As a forecasting consultant you have been asked to forecast the number of passengers arriving at the terminal. Available Data Part of the historic information is available in the file bicup2006.xls. The file contains the worksheet “Historic Information” with known demand for a 3-week period, separated into 15-minute intervals. The second worksheet (“Future”) contains dates and times for a future 3-day period, for which forecasts should be generated (as part of the 2006 competition) Assignment Goal Your goal is to create a model/method that produces accurate forecasts. To evaluate your accuracy, partition the given historic data into two periods: a training period (the first two weeks) and a validation period (the last week). Models should be fitted only to the training data and evaluated on the validation data. Although the competition winning criterion was the lowest Mean Absolute Error (MAE) on the future 3-day data, this is not the goal for this assignment. Instead, if we consider a more realistic business context, our goal is to create a model that generates reasonably good forecasts on any time/day of the week. Consider not only predictive metrics such as MAE, MAPE, and RMSE, but also look at actual and forecasted values, overlaid on a time plot. Assignment For your final model, present the following summary: 1. Name of the method/combination of methods 2. A brief description of the method/combination 3. All estimated equations associated with constructing forecasts from this method 4. The MAPE and MAE for the training period and the validation period 5. Forecasts for the future period (March 22-24), in 15-min intervals 6. A single chart showing the fit of the final version of the model to the entire period (including training, validation, and future). Note that this model should be fitted using the combined training + validation data Tips and Suggested Steps 1. Use exploratory analysis to identify the components of this time series. Is there a trend? Is there seasonality? If so, how many “seasons” are there? Are there any other visible patterns? Are the patterns global (the same throughout the series) or local? 2. Consider the frequency of the data from a practical and technical point of view. What are some options? 3. Compare the weekdays and weekends. How do they differ? Consider how these differences can be captured by different methods. 4. Examine the series for missing values or unusual values. Suggest solutions. 5. Based on the patterns that you found in the data, which models or methods should be considered? 6. Consider how to handle actual counts of zero within the computation of MAPE.

Posted in R

A store sells two types of toys, A and B. The store owner pays $8 and $14 for e

A store sells two types of toys, A and B. The store owner pays $8 and $14 for e

A store sells two types of toys, A and B. The store owner pays $8 and $14 for each one unit of toy A and B respectively. One unit of toys A yields a profit of $2 while a unit of toys B yields a profit of $3. The store owner estimates that no more than 2000 toys will be sold every month and he does not plan to invest more than $20,000 in inventory of these toys. How many units of each type of toys should be stocked in order to maximize his profit?
ex2 transportation problem

Posted in R