Using Jupyter Notebooks, answer each of the following questions. This is an indi
Using Jupyter Notebooks, answer each of the following questions. This is an individual lab. You will do your work in Jupyter notebook and must show the code you used to get the answer if it is a code-based question. Please submit both the Jupyter Notebook report (as an HTML file).
Your report submission should be formatted using Markdown and code in Jupyter notebook as follows: The name of the assignment as an H1, your name as an H2, and the date as an H3 all in the first cell block. Add a horizontal line/rule after the cell. The questions should then be answered in individual cells indicated by Q1, Q2, Q3…etc. in H4 format and separated from other questions by horizontal lines/rulers.
You are designing a database for a startup company and you need to begin by providing an overview of your data and then creating a few tables and performing some initial queries. You will use Jupyter notebook to answer the questions below, create the database, tables, and records, and write your queries.
The incoming data will be loaded in everyday at 8:00 PM EST. What kind of data processing is this considered?
The data will be in tabular format with rows and columns. What kind of database should the company use to store the data? What are some common database systems that could be used to store the data? What programming language will be used to interact with the data and database?
Currently, the incoming data only consists of tabular, related data. However, in the future, the management team is considering collecting data in JSON format. What kind of database should be used to store this type of data? What are the benefits of storing data in this way? What are the drawbacks?
As you build your relational database, what are three important components to consider? Describe each of these three components and why they are important.
Using SQL code in Jupyter notebook, create a SQL database using SQLite in Jupyter notebook and create two tables. The first table should be called “orders” and have the following columns/fields: order ID, item name, price, and customer ID. The second table should be called “customers” and have the following columns/fields: customer ID, signup_year, country, and age.Designate which columns will be primary keys and whether there is a foreign key in one of the tables that links to a primary key in another table. Add SQL constraints to ensure that there are no null values in the signup_year column.
Provide a written description of the SQL code you wrote above in Q6 in a way that a non-technical person with no knowledge of SQL could read and understand. Explain why the columns you chose are primary and foreign keys and the importance of keys in relational databases to a colleague who has no knowledge of databases or SQL.
Insert the following data into the “orders” table100 Blender 45.66 1001
101 Blender 43.23 1098
102 Treadmill 765.76 1009
103 TV 954.33 1045
104 TV 476.60 1003
105 Lamp 26.75 1008
106 Laptop 1300.34 1098
107 TV 675.55 1011
108 Treadmill 886.87 1003
109 Treadmill 1987.54 1055
Insert the following data into the “customers” table1000 2004 USA 63
1001 2008 FR 44
1002 2010 UK 76
1003 2018 FR 28
1004 2001 FR 71
1005 2015 USA 25
1006 2017 USA 58
1007 2009 USA 51
1008 2020 UK 48
1009 2017 USA 42
Write a query to pull all records and columns from the orders table.
Write a query to pull all records and columns from the orders table but limit the output to the first four records.
Write a query to select the sign up year, country, and age columns and only records of customers located in the UK from the customers table.
What is the mean age of customers in the USA? Include the SQL code you used to get the answer.
Using the orders table, determine how many treadmills have been sold. Use an alias to rename the name of the result column to be “treadmills_sold.” Include the SQL code you used to get the answer.
How old is the youngest customer in the USA?
What is the mean age of customers in each country? Include the SQL code you used to get the answer.
On average, what is the most expensive item purchased? Write a query to show the mean sale price of each item in the orders table and sort the results so the most expensive item is first and the least expensive item is last.
Alter the code you used for #17 to filter the results and keep only records that exceed 1000.00. Include the SQL code you used to get the answer.
Write a query to return the item names and price from the orders table that start with the “T” or cost less than 100.00
What is the difference between WHERE and HAVING? Explain when to use one or the other.
We need to know more about the customers who purchased the items in the orders table. Write a join to join the orders table to the customers table. The results should show only customers who have made a purchase that is listed in the orders table and are listed in the customers table and the item_name they purchased. (Note that not all customer ids are listed in the customer table.)
Alter the code you wrote in #21 to group the results by customer id from the customers table and count the number of orders for each customer made. Alias the result column as “num_orders” and sort the results so the customer with the most orders is the first row in the result. Which customer id has the most orders?
What kind of join would we use if we wanted to keep all records in the orders table (left table) and join any matches between the orders table and the customers table (right table)?
What is the average age of customers in the customers table? Include the SQL code you used to get the answer.
Write a subquery (i.e. nested query) to filter the customers table to keep only records that are less than the average age for all customers in the table
Return the country name and mean age for any countries’ customers with a mean age greater than 60.
Please submit both the Jupyter Notebook report (as an HTML file).