In this assignment, you will walk through the basics of feature engineering for
In this assignment, you will walk through the basics of feature engineering for two types of data–continuous, numerical data and categorical data (some of this material should already be familiar to you). The notebook discusses approaches to these problems in parts 4 and 6 (ignore the other parts for now) with concrete code examples using the titanic data set. Your task is to follow along with the explanations and code examples by running the code yourself–either in a cloud or local environment (e.g. Google Colab, Kaggle, local Anaconda, etc.).
Here’s the notebook and dataset:
feature-engineering_encoding_transformation.ipynbDownload feature-engineering_encoding_transformation.ipynbNote that you’ll need to run the first THREE (3) code cells in the notebook before skipping to parts 4 and 6. That’s to import everything needed and load the dataset.
titanic_train.csvDownload titanic_train.csv
You will submit a short explanation in the text submission area for the following:
What is the difference between continuous (numeric) and categorical data? (min 2 sentences)
Explain at least 3 approaches to handling categorical data. (min 2 sentences per approach, so 6 min)
For continuous data, explain discretization. (min 3 sentences)
Explain the concepts of standardization and normalization (both are means of ‘scaling’ features). Give one example method/algorithm for each.See https://www.kdnuggets.com/2020/04/data-transformation-standardization-normalization.htmlLinks to an external site. for more detail and examples.
Submission in a single Word document (doc/docx):
Responses for the four (4) items above as numbered explanations
Minimum of four (4) screenshots showing that you tried the examples (make sure it includes output from the code cells; paste the images into the document)