inf-428-data-analytics-online

Schedule

Module Goals Reading, Assignments and Discussion
Introduction Install KNIME and Annaconda
run example workflows/code
Module Notes
Discussion 1
So you wanna be a data scientist
KNIME Beginner’s Luck Chapter 1
Python Review
Assignment 1
Statistics Summarize data using measures of typical value
Open and explore .csv files
use measures of variation
apply significance tests (t-tests, p-values)
Module Notes
Why Standard Deviation
538 Scientists Explain P-values
Student’s t-test
Discussion 2
Assignment 2
Data Wrangling Use Pandas to open and manipulate tabular data.
Use Groupby to aggregate data
Merge and join data.
Module Notes
Discussion 3
KNIME Beginner’s Luck Chapter 5
10 minutes to Pandas
Python Data Analysis Chapter 4
Discussion 3
Assignment 3
Data Visualization Create several types of visualizations
(line plots, Bar Chart, Histogram, Scatter Plot and more
Annotate plots
Create Web Visualizations
KNIME Beginner’s luck Chapter 3
Pyplot tutorial
Discussion 4
Assignment 4
Machine Learning Use features to describe data
Classify data into categories
Use regression to predict values
Use algorithms like K-Nearest Neighbors, Decision Trees and Naive Baye’s to classify data.
Understand the difference between Sensitivity and Specificity.
KNIME Beginner’s Luck Chapter 4
Baye’s Theorem
K-Nearest Neighbors
Discussion 5
Assignmnent 5

Detailed Module Descriptions

Module 1 - Introduction

Goals

Install KNIME and Annaconda, run example work flows and programs.

Read

So you wanna be a data scientist
KNIME Beginner’s Luck Chapter 1
Python Review

Install Sofware

KNIME
Annaconda

Discuss

(5 points)
Tell us about your self (Initial post)?
How have you used datascience in your life?
What are some real life example of machine learning applications?

Assignment 1

Software Installation and running tests and workflows (15 points)

Module 2 - Statistics

Goals

Summarize data using measures of typical value, Open and explore .csv files, use measures of variation, apply significance tests (t-tests, p-values),

Read

Why Standard Deviation
538 Scientists Explain P-values
Student’s t-test

Discuss

Why do we use t-tests and p-values??

Explain what a ‘single sample t-test’ and a ‘independent samples t-test’ are, give an example application for each, and explain the differences.

What is a normal distribution? Why do we need to know and use different types of statistical distributions?

Assignment 2

Repeat the study on the effect of caffeine on muscle metabolism from here

Module 3 - Pandas and Data Wrangling with KNIME

Goals

Use Pandas to open and manipulate tabular data. Use Groupby to aggregate data. Merge and join data.

Read

KNIME Beginner’s Luck Chapter 5
10 minutes to Pandas

Discuss

When would you use the python Pandas library instead of a spread sheet??

What does the term “data wrangling mean” (you can google it). Describe times you’ve had to do “data wrangling”.

Describe situations where you’ve had to “clean” data before you used it. What did you have to do to the data before doing numerical analysis?

Assignment

Running Data. Make and plot groups from numerical columns

Module 4 - Visualization

Goals

Create several types of visualizations, including line plots, Bar Chart, Historgram, Scatter Plot, Pie Chart, Bubble Chart.
Annotate plots.
Create Web Visualizations for example MPLD3 and Bokeh

Read

KNIME Beginner’s luck Chapter 3
Pyplot tutorial

Discuss

For the visualization assignment you will be plotting data that is of interest to you. Choose of a topic that is of interest to you (for example sports, economics, politics, etc.) and start a discussion about your topic, and where to find data for your topic.

What difficulties did you encounter when looking for data?? Discuss with other students and help each other find data.

Assignment

Find data that is interesting to you.
Then create 10-20 different visualizations (1 point each)- Compile your visualizations into a web presentation.

Module 5 - Machine Learning

Goals

Use different types of features to descript data. Classify data into categories. Use regression to predict values. Use algorithms like K-Nearest Neighbors, Decision Trees and Naive Baye’s to classify data. Understand the difference between Sensitivity and Specificity.

Read

KNIME Beginner’s Luck Chapter 4
Baye’s Theorem
K-Nearest Neighbors

Discuss

How is machine learning used in real life?

Give examples of machine learning applications that use classification, give examples of machine learning applications that use regression.

If you were designing a classification system to detect whether an animal is a elephant or a mouse what features would you use?

If you were designing a classification system to detect whether an animal is a dog or a cat what features would you use?? If it more difficult to teach a computer the difference between a elephant and a mouse, compared to the difference between a dog and a cat??

Assignment

K-Nearest Neighbor Assignment
Baseball Prediction Assignment