Module | Goals | Reading, Assignments and Discussion |
---|---|---|
Introduction | Install KNIME and Annaconda run example workflows/code |
Module Notes Discussion 1 So you wanna be a data scientist KNIME Beginner’s Luck Chapter 1 Python Review Assignment 1 |
Statistics | Summarize data using measures of typical value Open and explore .csv files use measures of variation apply significance tests (t-tests, p-values) |
Module Notes Why Standard Deviation 538 Scientists Explain P-values Student’s t-test Discussion 2 Assignment 2 |
Data Wrangling | Use Pandas to open and manipulate tabular data. Use Groupby to aggregate data Merge and join data. |
Module Notes Discussion 3 KNIME Beginner’s Luck Chapter 5 10 minutes to Pandas Python Data Analysis Chapter 4 Discussion 3 Assignment 3 |
Data Visualization | Create several types of visualizations (line plots, Bar Chart, Histogram, Scatter Plot and more Annotate plots Create Web Visualizations |
KNIME Beginner’s luck Chapter 3 Pyplot tutorial Discussion 4 Assignment 4 |
Machine Learning | Use features to describe data Classify data into categories Use regression to predict values Use algorithms like K-Nearest Neighbors, Decision Trees and Naive Baye’s to classify data. Understand the difference between Sensitivity and Specificity. |
KNIME Beginner’s Luck Chapter 4 Baye’s Theorem K-Nearest Neighbors Discussion 5 Assignmnent 5 |
Install KNIME and Annaconda, run example work flows and programs.
So you wanna be a data scientist
KNIME Beginner’s Luck Chapter 1
Python Review
(5 points)
Tell us about your self (Initial post)?
How have you used datascience in your life?
What are some real life example of machine learning applications?
Software Installation and running tests and workflows (15 points)
Summarize data using measures of typical value, Open and explore .csv files, use measures of variation, apply significance tests (t-tests, p-values),
Why Standard Deviation
538 Scientists Explain P-values
Student’s t-test
Why do we use t-tests and p-values??
Explain what a ‘single sample t-test’ and a ‘independent samples t-test’ are, give an example application for each, and explain the differences.
What is a normal distribution? Why do we need to know and use different types of statistical distributions?
Repeat the study on the effect of caffeine on muscle metabolism from here
Use Pandas to open and manipulate tabular data. Use Groupby to aggregate data. Merge and join data.
KNIME Beginner’s Luck Chapter 5
10 minutes to Pandas
When would you use the python Pandas library instead of a spread sheet??
What does the term “data wrangling mean” (you can google it). Describe times you’ve had to do “data wrangling”.
Describe situations where you’ve had to “clean” data before you used it. What did you have to do to the data before doing numerical analysis?
Running Data. Make and plot groups from numerical columns
Create several types of visualizations, including line plots, Bar Chart, Historgram, Scatter Plot, Pie Chart, Bubble Chart.
Annotate plots.
Create Web Visualizations for example MPLD3 and Bokeh
KNIME Beginner’s luck Chapter 3
Pyplot tutorial
For the visualization assignment you will be plotting data that is of interest to you. Choose of a topic that is of interest to you (for example sports, economics, politics, etc.) and start a discussion about your topic, and where to find data for your topic.
What difficulties did you encounter when looking for data?? Discuss with other students and help each other find data.
Find data that is interesting to you.
Then create 10-20 different visualizations (1 point each)- Compile your visualizations into a web presentation.
Use different types of features to descript data. Classify data into categories. Use regression to predict values. Use algorithms like K-Nearest Neighbors, Decision Trees and Naive Baye’s to classify data. Understand the difference between Sensitivity and Specificity.
KNIME Beginner’s Luck Chapter 4
Baye’s Theorem
K-Nearest Neighbors
How is machine learning used in real life?
Give examples of machine learning applications that use classification, give examples of machine learning applications that use regression.
If you were designing a classification system to detect whether an animal is a elephant or a mouse what features would you use?
If you were designing a classification system to detect whether an animal is a dog or a cat what features would you use?? If it more difficult to teach a computer the difference between a elephant and a mouse, compared to the difference between a dog and a cat??
K-Nearest Neighbor Assignment
Baseball Prediction Assignment