Module 0: Introduction & Outline
About
What is Data Science?
Module 1: Required Background
Math: Stats, Calculus, Linear Algebra
Programming: Basics, Data Structures, Algorithms
Databases: Relational Algebra, SQL
Important Concepts: Regular expressions, Information Entropy, Distance measurements, OLAP, ETL, BI VS BA and CAP
Tools Introduction: WEKA, Python, R
Tutorials:
Beginners guide to using servers for data science
Installing Python Data Science Stack
R tutorial
SQL Tutorial
Big Data Tools Introduction: Hadoop, Hive, Pig, and many more!
Note: there is a big data section below, this is just an introduction
Module 2: Data Science Framework
Ask > Acquire > Assimilate > Analyze > Answer > Act
What questions to ask
What is Data Mining and Analysis Overview
Feature Engineering
Module 3: Aquire Data
Downloading Data, Scraping Data, Logging Data, Streaming
Module 4: Assimilate Data
Processing Data: Extract/Transform/Load, Data Cleaning, Outlier Detection, Filtering, Iputation, Dimensionality Reduction, Normalization and Transformation
Aggrigation: Exploratory data analysis
Module 5: Analyze Data
Analyse Framework: Describe, Discover, Predict, Advise
Module 6: Describe
Exploratory Data Analysis
Clustering: K means clustering, x means clustering, topic modeling
Module 7: Discover
Clustering
Association Rule Mining
Hypothesis Testing
Module 8: Predict
Model Evaluation: Evaluation metrics
Model Selection: Cross validation
Learning Curves: Bias vs Variance Trade-off
Parameter Tuning: Grid search
Ensembling: Combining Models
Boosting: Creating Data
Module 9: Regression
Tutorials: Simple Linear, generalized, non-linear, multi regression (coming soon!)
Module 10: Classification
Naive Bayes: Bayes theorem, naive bayes classifier
Decision Trees: Entropy based Decision Tree, C4.5, boosted trees, ensembled trees, Random forests
Rule Based Learning: One R, Prism, Trees and Rules
Logistic Regression
Support Vector Machines
K Nearest Neighbor
Hidden Markov Model
Bayesian Network
Artificial Neural Networks
Module 11: Deep learning
natural language processing: text mining, topic modeling, sentiment analysis
bioinformatics