Module 0: Introduction & Outline

About

What is Data Science?

Module 1: Required Background

Math: Stats, Calculus, Linear Algebra

Programming: Basics, Data Structures, Algorithms

Databases: Relational Algebra, SQL

Important Concepts: Regular expressions, Information Entropy, Distance measurements, OLAP, ETL, BI VS BA and CAP

Tools Introduction: WEKA, Python, R

Tutorials:

Beginners guide to using servers for data science

Installing Python Data Science Stack

R tutorial

SQL Tutorial

Big Data Tools Introduction: Hadoop, Hive, Pig, and many more!

Note: there is a big data section below, this is just an introduction

Module 2: Data Science Framework

Ask > Acquire > Assimilate > Analyze > Answer > Act

What questions to ask

What is Data Mining and Analysis Overview

Feature Engineering

Module 3: Aquire Data

Downloading Data, Scraping Data, Logging Data, Streaming

Module 4: Assimilate Data

Processing Data: Extract/Transform/Load, Data Cleaning, Outlier Detection, Filtering, Iputation, Dimensionality Reduction, Normalization and Transformation

Aggrigation: Exploratory data analysis

Module 5: Analyze Data

Analyse Framework: Describe, Discover, Predict, Advise

Module 6: Describe

Exploratory Data Analysis

Clustering: K means clustering, x means clustering, topic modeling

Module 7: Discover

Clustering

Association Rule Mining

Hypothesis Testing

Module 8: Predict

Model Evaluation: Evaluation metrics

Model Selection: Cross validation

Learning Curves: Bias vs Variance Trade-off

Ensembling: Combining Models

Boosting: Creating Data

Module 9: Regression

Tutorials: Simple Linear, generalized, non-linear, multi regression (coming soon!)

Module 10: Classification

Naive Bayes: Bayes theorem, naive bayes classifier

Decision Trees: Entropy based Decision Tree, C4.5, boosted trees, ensembled trees, Random forests

Rule Based Learning: One R, Prism, Trees and Rules

Logistic Regression

Support Vector Machines

K Nearest Neighbor

Hidden Markov Model

Bayesian Network

Artificial Neural Networks

Module 11: Deep learning

natural language processing: text mining, topic modeling, sentiment analysis

computer vision

bioinformatics

Module 12: Recommendation

Content based Filtering, Collaborative Filtering

Module 13: Advise (Coming soon!)

Simulation

Optimization

Project Economics

Module 14: Big Data

Big Data Tools

Map Reduce

Hadoop

Graph Mining

outline - Andrew Andrade