What background knowledge would be helpful to know as a datascientist? Know of any good (and free) resources? The following list is not complete, so please consider contributing via a pull request or sending me an email with suggestions or edits.
In the previous lesson on What is Data Science? we learnt that it is multifacted and requiers strong scientific analysis to create meaningful results. For this reason, to be a good data scientist, one requires a strong background in math: statistics and computer science as well as linear algebra and multivariable caluclus. Why? Not understanding the fundamentals will lead to incorrect analysis and result is poor decision making.
Most people give more emphasis on statistics (or computer science) over linear algebra and calculus since many of the machine learning algorithms are already implemented (so why learn the math?). While it is true that it is recommended to use already implemented algorithms, it is important to have a fundamental undestranding of how the algorithms are implemented. This way you understand the assumptions in using the algorithm both quickly and correctly.
The required knowledge is broken down into the following sections:
Calculus, Staticstics, Linear Algebra
Basics, Data Structures, Algorithms
Relational Algebra, SQL
Regular expressions, Entropy, Distance measurements, OLAP, ETL, BI VS BA and CAP
WEKA, Python, R tutorial, SQL Tutorial
Hadoop, Hive, Pig, and many more!