About this guide
This was originally written by Andrew Andrade and Professor Lukasz Golab as supplemental notes for MSCI 723, a graduate course about data science, data mining and big data. Since the students come from many different backgrounds (some non-technical) we strived to provide many optional background readings (both for beginners and advanced practitioners) which we think will be useful to becoming a better data scientist.
The goal is now to provide a comprehensive guide to data science starting with an approach for small to medium data, and later outlining more advanced topics such as dealing with big data, deep learning and many more (buzzwords). This is more manageable to learn the techniques with smaller datasets with simple features and which fit in memory first, and then move onto analysis of very large datasets which might require a distributed system.
Why We created this guide
Ever since Harvard Business Review claimed that being a Data Scientist is the Sexiest Job of the 21st Century there have been countless guides, blogs, courses, sites and tutorials on different aspects of data science, machine learning, statistics and many many more related topics. Different than other guides, this site strives to act as (somewhat comprehensive) learning plan or a road map with practical advice for getting started on a journey into the world of data science.
The notes provide a high level overview of topics and the tutorials provide hands on examples (and code)
While we attempt to be comprehensive, this guide is still a work in progress and we look forward to edits, improvements and suggestions. Please feel free to send us a pull request or send edits or suggestions to Andrew