As the hottest career field in 2016 and beyond, there are a plethora of learning resources available for a budding Data Scientist – right from MOOC’s to online video lectures and even our own immersive, practical course on learning Data Science. Below, we’ve listed our favorite books to complement any Data Science learning trajectory that you might embark upon.
1) Probability, Random Variables and Stochastic Processes by Athanasios Papoulis & S. Unnikrishna Pillai
Hands down, one of the best textbooks on probability out there, Papouli’s book has a thorough, practical approach towards the most important principles of Probability which will benefit both beginners as well as seasoned experts. It doesn’t get into purely theoretical fields such as Measure Theory and has a great set of exercises for each chapter that will provide a thorough plumbing of Statistics for any practitioner. A handbook that you will find useful throughout your career.
2) Deep Learning by Ian Goodfellow, Yoshua Bengio & Aaron Courville
As one of the most advanced techniques in machine learning, understanding this field is crucial for Data Scientists who are interested in adding one more tool to their arsenal. This is the only available textbook on deep learning that provides you with a complete walkthrough right from the foundational groundwork required in probability and linear algebra to outlining all the mathematical proofs required to understand neural networks. Deep learning is an academic field and this book is crucial for anyone interested in filling in the gaps.
3) Machine Learning with R by Brett Lantz
This is a great introductory text for machine learning, with special emphasis on practical application in the R language. It is the perfect companion for anyone looking to start their journey in building machine learning algorithms using R. It walks you through the entire process of building a model - starting from data wrangling, the feature selection process to estimating the accuracy of your model - all illustrated with great examples that make the text very approachable and lucid.
4) Python for Data Analysis – Data Wrangling with Pandas, NumPy and IPython by Wes McKinney
Unlike R, Python is a general-purpose programming language that has an amazing set of well-maintained libraries that has helped popularize the language for use in data science. Written by the creator of the library pandas (perhaps the most fundamental library required for data science in Python), this is a very comprehensive book about cleaning, managing and manipulating data in Python using the libraries NumPy and Pandas in IPython. I would suggest waiting for the release of the 2017 version as it would be an even more helpful take on all the revisions and developments that have happened to these crucial libraries in Python. A true classic.
5) Seeking Wisdom – From Darwin to Munger by Peter Bevelin
Seeking Wisdom is a book that focuses on answering questions such as how can thoughts be influenced, common human cognitive biases and provides us with tools and frameworks from a wide range of subjects to improve our thinking. Covering fields right from Biology to Evolutionary Psychology and Probability, it is one of the most comprehensive books on learning how to think better. An essential book for Data Scientists to improve their analytical abilities – the most crucial skill required for data analysis.
Prefer a more practical approach to learning? This program might do just the trick.