Must Read Books to learn Data Science in 2024
These book recommendations are not manuals like a lot of the stuff you find on Medium. They won't tell you 'how I made this sexy ML model in Python' with no context. They require a small investment of time, but the pay off is that they help to embed the thought process of a data scientist. That doesn't mean they don't have practical examples and aren't relevant to industry. But you should set your expectations that the point of reading these books is to embed the intuition and thought process of a DS.
If you aren't a DS yet or don't feel comfortable with maths/statistics this note is for you. Do you ever get frustrated reading the same paragraph or section from a book/paper and still have no idea whats going on? Do you often find yourself spending ages trying to understand some algebraic equations and after a few hours have passed you have multiple tabs open and feel like crap?
If so this next paragraph is for you
STOP READING BOOKS LIKE THAT! Now I'm being slightly facetious but my point is true. I've wasted a lot of time reading the wrong way. I'm not going to tell you how to suck eggs. If you have a system that works for you go for it. But my point here is that if you find what I said relatable stop trying to understand every little detail of whats happening. Focus on the examples in these books and take a second to step back and understand what the author is trying to do. To help with this we've highlighted the relevant chapters and why they are useful and in the coming weeks we'll flesh out each book in more detail to help you build intuition.
It is ok to read a chapter and not understand everything. That's normal!
Learning From Data
This book is about the foundations of Data Science. It explains: what is data science really about; how to approach modelling a problem; and fundamentals of machine learning theory (Generalisation error, bias variance trade off)
The key chapters to read are:
- Chapter 1: The Learning Problem
- Chapter 2: Training versus Testing
- Chapter 3: Overftting
Where to get it:
Practical Statistics for Data Scientists
A hands on approach to learning foundational statistics for Data Scientists
This book really walks you through the basics of stats. From basic descriptive statistics, experimentation and even some of the basic algorithms you will be using
The key chapters to read are:
- Chapter 1: Exploratory Data Analysis
- Chapter 2: Data and Sampling Distributions
- Chapter 3: Statistical Experiments and Significance Testing << This one is really good
- Chapter 4: Regression and Prediction
- Chapter 5: Classification
Where to get it:
Hands on Machine Learning with Scikit-Learn
Practical examples for applying machine learning models
A GREAT resource for the practical implementation of ml models, lots of good examples and code snippets for you to steal (… use). If you really covered the basics in the previous books then you can skip/skim the basics here
The key chapters to read are:
- Chapter 2: End-to-End Machine Learning Project
- Chapter 4: Training Models
- Chapter 6: Decision Trees
- Chapter 7: Ensemble Learning and Random Forests
- Chapter 8: Dimensionality Reduction
Where to get it:
The Changing world order
The book is about how the world’s reserve currency and subsequently global positioning changes over time, what features effect it the most and how.
An unconventional recommendation here but Ray Dalio walks you through his analysis on how the world powers/reserve currency status changes using a very data driven approach. He uses the scientific method in a way I wish most data scientists would approach their work problems. Less certainty and more inquiry. Ask questions (aka formulate hypothesis), collect data, synthesise results, present to experts (in your case stakeholders) collect feedback and iterate.
The key chapters to read are:
- Introduction
Where to get it:
Mathematics of Machine Learning
Learn the maths behind machine learning, Linear Algebra
Machine Learning is all about vectors, matrices and probability. Understand how these fundamental topics work so that you can master any machine learning algorithm. Lots of good exercises and it really does walk you through the maths! Key Chapters:
- Chapter 1: Introduction and Motivation
- Chapter 2: Linear Algebra
- Chapter 3: Analytic Geometry
- Chapter 4: Matrix Decomposition
- Chapter 5: Vector Calculus
- Chapter 7: Continuos Optimization
Where to get it:
Probabilistic Machine Learning Kevin P murphy 2022
The machine learning bible This is the machine learning bible. I don’t think you should read this cover to cover. This is more of a reference book. Use as needed. Honestly, EVERYTHING is here. If you have a solid maths & programming background and you just want to learn the theory of ML then go for this book. It walks you through the maths, intuitions etc. I really can’t emphasise this enough. It contains basically everything about modern machine learning until 2024. Not for the faint of heart. There is a part 2 containing advanced topics. Good luck with that one if you’re interested lol.
Key Chapters: Anything you choose to read
Where to get it:
Practical Guide to Applied Conformal Prediction in Python
Estimating Uncertainty in your models
Conformal Prediction is a relatively new, model agnostic framework that you can use to quantify the uncertainty in your models. The author is very vocal on twitter about its benefits and a lot of big companies, universities, researchers and personally myself have started to use it in practice. Learn and apply it in order to create valid prediction intervals for your stakeholders.
Key Chapters:
- Chapter 2: Overview of Conformal Prediction
- Chapter 3: Fundementals of Conformal Prediction
- Chapter 4: Validity and Efficiency of Conformal Prediction
- Chapter 5: Types of Conformal Predictors
Where to get it:
Causal Inference in Python
Figuring out why something happens, asking what if questions
Correlation isn’t causation, but what happens when it is? This book goes into the science behind causality and how you can use it. This book also has one of the best foundational statistics explanations I've ever come across and really teaches you to be a whizz with regression. I really enjoyed reading it. Key Chapters:
- Just read it at your leisure
Where to get it:
- O'reily
- A more mathy version here
- USA
- UK
- CA
Interpretable Machine Learning
Figure out how your model behaves and estimate what effect your features have on your target
Getting a prediction from a model is easy but figuring out why that model made that prediction is a lot harder. This book goes through the modern approaches to estimating that! This is more of a text that gives you a birds eye perspective. Once you get the gist go to the model packages to actually understand how to use these techniques along with their trade offs Key Chapters:
- Chapter 3: Interprebility
- Chapter 5: Interpretable Models
- Chapter 6: Model Agnostic Methods
- Chapter 8: Global Model Agnostic Methods
- Chapter 9: Local Model Agnostic Methods
Where to get it: