Tuesday, August 11, 2020

Five books every data scientist should read that are not about data science

 

Image for post

1) Incerto: This book is a collection of writings by Nassim Taleb, the most famous of which is ‘The Black Swan’ and the best, IMO, of which is ‘Antifragile.’ Taleb is the greatest modern thinker on risk, uncertainty and the problems with quantitative modeling. He is also a Twitter troll known for calling out people who are ‘intellectual yet idiots’ IYI. By background, he is an immigrant derivative trader turned mathematical philosopher. You will either love him or hate him because he will consistently challenge your assumptions in all of his writing. If he writes anything, you should put it on your reading list immediately.

Image for post
Image for post

2) Fortune’s Formula: The story of the birth of a formula (The Kelly Criterion) during MIT’s early days that claims to be behind an enormous amount of financial success. You will learn about the father of information theory (Claude Shannon) and the beginnings of the card counting shenanigans that later become famous in Ed Thorpe’s ‘Beat the Dealer.’ Thorpe is now considered the godfather of quantitative hedge funds. Most importantly this book shows how a good model cannot be ignored forever but bad ones can burn you. The story is also one of the first times in history where computer science and mathematics team up to solve a real-world problem (it just happens to be for gambling). This story is a foreshadowing of the data science industry 60 years before its creation.


Image for post

3) Chaos - Making a New Science: The detailed history of the youngest of sciences. Both a history of chaos and an accessible review of the topic. This book will give the reader an understanding of the limitations of our ability to model the real world. Many of the deep learning models being developed and deployed today cannot be genuinely understood due to the nature of non-linear processes. This book will help you comprehend these limitations. Also, a comprehensive review of the life and work of Benoit Mandelbrot alone make this a must read for any data scientist. James Gleick is a fantastic author and has many other excellent books you can add to your reading list.


Image for post

4) Dark Pools: The story of a programmer that changed stock market trading forever. Today prediction models are deployed in the world of high-frequency trading where decisions are made at nanosecond speeds. This book walks through the creation of this hidden but powerful ecosystem. The fantastic thing about this story is that it illuminates how a great many problems can be solved when you know some code. It also demonstrates that creating real value is doing something truly innovative and not relying on existing assumptions. Sometimes you have to be a little crazy to solve a hard problem.


Image for post
Image for post

5) The Theory That Would Not Die: The history of Bayes formula and Bayesian statistics as well as its competing rival, the frequentist. Both a history of statistics and a plain language review of critical technical topics make this book vital. You will learn about some of the greatest minds in history like Pierre Laplace and R.A. Fischer along with how their philosophies shaped the world’s approach to data for centuries.


These five books, while not exhaustive, will help to build a philosophical foundation for a data scientist working on real-world problems. Do not make the same mistakes the quants did a decade ago. Seek to understand techniques and models philosophically, not just mechanically, and our profession will become invaluable

No comments:

Post a Comment