The World’s Top 7 Data Scientists before there was Data Science

I am often a bit late to the party and only recently saw Tim O’Reilly’s “The Worlds’ 7 most powerful Data Scientists”. As data science has become a big deal, there have been a several top data science lists that have been floating around.

So for fun, I thought I would put together my own list of the top data scientists before there was data science.  The people listed here helped unearth key principles on how to extract information from data.  While obviously important, I didn’t want to include folks whose contribution was mostly on the development of some particular approach, method, or technology.  

To a large degree, the people on this list helped lay the foundation for a lot of what currently goes on as data science.  By studying what these guys* worked on, I think you can deepen the foundation of upon which your data science skills rest.  As a disclaimer, there are obviously way more than seven who made major contributions, but I wanted to riff on Tim’s piece, so seven it is.

So without further ado, on to the list:

1 Claude Shannon 
I can’t imagine anyone arguing with putting C. Shannon on the list. Claude is often referred to as the father of information theory– which from my vantage point is Data Science, considering that information theory underpins almost all ML algorithms.  Claude Shannon came up with his groundbreaking work while at Bell labs (as an aside, this is also where Vapnik and Guyon worked when they came out with their ’92 paper on using the Kernel trick for SVMs – although interestingly, they didn’t use the term support vector machine. )
For a quick overview of Claude Shannon take a look here
And for his 1948 paper A Mathematical Theory of Communication go here

2. John Tukey
Tukey is hero to all of the data explorers in the field, the folks who are looking for the relationships and stories that might be found in the data. He literally wrote the book on Exploratory Data Analysis . I guess you can see his work as the jumping off point for the Big Data gang. Oh yeah, he also came up with a little something called the Fast Fourier Transformation (FFT).

3 Andrey Kolmogorov
A real Andrey the Giant, maybe not in the order of an Euler, but this guy had breadth for sure. He gets on the list for coming up with Algorithmic Complexity theory. What’s that? It’s just the use of Shannon’s information theory to describe the complexity of algorithms in computer science. For a CS layman’s read (me), I recommend Gregory Chaitin’s book, Meta Math.  For what its worth, I’d argue that a life well lived, is one that maximizes its Kolmogorov complexity.

4) Andrey Markov
Our second Andrey on the list, I had to give Markov the nod since we make heavy use of him here at Conductrics. Sequences of events (language, clicks, purchases, etc.) can be modeled as stochastic processes.  Markov described a class of stochastic process that is super useful for simply, but effectively modeling things like language, or attribution.  There are many companies and experts out there going on about attribution analysis, or braying about their simplistic AB testing tools, but if they aren’t at least thinking Markov, they probably don’t really know how to solve these problems.  The reality is, if you want to solve decision problems algorithmically, by optimizing over sequences of events, then you are likely going to invoke the Markov property (conditional independence) via Markov Chains or Decision Processes (MDP). See our post on Data Science for more on this.

5 Thomas Bayes
I think it is fair to say that Data Science tends to favor, or is at least open to, Bayesian methods.  While modern Bayesian statistics is much richer than a mere application of Bayes’ theorem, we can attribute at least some of its development back to Bayes.  To get a hang of Bayes’ theorem, I suggest playing around with the chain rule of probability to derive it yourself.
For having a major branch of statistics named after him and for being a fellow alum of the University of Edinburgh, Bayes is on the list. By the way, if you want to learn more about assumptions and interpretations of Bayesian methods check out our Data Science post for Michael Jordan’s lectures.

6 Solomon Kullback and Richard Leibler
Maybe not as big as some of the other folks on the list, so they have to share a place, but come on, the Kullback-Leiber Divergence (KL-D)?! That has got to be worth a place here. Mentioned in our post on Data Science resources, the KL-D is basically a measure of information gain (or loss). This turns out to be an important measure in almost every single machine learning algorithm you are bound to wind up using. Seriously, take a peek at the derivation of your favorite algorithms and you are likely to see the KL-D in there.

7 Edward Tufte
I used to work at an advertising agency back in the ‘90s, and while normally the ‘creatives’ would ignore us data folks (this was back before data was cool), one could often get a conversation going with some of the more forward thinking by name checking Tufte.  I even went to one of Tufte’s workshops during that time, where he was promoting his second book, Envisioning Information. There was a guest magician that did a little magic show as part of the presentation.  A minor irritation is the guru/follower vibe you can get from some people when they talk about him.  Anyway, don’t let that put you off since Tufte spends quality ink to inform you how to optimize the information contained in your ink.

As I mentioned at the beginning, this list is incomplete. I think a strong argument for Alan Turing , Ada Lovelace, Ronald Fisher can be made.  I debated putting Gauss in here, but for some reason, he seems just too big to be labeled a data scientist. Please suggest your favorite data scientist before there was data science in the comments below. 

*yeah, its all men – please call out the women that I have missed.


  1. Posted July 31, 2013 at 6:12 pm | Permalink

    definitely tukey. but where’s breiman?

    • Matt Gershoff
      Posted July 31, 2013 at 6:32 pm | Permalink

      Thanks Chris. Yeah, I can see that bagging is an important general contribution. I actually modified XCS style classifiers to use a version of bagging, but rather than averaging, I used a majority vote, and used the distribution of votes as measure of confidence. In hindsight, I guess I should have used KL-D with uniform as the base distribution for that. Also, thanks for providing part of the impetus for this post. I had wrongly argued that Shannon was really the only Data Scientist of note, but after you suggested Tukey, I thought a bit more about it and put this list together.

  2. Posted August 5, 2013 at 4:00 pm | Permalink

    I enjoyed your article. Nice history lesson in a capsule form.

    You asked for some suggestions on female pioneers in statistics to add to the list. Here’s a few that I thought of:

    Ada Lovelace, as you mentioned, certainly. And along the same lines, Alistair Croll suggested on Twitter, Adm. Grace Hopper would be a good addition. Each pioneered computer programming and machine data analysis. If you use a computer to analyze data, you’re benefiting from their work.

    I would, without a doubt, add Florence Nightingale. She’s famous for health reformation, but she did it through data analysis, and using visualization of that data to win over those with the authority to make the reforms. If you’ve ever used a graphic visualization of data, you’re benefiting from her work.

    Named for Florence Nightingale, F. N. David made some wonderful contributions. If you ever use statistical model simulation or correlation coefficient tables, you’re benefiting from her work. And if you are a woman in a statistical field, you’re benefiting from some of the battles she fought.

    Gertrude Cox is another important contributor to the field. If you follow the principals of good experimental design, you’re benefiting from her work.

    There are more, of course, but those are the first ones who come to mind. I’m also thinking historically. There are a lot of folks out there now, making data science history, and women like Hilary Mason, Melinda Thielbar and Meta Brown are leading the way.


    • Matt Gershoff
      Posted August 5, 2013 at 4:15 pm | Permalink

      @Page – Thanks for taking the time to leave this great list of names for everyone to research a bit. I had not even considered Florence Nightingale – interesting! Thanks again for your input! – Matt

      • Posted August 6, 2013 at 5:58 pm | Permalink

        Dana Angluin B.A., Ph.D. University of California at Berkeley, 1969, 1976
        Joined Yale Faculty 1979 – Great lady!

  3. Posted August 7, 2013 at 7:23 pm | Permalink

    Gertrude Cox and Florence Nightingale. How about Dr. R. A. Fisher’s mathematical genius? Big Data healthcare Fraud, a $800 Billion industry, needs all of them!

  4. Posted August 7, 2013 at 8:21 pm | Permalink

    What about Vladimir Vapnik?

    He has been studying the learning problem from data for 52 years and has shown the weakness in parametric statistical assumptions for most data problems.

    • Matt Gershoff
      Posted August 7, 2013 at 8:30 pm | Permalink

      Not unreasonable. I did give him a shout out for his work on max margin classifiers, but I guess for his work on statistical learning theory, that would be a valid argument for placement. I know this is lazy on my part, but here is the wiki for SLT

  5. Alfred
    Posted December 18, 2013 at 6:20 pm | Permalink
    • Matt Gershoff
      Posted December 18, 2013 at 6:24 pm | Permalink

      I like that! Someone from manufacturing / statistical quality control, that is a good one.

Post a Comment

Your email is never published nor shared. Required fields are marked *