This video is free to watch - hope you enjoy it! If you like what you see, there's a ton more where that came from...

Buy Now

Entropy and Quantifying Information

Now that we know how to use binary to create switches and digitally represent information we need to ask the obvious question: 'is this worthwhile'? Are we improving things and if so, how much?

This is premium content - you can own it in just 30 seconds!

Buy Now

Back in the early 1900s electrical engineers were focused on a problem: how can we send more information, faster. This was, of course, of major interest to the military as it allowed for faster response and better coordination.

Speed and Communication in the 1900s

If you wanted to send information across long distances you did it using a telegraph and Morse code. A message was encoded using a series of beeps and, on the receiving end, the receiver would listen to those beeps and decode them. If you had two very good telegraphers you could trasmit 10 - 15 words a minute.

Not very fast, especially when 5 of those words not essential to the information in the message itself:


The Message

Sending information from one point to another involves a few terms we need to get to know:

  • A sender sends the information over a
  • Channel to a
  • Receiver using a
  • Transmitter
  • The information is contained in a message

In the message above, the receiver is informed that their barn burned down, which is something the receiver didn’t know prior to receiving the message. Here’s another message that might follow:


These two messages contain roughly the same number of characters but one message is a bit more impactful, or surprising than the other. Learning that your barned burned down has much more impact (and therefore more information) than a mundane weather report. But how much more?

All Possible Messages

If you lived in the early 1900s and stood next to your telegraph, staring at it, you would receive a number of messages:

  • Nothing. Believe it or not, something not happening is information too! There are nearly infinite possibilities of a particular thing happening in the future, some more probable than others (which is important to us - we’ll come back to that). When nothing happens, that’s not surprising at all and therefore contains very little information.
  • A message comes in with the weather report. “Yes yes” you think, “I can see outside and I can tell it’ll be a cloudy day”. That message does contain information, but not very much. Of all the possible messages that could come in, that was the most predictable.
  • Later that day you get another message, explaining that it wasn’t your barn that burned down, it was your neighbors! This has a TON of information! You didn’t expect this at all (though maybe you hoped for it because you hate your neighbor).

Messages don’t just happen, they are “pulled from the chaos” of all possible things that could happen. This is the main definition of a message that we’ll be working on from here on out:

  • It marks the intersection of probability to causality. Future to past. The one, finite thing that happened out of all possible things.
  • It defines how much you learned after receiving the message.

It turns out that this is the key to quantifying information.


Entropy is the “degree of randomness” of a system. If we’re talking about a binary system (1s and 0s) there is very little entropy: something can be a 1 or a 0. If we’re talking about a message in Morse code there’s a bit more entropy because we have 26 uppercase alpha-numeric characters, 10 digits and a space.

If the message is in the English language, there are 240,000 possible words. We can pare that down a bit to “common words” which brings it to more like 20 or 30,000. Either way - the degree of randomness tells us a lot.

The idea is a simple one: the greater the entropy, the greater the information a message contains. This only works, however, if we separate the meaning of a message from the information it contains. Yes, it’s sad if your barn burns down! It’s the chance of you reading those words, however, that define the information.

Fire, cows saved, pigs lost.

Those words, in that order, have a very low probability and thus a high amount of information. Ralph Hartley, the first engineer to try to calculate the speed of information, gave us the very first equation to express this idea:

H(M) = log |M|

The entropy, H, of a message M is equal to the log of all possible messages from that system |M|. In the case of our barn burning down, “all possible messages” would be the 240,000 words in the English language.

To calculate the entropy of this message we simply need to multiply the entropy of each word. If I tell you your barn burned down, that’s 4 words, or individual messages that make up the whole:

H("YOUR BARN BURNED DOWN") = log(240000) * 4

Entropy of Your Barn Burning

That gives us “21 Hartleys” of information, a unit of measure you’ve probably not heard before. Me neither. But that’s not really the weird thing here.

We have a bigger problem, which is that I could have said “THE SKY IS BLUE” and, using Hartley’s equation, that would contain the exact same amount of information as “YOUR BARN BURNED DOWN”, which is absolutely not the case.

The Shannon Entropy

This is where we get to meet Claude Shannon again, who expanded on Hartley’s idea, taking into account the probability of each message. To Shannon, the amount of information in a message is inversely related to its probability - something Shannon called the “surprise” of a message.

That, Shannon reasoned, is the true measure of information and he calculated it using a simple probability equation:

s(M) = log2(1/p(M))

The surprise, s of a given message M is equal to the log (base 2) of 1/p(M). Why log 2? Because Shannon was working in the world of digital communication, sending messages using binary meant that the 1 or 0 was the correct way of thinking about (and quantifying) information.

So, the surprise of the message “YOUR BARN BURNED DOWN” is directly tied to the probability of each of those words coming in. “YOUR” is pretty normal, “BARN” isn’t all that common but following “YOUR” it’s not unexpected. “BURNED” is extremely unexpected but, once again, the word “DOWN” is not, because it follows the word “BURNED”.

To Shannon, the words “YOUR” and “DOWN” would be redundant in this message and could likely be stripped out, conveying close to the same idea: “BARN BURNED”. Not exactly the same as there is a bit of ambiguity, but this is a very important breakthrough!

Shannon gave us a way to think about shortening a message without losing too much information. Shorter messages means more efficient transmission! Brilliant!

This led to Shannon’s Entropy equation, which is one of the most important equations ever produced:

Shannon's Entropy

The Seeds of Data Compression

If all of this has seemed academic up until now, this is where it all gets real. We use data compression every day - from GZIP on web pages to image compression in our texts. It all started right here with Claude Shannon’s realization that words could be stripped from a message without altering its information.

We’ll get to that in the next video.


The Code

Code for this video (and for the entire series) can be found in the production GitHub repo.

  • The Basics of Logic

    Let’s jump right in at the only place we can: the very begining, diving into the perfectly obvious and terribly argumentative 'rules of logic'.

  • Boolean Algebra

    You're George Boole, a self-taught mathematician and somewhat of a genius. You want to know what God's thinking so you decide to take Aristotle's ideas of logic and go 'above and beyond' to include mathematical proofs.

  • Binary Mathematics

    This is a famous interview question: 'write a routine that adds two positive integers and do it without using mathematic operators'. Turns out you can do this using binary!

  • Bitwise Operators

    Up until now we've been representing binary values as strings so we could see what's going on. It's time now to change that and get into some real binary operations.

  • Logical Negation

    We've covered how to add binary numbers together, but how do you subtract them? For that, you need a system for recognizing a number as negative and a few extra rules. Those rules are one's and two's complement.

  • Entropy and Quantifying Information

    Now that we know how to use binary to create switches and digitally represent information we need to ask the obvious question: 'is this worthwhile'? Are we improving things and if so, how much?

  • Encoding and Lossless Compression

    Claude Shannon showed us how to change the way we encode things in order to increase efficiency and speed up information trasmission. We see how in this video.

  • Correcting Errors in a Digital Transmission, Part 1

    There are *always* errors during the transmission of information, digital or otherwise. Whether it's written (typos, illegible writing), spoken (mumbling, environment noise) or digital (flipped bits), we have to account for and fix these problems.

  • Correcting Errors in a Digital Transmission, Part 2

    In the previous video we saw how we could correct errors using parity bits. In this video we'll orchestrate those bits using some math along with a divide and conquer algorithm to correct single-bit errors in transmissions of any size.

  • Encryption Basics

    In this video we play around with cryptography and learn how to encrypt things in a very simple, basic way. We then ramp up our efforts quickliy, creating our own one-time pad and Diffie-Hellman secure key transmitter.

  • Hashing and Asymmetric Encryption

    In this video we dive into hashing algorithms, how they're used and what they're good (and not so good) for. We'll also dig into RSA, one of the most important pieces of software ever created.

  • Functional Programming

    Functional programming builds on the concepts developed by Church when he created Lambda Calculus. We'll be using Elixir for this one, which is a wonderful language to use when discovering functional programming for the first time

  • Lambda Calculus

    Before their were computers or programming languages, Alonzo Church came up with a set of rules for working with functions, what he termed lambdas. These rules allow you to compute anything that can be computed.

  • Database Normalization

    How does a spreadsheet become a highly-tuned set of tables in a relational system? There are rules for this - the rules of normalization - which is an essential skill for any developer working with data

  • Big O Notation

    No video with this one - just a post with lots of code for quick review. Understanding Big O has many real world benefits, aside from passing a technical interview. In this post I'll provide a cheat sheet and some real world examples.

  • Arrays and Linked Lists

    The building block data structures from which so many others are built. Arrays are incredibly simple - but how much do you know about them? Can you build a linked list from scratch?

  • Stacks, Queues and Hash Tables

    You can build all kinds of things using the flexibility of a linked list. In this video we'll get to know a few of the more common data structures that you use every day.

  • Trees, Binary Trees and Graphs

    The bread and butter of technical interview questions. If you're going for a job at Google, Microsoft, Amazon or Facebook - you can be almost guaranteed to be asked a question that used a binary tree of some kind.

  • Basic Sorting Algorithms

    You will likely *never* need to implement a sorting algorithm - but understanding how they work could come in handy at some point. Interviews and workarounds for framework problems come to mind.

  • DFS, BFS and Binary Tree Search

    You now know all about trees and graphs - but how do you use them? With search and traversal algorithms of course! This is the next part you'll need to know when you're asked a traversal question in an interview. And you will be.

  • Dynamic Programming and Fibonnaci

    Dynamic programming gives us a way to elegantly create algorithms for various problems and can greatly improve the way you solve problems in your daily work. It can also help you ace an interview.

  • Calculating Prime Numbers

    The use of prime numbers is everywhere in computer science... in fact you're using them right now to connect to this website, read your email and send text messages.

  • Graph Traversal: Bellman Ford

    How can you traverse a graph ensuring you take the route with the lowest cost? The Bellman-Ford algorithm will answer this question.

  • Graph Traversal: Djikstra

    Bellman-Ford works well but it takes too long and your graph can't have cycles. Djikstra solved this problem with an elegant solution.

  • Design Patterns: Creational

    Tried and true design patterns for creating objects in an object-oriented language.

  • Design Patterns: Structural

    As your application grows in size you need to have a plan to handle the increase in complexity. The Gang of Four have some ideas that could work for you.

  • Design Patterns: Behavioral

    Mediators, Decorators and Facades - this is the deep end of object-oriented programming and something you'll come face to face with as your application grows.

  • Principles of Software Design

    You've heard the terms before: YAGNI, SOLID, Tell Don't ASK, DRY... what are they and what do they mean?

  • Testing Your Code: TDD and BDD

    Testing code has moved beyond the realm of QA and into the realm of design, asking you to think about what you do before you do it. Let's have a look at some strategies.

  • Shell Script Basics

    It's a Unix world. You should have a functional knowledge of how to get around a Unix machine using the command line, as well as how to complete basic tasks using shell scripts and Make files.

  • Hands On: Creating a Useful Shell Script

    I use the static site generator Jekyll to write my blog. I store the site at Github, who then translates and hosts it all for me for free. Jekyll is simple to use and I like it a lot. There's only one problem: it's a bit manual.

  • Deciphering a Complex Bash Script

    I use the static site generator Jekyll to write my blog. I store the site at Github, who then translates and hosts it all for me for free. Jekyll is simple to use and I like it a lot. There's only one problem: it's a bit manual.

  • Making Your Life Easier with Make

    Make is a build utility that works with a file called a Makefile and basic shell scripts. It can be used to orchestrate the output of any project that requires a build phase. It's part of Linux and it's easy to use.

  • Using Make to Improve Your Test Suite

    No video with this one - just a post with lots of code. Make has been around forever and is often overlooked in favor of tools that recreate precisely what it does, but in crappier ways. Let's see how you can use Make to help your testing process.

Watch Again



Next Up