Introduction to SQL and Data Science

You're about to embark on what I hope will be a fun and enjoyable trip into the world of SQL, data and databases. We'll be using PostgreSQL exclusively for this production, and in the first video we'll get you all set up with a server and some tools.

You're about to embark on what I hope will be a fun and enjoyable trip into the world of SQL, data and databases. We'll be using PostgreSQL exclusively for this production, and in the first video we'll get you all set up with a server and some tools.

You'll want to start downloading the code and data, however, and you can do that below.

The Code and Data

The final code is available here. It's 600Mb in size and contains all the data you'll need.

All the code you'll need is included on each page as well. I would encourage you to write as much of it by hand as possible, so your fingers get used to writing SQL.

If you want to have a look at the raw data for the mission, I have a copy right here. There are a lot of files in the archives and it's a pretty big download.

Finally, this video production is based on my book A Curious Moon but is designed to be standalone. If you're finding yourself struggling, the book might be worth your while. We do a number of different things in this production, however.

Enter your text here...

2 Lessons

Video lesson

You're about to dive into the glorious world of databases and the language of data - SQL. Lucky you! Our working data set will be the actual data from the Cassini mission, specifically that of Enceladus, an icy moon orbiting Saturn that is confounding scientists.

Video lesson

We'll be using PostgreSQL for our SQL and analysis tasks. You can use other systems and they work just fine - but for this one we'll be using PostgreSQL. We have a bunch of data to load up and poke through, so let's get rolling quickly OK?

7 Lessons

Video lesson

When working with data you often hear the terms "extraction, transformation and loading" or "ETL". This is something that data analysis people think about more than say Application Developers - which I think is a bummer. It's important to know how the data in any application is going to be used so you can make sure you collect the right stuff!

Video lesson

Our first task is to find the exact dates and times when Cassini flew by Enceladus and made its measurements. We need to create a time window so we can narrow down the results from the INMS - Cassini's on board "Ion Neutral Mass Spectrometer" - that's the thing that sniffed space for the chemicals we're looking for. Off we go!

Video lesson

Now that the data is in the database, let's poke around and see what we have, using simple select statements and getting to know the Postgres client tool psql.

Video lesson

Dates and timestamps are core to working with data as you will often find that if you don't know precisely WHEN something happened, it will become meaningless. Dates mark changes over time - those changes will often drive business decisions, so you better be correct!

Video lesson

The data in our mission plan looks straightforward but since it's a plan that is based on dates, we need to jump right into validating those dates. Thankfully for us, Postgres is outstanding at date and time functionality!

Video lesson

We don't like errors when running our imports and, unfortunately since we're human, we're going to have a lot of them. Instead of fixing things piecemeal, it's always better to just rerun everything.

Video lesson

We did a lot in this section! We dipped our toes into the lovely world of SQL and we also learned that we should NEVER trust a spreadsheet! Hopefully you're inspired to keep playing! We did a lot in this section, believe it or not, and we're going to ramp it up and do ever more in the next ones!

Video lesson

Lesson 1:  Introduction

We have our tools and our initial extraction ready to go, now let's get to work finding the flybys of Enceladus! In this section we'll focus on transforming this data into something we can query with some degree of confidence and then, hopefully, we'll find the exact time windows for the flybys.

Video lesson

Structuring a relational database is (typically) all about following the rules of normalization, called "normal forms". Sounds theoretical, but it's pretty straightforward.

Video lesson

This video is from The Imposter's Handbook videos that I did a few years back. It doesn't quite fit in the structure of this course BUT I think digging in to normalization a little more, if you're having trouble, can only help.

Video lesson

Before we can normalize the mission plan data, we need to understand what's in the table and how it's related.

Video lesson

Our inspection showed us how we can isolate the Enceladus data - now let's do it!

Video lesson

It's time to nail down the flybys! We have a plan table full of Enceladus mission plan goodness - now we just need to figure out how to sift the data correctly.

Video lesson

We learned a few wonderful things in this section - one of the biggest, to me, is that our job is more than running queries - it's also digging in to the data we have and ensuring that it's usable. Sometimes invalidating a set of data is just as important as validating it. Unfortunately, it means we'll often lose HOURS of work, which can suck.

Video lesson

The fun begins! We have the data we need isolated and we feel good about its integrity - now we're ready to let it tell us a story. But what story is that? This is where things get interesting.

Video lesson

We've done this all before so let's get rolling! The first thing we're going to do is to load up the INMS and chemical data into Postgres.

Video lesson

The INMS CSV is loaded, now we need to create our analysis table using the full power of PostgreSQL - specifically strong data types with appropriate constraints.

Video lesson

We've been moving along at a pretty fast clip and it's crucial that we don't go too fast, skimming over super important concepts like constraints!

Video lesson

Now that we understand constraints a little bit more, let's carry on and buff this table out completely. We'll add a timestamp with time zone so we know when the import happened and then we'll get into some weird stuff with generated columns!

Video lesson

Wouldn't be any fun if we did everything right the first time, would it? Imposing rules during transformation is always problematic - which is great! We get to learn about our assumptions and also dig deeper into the data.

Video lesson

Premium Summary

We learned some new SQL and expanded on some old friends. We learned how to update a set of rows using values from a different table. We also learned how to delete troublesome data using delete from. We touched briefly on analytical functions that we'll be using more in the next section. We got into trouble using check constraints but also learned they can help us sleep soundly at night. We had some fun using generated columns and saved ourselves some time later on by denormalizing (sort of) our table. And finally we learned about range queries using IN, a very popular way to find intersecting records.

Video lesson

The fun begins! We have the data we need isolated and we feel good about its integrity - now we're ready to let it tell us a story. But what story is that? This is where things get interesting.

Video lesson

Bias can creep in anywhere during the analysis process - even in the way you structure your query and the joins you use.

Video lesson

Excel is EVERYWHERE and for good reason - it's simple to use, you can visualize data and you can even write simple functions and macros. Excel is the powerhouse of the analytical world and we need to prepare our data to work with it.

Video lesson

It's important to take a step back from time to time and consider just what it is you're doing and why. In our case, we're working with one of the most important scientific data sets of the last century - at least with respect to planetary studies.

Video lesson

Premium Ship It!

We feel good about the data and we're ready to ship it off! But how? Well that, friends, is the best part! There are multiple ways to do this but by far the simplest way is with a simple shell command using PSQL and copy. You can drop the file anywhere you like - on your desktop so you can email it, or, what's easiest, is to use a secure file sharing system like Dropbox, Google Drive, OneDrive - whatever. Nice and fast.

Video lesson

Parting... is such sweet sorrow! I hope you were able to follow along with me in this section - because if you did your SQL and data skills just shot up - yay for you! Thanks for watching...

Pen
>