Predicting Privilege with Python. Do Millennials Make the Grade?

Hi everyone! So I’ve recently been developing a new talk on an intro to data science using python- it involves using interview transcripts to understand sentiment and creating valuable visualizations all within python!

First, some background on the project:

When I was at Duke University, I participated in Bass Connections, a program launched by a $50 million gift from Anne and Robert Bass. The program provided both graduate and undergraduate students with the opportunity to participate in research projects across different disciplines and partner with professors from various parts of the university. Our team, in the Education and Human Development focus, sought to find out about the future of Massive Open Online Courses (i.e. Coursera, Udemy, Kahn Academy, etc,).

We put up two separate courses in the spring of 2014: one on Introduction to Chemistry taught by Professor Dorian Canelas and another on Data Analysis and Statistical Inference taught by Professor Mine Cetinkaya-Rundel.

Interviewed 36 students over Skype.

My initial goal was to see if we could use sentiment analysis of student feedback to determine whether or not obtaining a certificate was the only way to measure success in Coursera courses. We are still in the process of finishing up this paper, so keep an eye out for it! Overall, we found that there are many different motivations and barriers that students have for taking an online course!

Extending this to Millennials specifically

Millennials: Overconfident, lazy and coddled – or are they? SNL did a hilarious skit about Millennials, which pins them to be self centered and entitled. But does this stereotype resonate everywhere or only in traditional settings? I looked to see what happens when we evaluate less traditional environments such as Massive Open Online Courses such as Coursera to see how well they adopt to this new way of learning that is fundamentally self-driven and self-motivated.

How do we do this in Python?

I’ve created a repository on GitHub with all of my code for this project. The main tools I looked at were:



I used this to turn my transcripts into data frames that I could easily manipulate. It’s a great tool for data scientists to use. It creates data frames that are similar to what you’d expect in Excel or R, which allow you to index columns and rows to manipulate them specifically.



Matplotlib is a great tool for visualizing datasets. Pandas uses Matplotlib in the background to create plots. The code shows the different ways to create plots.

Vader Sentiment Analysis

Vader Sentiment Analysis Joke from Parks and Rec

Vader sentiment analysis was created specifically for social media and takes emoticons into consideration as well. The tool compares words to negative and positive lexicons and takes the percentage of negative versus positive words to figureout an overall sentiment score. The tool also takes into account double negatives or statements like “I don’t like” and realizes it is a negative score.

I created a GUI that will allow users to enter a sentence and get their sentiment score. For example, the sentence “Sentiment Analysis is awesome. I love it so much!” receives the following score:

{‘neg’: 0.0, ‘neu’: 0.411, ‘pos’: 0.589, ‘compound’: 0.8622}
The compound score shows the percentage of positive vs negative statements. A score above zero shows that the statement is positive, and a score below zero is a more negative statement. The closer the score is to 1 or -1, the more positive or negative it is respectively.

Now if I add a 🙂 to the end, the tool knows that it is even MORE positive then before. The compound score becomes .9165 now. If I add a 🙁 to the end, the compound score now becomes .7516. It still knows it is positive, but a little less positive.


Stormtrooper word cloud

These show the most important and most frequent words that occurred. With these data we found that “Time” and “Think” were the most frequent and popular words to show up. I also put them in cool shapes! If you want to learn more about that check out my previous post about other shapes you can make including Minion Word Clouds!!



All in all, we found that Millennials in the United States were the most positive about the online courses in comparison to other ages and locations. While we can’t tell “WHY” they were more positive, we can make some guesses.

  1. They’re used to the online format
  2. Debt from colleges make Online more appealing
  3. Can learn multiple topics

If we wanted to learn more about WHY students felt the way the did, we would have to do a Qualitative analysis rather than a quantitative one. Tools like Nvivo allow you to break down the transcripts and run analyses. However, you have to do most of this manually. Our team took 3 full days to sit down and code each and EVERY sentence in these transcripts to a different topic or genre. For instance, a sentence might have fallen into the category Motivations, which we then broke down further into things like Certificates, knowledge, hobby, for fun. Each sentence or paragraph would be coded to one or more topic. While this kind of tool is very useful, it was extremely time consuming.

We found that the most common barriers were the lack of time, bad experiences, and the online format. While the most common motivations that students had for taking the courses were to gain knowledge, for work, and for the convenience of having the courses available online.

Barriers by Course

Motivations by Course

While we could determine that Millennials were more positive about online courses than any other age, we cannot exactly pin down why. It will be interesting to see how MOOCs develop and if their certificates garner more value. If that is the case, Millennials who are often plagued with massive debt from college, might be more willing to switch to the online platform. It is important to keep in mind the different challenges that plague students of online courses, and try to make it most similar to the brick and mortar type of school. This way they are not losing anything from learning online. Instead, they are learning about the topics they want, even if they were never available to them before!