Jeff Vidler: Integrating Big Data with Survey Data to Help Fill Gaps in Podcast Measurement

This guest column by Jeff Vidler, President and Founder of Signal Hill Insights Inc. in Ontario, was first published on the Signal Hill blog. Signal Hill Insights is an audio research consultancy. Jeff Vidler is a regular speaker at RAIN Summits, and is the co-producer of the Canadian Podcast Listener Report.


Data has often been called ‘the new oil.’ And, like oil, it has more value when it’s been refined.

IDC projects that the world will create 163 zettabytes by 2025, 10 times the amount generated in 2017. How big is a zettabyte? Put in an audio context, a single zettabyte of storage could store 7.5 trillion MP3 songs.

Just a tiny slice of that data can be transformative. We see it in the impact that things like dynamic ad insertion have had on advertising on podcasts, music streaming and even AM/FM streaming.

Slow motion closeup of worker filling seam between bricks with mortar from sealant gun

Despite all the data available today, we still have some blind spotsparticularly when it comes to podcasts. Who exactly is listening to your podcast? It may be the most important question that an advertiser has when they’re looking to make a buy. Yet, no single industry-wide measure has been able to answer that question. And it’s holding back the ability of the podcasting industry to monetize the continued growth of podcast listening.

IP address and user agent data can help, but only at the household level. Most important, they are limited when it comes to age and gender.

Survey research can help with these types of personalized demographics, but only when it comes to the most popular podcasts. With more than 2 million podcasts out there, no survey-based study has enough sample size to go beyond the very top tier of podcasts.

It’s with these challenges in mind that we at Signal Hill Insights have become engaged in an interesting adventure with the folks at Triton Digital.

A year and a half ago, we started a conversation around how we could marry the enormous amount of census level data that Triton collects on podcast listening from their publishing partners to the kind of demographic data that we’ve collected on the surveys we’ve done such as The Canadian Podcast Listener.

As survey researchers, we were intrigued by the opportunity to play in the big data sandbox that Triton could bring to the table. Digital data and survey research have long been relegated to their own silos. Bringing in a little data science, we were able to see a path to show how census level data and survey data can work together, with the strengths of one complementing the weaknesses of the other.

The fruits of those efforts are now in full view, with the US release in beta of Triton Digital’s Demos + Podcast Metrics. We started collecting data this past April.

Here’s how it works:

  • The key building block comes from the census level data in the Triton dataset. In the first 3 months of our study, we were able to analyze data from as many as 11 million listeners per week to the 15,000 podcasts from Triton’s publishing partners.
  • The first step in our process is to identify the other podcasts also listened to by listeners of each podcast.
  • Then, we look at which podcasts have a close connection with the podcast we’re measuring. By using an indexing approach that’s tuned to control for overall popularity, we find those podcasts that most comfortably live side-by-side with that podcast.
  • This gives us a unique ‘neighbourhood’ for that podcast. (Full credit to Pacific Content’s Dan Misener for coining the term ‘neighbourhood’ of podcasts.)
  • As you might expect, these listener neighbourhoods let us drill down into some very specific subgenres. Not just true crime, for example, but investigative true crime, historical true crime or true crime with humour, each of them delivering a unique type of audience.
  • The census level data from Triton brings in the neighbourhood; the survey data brings in the demographic profile.
  • It’s a big sample survey: 3,000 surveys among a nationally representative sample of monthly podcast listeners each quarter, rolling up to 12,000 surveys on an annual basis.
  • Rather than rely on the limited sample of listeners who identify listening to a single podcast, we amplify that by bringing in the survey mentions of like-minded listeners in that podcast’s neighbourhood.

By using Triton’s census level data to build these neighbourhoods, we effectively multiply the sample size available and generate reliable demographic profiles well down the long tail of podcast listening.

Even though it’s still early days, the results have been encouraging.

We’ve been validating our early findings by taking the podcasts with the most survey mentions and comparing their profile to their neighbourhood profile. More than 80% show an airtight match on key demographics, while the others are helping us refine the model.

These observations are based on our first quarter survey sample of just over 3,000 listeners. As we build towards the full sample of 12,000 that we’ll have at the end of year one, we’ll be in an even better place to show how big data and survey data can work together to help move the podcasting industry forward.

.

Jeff Vidler