Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 6 submissions in the queue.

Submission Preview

Link to Story

How Big Is YouTube?

Accepted submission by canopic jug at 2023-12-24 09:36:10 from the ain't-got-time-for-all-this-recorded-jibba-jabba dept.
Techonomics

Ethan Zuckerman asks, how big is YouTube? [ethanzuckerman.com]. Using a statistical sampling method, their current estimate for the size of YouTube is 13.325 billion videos.

Interesting as Reddit and Twitter are, they are much less widely used than YouTube, which is used by virtually all [I]nternet users. Pew reports that 93% of teens use YouTube [pewresearch.org] – the closest service in terms of usage is Tiktok with 63% and Snapchat with 60%. While YouTube has a good, well-documented API, there’s no good way to get a random, representative sample of YouTube. Instead, most research on YouTube either studies a collection of videos (all videos on the channels of a selected set of users) or videos discovered via recommendation (start with Never Going to Give You Up [youtube.com], objectively the center of the internet, and collect recommended videos.) You can do excellent research with either method, but you won’t get a sample of all YouTube videos and you won’t be able to calculate the size of YouTube.

I brought this problem to Jason Baumgartner, creator of PushShift, and prince of the dark arts of data collection. One of Jason’s skills is a deep knowledge of undocumented APIs, ways of collecting data outside of official means. Most platforms have one or more undocumented APIs, widely used by programmers for that platform to build internal tools. In the case of YouTube, that API is called “Inner Tube” [gizmodo.com] and its existence is an open secret in programmer communities. Using InnerTube, Jason suggested we do something that’s both really smart and really stupid: guess at random URLs and see if there are videos there.

As seen in his charts, the amount of videos there grows exponentially. Thus one could also conclude that the storage costs also grow in proportion.


Original Submission