Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 12 submissions in the queue.
posted by Fnord666 on Tuesday December 26, @05:52AM   Printer-friendly
from the ain't-got-time-for-all-this-recorded-jibba-jabba dept.

Ethan Zuckerman asks, how big is YouTube?. Using a statistical sampling method, their current estimate for the size of YouTube is 13.325 billion videos.

Interesting as Reddit and Twitter are, they are much less widely used than YouTube, which is used by virtually all [I]nternet users. Pew reports that 93% of teens use YouTube – the closest service in terms of usage is Tiktok with 63% and Snapchat with 60%. While YouTube has a good, well-documented API, there’s no good way to get a random, representative sample of YouTube. Instead, most research on YouTube either studies a collection of videos (all videos on the channels of a selected set of users) or videos discovered via recommendation (start with Never Going to Give You Up, objectively the center of the internet, and collect recommended videos.) You can do excellent research with either method, but you won’t get a sample of all YouTube videos and you won’t be able to calculate the size of YouTube.

I brought this problem to Jason Baumgartner, creator of PushShift, and prince of the dark arts of data collection. One of Jason’s skills is a deep knowledge of undocumented APIs, ways of collecting data outside of official means. Most platforms have one or more undocumented APIs, widely used by programmers for that platform to build internal tools. In the case of YouTube, that API is called “Inner Tube” and its existence is an open secret in programmer communities. Using InnerTube, Jason suggested we do something that’s both really smart and really stupid: guess at random URLs and see if there are videos there.

As seen in his charts, the amount of videos there grows exponentially. Thus one could also conclude that the storage costs also grow in proportion.


Original Submission

This discussion was created by Fnord666 (652) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 4, Insightful) by Ox0000 on Tuesday December 26, @09:27AM (5 children)

    by Ox0000 (5111) on Tuesday December 26, @09:27AM (#1337772)

    It's all fun and games to claim that YT is big. It's definitely very lucrative for its owner to claim that and be perceived as 'incontournable [cambridge.org]'. But how much of actual value is there on that site? How many of those billions of videos are actually meaningful, and not yet another wanna-be influencer that starts and ends their attempt with the hollow phrase "please like and subscribe"/"smash that subscribe button".

    Similarly, the claim that 93% of PFY's use the site lacks substance and is somewhat vacuous. To do what? Be background noise while they do other stuff? (Albeit with occasional, yet ever increasing in frequency, interruptions that try to scam them out of their money by purchasing things they don't need.)

    YT is there to shove advertisement down your throat. That is their sole reason for existing: to manipulate you into something you would not do out of your own volition.
    The way they have convinced 'content creators' to work for is essentially the equivalent of almost free for them, and act as the lure to get you to see/subject yourself to that advertisement is gob smacking... It says a lot about ourselves, as well as the morals of YT and its ilk.

    • (Score: 3, Insightful) by looorg on Tuesday December 26, @01:49PM (2 children)

      by looorg (578) on Tuesday December 26, @01:49PM (#1337800)

      The site, or at least storage, must be basically constantly growing more or less exponentially. In some regard the quality of the video and audio have gotten better over the years, the content of the quality perhaps -- a question or matter of taste and interest after all. But I would gather that it's rare that anything ever gets deleted. At best I would gather things get flagged as non-viewable. In that regard storage_space == size.

      That said as noted it's probably filled to the brim with duplicates and near duplicates that could probably be trimmed. But there is no tech for that. Perhaps that is what Google-AI should do. Then there are the tumbleweed videos, those that have been around for digital eons but nobody watched. All those people thinking they are going to be the next YouTube sensation that just never amounted to much of anything. Videos with less then a thousand views etc.

      Should they be pruned? After all to get paid by YT you need subs and views. If you don't reach the bar you get nothing, they keep everything. Even a lot of trivial small amounts combined eventually becomes large together. After all to even get paid by YT for your content you need to join their partner program -- you need at least 4000+ watched hours of video content and at least a thousand subscribers and also you need to make them watch the ads. No ad eyeballs = no pay. So most videos on YT are in that part probably free as they won't be making anything. In that regard a lot of it is just an archive of things, most of it duplicates if not by actual content but as things people didn't want to watch or cared about.

      So they are pulling data from YT to feed their stats site project, Tubestats. YT probably won't like that one bit. Considering they are currently in a constant fight with the alternative video players such as Invidious etc. They don't like others having or accessing their data. After all statistical data would be currency for advertisers, can't share that golden nugget with anyone.

      At least they are honest about that aspect. Also guessing is now apparently a valid statistical method. Which isn't actually wrong.

      Perhaps the main valid question is if YouTube is to large to fail as it stands now? After all there are not that many alternatives. But unlike other repositories of knowledge we at least try to maintain and preserve some of them. Who will preserve YT for the future? The digital preservation dilemma in general.

      • (Score: 0) by Anonymous Coward on Wednesday December 27, @12:27AM (1 child)

        by Anonymous Coward on Wednesday December 27, @12:27AM (#1337881)

        > Videos with less then a thousand views etc.

        I certainly hope YT doesn't take your advice and dump willy-nilly based on views. In the "less than a thousand views" are some real gems of early rock and roll footage, likewise early jazz performances. They often sit around for years without a lot of interest...then all of a sudden there is interest (maybe one of the band members died?) and presto, many more views.

        When there are dupes, one has less views and then doesn't have adverts inserted, so it's a better link to send to a friend (not all my friends use adblockers).

        • (Score: 2) by looorg on Wednesday December 27, @01:41AM

          by looorg (578) on Wednesday December 27, @01:41AM (#1337898)

          I'm not saying it is a great idea to be implemented as is. But then is just adding more storage ad infinitum really the answer?
          For you the user perhaps, but for them as a company to keep all this dupe content around for some potential future it might not make as much sense. Considering how they prune or censor for other reasons.

          Perhaps not just views, but if there are many near identical things. Keep the best quality audio/video once. With views over the crappier once.

          You could even go the environmental route. While storage is cheap and so forth. All that storage still consume electricity. For doing nothing then but creating heat. The disks, or array of disks, that content that generate nothing in revenue on will eventually need to be replaced. There is also internal backups and redundancy of all the data. So all these things might add up on some global environmental global warming chart somewhere. When will the eco-people demonstrate outside the Googleplex for the environment?

          When will the "fluff" have to go from a corporate financial perspective?

    • (Score: 4, Insightful) by tbuskey on Tuesday December 26, @03:54PM (1 child)

      by tbuskey (6127) on Tuesday December 26, @03:54PM (#1337833)

      It's all fun and games to claim that YT is big. It's definitely very lucrative for its owner to claim that and be perceived as 'incontournable [cambridge.org]'. But how much of actual value is there on that site? How many of those billions of videos are actually meaningful, and not yet another wanna-be influencer that starts and ends their attempt with the hollow phrase "please like and subscribe"/"smash that subscribe button".

      ---

      YT is there to shove advertisement down your throat. That is their sole reason for existing: to manipulate you into something you would not do out of your own volition.
      The way they have convinced 'content creators' to work for is essentially the equivalent of almost free for them, and act as the lure to get you to see/subject yourself to that advertisement is gob smacking... It says a lot about ourselves, as well as the morals of YT and its ilk.

      It sounds like you're claiming there is no value to YT. I disagree.

      YT exists to make money for Google. Much of the content is created to make money for the creators..

      But not all.
      In the past, consumer goods would often come with a VHS tape or DVD. Like a video version of the manual. The manual would be printed. In the dotcom era, companies started to put their manual up on the FTP server. It wasn't very expensive for them.
      Video was very expensive to put up. You needed servers, software and lots of $$ for bandwidth. When youtube came around, video was cheap to put up. Eventually the companies stopped putting out physical media & just had youtube links. Even the printed manual will go away. I bought a motorcycle & the manual was an online PDF with a link to a printing service if you wanted it.

      There are many educational/research videos that never would be on physical media or TV. Name a craft and you can find videos on YT that will teach you. I know one wood worker creating fantastic videos explaining how he creates historical furniture with historical techniques that you can't learn anywhere else. He makes $0 from it. There is lots of that out there.

      Maybe this wasn't the "purpose" of YT. It's saved $$ for companies explaining their product. Its enabled people to share their knowledge at low cost to them.

      Reply to This

      • (Score: 3, Insightful) by Ox0000 on Wednesday December 27, @01:13AM

        by Ox0000 (5111) on Wednesday December 27, @01:13AM (#1337891)

        To add nuance: I'm not claiming that there is zero value to YT, but I am claiming the value is significantly smaller than portrayed.

        When it comes to "Maybe this wasn't the "purpose" of YT. It's saved $$ for companies explaining their product. Its enabled people to share their knowledge at low cost to them.", watch YT's parent come down on those it considers 'freeloaders' like the fist of an angry god in the not distant future... It will do that on the video makers as well as on those who have to use the service in order to get those manuals and you will be forced into having to have an account with YT in order to access those convenient videos - which means more tracked cattle in GOOGs drag net. The transaction between you and the company that makes gizmo G should not involve GOOG coercing you into paying a second, third, fourth, fifth, and sixth time (with your privacy, your security, your attention budget, your electricity, your bandwidth, etc.) to get access to something that is inherently (should inherently be) part of the transaction in which you purchased gizmo G.

        "When youtube came around, video was cheap to put up."; and now it is cheap to do it yourself as well without ever needing YT. It's not rocket surgery/brain science anymore. So why use a platform that is run by the ilks for GOOG?

        An offline manual (whether that is dead tree, DVD, or downloaded version of a video) is still more valuable than a YT link. I can retrieve it when I need it 10 years from now from where I stuffed it. Good luck doing that with a video-manual that the company has taken down because of "planned obsolescence" and "pay for a new one, peasant".

        That being said, that was not even the main point of my post. I was trying to explain why it's worthwhile for YT to be portrayed as being that big, being "unavoidable if you want to reach PFY's". It's not for the person doing video's on historical techniques, it's to be able to stuff more ads down your throat.

  • (Score: 2) by JoeMerchant on Tuesday December 26, @01:47PM (4 children)

    by JoeMerchant (3937) on Tuesday December 26, @01:47PM (#1337798)

    >Thus one could also conclude that the storage costs also grow in proportion.

    If you have been tracking the cost of data storage across the past 20 years, it has only continued to plummet - probably faster than the size of YouTube grows.

    If storage cost were an issue, you'd notice a lot more limits on how long a free account can archive videos that garner less than one view per year.

    --
    🌻🌻 [google.com]
    • (Score: 2) by looorg on Tuesday December 26, @01:53PM (3 children)

      by looorg (578) on Tuesday December 26, @01:53PM (#1337801)

      Considering that Google, or Alphabet or whatever is the owner of YouTube, prunes stuff for "security reasons". One is left wondering why there is so little pruning of data from YT. Perhaps there is pruning, we just don't notice cause it's off things nobody watched anyway. But things just don't seem to get deleted. Ever.

      While storage is fairly cheap, considering the amount of storage they consume. By getting rid of all the old crap, lets say videos nobody watched in the last year. Not a single view == delete. They would save a lot of storage space. I don't know but I assume they recode content with more and more efficient video and audio codecs when available so they save space in that regard. Also active content are probably on faster access disks or cache while the tumbleweed content is not.

      • (Score: 2) by JoeMerchant on Tuesday December 26, @02:39PM (2 children)

        by JoeMerchant (3937) on Tuesday December 26, @02:39PM (#1337808)

        Tape backups FTW!

        When they worked, VCR type tape data storage was mind bogglingly cheap. My personal experience was that even with triple redundant backups we still experienced the occasional triple failure, but that was around 1/10,000 rare...

        --
        🌻🌻 [google.com]
        • (Score: 2) by looorg on Tuesday December 26, @05:22PM (1 child)

          by looorg (578) on Tuesday December 26, @05:22PM (#1337841)

          I remember building a cable for that during the Amiga days. It worked like a charm. Held a lot of storage as I recall it. Drawback as per usual is that tape is slow as hell. But it worked. As noted long term storage, you need to check your backups that they work or they are nothing but a false sense of security.

          • (Score: 2) by JoeMerchant on Wednesday December 27, @02:47PM

            by JoeMerchant (3937) on Wednesday December 27, @02:47PM (#1337954)

            I do sincerely hope that HDD and SSD cost per byte has finally gotten competitive with helical scan tape... It was the most frustrating storage media.

            --
            🌻🌻 [google.com]
  • (Score: 0) by Anonymous Coward on Tuesday December 26, @11:13PM

    by Anonymous Coward on Tuesday December 26, @11:13PM (#1337872)

    n/t

  • (Score: 2) by Mojibake Tengu on Thursday December 28, @09:30AM

    by Mojibake Tengu (8598) on Thursday December 28, @09:30AM (#1338065) Journal

    I am not quite sure YouTube uses real storage.

    Maybe it uses a π Filesystem (or another irrational generator value than π), and just computes requested files by their chunks on demand, video ID just be a single coordinate of first chunk.

    https://news.ycombinator.com/item?id=8018818 [ycombinator.com]
    https://github.com/philipl/pifs [github.com]

    You did noticed youtube-download operates in chunks, did you?

    The downside of this method is, all of your secrets (or pr0n) are stored somewhere in π number too, and that was already proven.

    --
    Respect Authorities. Know your social status. Woke responsibly.
(1)