Stories
Slash Boxes
Comments

SoylentNews is people

posted by Fnord666 on Friday July 19 2019, @05:08PM   Printer-friendly
from the interesting dept.

Over the years I have viewed many a video on YouTube. I quickly noticed an "ID" string that appeared in each video URL. Here's an example: https://www.youtube.com/watch?v=ShvnDSgjfXw -- see that string "ShvnDSgjfXw"? What characters are permitted? How long is it?

Along the way, I came upon an amazingly useful utility: youtube-dl. I accidentally discovered that it will happily download a YouTube video given just the Video ID. (Don't let the name of the utility mislead you; it seems to work fine with Instagram, Twitter, Sound Cloud... it's amazing!)

Now with my curiosity suitably piqued, I started a genuine search for what the parameters were that defined a valid YouTube Video ID. This question on "Web Applications Stack Exchange" was most helpful. Especially this response.

It appears that the Video ID (and the Channel ID) are modified base64 encodings of 64-bit (and 128-bit) integers. The primary change is that the base64 encoding produces two characters that are verboten in URLs. A generated "/" is replaced with "-" and a generated "+" is replaced with a "_".

There is no official documentation claiming that the ID lengths are guaranteed to always be 11 or 22 characters long, but empirical evidence suggests that is the current, de-facto standard.

There is even mention of " the maximally-constrained regular expression (RegEx) for the videoId" being:

[0-9A-Za-z_-]{10}[048AEIMQUYcgkosw]

Things get even more interesting if you are using Windows. Under NTFS, file names default to be case-preserving, but case-insensitive. Say I create a file called "Foo.txt" and then get a directory listing. Sure enough, I see: "Foo.txt" displayed. The fun comes if I do "DIR foo.txt" or "DIR FOO.TXT" or any other variation... they all find the same file: "Foo.txt"; this is counter to Unix where filenames are case-sensitive and each of those variations would be treated as separate and distinct files. Though it is possible to make an NTFS volume case-sensitive, it is not for the faint of heart!

One could, therefore, reverse-engineer the integer that produced the Video ID and use that in addition (or for the adventuresome: instead of) the Video ID.

The whole discussion was well-worth the read and highly recommended for anyone who would like more information on where it came from and how it came about.


Original Submission

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 4, Informative) by ikanreed on Friday July 19 2019, @05:19PM (2 children)

    by ikanreed (3164) Subscriber Badge on Friday July 19 2019, @05:19PM (#869063) Journal

    Ironically, someone already did a much more thorough youtube video [youtube.com] on youtube video urls.

    • (Score: 1) by nitehawk214 on Friday July 19 2019, @07:09PM

      by nitehawk214 (1304) on Friday July 19 2019, @07:09PM (#869123)

      I knew this was going to be Tom Scott :)

      --
      "Don't you ever miss the days when you used to be nostalgic?" -Loiosh
    • (Score: 2) by legont on Sunday July 21 2019, @01:08AM

      by legont (4179) on Sunday July 21 2019, @01:08AM (#869490)

      This one has adds while the other one does not.

      --
      "Wealth is the relentless enemy of understanding" - John Kenneth Galbraith.
  • (Score: 4, Insightful) by SomeGuy on Friday July 19 2019, @05:38PM (29 children)

    by SomeGuy (5632) on Friday July 19 2019, @05:38PM (#869072)

    Things get even more interesting if you are using Windows. Under NTFS, file names default to be case-preserving, but case-insensitive. Say I create a file called "Foo.txt" and then get a directory listing. Sure enough, I see: "Foo.txt" displayed. The fun comes if I do "DIR foo.txt" or "DIR FOO.TXT" or any other variation... they all find the same file: "Foo.txt"; this is counter to Unix where filenames are case-sensitive and each of those variations would be treated as separate and distinct files. Though it is possible to make an NTFS volume case-sensitive, it is not for the faint of heart!

    To the rest of the world, it is Unix/Linux's behavior that is unusual and needs explaining.

    Why should one be allowed to have a bajillion files with names like ambiguous, Ambiguous, AmBiguous, AMbiguous, AMBIGUOUS, ambiguouS, and so on. Look at any real world filing cabinet. Ok, you will have to get a shovel and dig it out from underneath the pile of dead smartphones first, but no one would ever place "a" after "Z". Case is just formatting that can hint at the use of a word, much like italics or bold print. Outside of some pedantic twats who only see hexadecimal when they look at words, nobody does it like that. Well, at least in the part of the world that can still read and write.

    • (Score: 5, Insightful) by Anonymous Coward on Friday July 19 2019, @05:57PM (13 children)

      by Anonymous Coward on Friday July 19 2019, @05:57PM (#869076)

      Fuck you. I want my filesystem to know the difference between differently cased characters.

      • (Score: 4, Informative) by DannyB on Friday July 19 2019, @06:26PM

        by DannyB (5839) Subscriber Badge on Friday July 19 2019, @06:26PM (#869097) Journal

        The filename behavior described first appeared in Classic Mac OS (1984). It was sane and made sense to normal people -- not autism touched people such as myself.

        Even as a geek I don't want a Jillion (which is less than a Bazillion) filenames that only differ in the case of their characters.

        But I understand the frustration if you cannot then use, say, a Base64 encoded value as a filename.

        But you could use a Base32 [crockford.com] encoded filename. Base32 is intended to eliminate confusion between upper/lower case and the difference between the letter "Oh" and numeral "Zero", as well as letter lowercase "Ell" and digit "Won". I could text a Base32 value to your phone so you could type it back onto a web page, and even if you made those wrong substitutions, there is no possibility of mis-decoding it due to silly errors by puny humans.

        --
        What doesn't kill me makes me weaker for next time.
      • (Score: 2, Informative) by Anonymous Coward on Friday July 19 2019, @06:37PM

        by Anonymous Coward on Friday July 19 2019, @06:37PM (#869103)

        Well, to be technical, the file system does know the difference between cased characters. It is the default API that doesn't care (which is a backwards compatibility thing back to the FAT16 days). If you use a different OS, different or lower-level API, or set the case-sensitivity attribute, you get case sensitive interactions with the file system.

      • (Score: 2) by digitalaudiorock on Saturday July 20 2019, @02:30AM (7 children)

        by digitalaudiorock (688) on Saturday July 20 2019, @02:30AM (#869258) Journal

        Amen to that...I'd mod your post if it wasn't already +5. The strange thing to me is the notion that case insensitive file systems are some how more simple. I'd bed the original decisions to make Unix file systems case sensitive...like most decisions made in Unix...was because it is the more simple approach. Maybe it's just my technical look on things but it sure seems so to me. In a case insensitive file system, I have a file MyFile.txt. So what exactly uniquely defines that file? Is it MyFile.txt, myfile.txt, MyFiLe.TXT...the answer is really all of the above. Fuck that. In 'nix the unique identifier are the characters of the file name...period.

        You gotta love how Windows used to limit file names to the 8.3 format when Unix could have almost any file/directory names at all. This included name with spaces, yet they were smart enough to avoid them because their a PIA. Yet as soon as MS supported spaces in names, they decide to put excusables (under their own directories) under "C:My Pretty Little Program Files" because "we can". I'll take ALL the Unix choices any day of the week and this is one of those.

        • (Score: 1, Insightful) by Anonymous Coward on Saturday July 20 2019, @03:38AM (1 child)

          by Anonymous Coward on Saturday July 20 2019, @03:38AM (#869272)

          Sure, more simple from a *programming* perspective. That would be fine if only computers used computers, but they don't.

          • (Score: 2) by digitalaudiorock on Saturday July 20 2019, @03:04PM

            by digitalaudiorock (688) on Saturday July 20 2019, @03:04PM (#869386) Journal

            I suppose it's a matter of preference. I'm certainly not a computer and I prefer case sensitivity in a big way, mostly because it just makes more sense to me.

        • (Score: 0) by Anonymous Coward on Saturday July 20 2019, @03:50AM

          by Anonymous Coward on Saturday July 20 2019, @03:50AM (#869275)

          Oh, forgot to add, those arbitrary spaces placed in Windows 95/NT 4 path names helped bang in to applications that they HAD to support spaces in file names. Back then there were a LOT of freshly ported 3.1/NT 3.51, poorly written, 32-bit applications that would die if you tried to open/save a file or path that had a space.

          (Oh, and since you probably can't read it, by HAD I mean had, and LOT I mean lot, but with more emphasis.)

        • (Score: 0) by Anonymous Coward on Saturday July 20 2019, @07:28AM (2 children)

          by Anonymous Coward on Saturday July 20 2019, @07:28AM (#869321)

          UNIX/Linux/BSD had a very different linage than Windows.

          Windows had to work with DOS/CMS style disks. Not 'oh it would be ok if', *had to* as in hard requirement. To be used by people spending money and wanted the software they already paid for to continue to work. Backwards compatibility was a huge deal to windows. They spend a lot of effort and money making it happen.

          UNIX on the other hand takes the approach of re-write it and damned backwards compatibility. API compat was where it was at. That worked because you would buy your computer and the OS that came along with it would be custom for your computer. In many cases you would compile it then install it. Some software you would buy would be the same way.

          DOS had none of that. It had to fit in 640k (usually less). It had to fit on a 360k disk. Upper lower case not happening. To do that compromises were made. The UNIX way only works with a much more rich environment where wasting hundreds of K is no big deal. If you look at the old systems they would bitpack things to save space. These days inventing something new you would have a json descriptor and not worry too much about the wasted space.

          We live today with choices made in the mid 80s to fit things into a borderline embedded computer that did not cost 20,000 just to get in the door with. Remember a computer with a 2MB of RAM and 5MB drive was wildly expensive. Computers would come with 2 floppy drives just so you did not have buy more hard drive or memory. Because floppies were comparatively cheaper than both. Simpler is not always best. Cost maters a lot too. I am sure my 15 year old self would have loved to have a 'technically simple system' and would have lorded that over anyone who did not have it. But my 15 year old selfs parents would have looked at the sticker price and said 'yeah thats not happening ever'.

          Case insensitivity is annoying sometimes. But then again so is case sensitivity. You can look at a file Myfile.txt and MyfIle.txt and miss the case very easy depending on font choice. Insensitivity makes it more usable. But it does come at a cost in both directions. UNIX works the way you think it would work. Windows uses the way people actually use a computer.

          All systems with a bit of age on them have some warts. ls instead of catalog or dir. cp instead of copy spring to mind right away. mv instead of move. That was because of the UNIX lineage of being on a teletype. Where less characters meant you got more done. But came with a cognitive load of having to basically learning random 2 char string phrases to do anything. Remember these were timeshare style systems. Every second counts and the bill is running.

          Understand your systems, where they come from, and you can work those limitations and even know how to bend them to your will.

          I expect soon that windows will have native built in ext4 mounting options and at that point the whole thing is a non issue. You will pick which way you want it.

          • (Score: 2) by digitalaudiorock on Saturday July 20 2019, @03:07PM (1 child)

            by digitalaudiorock (688) on Saturday July 20 2019, @03:07PM (#869388) Journal

            I expect soon that windows will have native built in ext4 mounting options and at that point the whole thing is a non issue. You will pick which way you want it.

            Wow...any citation as to these even being on any road map? Personally, I'll believe that one when I see it. NTFS is a complete piece of shit that should have been replaced by just about anything years ago, but like so much of Windows, I'd say the OS if married to that cluster fuck forever, just like drive letters.

            • (Score: 0) by Anonymous Coward on Saturday July 20 2019, @06:01PM

              by Anonymous Coward on Saturday July 20 2019, @06:01PM (#869432)

              Why would they *not* put it in? They are shipping the linux kernel in the next version https://www.theverge.com/2019/5/6/18534687/microsoft-windows-10-linux-kernel-feature [theverge.com]

              It is not that big of leap of logic that windows users can native mount ext4 and all of the other linux style file systems.

              NTFS should have had many major fixes (and it has). But it is showing its age. It was designed in a time where HD space cost a bit more than it does now. The central MFT is one of the stumbling blocks. Where as the place anywhere inode style OS's seem to be working better. Both come at a cost. For the time NTFS came out it was the best there was for the price.

              The NT driver system is not some special magic sauce that only MS has access to. https://www.paragon-drivers.com/en/lfswin/ [paragon-drivers.com] https://sourceforge.net/projects/ext2fsd/ [sourceforge.net] They have an option here buy/build.

              They will probably want a way to do it such that they do not have to open source windows (yet). Which I suspect they will eventually do anyway. The problem they have is the amount of 3rd party software they bought and plugged into windows. For example their defragment library is a 3rd party program. They will have to either re-negotiate those bits or strip them out if they want that. For example the last line of this https://devblogs.microsoft.com/oldnewthing/20121218-00/?p=5803 [microsoft.com] https://devblogs.microsoft.com/oldnewthing/20181221-00/?p=100535 [microsoft.com] and that is just an ad on game! They are doing some bits where they can https://github.com/microsoft/calculator [github.com]

        • (Score: 2) by toddestan on Sunday July 21 2019, @12:44AM

          by toddestan (4982) on Sunday July 21 2019, @12:44AM (#869485)

          Linux is actually kind of crazy as you can use pretty much any byte you want in the name of a file, except NUL and '/'. Yes, that includes control characters like carriage return and new line.

          I wish I could find it, but I ran across a webpage once where some guy was trying to describe everything you would have to do to properly handle all possible file paths in Linux. I seem to remember he concluded it would be just about impossible, but he reasoned it would probably be okay to just not handle a lot of the edge cases because no one is legitimately going to use file names with new lines in them anyway.

      • (Score: 0) by Anonymous Coward on Saturday July 20 2019, @04:04AM (2 children)

        by Anonymous Coward on Saturday July 20 2019, @04:04AM (#869278)

        Fuck you. I want my filesystem to know the difference between differently cased characters.

        Fuck you. I want my file system to not give me bullshit telling me it cant find my file when I type "shopping list" instead of "Shopping List", which means exactly the same thing.

        • (Score: 0) by Anonymous Coward on Saturday July 20 2019, @01:31PM

          by Anonymous Coward on Saturday July 20 2019, @01:31PM (#869365)

          LOL. STFU you use a GUI or else you would just ls and find out what the real name is first.

        • (Score: 2) by legont on Sunday July 21 2019, @01:15AM

          by legont (4179) on Sunday July 21 2019, @01:15AM (#869493)

          Xmm... how about шопинг лист?

          --
          "Wealth is the relentless enemy of understanding" - John Kenneth Galbraith.
    • (Score: 0) by Anonymous Coward on Friday July 19 2019, @06:12PM (6 children)

      by Anonymous Coward on Friday July 19 2019, @06:12PM (#869086)

      Case is just formatting

      Not in ASCII (or Unicode).

      • (Score: 2) by DannyB on Friday July 19 2019, @06:30PM (3 children)

        by DannyB (5839) Subscriber Badge on Friday July 19 2019, @06:30PM (#869099) Journal

        ˙dn ǝpıs ʇɥƃıɹ ɹo uʍop ǝpısdn sı ʇı ɹǝɥʇǝɥʍ oslɐ sı ƃuıpoɔuǝ ʇxǝʇ

        text encoding is also whether it is upside down or right side up.

        --
        What doesn't kill me makes me weaker for next time.
        • (Score: 0) by Anonymous Coward on Friday July 19 2019, @06:37PM (2 children)

          by Anonymous Coward on Friday July 19 2019, @06:37PM (#869102)

          I like to fuck with people by saying "uppercase 2, capital 3, lowercase 4" and so on. They pause trying to figure out how to write what I'm saying.

          • (Score: 2) by DannyB on Friday July 19 2019, @06:40PM

            by DannyB (5839) Subscriber Badge on Friday July 19 2019, @06:40PM (#869104) Journal

            If they are somewhat competent, I can say, all letters are lowercase unless I specify otherwise.

            X, Z, uppercase W, 2, 5, Y, uppercase C

            --
            What doesn't kill me makes me weaker for next time.
          • (Score: 1, Touché) by Anonymous Coward on Saturday July 20 2019, @02:29AM

            by Anonymous Coward on Saturday July 20 2019, @02:29AM (#869257)

            "uppercase 2, capital 3, lowercase 4"

            234
            See, formatting. :P

      • (Score: 3, Funny) by epitaxial on Friday July 19 2019, @06:57PM

        by epitaxial (3165) on Friday July 19 2019, @06:57PM (#869113)

        I use EBCDIC you insensitive clod!

      • (Score: 0) by Anonymous Coward on Saturday July 20 2019, @02:48AM

        by Anonymous Coward on Saturday July 20 2019, @02:48AM (#869263)

        Not in ASCII (or Unicode).

        Mainly because the characters needed their own unique appearance.

        Some early terminals, microcomputers, and such, only supported uppercase. Things like inverse, high-intensity, double-strike, underline, blink, could be done with attributes. Munging upper case graphic characters in to lower case had been done, but never worked well.

    • (Score: 2) by ikanreed on Friday July 19 2019, @06:20PM (3 children)

      by ikanreed (3164) Subscriber Badge on Friday July 19 2019, @06:20PM (#869093) Journal

      Whatever, at least in linux you can name a file "aux.dat" and not have the filesystem not melt down.

      • (Score: 3, Interesting) by DannyB on Friday July 19 2019, @06:34PM (2 children)

        by DannyB (5839) Subscriber Badge on Friday July 19 2019, @06:34PM (#869100) Journal

        Windows has skrew ball file names.

        10. Create a new folder.
        20. Rename the folder to "God Mode.{ED7BA470-8E54-465E-825C-99712043E01C}".
        30. Folder becomes a new control panel shortcut. One that you didn't have before.
        40. Profit! (or cause havoc)

        --
        What doesn't kill me makes me weaker for next time.
        • (Score: 0) by Anonymous Coward on Friday July 19 2019, @07:02PM (1 child)

          by Anonymous Coward on Friday July 19 2019, @07:02PM (#869118)

          God Mode.{ED7BA470-8E54-465E-825C-99712043E01C}

          Well obviously... I think I prefer case sensitive file systems though. I've also been encoding integers as base62 for use in web URL's for about 15 years now, extended to the file names for a flat html caching system that would be impossible to manipulate using a DOS shell.

          • (Score: 2) by DannyB on Monday July 22 2019, @02:05PM

            by DannyB (5839) Subscriber Badge on Monday July 22 2019, @02:05PM (#869948) Journal

            Why base62 instead of base64? Or did you mean base32?

            --
            What doesn't kill me makes me weaker for next time.
    • (Score: 3, Interesting) by shortscreen on Friday July 19 2019, @06:46PM

      by shortscreen (2252) on Friday July 19 2019, @06:46PM (#869107) Journal

      I have a website on a low-cost shared hosting service. I noticed something rather dumb about it. If a link/image/whatever in the page (which I most likely typed in myself in all lower case) doesn't match the case of the filename to which it refers (which I uploaded over FTP and could be anything) then the server will respond with an HTTP 301 with a corrected URL. Of course the web browser handles this silently, requesting the file again, with nobody being the wiser. Just a little meaningless bandwidth wastage every single time the page is loaded.

    • (Score: 3, Insightful) by jrbrtsn on Friday July 19 2019, @08:07PM (2 children)

      by jrbrtsn (6338) Subscriber Badge on Friday July 19 2019, @08:07PM (#869149)

      "To the rest of the world, it is Unix/Linux's behavior that is unusual and needs explaining."

      The rest of the world thinks that under *nix you must type entire file paths perfectly; this because they don't know about the find command, or how to use the tab key to display choices, or to auto-complete what you are typing. I've watched it, and it causes me to cringe. Also, the idea of using a terminal multiplexer program causes their head to explode.
      Finding a file by name and actually opening a file are two entirely different propositions. If you need to find files without case sensitivity, behold the -iname flag for the find command. Opening a file by supplying a crude approximation of the file name is risky business at best.

      • (Score: 2) by DavePolaschek on Saturday July 20 2019, @03:06AM (1 child)

        by DavePolaschek (6129) on Saturday July 20 2019, @03:06AM (#869267) Homepage Journal

        If only I could figure out how to properly quote 💩 when passing it into iname...

        • (Score: 1) by jrbrtsn on Tuesday July 23 2019, @11:32AM

          by jrbrtsn (6338) Subscriber Badge on Tuesday July 23 2019, @11:32AM (#870286)
          Magic bullet is this: any character preceded by \ (backslash), including a space character, is not considered to be special by the shell, and therefore passes to the invoked command unmodified. Supposing I have a file named: siLly's File nAme
          I could find it by issuing the command: find ~ -iname silly\'s\ file\ name
          To make life easier for the GUI crowd, you could write a GUI program that takes the contents of a text box, escapes potentially special characters, then passes the resulting string to the find command.
  • (Score: 2) by DannyB on Friday July 19 2019, @06:16PM (4 children)

    by DannyB (5839) Subscriber Badge on Friday July 19 2019, @06:16PM (#869090) Journal

    Hear [youtube.com] Har is a YouTube video with an image of the characters of the video's actual URL.

    --
    What doesn't kill me makes me weaker for next time.
    • (Score: 2) by edIII on Friday July 19 2019, @09:38PM (3 children)

      by edIII (791) on Friday July 19 2019, @09:38PM (#869172)

      How? From what I remember about uploading YouTube videos (API or not), was that you received the URL after, not before.

      --
      Technically, lunchtime is at any moment. It's just a wave function.
      • (Score: 1, Informative) by Anonymous Coward on Saturday July 20 2019, @07:30AM

        by Anonymous Coward on Saturday July 20 2019, @07:30AM (#869322)

        In some cases if you are doing multi part videos you can get it before.

        So you mark it like that. Do the upload slow enough that you are encoding the new video with the id in it.

      • (Score: 1, Funny) by Anonymous Coward on Saturday July 20 2019, @12:48PM (1 child)

        by Anonymous Coward on Saturday July 20 2019, @12:48PM (#869357)

        That s easy: keep uploading until the content matches the URL

        • (Score: 2) by DannyB on Monday July 22 2019, @01:23PM

          by DannyB (5839) Subscriber Badge on Monday July 22 2019, @01:23PM (#869923) Journal

          Easier, and can be done in the lifetime of the universe:
          10 Receive URL from the future
          20 Incorporate URL into video
          30 Upload video
          40 Send URL to the past
          50 GOSUB 10
          60 Profit

          --
          What doesn't kill me makes me weaker for next time.
(1)