Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 13 submissions in the queue.
posted by janrinok on Tuesday January 30, @06:50PM   Printer-friendly
from the the-net-never-forgets-ha dept.

Web developer Trevor Morris has a short post on the attrition of web sites over the years.

I have run the Laravel Artisan command I built to get statistics on my outgoing links section. Exactly one year later it doesn't make good reading.

[...] The percentage of total broken links has increased from 32.8% last year to 35.7% this year. Links from over a decade ago have a fifty per cent chance of no longer working. Thankfully, only three out of over 550 have gone missing in the last few years of links, but only time will tell how long they'll stick around.

As pointed out in the early and mid 1990s, the inherent centralization of sites, later web sites, is the basis for this weakness. That is to say one single copy exists which resides under the control of the publisher / maintainer. When that one copy goes, it is gone.


Original Submission

This discussion was created by janrinok (52) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 5, Insightful) by JoeMerchant on Tuesday January 30, @07:05PM (3 children)

    by JoeMerchant (3937) on Tuesday January 30, @07:05PM (#1342426)

    As I work, I research things. As I find things valuable to what I'm doing, I'll incorporate them in my code / writings / whatever. Especially with code, I'll often drop a comment with a link - or in a document a footnote. If the thing I'm linking to can be summarized nicely in a paragraph or less, I'll also put that beside the link because... it's just a link, not terribly helpful when the destination decides to reorganize their content and either delete what I've linked to or more likely just spitefully move it somewhere to break all the links to it.

    There are all kinds of writing / coding / commenting standards out there... I tend to think of those more as guidelines than actual rules, particularly when they may jeopardize future me's understanding of what I'm writing, not to mention the poor sod who may try to pick up the pieces when I'm unavailable.

    --
    🌻🌻 [google.com]
    • (Score: 1) by khallow on Wednesday January 31, @12:41AM (2 children)

      by khallow (3766) Subscriber Badge on Wednesday January 31, @12:41AM (#1342464) Journal
      Depends. If I'm in an activity where honoring copyright is a thing, it is - writing code or songs, for example. For example, I shouldn't get a pass for verbatim copying of your code or artistic works without legally following whatever contract that code or art is released to me under. If I'm in a career where citation is really important - like academic research, then I need to cite. If I'm building a shed for personal use using some blueprints I got out of a book or website, then no it isn't. If I quote from a modern movie or book for internet points, it isn't.

      There's a hubbub in academia right now about shifty right-wingers searching for plagiarism (and similar misdeeds) in order to shame/embarrass/remove academic targets. It already caused a Harvard president to resign. It'll be interesting to see if this effort catches a bunch of targets (I gather the wife of the rich guy funding one of these efforts got caught as well) . My take is that there's some fields where this sort of skullduggery is probably incredibly widespread. OTOH, while I think a house-cleaning of plagiarizing academics would be useful, I doubt it'll have the effect that said right-wingers want.
      • (Score: 4, Interesting) by bzipitidoo on Wednesday January 31, @03:31AM (1 child)

        by bzipitidoo (4388) on Wednesday January 31, @03:31AM (#1342471) Journal

        You might find interesting a facet of chess problem composition. Evidently, the space is small enough that inadvertent recreation of chess problems happens quite often. So often, there's a term for it: anticipation. First decent chess problem I made, I learned had been "anticipated". By 150 years.

  • (Score: 2, Interesting) by Anonymous Coward on Tuesday January 30, @08:24PM (9 children)

    by Anonymous Coward on Tuesday January 30, @08:24PM (#1342434)

    For my small company website, we include a "News" page that links to items related to our niche business, usually a few News items/year. Once or twice a year my webmaster does a link check and there are usually a few broken links to be either updated (that site was reorganized) or deleted/replaced with something similar.

    It's been this way for many years, and perhaps about five years ago I started adding these original links to archive.org Now when we add an item to the News page, it includes both the original link and also the archive.org "permanent link". This has cut down on the broken links considerably...and when one breaks all we have to do is delete it and let the archive.org link be primary.

    • (Score: 0) by Anonymous Coward on Tuesday January 30, @08:39PM (1 child)

      by Anonymous Coward on Tuesday January 30, @08:39PM (#1342436)

      and when IA is beaten to a pulp by the publishing industy

      • (Score: 0) by Anonymous Coward on Tuesday January 30, @08:54PM

        by Anonymous Coward on Tuesday January 30, @08:54PM (#1342441)

        > ... when IA is beaten to a pulp by the publishing industy

        Hasn't happened yet and the IA is well funded in terms of legal support.

        My take -- If IA is eventually forced to remove (C) material from their servers, the Wayback Machine (which is what I use for a "permanent link") will survive in some form. So I'm not worried, but you can be as doom-ey and gloomy as you like(grin).

    • (Score: 2) by JoeMerchant on Tuesday January 30, @08:49PM (6 children)

      by JoeMerchant (3937) on Tuesday January 30, @08:49PM (#1342439)

      The wayback machine doesn't work nearly as well as it used to.

      --
      🌻🌻 [google.com]
      • (Score: 1, Informative) by Anonymous Coward on Tuesday January 30, @08:59PM (4 children)

        by Anonymous Coward on Tuesday January 30, @08:59PM (#1342442)

        > The wayback machine doesn't work nearly as well as it used to.

        Istr seeing something about this, might be due to all the variable content that is served out of a database (page created on the fly) and/or all the code that is behind "modern" (aka bloated) web pages?

        When I send the Wayback Machine a link to a fairly simple page it seems to work same as ever. Often those are the pages with the most archival value anyway...but ymmv.
         

        • (Score: 2) by JoeMerchant on Tuesday January 30, @10:30PM (3 children)

          by JoeMerchant (3937) on Tuesday January 30, @10:30PM (#1342448)

          Wayback works great on the website I made in 1997 (and continue to maintain in the same style through today).

          I threw up a Wordpress blog somewhere around 2005, I don't think Wayback ever picked it up.

          --
          🌻🌻 [google.com]
          • (Score: 4, Funny) by Anonymous Coward on Wednesday January 31, @03:52AM (2 children)

            by Anonymous Coward on Wednesday January 31, @03:52AM (#1342472)

            > I threw up a Wordpress blog ...

            I can't imagine how you ever swallowed it in the first place, but it's good it came back up rather than causing an intestinal perforation.

            • (Score: 3, Funny) by The Vocal Minority on Wednesday January 31, @05:22PM (1 child)

              by The Vocal Minority (2765) on Wednesday January 31, @05:22PM (#1342527) Journal

              Contrary to popular belief, a swallowed Wordpress blog, although providing no nutritional value, is unlikely to cause intestinal perforation. A particularly large blog may become lodged, however, and require extensive archiving to remove.

              • (Score: 1, Informative) by Anonymous Coward on Thursday February 01, @05:44AM

                by Anonymous Coward on Thursday February 01, @05:44AM (#1342591)

                Ah, so it becomes a bezoar [webpathology.com]. That sounds about right.

      • (Score: 1, Informative) by Anonymous Coward on Wednesday January 31, @07:29AM

        by Anonymous Coward on Wednesday January 31, @07:29AM (#1342480)

        Because it respects robots.txt, etc. So when a domain squatter takes over the previous archived site can vanish if the new robots.txt or similar tells IA to not archive:

        https://help.archive.org/help/using-the-wayback-machine/ [archive.org]

        Some sites are not available because of robots.txt or other exclusions. What does that mean?

        Such sites may have been excluded from the Wayback Machine due to a robots.txt file on the site or at a site owner’s direct request.

  • (Score: 5, Insightful) by MostCynical on Tuesday January 30, @08:28PM (3 children)

    by MostCynical (2589) on Tuesday January 30, @08:28PM (#1342435) Journal

    the internet is not for keeping things.
    the internet is for sharing things.

    anything older than the person you are talking to is irrellevant.

    (anyone 'keeping' things on the internet is very bravefoolish)

    corporations own most of the internet, so this sharing is monetized.

    storage is far cheaper than in the early days, but the amount being shared has increased dramatically.

    link rot has been a problem since forever. Sites like photobucket died early, but many, many more are now inaccessible, or worse, behind paywalls.

    Store your own copies of anything important.

    Keep off-line backups.

    Assume the worst - reality will top whatever you imagined, but it will be a wild ride.

    --
    "I guess once you start doubting, there's no end to it." -Batou, Ghost in the Shell: Stand Alone Complex
    • (Score: 2) by JoeMerchant on Tuesday January 30, @08:53PM (2 children)

      by JoeMerchant (3937) on Tuesday January 30, @08:53PM (#1342440)

      >anyone 'keeping' things on the internet is very bravefoolish

      I keep my auto maintenance records, incomplete as they are, on a page in my website. It has proven to be a much more durable storage location than anything else I have used for similar things over the past 25 years, and the convenience of "available anytime on the smart phone" feature of the past 10-15 years cannot be overstated.

      --
      🌻🌻 [google.com]
      • (Score: 2) by JoeMerchant on Tuesday January 30, @10:39PM (1 child)

        by JoeMerchant (3937) on Tuesday January 30, @10:39PM (#1342450)

        Counterpoint: to update my auto maintenance records requires me to think of doing so while I'm at my desktop, fire up FileZilla, FTP in the existing page, text edit in the new list item (trivial), save, upload, and generally I'll load the page to see my handiwork. Major problem with that is taking the 5 minutes, at my desk with my computer all fired up, to write the line into the page. I probably only log 1/3 to 1/2 of the auto maintenance items I should, which is infinitely better than 0.

        On the sailboat, I frequently don't have a computer handy, so there's a traditional little waterproof log book sitting on the nav table, begging me to take the 30 to 90 seconds required to log whatever it is that wants logging. It's not accessible anywhere but in the boat, but that's where 90% of its value lies anyway. Being 5x quicker to make a log entry, I probably log about 80% of what I should (these days, anytime I forget to log something I wish I had later I generally do log it the next time it happens.) Main thing that doesn't get logged with that system are "party cruises" where it's a chaos circus getting the guests/kids/etc. off the boat and everything else that NEEDS securing secured before we go blundering back to the car(s) for whatever comes next. I suppose I could be a more disciplined captain and just ignore the peons while I write in the logs, but when one of the kids goes sprinting down the dock heading for god knows where it's all hands + captain to deal with that.

        --
        🌻🌻 [google.com]
        • (Score: 0) by Anonymous Coward on Wednesday January 31, @12:03AM

          by Anonymous Coward on Wednesday January 31, @12:03AM (#1342460)

          > to update my auto maintenance records...

          ...requires opening the file drawer, stuffing the itemized receipt from the shop in the front of the file folder for that car (all paper) and closing the file drawer. Takes a few seconds, since my memory for which drawer that folder lives in is (still) pretty good.

          When I'm ready to sell that car (or really bored), I total up the receipts, write it on a scrap of paper and staple that batch together so I don't have to run that total again.

  • (Score: 4, Interesting) by Mojibake Tengu on Tuesday January 30, @09:18PM (2 children)

    by Mojibake Tengu (8598) on Tuesday January 30, @09:18PM (#1342444) Journal

    Copyright on particular data is inadequate legal paradigm for any integrated dataspace, Internet included.
    This will predictably break things because of technical irresponsibility of data owners.

    For this reason, a concept of copyright (or any kind of "intellectual property") has no future in advanced digital civilization.
    For transitional period, what about "backupright"?

    --
    Rust programming language offends both my Intelligence and my Spirit.
    • (Score: 1) by pTamok on Wednesday January 31, @11:30AM (1 child)

      by pTamok (3042) on Wednesday January 31, @11:30AM (#1342489)

      There's no copyright on data. There is copyright on creative works. The hurdle for creativity is low, but not zero - see 'sweat of the brow' works https://en.wikipedia.org/wiki/Sweat_of_the_brow [wikipedia.org]

      • (Score: 2) by Mojibake Tengu on Friday February 02, @05:07AM

        by Mojibake Tengu (8598) on Friday February 02, @05:07AM (#1342744) Journal

        In digital domain of existence, any creative work must be represented as data. Otherwise it is not digitally existent. Thus, a copyright on work is attempted to be transitively extended to its representation data.

        'Sweat of brow' actually means labour, euphemism to a dirty word which is considered a vulgarism in noble castes themselves who invented concept of copyright.
        Imagine any road pavement worker had a copyright for his stonework or asphalt craft or plumber for his pipe assembly and you may see the absurdity of the concept.

        --
        Rust programming language offends both my Intelligence and my Spirit.
  • (Score: 4, Interesting) by darkfeline on Tuesday January 30, @09:46PM

    by darkfeline (1030) on Tuesday January 30, @09:46PM (#1342446) Homepage

    Copyright plays a role methinks.

    > the inherent centralization of sites, later web sites, is the basis for this weakness

    That should not be a problem, because with federation, anyone can copy and re-host what they deem is worth preserving. With digital technology, it has never been easier to copy and preserve information.

    Except as a society we decided that that was a bad idea and imposed artificial legal restrictions on copying, in the hopes that the pros outweigh the cons, and we may do a similar thing again with AI.

    --
    Join the SDF Public Access UNIX System today!
  • (Score: 4, Interesting) by Snotnose on Wednesday January 31, @12:41AM

    by Snotnose (1623) on Wednesday January 31, @12:41AM (#1342463)

    I don't do web stuff so I don't really know the issues, just the results. I've often wondered why websites don't periodically run a spider to check all the links on their page to ensure they're still valid.

    I've also wondered why, when a large website decides to move page foo to bar, they don't make foo a link to bar. It's really annoying when not only your docs, but the first 3 pages of google results send me to foo/bigAssCompany.com when that page got moved to bar/bigAssCompany.com years ago and now foo gives a 404 error.

    An example would be my local library. I used to login and end up on the home page. A couple years ago they revamped their web site (a good thing), now when I log in I end up on a 404 page. The menus and stuff are still there and it all works, but...... Yeah, I could fix the bookmark. But what html address should I set it to? And how many "average" users know that's even an option?

    --
    God gives his toughest battles to his strongest soldiers. I am not one of them, please make it stop.
  • (Score: 0) by Anonymous Coward on Wednesday January 31, @10:58AM (2 children)

    by Anonymous Coward on Wednesday January 31, @10:58AM (#1342487)

    That is to say one single copy exists which resides under the control of the publisher / maintainer.

    Should I invite others to maintain my personal site or what?

    • (Score: 4, TouchĂ©) by canopic jug on Wednesday January 31, @12:08PM (1 child)

      by canopic jug (3949) Subscriber Badge on Wednesday January 31, @12:08PM (#1342493) Journal

      You miss the point, perhaps on purpose. With paper-based publishing, the recipients received a copy over which they could decide on many factors including how long to keep it around for. You don't get that choice with browsers. And there is nothing approaching an archive (very long term caching) or library (medium term caching) except for the Internet Archive, and even that is centralized. Furthermore, it is under constant legal attack to drag it down and remove old material forever from the public.

      --
      Money is not free speech. Elections should not be auctions.
      • (Score: 1, Informative) by Anonymous Coward on Wednesday January 31, @06:22PM

        by Anonymous Coward on Wednesday January 31, @06:22PM (#1342535)

        In case you are still here...

        > or library (medium term caching) except for the Internet Archive

        There are many other archive sites now, archive.is is one I've used. Seems to have good coverage for news articles about current events (which might be behind a paywall).

(1)