SoylentNews Comments | Attrition of Web Sites Over the Recent Years

Attrition of Web Sites Over the Recent Years

posted by janrinok on Tuesday January 30 2024, @06:50PM

from the the-net-never-forgets-ha dept.

Web developer Trevor Morris has a short post on the attrition of web sites over the years.

I have run the Laravel Artisan command I built to get statistics on my outgoing links section. Exactly one year later it doesn't make good reading.
[...] The percentage of total broken links has increased from 32.8% last year to 35.7% this year. Links from over a decade ago have a fifty per cent chance of no longer working. Thankfully, only three out of over 550 have gone missing in the last few years of links, but only time will tell how long they'll stick around.

As pointed out in the early and mid 1990s, the inherent centralization of sites, later web sites, is the basis for this weakness. That is to say one single copy exists which resides under the control of the publisher / maintainer. When that one copy goes, it is gone.

Original Submission

This discussion was created by janrinok (52) for logged-in users only, but now has been archived. No new comments can be posted.

Attrition of Web Sites Over the Recent Years | Log In/Create an Account | Top | 26 comments | Search Discussion

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

Change (in weblinks) is the only Constant Change (in weblinks) is the only Constant (Score: 2, Interesting) by Anonymous Coward on Tuesday January 30 2024, @08:24PM (9 children)

by Anonymous Coward on Tuesday January 30 2024, @08:24PM (#1342434)

For my small company website, we include a "News" page that links to items related to our niche business, usually a few News items/year. Once or twice a year my webmaster does a link check and there are usually a few broken links to be either updated (that site was reorganized) or deleted/replaced with something similar.

It's been this way for many years, and perhaps about five years ago I started adding these original links to archive.org Now when we add an item to the News page, it includes both the original link and also the archive.org "permanent link". This has cut down on the broken links considerably...and when one breaks all we have to do is delete it and let the archive.org link be primary.

Starting Score:	0		points
Moderation		+2
Interesting=2, Total=2
Extra 'Interesting' Modifier		0

Total Score:		2

let the archive.org link be primarylet the archive.org link be primary (Score: 0) by Anonymous Coward on Tuesday January 30 2024, @08:39PM (1 child)

by Anonymous Coward on Tuesday January 30 2024, @08:39PM (#1342436)

and when IA is beaten to a pulp by the publishing industy

Parent
- Re:let the archive.org link be primary(Score: 0) by Anonymous Coward on Tuesday January 30 2024, @08:54PM
  
  by Anonymous Coward on Tuesday January 30 2024, @08:54PM (#1342441)
  
  > ... when IA is beaten to a pulp by the publishing industy
  Hasn't happened yet and the IA is well funded in terms of legal support.
  My take -- If IA is eventually forced to remove (C) material from their servers, the Wayback Machine (which is what I use for a "permanent link") will survive in some form. So I'm not worried, but you can be as doom-ey and gloomy as you like(grin).
  
  Parent
Re:Change (in weblinks) is the only Constant Re:Change (in weblinks) is the only Constant (Score: 2) by JoeMerchant on Tuesday January 30 2024, @08:49PM (6 children)

by JoeMerchant (3937) on Tuesday January 30 2024, @08:49PM (#1342439)

The wayback machine doesn't work nearly as well as it used to.

--
🌻🌻🌻 [google.com]

Parent
- Re:Change (in weblinks) is the only Constant Re:Change (in weblinks) is the only Constant (Score: 1, Informative) by Anonymous Coward on Tuesday January 30 2024, @08:59PM (4 children)
  
  by Anonymous Coward on Tuesday January 30 2024, @08:59PM (#1342442)
  
  > The wayback machine doesn't work nearly as well as it used to.
  Istr seeing something about this, might be due to all the variable content that is served out of a database (page created on the fly) and/or all the code that is behind "modern" (aka bloated) web pages?
  When I send the Wayback Machine a link to a fairly simple page it seems to work same as ever. Often those are the pages with the most archival value anyway...but ymmv.
  
  Parent
  - Re:Change (in weblinks) is the only Constant Re:Change (in weblinks) is the only Constant (Score: 2) by JoeMerchant on Tuesday January 30 2024, @10:30PM (3 children)
    
    by JoeMerchant (3937) on Tuesday January 30 2024, @10:30PM (#1342448)
    
    Wayback works great on the website I made in 1997 (and continue to maintain in the same style through today).
    I threw up a Wordpress blog somewhere around 2005, I don't think Wayback ever picked it up.
    
    --
    🌻🌻🌻 [google.com]
    
    Parent
    - Re:Change (in weblinks) is the only Constant Re:Change (in weblinks) is the only Constant (Score: 4, Funny) by Anonymous Coward on Wednesday January 31 2024, @03:52AM (2 children)
      
      by Anonymous Coward on Wednesday January 31 2024, @03:52AM (#1342472)
      
      > I threw up a Wordpress blog ...
      I can't imagine how you ever swallowed it in the first place, but it's good it came back up rather than causing an intestinal perforation.
      
      Parent
      - Re:Change (in weblinks) is the only Constant Re:Change (in weblinks) is the only Constant (Score: 3, Funny) by The Vocal Minority on Wednesday January 31 2024, @05:22PM (1 child)
        
        by The Vocal Minority (2765) on Wednesday January 31 2024, @05:22PM (#1342527) Journal
        
        Contrary to popular belief, a swallowed Wordpress blog, although providing no nutritional value, is unlikely to cause intestinal perforation. A particularly large blog may become lodged, however, and require extensive archiving to remove.
        
        Parent
        
        Re:Change (in weblinks) is the only Constant (Score: 1, Informative) by Anonymous Coward on Thursday February 01 2024, @05:44AM
        
        by Anonymous Coward on Thursday February 01 2024, @05:44AM (#1342591)
        
        Ah, so it becomes a bezoar [webpathology.com]. That sounds about right.
        
        Parent
- Re:Change (in weblinks) is the only Constant (Score: 1, Informative) by Anonymous Coward on Wednesday January 31 2024, @07:29AM
  
  by Anonymous Coward on Wednesday January 31 2024, @07:29AM (#1342480)
  
  Because it respects robots.txt, etc. So when a domain squatter takes over the previous archived site can vanish if the new robots.txt or similar tells IA to not archive:
  https://help.archive.org/help/using-the-wayback-machine/ [archive.org]
  Some sites are not available because of robots.txt or other exclusions. What does that mean?
  Such sites may have been excluded from the Wayback Machine due to a robots.txt file on the site or at a site owner’s direct request.
  
  Parent

Moderator Help

SoylentNews

SoylentNews is people

Navigation

Sections

SoylentNews

Attrition of Web Sites Over the Recent Years

Change (in weblinks) is the only Constant Change (in weblinks) is the only Constant (Score: 2, Interesting) by Anonymous Coward on Tuesday January 30 2024, @08:24PM (9 children)

let the archive.org link be primarylet the archive.org link be primary (Score: 0) by Anonymous Coward on Tuesday January 30 2024, @08:39PM (1 child)

Re:let the archive.org link be primary(Score: 0) by Anonymous Coward on Tuesday January 30 2024, @08:54PM

Re:Change (in weblinks) is the only Constant Re:Change (in weblinks) is the only Constant (Score: 2) by JoeMerchant on Tuesday January 30 2024, @08:49PM (6 children)

Re:Change (in weblinks) is the only Constant Re:Change (in weblinks) is the only Constant (Score: 1, Informative) by Anonymous Coward on Tuesday January 30 2024, @08:59PM (4 children)

Re:Change (in weblinks) is the only Constant Re:Change (in weblinks) is the only Constant (Score: 2) by JoeMerchant on Tuesday January 30 2024, @10:30PM (3 children)

Re:Change (in weblinks) is the only Constant Re:Change (in weblinks) is the only Constant (Score: 4, Funny) by Anonymous Coward on Wednesday January 31 2024, @03:52AM (2 children)

Re:Change (in weblinks) is the only Constant Re:Change (in weblinks) is the only Constant (Score: 3, Funny) by The Vocal Minority on Wednesday January 31 2024, @05:22PM (1 child)

Re:Change (in weblinks) is the only Constant (Score: 1, Informative) by Anonymous Coward on Thursday February 01 2024, @05:44AM

Re:Change (in weblinks) is the only Constant (Score: 1, Informative) by Anonymous Coward on Wednesday January 31 2024, @07:29AM