Stories
Slash Boxes
Comments

SoylentNews is people

posted by Fnord666 on Wednesday February 15 2017, @04:04AM   Printer-friendly
from the it's-magic dept.

Wired reports that data sets from NASA - mainly those related to climate - that used to be publicly available have started to disappear and that a group of "diehard coders" at UC Berkeley and other places worked over the weekend to "tag and bag" this data with the Internet archive:

[...] 200 adults had willingly sardined themselves into a fluorescent-lit room in the bowels of Doe Library to rescue federal climate data.

Like similar groups across the country—in more than 20 cities—they believe that the Trump administration might want to disappear this data down a memory hole. So these hackers, scientists, and students are collecting it to save outside government servers.

But now they're going even further. Groups like DataRefuge and the Environmental Data and Governance Initiative, which organized the Berkeley hackathon to collect data from NASA's earth sciences programs and the Department of Energy, are doing more than archiving. Diehard coders are building robust systems to monitor ongoing changes to government websites. And they're keeping track of what's already been removed—because yes, the pruning has already begun.

[...] Starting in August, access to Goddard Earth Science Data required a login. But with a bit of totally legal digging around the site (DataRefuge prohibits outright hacking), Tek found a buried link to the old FTP server. He clicked and started downloading. By the end of the day he had data for all of 2016 and some of 2015. It would take at least another 24 hours to finish.

The non-coders hit dead-ends too. Throughout the morning they racked up "404 Page not found" errors across NASA's Earth Observing System website. And they more than once ran across databases that had already been emptied out, like the Global Change Data Center's reports archive and one of NASA's atmospheric CO2 datasets.

And this is where the real problem lies. They can't be sure when this data disappeared (or if anyone backed it up first).

[Ed. - emphasis added by submitter]

[Continued...]

This is on the heels of a December 2016 article in The Washington Post, titled "Scientists are frantically copying U.S. climate data, fearing it might vanish under Trump", that details several additional initiatives along the same lines.

Alarmed that decades of crucial climate measurements could vanish under a hostile Trump administration, scientists have begun a feverish attempt to copy reams of government data onto independent servers in hopes of safeguarding it from any political interference.

The efforts include a "guerrilla archiving" event in Toronto, where experts will copy irreplaceable public data, meetings at the University of Pennsylvania focused on how to download as much federal data as possible in the coming weeks, and a collaboration of scientists and database experts who are compiling an online site to harbor scientific information.

"Something that seemed a little paranoid to me before all of a sudden seems potentially realistic, or at least something you'd want to hedge against," said Nick Santos, an environmental researcher at the University of California at Davis, who over the weekend began copying government climate data onto a nongovernment server, where it will remain available to the public. "Doing this can only be a good thing. Hopefully they leave everything in place. But if not, we're planning for that."

[...] At the University of Toronto this weekend, researchers are holding what they call a "guerrilla archiving" event to catalogue key federal environmental data ahead of Trump's inauguration. The event "is focused on preserving information and data from the Environmental Protection Agency, which has programs and data at high risk of being removed from online public access or even deleted," the organizers said. "This includes climate change, water, air, toxics programs."

So Soylentils, are there any US .gov public databases that you don't want to see disappear?


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: -1, Flamebait) by jmorris on Wednesday February 15 2017, @04:55AM

    by jmorris (4844) on Wednesday February 15 2017, @04:55AM (#467232)

    I don't have a problem with duplicating information. I wonder why they think this is a Trump thing. The Tired article is fuzzy on timing but notes the first of the login portals appeared in August. If they were actually worried about Trump they would have sucked down everything in Nov 16. Of course they wouldn't have had to do anything of the sort because insiders would hand them direct copies.

    It is the "sceptics" who traditionally run into problems getting access to original and complete datasets, source code for the models, etc. in efforts to reproduce the published results and them examine them in detail looking for errors. I'd bet this is a case of the guilty trying to vanish any contradictory evidence. Will be poetic if some Berkley "information wants to be free" diehards save it.

    Starting Score:    1  point
    Moderation   -2  
       Offtopic=1, Flamebait=2, Troll=1, Insightful=1, Interesting=1, Total=6
    Extra 'Flamebait' Modifier   0  

    Total Score:   -1  
  • (Score: 2) by MostCynical on Wednesday February 15 2017, @06:14AM

    by MostCynical (2589) on Wednesday February 15 2017, @06:14AM (#467262) Journal

    What I don't understand is the lack of duplication in the first place.
    Some of this data has major bearing on government policy, yet the data sets are often, apparently, sitting on one, probably under-funder government (or worse, government-funded-by-time-limited-grant) department or research group.

    One data store. "We can rely on our department's back up processes".
    Sad. Delusional.

    *every* government (wholly or partially) funded research should be required to release (and store) all primary collected data, as well as copy the complete data set onto a government site. Both should be publically available, for free, for ever.

    --
    "I guess once you start doubting, there's no end to it." -Batou, Ghost in the Shell: Stand Alone Complex
    • (Score: 2) by q.kontinuum on Wednesday February 15 2017, @08:14AM

      by q.kontinuum (532) on Wednesday February 15 2017, @08:14AM (#467286) Journal

      One data store. "We can rely on our department's back up processes".
      Sad. Delusional.

      I'm mot sure if I should call this cut-up sentences "twitteresque" or mention how it reminds me of a haiku :-)

      --
      Registered IRC nick on chat.soylentnews.org: qkontinuum
    • (Score: 1, Insightful) by Anonymous Coward on Wednesday February 15 2017, @09:34AM

      by Anonymous Coward on Wednesday February 15 2017, @09:34AM (#467305)

      With one MRI scanner you can generate gigabytes of raw data in seconds. This is normally processed down to a few kb of images and the raw data tossed. You do not want to force everyone to archive the raw data for all federal-funded studies despite how useful it may seem.

    • (Score: 2) by VLM on Wednesday February 15 2017, @03:12PM

      by VLM (445) on Wednesday February 15 2017, @03:12PM (#467404)

      for ever

      Like most people I went thru a phase of "Wouldn't it be fun to play around with GNU R and NASA data?" Well, maybe like most people here.

      Ignoring the whole climate thing, lets say you want Voyager space probe magnetometer data to see if a FFT of geomagnetic data shows a peak at the sun's rotation rate or how far away from Jupiter/IO can I detect them in the data or WTF. Or I'm gonna do photogrammetry of pix from the surface of the moon to do some trig and calculus to determine the diameter of pebbles in the pictures as the rovers drive around so I could determine "something" about weathering on mars.

      It turns out that over the last quarter century that stuff has been on QUITE a few systems, maintained by many different people and groups in different formats etc etc. Its more of a PITA than you'd think.

      Something kinda mind blowing is its a PITA to obtain government data, but I've had no problem at all obtaining private organization data from AAVSO. The government people are like "F you I applied to work customer service for the DMV but now I'm stuck helping you weirdos, so did I mention F you, well if not, just to be sure, F you" but the AAVSO people are like "I luvs you if it were biologically possible we will have many babys together" Well maybe slightly exaggerated but there is truth that the AAVSO private citizen people were very professional and helpful and their policies were sensible and the government people I worked with... not as much. Now that I think about it, its kinda the same with genealogical research, I swear the public library librarians toke up at break time, but the folks at ancestry.com seem to not be high as a kite, maybe because they're all Mormons I donno. Anyway the point of this rant is if you want to preserve data, climate or otherwise, don't leave it in the hands of .gov or space hippies, give it all to a sane and stable private service company like perhaps archive.org. Don't even think about torrenting it or hosting it yourselfs or keeping tapes in a safe deposit box or whatever dumb ideas, give it all to a professional service company.

    • (Score: 1) by Magneto on Wednesday February 15 2017, @04:27PM

      by Magneto (6410) on Wednesday February 15 2017, @04:27PM (#467438)

      A lot of it is duplicated but the duplicating is done on an ad hoc basis. I have colleagues in atmospheric chemistry in the UK that have copies of a lot of the data being removed but those copies were made purely for their research. While I'm sure they would happily share the data if requested no one outside their research group and collaborators will know what they have.

      I think the idea here is not only to duplicate the data but to deliberately make it accessible to anyone who looks.

      • (Score: 2) by bob_super on Wednesday February 15 2017, @07:39PM

        by bob_super (1357) on Wednesday February 15 2017, @07:39PM (#467557)

        Well, one could start a central place offshore to backup all of that data.
        But the problem is checking that raw data "saved" by a backup from a random researcher has not been tampered with.

        Chain of custody is critical, when powerful interests are attacking the credibility of any entity analyzing data with doesn't greenlight their eternal growth.

    • (Score: 2) by hendrikboom on Friday February 17 2017, @03:38AM

      by hendrikboom (1125) Subscriber Badge on Friday February 17 2017, @03:38AM (#468071) Homepage Journal

      Even if it's adequately backed up, a serious attempt to destroy the data would very likely be competent enouggh to destroy the backups too. So it's important to have a copy under independent custody.