Stories
Slash Boxes
Comments

SoylentNews is people

posted by Fnord666 on Wednesday February 15 2017, @04:04AM   Printer-friendly
from the it's-magic dept.

Wired reports that data sets from NASA - mainly those related to climate - that used to be publicly available have started to disappear and that a group of "diehard coders" at UC Berkeley and other places worked over the weekend to "tag and bag" this data with the Internet archive:

[...] 200 adults had willingly sardined themselves into a fluorescent-lit room in the bowels of Doe Library to rescue federal climate data.

Like similar groups across the country—in more than 20 cities—they believe that the Trump administration might want to disappear this data down a memory hole. So these hackers, scientists, and students are collecting it to save outside government servers.

But now they're going even further. Groups like DataRefuge and the Environmental Data and Governance Initiative, which organized the Berkeley hackathon to collect data from NASA's earth sciences programs and the Department of Energy, are doing more than archiving. Diehard coders are building robust systems to monitor ongoing changes to government websites. And they're keeping track of what's already been removed—because yes, the pruning has already begun.

[...] Starting in August, access to Goddard Earth Science Data required a login. But with a bit of totally legal digging around the site (DataRefuge prohibits outright hacking), Tek found a buried link to the old FTP server. He clicked and started downloading. By the end of the day he had data for all of 2016 and some of 2015. It would take at least another 24 hours to finish.

The non-coders hit dead-ends too. Throughout the morning they racked up "404 Page not found" errors across NASA's Earth Observing System website. And they more than once ran across databases that had already been emptied out, like the Global Change Data Center's reports archive and one of NASA's atmospheric CO2 datasets.

And this is where the real problem lies. They can't be sure when this data disappeared (or if anyone backed it up first).

[Ed. - emphasis added by submitter]

[Continued...]

This is on the heels of a December 2016 article in The Washington Post, titled "Scientists are frantically copying U.S. climate data, fearing it might vanish under Trump", that details several additional initiatives along the same lines.

Alarmed that decades of crucial climate measurements could vanish under a hostile Trump administration, scientists have begun a feverish attempt to copy reams of government data onto independent servers in hopes of safeguarding it from any political interference.

The efforts include a "guerrilla archiving" event in Toronto, where experts will copy irreplaceable public data, meetings at the University of Pennsylvania focused on how to download as much federal data as possible in the coming weeks, and a collaboration of scientists and database experts who are compiling an online site to harbor scientific information.

"Something that seemed a little paranoid to me before all of a sudden seems potentially realistic, or at least something you'd want to hedge against," said Nick Santos, an environmental researcher at the University of California at Davis, who over the weekend began copying government climate data onto a nongovernment server, where it will remain available to the public. "Doing this can only be a good thing. Hopefully they leave everything in place. But if not, we're planning for that."

[...] At the University of Toronto this weekend, researchers are holding what they call a "guerrilla archiving" event to catalogue key federal environmental data ahead of Trump's inauguration. The event "is focused on preserving information and data from the Environmental Protection Agency, which has programs and data at high risk of being removed from online public access or even deleted," the organizers said. "This includes climate change, water, air, toxics programs."

So Soylentils, are there any US .gov public databases that you don't want to see disappear?


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 0) by Anonymous Coward on Wednesday February 15 2017, @05:36AM

    by Anonymous Coward on Wednesday February 15 2017, @05:36AM (#467243)

    Keep the raw sensor data, along with sensor specifications and calibration data. Junk anything derived from that, since keeping it around would imply endorsement of the picking and choosing of data points and worse.

  • (Score: 0) by Anonymous Coward on Wednesday February 15 2017, @05:47AM

    by Anonymous Coward on Wednesday February 15 2017, @05:47AM (#467246)

    I'd say keep both in an ideal world, but make the raw data + code a priority

    • (Score: 3, Insightful) by Demena on Wednesday February 15 2017, @12:03PM

      by Demena (5637) on Wednesday February 15 2017, @12:03PM (#467341)

      Not possible in many cases. Raw datasets can be huge. It is simply prohibitively expensive to store. This will always be the case.

      If I have a thermocouple I can sample it at many different rates. But I will pick _a_ rate. At that rate I will send signals back. Your first level of filtration is the device itself.
      Depending on what I need the data for, I may only be interested in the maxima and the minima for the day and only sed on or store those figures. Filtration by use.
      I may have no interest in the data at all unless it exceeds 451 Fahrenheit. Until that occurs, nothing is passed on.

      'Raw' data depends on the eye of the beholder. True raw data inevitably voluminous.