Stories
Slash Boxes
Comments

SoylentNews is people

posted by janrinok on Thursday October 22 2015, @07:42PM   Printer-friendly
from the does-not-bode-well dept.

Most of Edward Snowden's previous job as an analyst just got automated along with a lot of the work done by related jobs.

From the paper "Deep Feature Synthesis: Towards Automating Data Science Endeavors" (PDF 4.3 MB):

With these components in place, we present the Data Science Machine — an automated system for generating predictive models from raw data. It starts with a relational database and automatically generates features to be used for predictive modeling. Most parameters of the system are optimized automatically, in pursuit of good general purpose performance.

Developed by James Max Kanter and Kalyan Veeramachaneni at the "Computer Science and Artificial Intelligence Laboratory" (CSAIL) of the Massachusetts Institute of Technology (MIT) the "Data Science Machine (DSM)" enables much faster and more efficient automated analysis of data related to human behavior, decisions, and activities. Previously, analysis of such data relied greatly upon the human intuition of the data scientists trying to analyze it. The DSM and its Deep Feature Synthesis (DFS) are generalized approaches that do not require modification to run with new data sets.

The DSM did better than the majority of trained human data scientists (it beat 615 out of 904 teams) in two large competitions and in a third competition it beat 85.6% of the teams and achieved 95.7% of the top score (i.e. the best team). The paper goes into much more detail about the tests and everything else.

This is very big news considering the amount of human behavioural data which is being collected and stored by companies and governments or agencies. Most of that data is only analyzed —if at all— for a few key features like <sarcasm>"sales & terrorism"</sarcasm> due to the time involved and the lack of data scientists or analysts, data engineers, and machine learning researchers filling the roles of a data scientist (see Fig. 1 in the paper for an example of the typical workflow). The IEEE Spectrum article "Artificial Intelligence Outperforms Human Data Scientists" touches upon some of this and how the job of data scientists is seen as guaranteed employment for good pay.

With the DSM things will change, since it automates huge amounts of hard work (and out-competes nearly all human intuition) there's little or no reason not to automatically analyze much more or all of the data and in addition get the results much faster: the proverbial haystacks are going to disappear and be replaced with needles sorted according to any chosen metric.

The DSM uses MySQL with InnoDB and logic written in Python. The DFS uses Gaussian copula for tuning its machine learning process.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 0) by Anonymous Coward on Thursday October 22 2015, @08:47PM

    by Anonymous Coward on Thursday October 22 2015, @08:47PM (#253367)

    You mean all my k in student debt to get a research phd is wasted by some shit research phd?