Most of Edward Snowden's previous job as an analyst just got automated along with a lot of the work done by related jobs.
From the paper "Deep Feature Synthesis: Towards Automating Data Science Endeavors" (PDF 4.3 MB):
With these components in place, we present the Data Science Machine — an automated system for generating predictive models from raw data. It starts with a relational database and automatically generates features to be used for predictive modeling. Most parameters of the system are optimized automatically, in pursuit of good general purpose performance.
Developed by James Max Kanter and Kalyan Veeramachaneni at the "Computer Science and Artificial Intelligence Laboratory" (CSAIL) of the Massachusetts Institute of Technology (MIT) the "Data Science Machine (DSM)" enables much faster and more efficient automated analysis of data related to human behavior, decisions, and activities. Previously, analysis of such data relied greatly upon the human intuition of the data scientists trying to analyze it. The DSM and its Deep Feature Synthesis (DFS) are generalized approaches that do not require modification to run with new data sets.
The DSM did better than the majority of trained human data scientists (it beat 615 out of 904 teams) in two large competitions and in a third competition it beat 85.6% of the teams and achieved 95.7% of the top score (i.e. the best team). The paper goes into much more detail about the tests and everything else.
This is very big news considering the amount of human behavioural data which is being collected and stored by companies and governments or agencies. Most of that data is only analyzed —if at all— for a few key features like <sarcasm>"sales & terrorism"</sarcasm> due to the time involved and the lack of data scientists or analysts, data engineers, and machine learning researchers filling the roles of a data scientist (see Fig. 1 in the paper for an example of the typical workflow). The IEEE Spectrum article "Artificial Intelligence Outperforms Human Data Scientists" touches upon some of this and how the job of data scientists is seen as guaranteed employment for good pay.
With the DSM things will change, since it automates huge amounts of hard work (and out-competes nearly all human intuition) there's little or no reason not to automatically analyze much more or all of the data and in addition get the results much faster: the proverbial haystacks are going to disappear and be replaced with needles sorted according to any chosen metric.
The DSM uses MySQL with InnoDB and logic written in Python. The DFS uses Gaussian copula for tuning its machine learning process.
(Score: 2) by arslan on Friday October 23 2015, @01:55AM
Adjust some knobs, push a button, wait a little and out pops a magic 8 ball that says this person is the next target. Unequivocally emotion free...Oh yea! I bet the spooks are rubbing their grubby paws and cracking a sinister grin right about now.
(Score: 2) by VortexCortex on Friday October 23 2015, @04:20AM
These are fun times for digital hypnotists. Once the digital version of autonomic organelles are introduced it becomes far easier to hypnotize a government. You see, all you need then is the algorithm by which the system functions and you can control the system by providing apt inputs.
Something similar happens with the counter intelligence program. Everyone knows the simple equations they use to determine threats based on intent and capabilities, so this makes COINTELPRO practically useless. In fact, one can attract their attention purposefully in order for other observers to discover limits of certain capabilities. At least in a system with humans, however bureaucratic they may be, there's a chance someone will say: "Wait, what the fuck, this doesn't actually make sense. It fits perfectly, too perfect, but our methods aren't working on them. Are we being played?!" With automated system you'll be second guessing the system itself.
With a complex digital system it works more like a physical brain. Like a physical brain, a hypnotist can supplies patterns which lull the target into a suggestible state, and then influence action. There are many conscious safeguards at the surface of the mind so a cascade of underlying functions of the subconscious is activated without the patient being aware via a pattern of stimulation and relaxation of different brain structures. I'm not talking about inducing catatonic trace state but the hotly researched topic in advertising for decades. You can see the suggestible state on people's faces when they watch TV for more than 10 minutes. You can see the emotional responses invoked by commercials or propaganda in the evening news.
You can see the digital corollary when you do research on the bombing of Dresden, Germany for a book you're writing and the next week Youtube suggests videos implying it thinks you're a Neo-Nazi. Google is relatively dumb that way, and if they haven't perfected the recommendation system then this MIT "Big Data" rubbish will fare no better. Once the proper mesmerizing pattern is discovered and applied the subject is made more suggestible and you can nudge them towards making conclusions in line with those the hypnotist desires. This is a handy trick if you want to hypnotize Youtube into forgetting about your recent viewing pattern, or ensure spies are never flagged as such by automated systems.
MIT can't release their code if they want defense contracts because that would be like giving a neurologist's a map of your brain to every evil hypnotist that wants to fuck with you. Oh, you've got a larger than average amygdala? Then we need to pulse the aggression circuit softly until the false alarms disarm you first and suppress your hyper vigilance mode, that and/or rank the use of physical threats as more effective distractions.
As we install more automated systems into our government, intelligence apparatus, and war machines, we are moving from 4th generation warfare to 6th generation warfare (5th gen being all that extremely deceptive feint within a feint stuff that Russia likes so much -- better not to tangle up our bureaucracy with that or you'll have the DOD and CIA accidentally fighting each other by proxy with ISIS and Syrian Rebels, heh). You should read up on 4th generation warfare to have any hope of understanding the world politics today. In 6th generation warfare we'll need data wizards who will weave spells via passive disinformation to enchant without detection, we'll need spies with a knack for using their "Jedi Mind Tricks" on computers to walk undetected under surveillance, and we'll need cyberneticians working as A.I. psychologists to ensure the mental health of an intelligence apparatus, and make sure it has not been brainwashed by enemy hypnotists. How would an outsider discover which tracked inputs trigger what actions? We'll need experimental cases using apparently innocent targets who press certain combinations of buttons within the INTEL brain. Since only a very small subgroup of elite of hackers are even beginning to study cracking A.I. the saying will hold true: "The Geek shall inherit the Earth".
Therein lies the rub. Even if you watch the watchers, if you're not a hacker you won't know if they're lying or the digital patient has deceived you both either purposefully or accidentally. Every problem in CS can be solved by adding another level of indirection. Every problem in intelligence can be solved by removing another level of indirection. The twain shall never meet satisfactorily. AI is still in its infancy when it comes to intelligence. Would you let an infant tell you what to do? We need more HUMINT. We're better off funding the CIA, not this easily fooled cybercrap.
(Score: 2) by Yog-Yogguth on Wednesday October 28 2015, @09:29PM
A lot of what you say is right, I'll only point out a few things.
There is code in the paper. Pseudocode but that's a given even in many math papers (and in many ways this is close to a maths paper). But no they have not released the full programs and never will partly because the code is entirely subordinate to the math and methods which are detailed in the paper... As you and many others must know papers do not try to give the reader a full education; they only give the last bit.
Judging by the (widespread and shit) coverage I've seen on the internet (and there was a lot of coverage, most of it devoid of almost all content) and from the comments here they could have released both the source and precompiled binaries and a fucking HOWTO and people still wouldn't undestand nor care. Hopefully (and that's not a good thing) that is a misleading/wrong impression.
I would have hoped (fuck) more people would have understood what they've done and what is new, I tried to spell it out for them but almost nobody seems to have understood much or any of it.
I think this is a groundbreaking paper as far as automated analysis of human behavior is concerned. It has never been automated to this extent before, not even remotely close, as far as is known through public or leaked sources, or even I dare say through inference i.e. not even Five Eyes (or anybody) has had this capability: this is the current edge. In some ways it is also deceptively simple which is probably why it's so fast (orders of magnitude faster than the competition).
All the other reporting on this except from the IEEE was total shit and even the IEEE Spectrum article wasn't all that good. Other reporting that appeared after this submission was likewise shit and written by people who did not seem to understand the paper (looking at NextBigFuture) or its ramifications to the vast stores of untouched data that is being kept. I worked very hard to write this submission (on and off over two days, haven't written anything this difficult in many many years) and to do so in a consice but informative manner that might enable a larger group of people to understand what this is, so even if there was nearly no discussion here about any of the many topics this impacts I have to hope (that's too many times now, getting peeved :D ) that it might help make that happen elsewhere or internally for individuals.
Bite harder Ouroboros, bite! tails.boum.org/ linux USB CD secure desktop IRC *crypt tor (not endorsements (XKeyScore))