Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 19 submissions in the queue.
posted by janrinok on Sunday December 18 2016, @02:55PM   Printer-friendly

The latest issue of Wired has an interesting article about a 29 year old mathematician who is using crowd sourced machine learning to manage hedge funds.

Richard Craib is a 29-year-old South African who runs a hedge fund in San Francisco. Or rather, he doesn't run it. He leaves that to an artificially intelligent system built by several thousand data scientists whose names he doesn't know.

Under the banner of a startup called Numerai, Craib and his team have built technology that masks the fund's trading data before sharing it with a vast community of anonymous data scientists. Using a method similar to homomorphic encryption, this tech works to ensure that the scientists can't see the details of the company's proprietary trades, but also organizes the data so that these scientists can build machine learning models that analyze it and, in theory, learn better ways of trading financial securities.

"We give away all our data," says Craib, who studied mathematics at Cornell University in New York before going to work for an asset management firm in South Africa. "But we convert it into this abstract form where people can build machine learning models for the data without really knowing what they're doing."

He doesn't know these data scientists because he recruits them online and pays them for their trouble in a digital currency that can preserve anonymity. "Anyone can submit predictions back to us," he says. "If they work, we pay them in bitcoin."

So, to sum up: They aren't privy to his data. He isn't privy to them. And because they work from encrypted data, they can't use their machine learning models on other data—and neither can he. But Craib believes the blind can lead the blind to a better hedge fund.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by fritsd on Sunday December 18 2016, @05:22PM

    by fritsd (4586) on Sunday December 18 2016, @05:22PM (#442720) Journal

    I think that in these kind of situations, the company with the actual test data outputs (Numerai) holds all the cards.

    They can't share the test data outputs together with the training data and test data inputs, because that would defeat the purpose of the machine learning process: the committee members would just train with supervised learning on the training+test data, and you can't tell if it has a low RMS error because they cheated and overfitted [wikipedia.org], or because they did everything honorably and just found an extremely useful algorithm.

    That implies that the committee members can never evaluate if they're being rewarded properly either.

    Actually I just thought of something: Numerai can put it (=test data outputs) in a public location afterwards; then the committee members can download the (checksummed!!) used-up test data, compare if gives the same score function that they were rewarded with, and have furious phonecalls with data scientists or their students that they know, to ask if they've also been shafted by Numerai, or if the checksums don't match and each committee member gets an individual fake test data set.
    If the (public!!) scoring function gives correct results on the test data, AND the test data is identical to your concullega's download of the test data, AND it gives correct results for them, then that proves Numerai is honorable.

    Would that work?

    The committee members don't have to publish their code or algorithms either: since the purpose is making oodles of money, they might be reluctant to do that anyway.

    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2