Stories
Slash Boxes
Comments

SoylentNews is people

posted by Fnord666 on Sunday August 02 2020, @04:41PM   Printer-friendly
from the seriously-cool-maths dept.

IBM completes successful field trials on Fully Homomorphic Encryption:

Yesterday, Ars spoke with IBM Senior Research Scientist Flavio Bergamaschi about the company's recent successful field trials of Fully Homomorphic Encryption. We suspect many of you will have the same questions that we did—beginning with "what is Fully Homomorphic Encryption?"

FHE is a type of encryption that allows direct mathematical operations on the encrypted data. Upon decryption, the results will be correct. For example, you might encrypt 2, 3, and 7 and send the three encrypted values to a third party. If you then ask the third party to add the first and second values, then multiply the result by the third value and return the result to you, you can then decrypt that result—and get 35.

You don't ever have to share a key with the third party doing the computation; the data remains encrypted with a key the third party never received. So, while the third party performed the operations you asked it to, it never knew the values of either the inputs or the output. You can also ask the third party to perform mathematical or logical operations of the encrypted data with non-encrypted data—for example, in pseudocode, FHE_decrypt(FHE_encrypt(2) * 5) equals 10.

[...] Although Fully Homomorphic Encryption makes things possible that otherwise would not be, it comes at a steep cost. Above, we can see charts indicating the additional compute power and memory resources required to operate on FHE-encrypted machine-learning models—roughly 40 to 50 times the compute and 10 to 20 times the RAM that would be required to do the same work on unencrypted models.

[...] Each operation performed on a floating-point value decreases its accuracy a little bit—a very small amount for additive operations, and a larger one for multiplicative. Since the FHE encryption and decryption themselves are mathematical operations, this adds a small amount of additional degradation to the accuracy of the floating-point values.

[...] As daunting as the performance penalties for FHE may be, they're well under the threshold for usefulness—Bergamaschi told us that IBM initially estimated that the minimum efficiency to make FHE useful in the real world would be on the order of 1,000:1. With penalties well under 100:1, IBM contracted with one large American bank and one large European bank to perform real-world field trials of FHE techniques, using live data.

[...] IBM's Homomorphic Encryption algorithms use lattice-based encryption, are significantly quantum-computing resistant, and are available as open source libraries for Linux, MacOS, and iOS. Support for Android is on its way.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 0) by Anonymous Coward on Monday August 03 2020, @09:26AM (5 children)

    by Anonymous Coward on Monday August 03 2020, @09:26AM (#1030664)

    An example would be summary statistics when you need to maintain privacy. I give you blobs of encrypted data about people, you can compute the row counts, or mean age, or standard deviations of income, or lm frames, or whatever. This would allow me to put the encrypted but still usable sensitive data on one of the regular clusters instead of waiting multiple days/weeks for a slow desktop in the secure room or having to pay for the time on more expensive audited and approved clusters. It could be a lot more time/money efficient to do it that way on large sets of data instead of doing it another.

    If that is still unclear, I can try again with another example or answer your questions.

  • (Score: 0) by Anonymous Coward on Monday August 03 2020, @05:55PM (4 children)

    by Anonymous Coward on Monday August 03 2020, @05:55PM (#1030811)

    The question is wouldn't it usually be cheaper for me to just compute the data myself than for me to encrypt it, send it over to you, wait for you to compute it and send it back, then decrypt it?

    I can't imagine too many situations where this wouldn't be the case.

    • (Score: 0) by Anonymous Coward on Monday August 03 2020, @06:38PM

      by Anonymous Coward on Monday August 03 2020, @06:38PM (#1030852)

      There are a few cases.

      It allows someone to cast a vote, receive verification that the vote was counted, but not reveal the way they voted to anyone.

      It allows a group of people who don't trust each other to determine the result of some calculation that affects all of them. For example, a group could determine who won a lottery without trusting each other someone external.

      It allows for anonymous collection of statistics which can be totaled without having to reveal anyone's individual contributions.

      It allows a group of people to create an encrypted document that can only be decrypted if a certain fraction of them agree, but without anyone having to share keys with each other.

    • (Score: 0) by Anonymous Coward on Monday August 03 2020, @08:44PM (2 children)

      by Anonymous Coward on Monday August 03 2020, @08:44PM (#1030924)

      I'll give you a more specific example. I have a controlled data set that is in excess of 10 TB. It is approximately 600 columns and 10 million rows. Since it is controlled, I can only have it in unencrypted form on certain computers. While beefy, the most powerful machine I can run anything on is a single server desktop in the secure room and it very limited when compared to one of our clusters and is made worse because it is sharing jobs with other people. Jobs on that data set take literal weeks if not months; there are some things I'd love to do with the data but they would take years of wall time. Jobs on even larger sets of data but on a cluster take less than a week and most are a day. It is multiple orders of magnitude faster.

      When comparing costs, encrypting the data once is a relatively cheap operation for the data as there are a finite number of values and most are repeated (e.g. true/false or categories coded as 1-5) or, if you want to use a unique scheme, it is still faster since each datum is only handled once. I can then have IT load that data onto the cluster SAN and spit back results in a day or two, instead of waiting relatively forever. Therefore, my productivity goes up because I can iterate on a single dataset faster and don't have to do as much context switching between multiple projects. It is also cheaper because we don't have to have excess capacity for worst-case situations all over the place or from bad scaling. There is a reason why clusters became a thing, after all. Additionally, I could use the cheaper general-purpose clusters instead of the more secure or approved clusters for that data, saving money that way. Depending on the exact job in question, I might be able to put it on an almost free but still very fast grid. Something that would give my boss a heart attack and get me fired if I suggested it on controlled data unencrypted.

      • (Score: 0) by Anonymous Coward on Tuesday August 04 2020, @07:58AM

        by Anonymous Coward on Tuesday August 04 2020, @07:58AM (#1031165)

        This is exactly why health organizations, militaries, the EFF, and the Mozilla Foundation want this ability.

      • (Score: 0) by Anonymous Coward on Tuesday August 04 2020, @08:25AM

        by Anonymous Coward on Tuesday August 04 2020, @08:25AM (#1031170)

        Also it bothers me very much that your post and other superb ones in this thread were not upvoted. Shame on Soylents for letting gold pass by unheeded.

        Shame!