Python Crypto Library Updated to Steal Private Keys:
Yesterday, Phylum's automated risk detection platform discovered that the PyPI package aiocpa was updated to include malicious code that steals private keys by exfiltrating them through Telegram when users initialize the crypto library. While the attacker published this malicious update to PyPI, they deliberately kept the package's GitHub repository clean of the malicious code to evade detection.
[...] Interesting! The attacker overwrites the __init__ method of the CryptoPay class. Actually, it's acting more like a wrapper around the originality functionality of the method. They're saving the original method via init = CryptoPay.__init__ and then calling it as per usual with init(*args, **kwargs) and then sending a Telegram message to, presumably, the attacker's Telegram bot call with args[1:] as the message.
[...] Just to recap, we're seeing a crypto library that dynamically alters the class's constructor upon module import to exfiltrate the victim's private keys when calling the class's constructor!
Another interesting aspect we discovered in our investigation is that its PyPI homepage points to a GitHub repo.
However, if you look at the same file in the GitHub repo, you'll notice that the obfuscated payload is missing! This means the attacker updated a local copy of the repo with the malicious payload and then published that package to PyPI, leaving the GitHub repo with the same version numbers malware-free — a clear attempt at evasion.
This library's popularity - with 17 GitHub stars and (according to pypistats.org before the package was removed) nearly 4K downloads in the last month–makes this incident particularly concerning. The attack highlights two critical security lessons: First, it demonstrates the importance of scanning the actual code sent to open source ecosystems, that is the code that actually runs when you pip install or node -i a package, rather than just reviewing source repositories alone. As evidenced here, attackers can deliberately maintain clean source repos while distributing malicious packages to the ecosystems. Second, it serves as a reminder that a package's previous safety record doesn't guarantee its continued security.
(Score: 4, Insightful) by pTamok on Monday November 25, @03:47PM (20 children)
I would have expected the package distribution to require pushing an update to a Git repository controlled by 'PyPI' (The Python Software Foundation), which then automates the creation and distribution of the package from the push. This gives a log of changes to the source.
I would not have expected PyPI to be updatable from a random repository controlled by an entity other than PyPI. That is madness.
To add to the woes, packages can contain compiled binaries - e.g. Windows DLLs.
I hope I have misunderstood.
(Score: 4, Informative) by Unixnut on Monday November 25, @04:02PM
I've published using PyPi (at least before they became crap) and no, it is not tied to any particular git repo.
You have your PyPi account credentials, you build your code locally and deploy the package to PyPi via Twine [readthedocs.io]. There is no requirement to even have a git repository, let alone it to be linked in any way.
All you need to upload to a projects PyPi account is their credentials. If those get compromised then a malicious actor can take the real code from the git repo, modify it for some nefarious purpose and publish it without contributing the changes back to the public repo (as has happened here).
It is different to the structure of distro/OS package managers, who usually fork the public repo to their distro (e.g. Debian), after which they use that code to build, sign and verify the package before it is pushed to end-users (it also allows for any OS specific changes to be made that may not need to be pushed upstream) while allowing the end-users to see the exact repo the package was built from. This system (while not foolproof) is IMO much better than the ways done nowadays with npm, PyPi etc.... which is why I stopped using/publishing those tools, and instead either use official OS repos or get the code from the source (much easier with interpreted languages like Python as you don't have to have a complete build environment set up)
(Score: 2, Disagree) by pTamok on Monday November 25, @04:06PM
This appears to be some of the documentation for the security model of PyPI
PEP 458 – Secure PyPI downloads with signed repository metadata [python.org] Status - Active (accepted; may not be implemented yet)
PEP 480 – Surviving a Compromise of PyPI: End-to-end signing of packages [python.org] Status - Draft (under consideration)
It should not be the case that a developer whose keys hav been compromised, or a malicious developer, can upload a package to be distributed without the repository copy of source of that package being updated, and the distributed package being generated from the source.
(Score: 5, Insightful) by owl on Monday November 25, @04:48PM (10 children)
It's the Python ecosystem. What else did you expect?
(Score: 5, Funny) by pe1rxq on Monday November 25, @04:53PM (9 children)
Everybody assumed security was encoded in whitespace?
(Score: 2, Insightful) by pTamok on Monday November 25, @05:21PM (8 children)
Well, to an extent, there is the get-out that you should review the source of what you are downloading yourself. Obfuscated code and binary blobs makes that difficult.
Very, very few people do. Any many people use Python packages as a tools to do something else, without having the training to be able to review source code that they probably don't understand. Just like you don't need to be a car mechanic to be able to drive a car.
On the other hand, people running a method for distributing packages really ought to be cognisant of how they are being used, and apply mitigation. Just like people trust car manufacturers and workshops to deliver cars that don't need to be audited by an expert before you drive them away. Package distribution implies some responsibilities. (Don't worry, I can hear the laughter)
(Score: 2) by JoeMerchant on Monday November 25, @05:24PM (5 children)
And what happens when the Python packages include pre-compiled C++, Fortran or other modules to handle the "heavy lifting" that Python is so dismal at? Presumably your Python developers mostly don't understand that 'under the covers' stuff, otherwise why would they be coding in Python in the first place? /s
🌻🌻 [google.com]
(Score: 4, Insightful) by HiThere on Monday November 25, @06:19PM (4 children)
I code in Python because I can get things working a lot more quickly. If it's performance critical, it's rather easy to translate it into C++ later...unless it depends on a Python library that doesn't have a corresponding C++ library. (There are edge cases where parts are difficult to translate, but those aren't common.)
Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
(Score: 2) by JoeMerchant on Monday November 25, @06:24PM (2 children)
I don't code in Python because I have to maintain my code in working order for many years, and the Python library environment makes that a hamster in a wheel job - I try to avoid those.
🌻🌻 [google.com]
(Score: 3, Funny) by ikanreed on Monday November 25, @08:54PM (1 child)
I suppose if wheels are not your thing you could put your hamster in a .egg instead.
(Score: 2) by JoeMerchant on Monday November 25, @09:19PM
Python eggs work well when you never update your code. Ours doesn't just live in a bubble, it is constantly tweaked, expanded, and scrutinized vs CVEs.
I have a trac instance (big pile 'o Python) that's still running great on Ubuntu 15.10, just as it was built 8 years ago, eggs and all. Very different beast from thousands of field deployed copies constantly updated with the output of a dozen developers...
🌻🌻 [google.com]
(Score: 0) by Anonymous Coward on Tuesday November 26, @11:17AM
How about Julia then? Sane syntax, fast development, good performance?
(Score: 5, Insightful) by bloodnok on Monday November 25, @08:16PM
And this is the heart of the matter. Even those of us who are competent to do so have neither the time nor inclination to do so. If you run a repository you must know this, so if you don't take responsibility for ensuring at least a minimum of good security practice you are putting your users at risk.
Reviewing any non-trivial Python code (and it's the same for may other languages) means that you have to review all of the libraries that it uses. And all of the libraries that they use and so on. And if you actually did that, it would take so long that half of those libraries would have been updated - so what you'd have reviewed would have been a snapshot that probably no-one else would ever be able to recreate. And when you discover that you need a new feature, and have to update library X, that causes a cascade of other updates so your original review now only covers a fraction of the code you are now planning to use.
This is why I avoid using Python, Ruby, Javascript and all the other flavours of the month where at all possible. When I create software, I avoid languages that expect me to download the latest and greatest packages from sources that have shown themselves time and time again to be unable to protect their users from abuse. This means that I use dumb tools like bash and gawk where Python would be a much better choice technically, and why I avoid using libraries to do things that I can code myself more slowly and dumbly.
It doesn't make my code better, and it certainly makes it slower to develop but I know that it will be proof against all but the most impressive supply chain attacks. This is not a good state of affairs and I see no likelihood of it improving any time soon.
I see no real solutions for this. A major attitude adjustment on the parts of the various language communities would be a good start but I don't see that happening.
__
The Major
(Score: 3, Insightful) by driverless on Wednesday November 27, @01:56AM
At least one reason for this is because Python-one-thing seems to require Python-everything-else. How many people have gone through the following:
Install something Python-based. Install a pile of dependencies and additional requirements. Update some things from who-knows-where because the default ones are out of date or the wrong shade of green or whatever. Roll back a few others because whoever created the original package you installed was using a few out-of-date things that happened to be on their system. Jiggle and juggle and randomly try things until it starts working.
Realise that in the process you've churned through about 10MB of random Python code of unknown provenance that you haven't looked at any of.
(Score: 3, Interesting) by JoeMerchant on Monday November 25, @05:21PM (6 children)
Didn't we used to publish MD5 hashes of compiled and packaged code? (Haven't we updated that to SHA3 or some other cryptographically secure hash since then?)
Shouldn't any user of downloaded code be validating their received code against the secure hash published by the trusted source that hosts the source code, compiles it and runs the validation tests?
Doesn't apt do all this automagically as it runs?
Will developers ever stop re-inventing the wheel?
🌻🌻 [google.com]
(Score: 2) by ls671 on Monday November 25, @05:50PM (5 children)
It's still pretty hard to modify a package and make it have the same md5 checksum. A checksum has nothing to do with cryptography strictly speaking. Now, signed packages are another story.
Everything I write is lies, including this sentence.
(Score: 3, Interesting) by JoeMerchant on Monday November 25, @06:22PM (3 children)
Signed packages are for holders of private keys to prove that they have signed the package...
While I'll agree that creating MD5 collisions is "relatively hard" - it's also automatable, so... not really that much effort on the part of the attacker, just a little time and electricity.
SHA3 (among others) are basically drop-in replacements for the MD5 function which are currently believed to be basically impossible to create collisions for.
🌻🌻 [google.com]
(Score: 2) by ls671 on Monday November 25, @06:41PM (1 child)
Signed packages are just a package coming with a digital signature of the checksum. Unfortunately, not everybody will check the checksum and the checksum isn't always easily available independently for signed packages. I check both when both are available.
Everything I write is lies, including this sentence.
(Score: 2) by JoeMerchant on Monday November 25, @07:28PM
I avoid using external packages whenever possible... compiled from source, locally controlled.
🌻🌻 [google.com]
(Score: 3, Insightful) by JoeMerchant on Monday November 25, @07:30PM
Interesting potential exploit, though:
If you have a package, and the signature is just a signature of the MD5 of the package, then you can make a colliding MD5 evil twin package and the signature will still be valid...
🌻🌻 [google.com]
(Score: 3, Informative) by gnuman on Monday November 25, @07:07PM
That's a little offtopic, but is now takes a minute to calculate the collision. Just append garbage under a comment.
SHA1 is no longer recommended to be used either. NIST will kill SHA1 in next few years for all government software usage and it's already killed by FIPS. You need SHA256 at least.