At least 100 instances of malicious AI ML models were found on the Hugging Face platform, some of which can execute code on the victim's machine, giving attackers a persistent backdoor.
Hugging Face is a tech firm engaged in artificial intelligence (AI), natural language processing (NLP), and machine learning (ML), providing a platform where communities can collaborate and share models, datasets, and complete applications.
JFrog's security team found that roughly a hundred models hosted on the platform feature malicious functionality, posing a significant risk of data breaches and espionage attacks.
This happens despite Hugging Face's security measures, including malware, pickle, and secrets scanning, and scrutinizing the models' functionality to discover behaviors like unsafe deserialization.
[...] The analysts deployed a HoneyPot to attract and analyze the activity to determine the operators' real intentions but were unable to capture any commands during the period of the established connectivity (in one day).
(Score: 5, Informative) by grant on Saturday March 02 2024, @08:43PM (3 children)
This is because people are distributing python pickle files in the models for the python frameworks.
If all you care about is inference, not training, you can avoid python.
GGUF format used by llama.cpp and derivatives is immune to this (pretty much anything not python, for running llms, derives from llama.cpp and their ggml library).
llama.cpp is also a ton easier to setup than any of the python stuff, and with the new Vulkan back-end, you can offload layers to your igpu on your ryzen laptop without any proprietary drivers/libraries like cuda/rocm. The Vulkan back-end works great with free mesa drivers and distribution packaged libvulkan-dev (Debian). Should also work with anything with vulkan drivers, but haven't personally tried it on an intel laptop igpu.
Vulkan will let you access more than the amount of memory reserved as "video memory" in the bios. On my shitty Ryzen 3 (the 2 core one made on Global Foundries old node, not TSMC, so pretty much as low-end as you can get [except I upgraded the RAM to 32GB]). I have access to 16GB of video memory via Vulkan, while the bios has hard-set, non-adjustable, 2GB video memory reserved. A quantized 13B model will offload entirely to GPU on this low-end laptop. And, a 34B model will run SLOWLY with a little more than half the layers offloaded to GPU. Tiny 3B-ish models like Phi provide multiple tokens per second, which would be pretty usable if not for how terribly the tiny models perform.
On my laptop, GPU offload doesn't really speed inference up from running on the CPU, but it does leave the CPU cores free for other things, so I can run a model while doing other things. And, I get to use the 2GB RAM that is reserved as video ram on my laptop. The fans spin up a lot, but no impact (other than thermal), on stuff running on the CPU.
If you have a recent mac, llama.cpp runs super fast using the 'metal' backend to offload layers to the GPU.
https://github.com/ggerganov/llama.cpp [github.com]
whisper.cpp:
https://github.com/ggerganov/whisper.cpp [github.com]
(uses same ggml library as llama.cpp, but doesn't support vulkan). whisper.cpp will run the whisper auto transcription/translation models in the older GGML format (avoiding the python issue). I use this to transcribe downloaded youtube instructional videos that do not already have a transcription available, so I can index the video content with recoll. 'nice -n 20 ionice -c 3 whisper.cpp ...' or brings laptop to knees. Hoping it gets vulkan support eventually too.
There is also the unaffiliated to ggerganov, stable-diffusion.cpp:
https://github.com/leejet/stable-diffusion.cpp [github.com]
Generative art.
Haven't used this yet, but did try out the original stable-diffusion python stuff when it was first announced. The python stuff was (IMO) terrible to setup.
TheBloke on Huggingface has quantized versions of most popular models in GGUF / GGML formats for download. Some llama.cpp derivatives like ollama have downloading the model built-in (no vulkan support ?yet?, in ollama, though).
(Score: 3, Informative) by mth on Saturday March 02 2024, @10:47PM (1 child)
You can use Python with the safetensors format. Using pickle as a distribution format was never a good idea.
(Score: 2, Informative) by grant on Sunday March 03 2024, @03:36AM
Yeah, using pickle is the issue for the vulnerability in the python based stuff, not python itself. But, with the *.cpp stuff, you get safe models AND easy setup.:
For plain CPU only inference, there are zero deps, just:
make
You're done setting things up. You can now run inference on any llama family model you have downloaded.
For vulkan, If your system can already run vulkan stuff like games, there is only a single dep to install (assuming Debian):
apt install libvulkan-dev
make LLAMA_VULKAN=1
If you have never run vulkan stuff before, you may also need to:
apt install mesa-vulkan-drivers vulkan-tools
No coda/miniconda + virtual env with a million deps and files you need to edit to change the back-end (everything assumes cuda by default) that, if they are documented anywhere but random websites in an incomplete form, I failed to find that documentation.
Maybe I was holding the python stuff wrong out of ignorance about the platform, but llama.cpp and friends are so easy to setup it is probably an improvement even for folks who are familiar with the python way. For folks like me not deep into python, it was a huge improvement.
(Score: 3, Insightful) by The Vocal Minority on Sunday March 03 2024, @04:49AM
I'm out of mod points but thanks for this. I do tend to customise the models I get from hugging face and fine tune so I'm not sure if this is so useful for me, but worth knowing about all the same.
(Score: 0) by Anonymous Coward on Sunday March 03 2024, @06:04PM
Make sure to pay for an API instead to make you safe. Don't even think about using your own without our guardrails. Google shunting malware to the top of the search results
for things they don't like is fine. Nobody needs to know about that. Not an issue. ChatGPT is all you need.
(Score: 2) by maxwell demon on Monday March 04 2024, @08:50AM (3 children)
I guess those developers thought of the other meaning of "intelligence" when they developed their artificial intelligence code.
The Tao of math: The numbers you can count are not the real numbers.
(Score: 2) by Freeman on Monday March 04 2024, @07:51PM (2 children)
It's called "artificial" intelligence for a reason. It's certainly as smart as a Smart TV or a smartphone.
Joshua 1:9 "Be strong and of a good courage; be not afraid, neither be thou dismayed: for the Lord thy God is with thee"
(Score: 2) by Freeman on Monday March 04 2024, @07:57PM
Though thinking on the term "smart device" I'm lead to remember a show called "Get Smart".
Joshua 1:9 "Be strong and of a good courage; be not afraid, neither be thou dismayed: for the Lord thy God is with thee"
(Score: 2) by maxwell demon on Tuesday March 05 2024, @08:02PM
You seemingly didn't get my comment. Hint: The CIA is not named after its IQ.
The Tao of math: The numbers you can count are not the real numbers.