Stories
Slash Boxes
Comments

SoylentNews is people

posted by janrinok on Saturday March 02 2024, @07:16PM   Printer-friendly
from the no-sec-in-ai dept.

https://www.bleepingcomputer.com/news/security/malicious-ai-models-on-hugging-face-backdoor-users-machines/

At least 100 instances of malicious AI ML models were found on the Hugging Face platform, some of which can execute code on the victim's machine, giving attackers a persistent backdoor.

Hugging Face is a tech firm engaged in artificial intelligence (AI), natural language processing (NLP), and machine learning (ML), providing a platform where communities can collaborate and share models, datasets, and complete applications.

JFrog's security team found that roughly a hundred models hosted on the platform feature malicious functionality, posing a significant risk of data breaches and espionage attacks.

This happens despite Hugging Face's security measures, including malware, pickle, and secrets scanning, and scrutinizing the models' functionality to discover behaviors like unsafe deserialization.

[...] The analysts deployed a HoneyPot to attract and analyze the activity to determine the operators' real intentions but were unable to capture any commands during the period of the established connectivity (in one day).


Original Submission

 
This discussion was created by janrinok (52) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 5, Informative) by grant on Saturday March 02 2024, @08:43PM (3 children)

    by grant (4922) on Saturday March 02 2024, @08:43PM (#1347126)

    This is because people are distributing python pickle files in the models for the python frameworks.

    If all you care about is inference, not training, you can avoid python.

    GGUF format used by llama.cpp and derivatives is immune to this (pretty much anything not python, for running llms, derives from llama.cpp and their ggml library).

    llama.cpp is also a ton easier to setup than any of the python stuff, and with the new Vulkan back-end, you can offload layers to your igpu on your ryzen laptop without any proprietary drivers/libraries like cuda/rocm. The Vulkan back-end works great with free mesa drivers and distribution packaged libvulkan-dev (Debian). Should also work with anything with vulkan drivers, but haven't personally tried it on an intel laptop igpu.

    Vulkan will let you access more than the amount of memory reserved as "video memory" in the bios. On my shitty Ryzen 3 (the 2 core one made on Global Foundries old node, not TSMC, so pretty much as low-end as you can get [except I upgraded the RAM to 32GB]). I have access to 16GB of video memory via Vulkan, while the bios has hard-set, non-adjustable, 2GB video memory reserved. A quantized 13B model will offload entirely to GPU on this low-end laptop. And, a 34B model will run SLOWLY with a little more than half the layers offloaded to GPU. Tiny 3B-ish models like Phi provide multiple tokens per second, which would be pretty usable if not for how terribly the tiny models perform.

    On my laptop, GPU offload doesn't really speed inference up from running on the CPU, but it does leave the CPU cores free for other things, so I can run a model while doing other things. And, I get to use the 2GB RAM that is reserved as video ram on my laptop. The fans spin up a lot, but no impact (other than thermal), on stuff running on the CPU.

    If you have a recent mac, llama.cpp runs super fast using the 'metal' backend to offload layers to the GPU.

    https://github.com/ggerganov/llama.cpp [github.com]

    whisper.cpp:
    https://github.com/ggerganov/whisper.cpp [github.com]
    (uses same ggml library as llama.cpp, but doesn't support vulkan). whisper.cpp will run the whisper auto transcription/translation models in the older GGML format (avoiding the python issue). I use this to transcribe downloaded youtube instructional videos that do not already have a transcription available, so I can index the video content with recoll. 'nice -n 20 ionice -c 3 whisper.cpp ...' or brings laptop to knees. Hoping it gets vulkan support eventually too.

    There is also the unaffiliated to ggerganov, stable-diffusion.cpp:
    https://github.com/leejet/stable-diffusion.cpp [github.com]
    Generative art.
    Haven't used this yet, but did try out the original stable-diffusion python stuff when it was first announced. The python stuff was (IMO) terrible to setup.

    TheBloke on Huggingface has quantized versions of most popular models in GGUF / GGML formats for download. Some llama.cpp derivatives like ollama have downloading the model built-in (no vulkan support ?yet?, in ollama, though).

    Starting Score:    1  point
    Moderation   +4  
       Interesting=1, Informative=3, Total=4
    Extra 'Informative' Modifier   0  

    Total Score:   5  
  • (Score: 3, Informative) by mth on Saturday March 02 2024, @10:47PM (1 child)

    by mth (2848) on Saturday March 02 2024, @10:47PM (#1347147) Homepage

    You can use Python with the safetensors format. Using pickle as a distribution format was never a good idea.

    • (Score: 2, Informative) by grant on Sunday March 03 2024, @03:36AM

      by grant (4922) on Sunday March 03 2024, @03:36AM (#1347167)

      Yeah, using pickle is the issue for the vulnerability in the python based stuff, not python itself. But, with the *.cpp stuff, you get safe models AND easy setup.:

      For plain CPU only inference, there are zero deps, just:
      make

      You're done setting things up. You can now run inference on any llama family model you have downloaded.

      For vulkan, If your system can already run vulkan stuff like games, there is only a single dep to install (assuming Debian):
      apt install libvulkan-dev
      make LLAMA_VULKAN=1

      If you have never run vulkan stuff before, you may also need to:
      apt install mesa-vulkan-drivers vulkan-tools

      No coda/miniconda + virtual env with a million deps and files you need to edit to change the back-end (everything assumes cuda by default) that, if they are documented anywhere but random websites in an incomplete form, I failed to find that documentation.

      Maybe I was holding the python stuff wrong out of ignorance about the platform, but llama.cpp and friends are so easy to setup it is probably an improvement even for folks who are familiar with the python way. For folks like me not deep into python, it was a huge improvement.

  • (Score: 3, Insightful) by The Vocal Minority on Sunday March 03 2024, @04:49AM

    by The Vocal Minority (2765) on Sunday March 03 2024, @04:49AM (#1347174) Journal

    I'm out of mod points but thanks for this. I do tend to customise the models I get from hugging face and fine tune so I'm not sure if this is so useful for me, but worth knowing about all the same.