Hiding Android Malware in Hugging Face Repos
I spent my entire Tuesday morning cleaning up a mess because a junior developer treated Hugging Face like a trusted package manager. It isn’t. It’s a massive, mostly unvetted file host. And attackers have absolutely figured this out.
We’ve been warning about supply chain attacks in machine learning for a while. Usually, the fear is a poisoned PyTorch pickle file that executes arbitrary code when you load the model weights. That’s bad enough. But the latest trend I’ve been tracking since late February is actually much dumber—and somehow way more effective.
Malware operators are using Hugging Face repositories to host Android Remote Access Trojans (RATs) and other mobile malware payloads.
Yeah. Android malware. On an AI hub.
Why Hugging Face?
Look at this from an attacker’s perspective. If you want your malicious Android app to download a secondary payload after it bypasses the Google Play Protect checks, where do you host that payload?
If you spin up a random AWS bucket or register totally-not-malware.xyz, security vendors will flag the domain in a matter of hours. The traffic gets blocked at the firewall level. Game over.
But huggingface.co? Every single corporate firewall on the planet whitelisted that domain two years ago. We all blindly trust it because our data science teams literally cannot do their jobs without it. Attackers know this. They’re creating fake user accounts, spinning up empty model repositories, and uploading encrypted .apk files or secondary payloads disguised as datasets.
The malicious Android app on the victim’s phone just makes a quiet GET request to the Hugging Face API, pulls down the RAT, and installs it. HF is basically acting as a bulletproof, free Content Delivery Network (CDN) for hackers.
The 8MB Config File Gotcha
Well, that’s not entirely accurate — I caught one of these in the wild last month. I was auditing some network logs and noticed a script trying to pull a payload from a repo disguised as a fine-tuned LLaMA adapter. The dead giveaway? A config.json file that was 8.4MB. Who has an eight-megabyte config file? Nobody. I pulled the file down in an isolated sandbox, cracked it open, and it was just a massive base64 encoded blob. It wasn’t even a model. It was a staged payload waiting for a dropper app to fetch it.
This completely breaks the assumption that if you stick to downloading safetensors and JSON files, you’re entirely safe. Sure, the JSON won’t execute on your machine just by downloading it, but if your infrastructure is being used as a pass-through to serve malware to end-users, your IP reputation is going to tank.
Fixing the Blind Pull Problem
And we had to change our workflow immediately. I stopped letting my team use snapshot_download without strict pre-flight checks. You cannot just pull an entire repository blindly anymore.
I wrote a quick intercept script using the Hub API. Before we download anything, we list the repository files and check for suspicious extensions or wildly disproportionate file sizes. If it fails, the pull is blocked.
Here’s the exact pre-pull check I’m running right now. I tested this on my M2 Mac running Python 3.12.2 and huggingface_hub version 0.28.1, and we’ve since deployed it to our staging cluster.
from huggingface_hub import HfApi
import sys
def pre_flight_repo_check(repo_id: str) -> bool:
api = HfApi()
# Extensions that have absolutely no business in an ML repo
banned_extensions = ['.apk', '.dex', '.exe', '.sh', '.bat', '.cmd']
try:
files = api.model_info(repo_id).siblings
except Exception as e:
print(f"Failed to fetch repo info: {e}")
return False
for file in files:
filename = file.rfilename.lower()
# Check 1: Obviously malicious extensions
if any(filename.endswith(ext) for ext in banned_extensions):
print(f"BLOCKING PULL: Found banned file type -> {filename}")
return False
# Check 2: Suspiciously large config/text files (Size is in bytes)
# 2MB is a generous upper limit for a normal config.json
if filename.endswith('.json') and file.size and file.size > 2_000_000:
print(f"BLOCKING PULL: Anomalous JSON file size ({file.size} bytes) -> {filename}")
return False
return True
# Usage
target_repo = "suspicious-user/fake-llama-adapter"
if pre_flight_repo_check(target_repo):
print("Repo looks clean. Proceeding with download...")
# snapshot_download(repo_id=target_repo)
else:
sys.exit(1)
I wired this into our GitHub Actions pipeline last week. It added exactly 1.2 seconds to our build time. I’ll take a one-second delay over accidentally hosting a proxy for an Android banking trojan.
Where This Is Going
The security team at Hugging Face is usually pretty responsive, but they are fighting an uphill battle against the sheer volume of uploads. You can’t manually moderate millions of repositories.
And right now, the platform is too open. I probably expect we’ll see a massive policy shift by Q1 2027. They’ll likely have to start aggressively rate-limiting or outright banning non-ML file extensions. They might even force domain-level verification for accounts pushing high volumes of traffic that don’t match typical inference patterns.
Until then, you need to treat the Hub like you treat npm or PyPI. Assume everything is hostile until proven otherwise. Check your dependencies, audit the files you’re pulling, and for the love of god, stop running trust_remote_code=True on random weekend projects you found on a forum.
