PyTorch Dependency Poisoned With Malicious Code

An unknown attacker used the PyPI code repository to get developers to download a compromised PyTorch dependency that included malicious code designed to steal system data.

Developers who last week downloaded the nightly builds of the open source PyTorch framework also unknowingly installed a malicious version of the torchtriton dependency found in the Python Package Index, according to PyTorch’s maintainers.

In a blog post this week, PyTorch recommended those who installed the PyTorch nightly on Linux through pip between December 25 and December 30 to uninstall it and use the latest nightly binaries that were released after December 30.

They said devs using the PyTorch stable packages were not affected by the malicious binary.

‘Security research gone bad’

However, the extent of the attack is unclear. A person taking responsibility for the incident said it was part of a security research project that went awry. They apologized, saying they erred in not making this clear and that they’ve deleted all data that was exfiltrated. A copy of the note can be found at the bottom of this Checkmarx blog.

The dependency confusion attack included uploading a copy of torchtriton – a legitimate dependency – that was laced with malware to PyPI, an online repository of packages for Python developers. The compromised torchtriton package came with the same name as the one PyTorch maintainers ship on the PyTorch nightly package index.

“Since the PyPI index takes precedence, this malicious package was being installed instead of the version from our official repository,” they wrote. “This design enables somebody to register a package by the same name as one that exists in a third party index, and pip will install their version by default. This malicious package has the same name torchtriton but added in code that uploads sensitive data from the machine.”

The sensitive data includes nameservers, hostnames, the current username and working directory name. In addition, it accessed a range of files, including /etc/hosts, /etc/password/, the first 1,000 files in $HOME/*, $HOME/.gitconfig, and $HOME/.ssh*.

The malicious binary would upload files ranging in size up to 99,999 bytes and send the contents to a specified domain.

PyTorch – like Keras, TensorFlow and Jax – is a framework developers can use for machine learning applications like natural language processing and computer vision. It’s based on the Torch library and was developed by Meta AI, though it now is under the auspices of the Linux Foundation.

The PyTorch maintainers have taken several steps to fix the issue, including removing torchtriton as a dependency for the nightly packages and replacing it with pytorch-triton. In addition, they registered a dummy package on PyPI to prevent similar attacks.

All nightly packages that depend on torchtriton were removed from the package indices and they are asking the PyPI security team for greater ownership of the torchtriton package and to delete the malicious version.

PyPI and other open source code repositories have become a target in supply chain attacks. A malicious package on PyPI was found last month masquerading as a legitimate SDK from SentinelOne and Phylum in November identified a campaign distributing the W4SP malware via PyPI packages.

During the summer, PyPI talked about a phishing attack against developers using the index and said it was offering security keys for two-factor authentication for projects with the most downloads over the previous six months.

In April, the Open Source Security Foundation said it created a community-based working group to address the issue of securing software repositories.

“This attack is the first known significant dependency confusion attack on the PyPi ecosystem,” Zack Tzachi, head of software supply chain for Checkmarx, wrote in the blog post. “Dependency confusion attacks were first revealed by Alex Birsan in 2021. Since then, the technique has been used countless times in both PyPI and NPM package registries.”

While attacks on software library dependencies aren’t new, they’ve become more frequent, said Mike Parkin, senior technical engineer at Vulcan Cyber, adding that the greater visibility should lead to more mitigations.

“The question is whether the primary fix will fall to the repositories to make sure libraries they distribute aren’t compromised or to the developers who use those libraries to make sure the expected library is loaded,” Parkin told The Register.

John Bambenek, principal threat hunter at Netenrich, told The Register that while there are benefits to open source software, there is little institutional protection beyond an almost entirely voluntary effort to address inherent supply chain risks. Until more money is directed to the issue, the problems will continue, Bambenek said. ®

READ MORE HERE