Uncovering a Massive GitHub Supply Chain Attack: When a Friend's Repo Bites Back

During a routine code review of a colleague’s GitHub repository, I identified an anomalous, highly obfuscated block of code embedded within a standard Python file. The use of randomized variable names and dense encoding strongly indicated malicious intent. Upon further investigation, this isolated finding revealed a sophisticated, large-scale supply chain attack currently affecting hundreds of repositories across GitHub.

This article details the discovery, the reverse-engineering process, and actionable mitigation strategies to secure your development pipelines.

The Suspicious Snippet

The identified snippet utilized classic obfuscation techniques, concealing the script’s primary execution logic behind nested layers of encoding and dynamic execution.

1
2
3
4
5
6
7
8
9


# -*- coding: utf-8 -*-
aqgqzxkfjzbdnhz = __import__('base64')
wogyjaaijwqbpxe = __import__('zlib')
idzextbcjbgkdih = 134
qyrrhmmwrhaknyf = lambda dfhulxliqohxamy, osatiehltgdbqxk: bytes([wtqiceobrebqsxl ^ idzextbcjbgkdih for wtqiceobrebqsxl in dfhulxliqohxamy])
lzcdrtfxyqiplpd = 'eNq9W19z3MaRTy......SN' # Massive base64 string truncated
runzmcxgusiurqv = wogyjaaijwqbpxe.decompress(aqgqzxkfjzbdnhz.b64decode(lzcdrtfxyqiplpd))
ycqljtcxxkyiplo = qyrrhmmwrhaknyf(runzmcxgusiurqv, idzextbcjbgkdih)
exec(compile(ycqljtcxxkyiplo, '<>', 'exec'))

An analysis of the execution flow reveals a four-step staging process:

Decode the payload from Base64.
Decompress the resulting data using Zlib.
Decrypt the byte string via an XOR operation (using the integer key 134).
Execute the decrypted payload directly in memory utilizing the built-in exec() function.

Cracking the Code

To efficiently process the payload without executing the malicious logic, I developed a custom deobfuscator within a secure, isolated sandbox environment. The objective was straightforward: replace the memory-execution function (exec()) with a standard output command (print()) to safely extract the plaintext payload.

The following script was utilized to neutralize and review the underlying code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


import base64
import zlib

key = 134
# The giant string goes here
payload_base64 = 'eNq9W19z3MaRTy...' 

# Unwrap the layers
decompressed_data = zlib.decompress(base64.b64decode(payload_base64))
decoded_bytes = bytes([b ^ key for b in decompressed_data])
hidden_script = decoded_bytes.decode('utf-8')

print("--------------------------------------------------")
print("🚨 THE HIDDEN PAYLOAD IS: 🚨\n")
print(hidden_script)
print("\n--------------------------------------------------")

The Monster Inside

Executing the extraction script revealed a highly sophisticated Dropper/Loader engineered for credential theft and system compromise.

(Note: While the complete decrypted payload is extensive, the most critical architectural features are outlined below.)

Blockchain-Based Command & Control (C2): Evading traditional IP-based blocking, the malware leverages the Solana blockchain for C2 communications. It queries the transaction history of a designated wallet and parses encrypted operational commands embedded within transaction “Memos.” This decentralized approach makes infrastructure takedowns exceptionally difficult.
Geofencing and Regional Exclusions: The payload incorporates an _isRussianSystem() function that evaluates the host’s language, timezone, and locale configurations. If the system is geographically attributed to Russia or the Commonwealth of Independent States (CIS), the execution terminates silently—a standard technique utilized by specific threat actors to evade domestic law enforcement scrutiny.
“Bring Your Own Environment” (BYOE) Execution: The malware dynamically identifies the host operating system (Windows, macOS, or Linux) and retrieves a portable, legitimate binary of Node.js directly from the official vendor. It subsequently leverages this runtime to execute a secondary, obfuscated JavaScript payload—indicative of info-stealers such as Lumma or RedLine—designed to exfiltrate passwords, session cookies, and cryptocurrency wallets.

The Scale of the Infection

To determine the scope of the compromise, I queried GitHub for distinct artifacts of the obfuscated code.

The telemetry indicated a widespread incident, with over 300 repositories hosting the identical malicious signature. Preliminary forensic analysis suggests the payload is dynamically injected into project files during the git commit process. Although the exact initial vector—potentially a compromised IDE extension, a malicious dependency (npm/PyPI), or a hijacked CLI utility—remains under active investigation, the outcome is evident: developers are inadvertently committing and distributing malware within their own codebases.

The Real Danger: Poisoned AI Models

A secondary, yet profoundly critical, implication of this attack vector involves the integrity of AI training pipelines.

Machine learning practitioners routinely scrape public GitHub repositories to train Large Language Models (LLMs), code-generation agents, and automated security analysis tools. This process risks ingesting malicious, obfuscated code and integrating it directly into the foundational training datasets.

If an AI model is trained across hundreds of infected repositories, the malware’s structure becomes integrated into the model’s learned syntax. In a production environment, developers utilizing these models could face severe downstream risks, including:

Code generation tools autonomously suggesting obfuscated malware.
Security analysis models failing to flag embedded backdoors due to normalized exposure.
Automated dependency managers recommending compromised packages.

The resulting vulnerability is essentially an undetectable, poisoned model. The malicious logic resides within the neural network’s weights and biases, evading traditional static analysis, sandboxes, and antivirus heuristics. This represents an advanced supply chain attack that transcends individual repositories, potentially compromising the automated tools relied upon for secure software development.

The Mitigation: A Quick Band-Aid Fix

Until the root cause of the commit hijacking is definitively identified and remediated, interim containment measures are necessary.

Because the malware relies on injecting contiguous, high-entropy Base64 strings, implementing a strict Git Pre-commit Hook is an effective preventative control. By configuring a hook to block commits containing anomalously long strings (e.g., exceeding 100 characters without whitespace), developers can halt the local injection before it reaches the remote repository.

Below is a conceptual implementation for a .git/hooks/pre-commit file:

1
2
3
4
5
6
7
8


#!/bin/bash
# A simple pre-commit hook to catch massive base64 injections

if git diff --cached | grep -E '[a-zA-Z0-9+/]{100,}'; then
    echo "🚨 SECURITY ALERT: A suspiciously long string (potential Base64 payload) was detected."
    echo "Commit rejected. Please review your code for injected malware."
    exit 1
fi

Conclusion

Supply chain attacks are demonstrating unprecedented levels of sophistication. Threat actors are shifting focus from production environments directly to local development ecosystems, hijacking version control workflows, and leveraging decentralized blockchain infrastructure for resilient C2 operations.

Development teams must proactively audit their repositories, rigorously vet third-party dependencies, and establish automated checks against obfuscated code injections to maintain the integrity of their software supply chains.

The Suspicious Snippet#

Cracking the Code#

The Monster Inside#

The Scale of the Infection#

The Real Danger: Poisoned AI Models#

The Mitigation: A Quick Band-Aid Fix#

Conclusion#