Are you at risk from this critical dbt vulnerability?
A newly discovered critical security vulnerability in the dbt ecosystem
UPDATE 17th July 2024: CVE-2024-40637 assigned and noted in GitHub.
CVSS score 4.2.
Today we’re sharing news of a critical security vulnerability that affects users of the dbt package ecosystem. This vulnerability, which I discovered with Michal Czerwinski, highlights the challenges our industry faces around the security of new software package supply chains. We responsibly disclosed our concerns to dbt Labs, who accepted the vulnerability and have implemented mitigations.
Understanding the vulnerability
The dbt tool is widely used to transform data within data warehouses. It allows data analysts and engineers to write modular SQL queries, which can be used in data pipelines.
dbt’s power and flexibility has made it a popular choice in the analytics engineering space, but that same flexibility also introduces significant risks. Because dbt brings its own ecosystem of software packages, the core of this vulnerability is the trust model inherent in software supply chains.
The potential impact of this vulnerability is severe. An attacker could:
- Manipulate data: alter or delete data, leading to data integrity issues
- Exfiltrate data: extract sensitive information from or change permissions in the database
“During a threat assessment for one of our clients, we encountered several security concerns. As I explored how to securely expose the DBT ecosystem to our developers, it became clear that there are significant challenges in addressing software supply chain security within the current DBT module ecosystem.” – Michal Czerwinski
When users install dbt packages from sources other than dbt Labs, they trust that these packages perform the advertised function and nothing more. In affected versions, the new vulnerability abuses the way dbt generates SQL, allowing a malicious dbt package to execute SQL injection attacks without any user interaction. An attacker could craft a dbt package that, once installed, could change, exfiltrate, or delete data within the victim database. We believe this vulnerability affects both dbt-core and the dbt Cloud hosted service.
We should note that dbt packages are not Python packages. They are a part of a dbt-specific package ecosystem that is largely unknown to the infosec community. Software Composition Analysis (SCA) tools like safetycli and Snyk can, along with Static Application Security Testing (SAST), scan third party and transient dependencies, alerting users to known vulnerabilities they might be exposed to.
This is a critical blind spot for users who depend on such tooling to inform them of vulnerabilities they are exposed to.
Simple example: exfiltrating at scale on Google Cloud via dbt
Here’s a simple exploit we crafted to demonstrate the problem. An attacker creates a malicious dbt package that copies your data out of Google BigQuery in the background whilst performing its advertised function.
The attacker creates a project named “myco_example_project” in Google Cloud, and creates a dataset “example_dataset” inside. This dataset is shared with public Data Editor permissions, so a table can be created in this dataset, and data copied into it from anywhere.
The attacker also creates or compromises a dbt package and publishes that in a public GitHub repository and dbt Hub, whilst also deploying marketing to tempt or trick unsuspecting victims into installing it. As with most such ecosystems, the dbt Hub documentation explicitly states that they do not “certify or confirm the … security of any Packages”, as reiterated in the disclaimer. It is for the consumer to accept the risk of installing a specific package.
Within our exploit package’s directory structure is an innocuous-looking file “macros/example.sql”, starting with the following Jinja macro text:
An unsuspecting victim installs the package from GitHub or dbt Hub. With no further interaction, they execute `dbt run` as usual, or it is run by their automation.
In affected versions of dbt, this macro is run silently in place of the legitimate and trusted BigQuery adapter’s version. The contents of whatever `SELECT *` produces against this model (and for each of the set of models included in the run) is copied into a new table in the attacker’s dataset in seconds. Evidence of the exfiltration would only be present in the dbt log files and GCP audit logging, neither of which would, by default, proactively alert the victim of the attack.
How to mitigate against this dbt vulnerability
The vendor has provided mitigations for the issue with the config flag require_explicit_package_overrides_for_builtin_materializations. The behaviour of this flag varies by versions of dbt core and dbt Cloud, so refer to the Legacy Behaviours documentation to understand your current position and upgrade options. We offer the following advice for any dbt users to assess and mitigate the risks posed by this vulnerability:
- Explicitly set the config flag require_explicit_package_overrides_for_builtin_materializations to True in dbt_project.yml for all your dbt projects.
- dbt-core versions are Python dependencies. dbt Labs have recently updated their documentation to making a strong recommendation to keep versions up-to-date. Ensure dbt-core versions are actively updated to the latest versions as these fixes become available, including in dbt Cloud.
- Review dbt package usage in your organisation. Ensure packages are obtained from trusted sources like dbt vendor itself, check that the value of a package outweighs the risk.
- Ensure software dependencies are being scanned for known vulnerabilities, and that you have a vulnerability management process in place to respond to any alarms.
- Review and minimise permissions that dbt is run with for human and unattended workloads.
- Review the controls you have in place in your infrastructure that prevent transfer of data outside your organisational boundaries.
These assessments and mitigations can prove challenging to undertake in practice. Equal Experts has published a Secure Delivery Playbook with lots of advice for applying security principles. I’ve also shared the practices I follow to assess the risk a package represents and to automatically update dependencies without causing chaos in my teams.