Testing and deploying UDFs with dbt

Inspired by our previous post on unit testing dbt and discussion in the dbt forums, I looked into how to get dbt to help me test and deploy UDFs. Although dbt doesn’t directly support UDFs, it turns out it’s actually fairly straightforward to define, test and reference them in your models. I’ll be using BigQuery in this post, but I think the techniques will apply to other warehouses that support UDFs like Snowflake, Redshift and Spark, too.

What’s a UDF and what does it have to do with dbt?

A UDF is a User-Defined Function. Like any other type of function, they’re really helpful to capture and name chunks of logic that you can apply in queries, and I talked about them before, starting from Tip 4 in our Data Warehouse Tips post. dbt takes care of deploying the tables and views that you might use your UDFs in, so robustly referencing a function from your models is important. When I say ‘robustly’, I mean ensuring that the function exists and that you’re calling the function from the appropriate location – for example, avoiding hard-coding your production functions everywhere. There’s no native support for referencing a UDF like there is with `source` or `ref`, nor any native way to deploy a function in a project. This post explains how you can solve those problems.

The example

I have a BigQuery dataset that I use for my personal development work. This is referenced in my own dbt profile. When I run dbt commands on my machine, they take effect here.

There are test and production datasets that my CI system has profiles for. I cannot run ad-hoc dbt commands in these datasets. When I push my changes, the CI system will run any dbt commands necessary to deploy my updates and run tests. I’ll not be talking about that in this post, but it’s important that I’m referencing the UDF that is deployed in the same environment that I’m working in – otherwise I can’t iterate safely, relying on my tests.

The example UDF I’ll use is a simple predicate function, located in `my_schema`, from string to bool that returns true if and only if the string value can be interpreted as a positive integer:

CREATE OR REPLACE FUNCTION my_schema.is_positive_int(a_string STRING)


AS (

  REGEXP_CONTAINS(a_string, r'^[0-9]+$')


Defining a UDF in dbt

We need to write a dbt “macro” – a Jinja template – that parameterises where the UDF should go. In `macros/is_positive_int_udf.sql`:

{% macro is_positive_int_udf() %}

CREATE OR REPLACE FUNCTION {{ target.schema }}.is_positive_int(a_string STRING)


AS (

  REGEXP_CONTAINS(a_string, r'^[0-9]+$')


{% endmacro %}

Note the macro name matches the file and that `target.schema` is templated out.

To have dbt actually deploy this for us, we add a hook in our `dbt_project.yml`:


- '{{ is_positive_int_udf() }}'

Now, when we `dbt run`, the UDF will be deployed.

11:11:52  Running 1 on-run-start hooks

11:11:52  1 of 1 START hook: my_project.on-run-start.0.................... [RUN]

11:11:52  1 of 1 OK hook: my_project.on-run-start.0....................... [OK in 0.49s]

Kind of yucky in the console output with just list indices to identify what’s what –  but not the end of the world.

Testing a UDF in dbt

Now our UDF is schema-aware and is being deployed by dbt, we can write tests for it with minimal ceremony. In `tests/is_positive_int.sql`:

WITH examples AS (

  SELECT 'foo' the_string, FALSE expected


  UNION ALL SELECT '5 not this', FALSE

  UNION ALL SELECT 'still not this 5', FALSE



, test AS (



  {{ target.schema }}.is_positive_int(the_string) actual

  FROM examples



Running `dbt test` runs the pre-hooks and then the tests, perfect.


11:25:22  Running 1 on-run-start hooks

11:25:22  1 of 1 START hook: my_project.on-run-start.0.................... [RUN]

11:25:23  1 of 1 OK hook: my_project.on-run-start.0....................... [OK in 0.58s]


11:25:24  1 of 1 START test is_positive_int............................... [RUN]

11:25:25  1 of 1 PASS is_positive_int..................................... [PASS in 1.02s]

11:25:26  Finished running 1 tests, 1 hooks in 1.46s.

Using the UDF in a Model

We’ve actually already seen how to do this, in the test.

In `models/my_model.sql`:

SELECT {{ target.schema }}.is_positive_int(maybe_positive_int_column)

FROM {{ ref(...) }}


There you have it. In summary, to incorporate UDFs neatly into your dbt models:

  • Write a macro for your UDF, templating the target schema
  • Add a call to macro to `on-run-start` in your `dbt_project.yaml`
  • Write some tests, of course
  • Use the UDF in your models, templating the target schema