DRY data documentation by using dbt Doc Blocks

data documentation by using dbt Doc Blocks

Using dbt doc blocks enables DRY (Don’t Repeat Yourself) documentation by defining field descriptions in central Markdown files, improving consistency and maintainability as projects scale. This approach prevents the need to redefine fields in multiple YAML files, allowing updates to be reflected instantly across all downstream models.

The Hidden Cost of Poor Documentation

Documentation is often one of the most overlooked aspects of data work. We build pipelines, structure our models, and deliver dashboards. The people already involved understand what is happening, but what happens when someone new joins?
Without clear documentation, it becomes difficult for them to make sense of anything.
With the rise of AI, this problem is only amplified. We increasingly expect to “talk” to our data, but if that data lacks clear definitions, how can we expect accurate answers in return?

Redefining the Same Fields

So how do we actually make data systems more structured and usable? In dbt one practical way of approaching this is using doc blocks.
In dbt, we use YAML files to define the fields of a table. It typically looks something like this:

				
					models
  - name: stg_supabase__location
    descrption: Staging model for the locations with source supabase
    columns:
        - name: location_id
          description: Unique identifier for each location
				
			

This works well for a single model. However, in practice, models often build on top of each other. As a result, you end up redefining the same fields, like IDs, over and over again in each model.

Redefining the same fields like ID's

Introducing dbt Doc Blocks

When using doc blocks, we define our fields in a Markdown file instead. This allows us to define each field once and let the Jinja templating engine populate the documentation wherever it is needed.
Example markdown file:

				
					{% docs location_id %} 
Unique identifier for each location
{% enddocs %}
				
			

Example YAML file:

				
					models
  - name: stg_supabase__location
    descrption: Staging model for the locations with source supabase
    columns:
        - name: location_id
          description: {{doc('some_id')}}
				
			

Consistent, maintainable and AI-ready documentation

The main advantage of this approach is consistency. We maintain a single, clear definition for each field. If something changes, we update it in one place and that change is reflected everywhere.

Beyond consistency, this also improves maintainability. As projects grow, duplicated definitions quickly become hard to manage and small differences start to appear.

It also keeps your models cleaner by separating structure from documentation.
Most importantly, it makes your data more usable for AI. These systems rely on metadata to understand data. This means you need clear and consistent definitions to generate accurate answers. If your documentation is fragmented or inconsistent, the output will be too. Centralizing definitions helps ensure that both humans and AI are working with the same understanding.

Would you like to learn more about AI-ready documentation or data governance in general? Contact us now.

Ready to Design Your Data Future?

Let's Create Your Strategic Roadmap

Do you want to make better use of your data and information? Contact us now, and learn how we can bring your organisation to the next level.