Filters

Filters provide a powerful way to transform and select data in expressions. Inspired by Unix pipes and JQ, filters let you chain operations to process inputs, parameter values, and other data.

Overview

A filter is a reusable transformation that can be applied to data using the pipe operator |. Filters can:

  • Select items based on criteria (e.g., by file type)

  • Transform data structures (e.g., extract field values)

  • Aggregate information (e.g., count by type)

Example:

tasks:
  - name: compile
    needs: [find_sources]
    consumes: [std.FileSet]
    with:
      sv_files:
        value: "${{ inputs | by_filetype('.sv') }}"

The expression inputs | by_filetype('.sv') pipes the inputs array through the by_filetype filter, which returns only items with .sv extensions.

Using Filters

Basic Usage

Apply a filter using the pipe operator:

# Filter by file extension
sv_files: "${{ inputs | by_filetype('.sv') }}"

# Filter by type
filesets: "${{ inputs | by_type('std.FileSet') }}"

# Get all paths
all_paths: "${{ inputs | paths }}"

Filters with Parameters

Some filters accept parameters:

# Filter by specific extension
verilog: "${{ inputs | by_filetype('.v') }}"

# Get first item of type
first_fs: "${{ inputs | first_of_type('std.FileSet') }}"

# Extract specific field
types: "${{ inputs | pluck('type') }}"

Chaining Filters

Chain multiple filters for complex transformations:

# Get unique sorted extensions
extensions: "${{ inputs | paths | extensions | unique | sort }}"

# Get basenames of SystemVerilog files
sv_names: "${{ inputs | by_filetype('.sv') | basenames }}"

# Count then reverse
data: "${{ values | sort | reverse }}"

Combining with Builtins

Filters work seamlessly with JQ-style builtins:

# Count filtered items
sv_count: "${{ inputs | by_filetype('.sv') | length }}"

# Get last path
last_file: "${{ inputs | paths | last }}"

# Split and filter
basename: "${{ path | split('/') | last }}"

Defining Filters

Filters are defined at the package level using the filters key.

Basic Filter

A simple filter without parameters:

package:
  name: myproject

  filters:
    - name: text_files
      export: true
      expr: |
        input[] | select(input.path | split(".") | last | . == "txt")

This filter iterates through input and selects items with .txt extensions.

Parameterized Filter

Filters can accept parameters using with:

filters:
  - name: by_pattern
    export: true
    with: [pattern]
    expr: |
      input[] | select(input.path | test($arg0))

Parameters are accessed as $arg0, $arg1, etc. (positional).

Usage:

matches: "${{ inputs | by_pattern('.*_test\\.sv') }}"

Executable Filters

Filters can execute Python or Shell scripts:

filters:
  - name: process_data
    export: true
    run: |
      import json
      import sys
      data = json.load(sys.stdin)
      result = [item for item in data if item['size'] > 1000]
      print(json.dumps(result))
    shell: python

The script receives data via stdin and outputs JSON via stdout.

Filter Visibility

Filters support visibility modifiers:

Export

Make a filter available to importing packages:

filters:
  - name: public_filter
    export: true
    expr: "input[]"

Local

Hide a filter from child packages:

filters:
  - name: internal_filter
    local: true
    expr: "input[]"

Default Visibility

By default, filters have name visibility (visible to package and children, but not to importers).

filters:
  - name: package_filter
    expr: "input[]"

Qualified Names

Reference filters from other packages:

result: "${{ inputs | mypackage.custom_filter }}"

Standard Library Filters

See Standard Library for documentation of all standard library filters, including:

  • by_filetype(ext) - Filter by file extension

  • by_type(typename) - Filter by data type

  • basenames - Extract filenames

  • extensions - Extract file extensions

  • paths - Get all paths from FileSets

  • first_of_type(typename) - Get first item of type

  • pluck(field) - Extract field values

  • count_by_type - Count by type

Advanced Topics

Runtime vs Static Evaluation

Filters can be evaluated at different times:

Static evaluation (during graph construction):

# params is known at graph construction
count: "${{ params | length }}"

Runtime evaluation (during task execution):

# inputs only known at runtime
file_count: "${{ inputs | length }}"

Expressions referencing inputs or memento are automatically deferred and evaluated at runtime.

Error Handling

Filters should handle missing or invalid data gracefully:

# Safe: returns empty array if no matches
result: "${{ inputs | by_filetype('.sv') }}"

# Safe: returns null if type not found
first: "${{ inputs | first_of_type('MyType') }}"

Working with Types

Filters often work with data types. Access type information:

# Get type field
types: "${{ inputs | pluck('type') }}"

# Filter by type
filesets: "${{ inputs | by_type('std.FileSet') }}"

# Count by type
counts: "${{ inputs | count_by_type }}"

Best Practices

Keep Filters Simple

Prefer simple, single-purpose filters:

# Good: focused filter
- name: sv_files
  expr: "input[] | by_filetype('.sv')"

# Better: reuse built-in
sv_files: "${{ inputs | by_filetype('.sv') }}"

Use Descriptive Names

Filter names should clearly indicate their purpose:

# Good names
- name: verilog_sources
- name: by_author
- name: active_tasks

# Avoid
- name: filter1
- name: process
- name: do_stuff

Document Parameters

When filters take parameters, document them:

filters:
  - name: files_larger_than
    # Parameters:
    #   size_kb: minimum file size in kilobytes
    with: [size_kb]
    expr: |
      input[] | select(input.size > ($arg0 * 1024))

Prefer Expression Filters

Use expr over run when possible:

  • Expression filters are faster (no process spawn)

  • Expression filters are more declarative

  • Expression filters are easier to debug

Only use run for complex logic requiring full scripting language.

Test Your Filters

Test filters with various inputs:

# Test with empty input
result: "${{ [] | my_filter }}"

# Test with single item
result: "${{ [item] | my_filter }}"

# Test with missing fields
result: "${{ items_without_path | my_filter }}"

Known Limitations

Current limitations of the filter system:

map() and select() Arguments

The map() and select() builtins currently evaluate their arguments before execution. This prevents per-item field access inside these operations:

# This does NOT work as expected:
filtered: "${{ inputs | map(select(input.type == 'FileSet')) }}"

# Workaround: use simpler filters
filtered: "${{ inputs | by_type('FileSet') }}"

This is a known limitation and may be addressed in future versions.

Implicit Input Context

Unlike native JQ, dv-flow uses explicit input variables rather than implicit . context:

# JQ style (not supported):
result: "${{ .[] | select(.type == 'A') }}"

# dv-flow style (use explicit input):
result: "${{ input[] }}"

Negative Array Indices

Negative array indices are not currently supported:

# Not supported:
last: "${{ array[-1] }}"

# Use instead:
last: "${{ array | last }}"

See Also