Filters¶
Filters provide a powerful way to transform and select data in expressions. Inspired by Unix pipes and JQ, filters let you chain operations to process inputs, parameter values, and other data.
Overview¶
A filter is a reusable transformation that can be applied to data using the
pipe operator |. Filters can:
Select items based on criteria (e.g., by file type)
Transform data structures (e.g., extract field values)
Aggregate information (e.g., count by type)
Example:
tasks:
- name: compile
needs: [find_sources]
consumes: [std.FileSet]
with:
sv_files:
value: "${{ inputs | by_filetype('.sv') }}"
The expression inputs | by_filetype('.sv') pipes the inputs array
through the by_filetype filter, which returns only items with .sv
extensions.
Using Filters¶
Basic Usage¶
Apply a filter using the pipe operator:
# Filter by file extension
sv_files: "${{ inputs | by_filetype('.sv') }}"
# Filter by type
filesets: "${{ inputs | by_type('std.FileSet') }}"
# Get all paths
all_paths: "${{ inputs | paths }}"
Filters with Parameters¶
Some filters accept parameters:
# Filter by specific extension
verilog: "${{ inputs | by_filetype('.v') }}"
# Get first item of type
first_fs: "${{ inputs | first_of_type('std.FileSet') }}"
# Extract specific field
types: "${{ inputs | pluck('type') }}"
Chaining Filters¶
Chain multiple filters for complex transformations:
# Get unique sorted extensions
extensions: "${{ inputs | paths | extensions | unique | sort }}"
# Get basenames of SystemVerilog files
sv_names: "${{ inputs | by_filetype('.sv') | basenames }}"
# Count then reverse
data: "${{ values | sort | reverse }}"
Combining with Builtins¶
Filters work seamlessly with JQ-style builtins:
# Count filtered items
sv_count: "${{ inputs | by_filetype('.sv') | length }}"
# Get last path
last_file: "${{ inputs | paths | last }}"
# Split and filter
basename: "${{ path | split('/') | last }}"
Defining Filters¶
Filters are defined at the package level using the filters key.
Basic Filter¶
A simple filter without parameters:
package:
name: myproject
filters:
- name: text_files
export: true
expr: |
input[] | select(input.path | split(".") | last | . == "txt")
This filter iterates through input and selects items with .txt extensions.
Parameterized Filter¶
Filters can accept parameters using with:
filters:
- name: by_pattern
export: true
with: [pattern]
expr: |
input[] | select(input.path | test($arg0))
Parameters are accessed as $arg0, $arg1, etc. (positional).
Usage:
matches: "${{ inputs | by_pattern('.*_test\\.sv') }}"
Executable Filters¶
Filters can execute Python or Shell scripts:
filters:
- name: process_data
export: true
run: |
import json
import sys
data = json.load(sys.stdin)
result = [item for item in data if item['size'] > 1000]
print(json.dumps(result))
shell: python
The script receives data via stdin and outputs JSON via stdout.
Filter Visibility¶
Filters support visibility modifiers:
Export¶
Make a filter available to importing packages:
filters:
- name: public_filter
export: true
expr: "input[]"
Local¶
Hide a filter from child packages:
filters:
- name: internal_filter
local: true
expr: "input[]"
Default Visibility¶
By default, filters have name visibility (visible to package and children,
but not to importers).
filters:
- name: package_filter
expr: "input[]"
Qualified Names¶
Reference filters from other packages:
result: "${{ inputs | mypackage.custom_filter }}"
Standard Library Filters¶
See Standard Library for documentation of all standard library filters, including:
by_filetype(ext)- Filter by file extensionby_type(typename)- Filter by data typebasenames- Extract filenamesextensions- Extract file extensionspaths- Get all paths from FileSetsfirst_of_type(typename)- Get first item of typepluck(field)- Extract field valuescount_by_type- Count by type
Advanced Topics¶
Runtime vs Static Evaluation¶
Filters can be evaluated at different times:
Static evaluation (during graph construction):
# params is known at graph construction
count: "${{ params | length }}"
Runtime evaluation (during task execution):
# inputs only known at runtime
file_count: "${{ inputs | length }}"
Expressions referencing inputs or memento are automatically deferred
and evaluated at runtime.
Error Handling¶
Filters should handle missing or invalid data gracefully:
# Safe: returns empty array if no matches
result: "${{ inputs | by_filetype('.sv') }}"
# Safe: returns null if type not found
first: "${{ inputs | first_of_type('MyType') }}"
Working with Types¶
Filters often work with data types. Access type information:
# Get type field
types: "${{ inputs | pluck('type') }}"
# Filter by type
filesets: "${{ inputs | by_type('std.FileSet') }}"
# Count by type
counts: "${{ inputs | count_by_type }}"
Best Practices¶
Keep Filters Simple¶
Prefer simple, single-purpose filters:
# Good: focused filter
- name: sv_files
expr: "input[] | by_filetype('.sv')"
# Better: reuse built-in
sv_files: "${{ inputs | by_filetype('.sv') }}"
Use Descriptive Names¶
Filter names should clearly indicate their purpose:
# Good names
- name: verilog_sources
- name: by_author
- name: active_tasks
# Avoid
- name: filter1
- name: process
- name: do_stuff
Document Parameters¶
When filters take parameters, document them:
filters:
- name: files_larger_than
# Parameters:
# size_kb: minimum file size in kilobytes
with: [size_kb]
expr: |
input[] | select(input.size > ($arg0 * 1024))
Prefer Expression Filters¶
Use expr over run when possible:
Expression filters are faster (no process spawn)
Expression filters are more declarative
Expression filters are easier to debug
Only use run for complex logic requiring full scripting language.
Test Your Filters¶
Test filters with various inputs:
# Test with empty input
result: "${{ [] | my_filter }}"
# Test with single item
result: "${{ [item] | my_filter }}"
# Test with missing fields
result: "${{ items_without_path | my_filter }}"
Known Limitations¶
Current limitations of the filter system:
map() and select() Arguments¶
The map() and select() builtins currently evaluate their arguments
before execution. This prevents per-item field access inside these operations:
# This does NOT work as expected:
filtered: "${{ inputs | map(select(input.type == 'FileSet')) }}"
# Workaround: use simpler filters
filtered: "${{ inputs | by_type('FileSet') }}"
This is a known limitation and may be addressed in future versions.
Implicit Input Context¶
Unlike native JQ, dv-flow uses explicit input variables rather than
implicit . context:
# JQ style (not supported):
result: "${{ .[] | select(.type == 'A') }}"
# dv-flow style (use explicit input):
result: "${{ input[] }}"
Negative Array Indices¶
Negative array indices are not currently supported:
# Not supported:
last: "${{ array[-1] }}"
# Use instead:
last: "${{ array | last }}"
See Also¶
Expressions - Expression syntax and builtins
Standard Library - Standard library filters and tasks
Dataflow & Produces - Data flow and task dependencies
Using Tasks - Using tasks in flows