==================
Incremental Builds
==================

DV Flow Manager supports incremental builds by tracking task execution 
state and skipping tasks that are already up-to-date. This can significantly
reduce build times when only a subset of inputs have changed.

How It Works
============

Tasks record their inputs, outputs, and parameters in an ``exec.json`` file
in their run directory. On subsequent runs, DV Flow Manager checks:

1. **Exec record existence**: If no ``exec.json`` exists, the task must run.
2. **Parameter values**: If recorded parameters differ from current values, re-run.
3. **Input data**: If any input dataset is marked as 'changed', re-run.
4. **Input structure**: If the number, position, or elements of inputs differ, re-run.
5. **Custom up-to-date check**: If provided, invoke the custom method for final confirmation.

CLI Options
===========

The following CLI options control incremental build behavior:

``-f, --force``
    Force all tasks to run, ignoring up-to-date status. Useful when you 
    want to rebuild everything regardless of what has changed.

    .. code-block:: bash

        dfm run my_task --force

``-v, --verbose``
    Show all tasks including up-to-date ones. By default, tasks that are
    determined to be up-to-date are not displayed in the output. Use this
    option to see all tasks that were evaluated.

    .. code-block:: bash

        dfm run my_task --verbose

Custom Up-to-Date Methods
=========================

For tasks that reference files not explicitly listed in a fileset, you can
provide a custom up-to-date check method. The ``uptodate`` field in a task 
definition accepts one of three values:

- ``false``: Always run the task (never consider it up-to-date)
- A non-empty string: Python method to evaluate
- Empty/null: Use the default up-to-date check

YAML Configuration
------------------

.. code-block:: yaml

    tasks:
    - name: compile_sources
      uptodate: false  # Always recompile

    - name: check_dependencies
      uptodate: mymodule.check_deps_uptodate  # Custom check method

Implementing a Custom Check
---------------------------

A custom up-to-date method is an async Python function that receives an 
``UpToDateCtxt`` object and returns a boolean indicating whether the task
is up-to-date:

.. code-block:: python

    async def check_deps_uptodate(ctxt: UpToDateCtxt) -> bool:
        """
        Custom up-to-date check that verifies external dependencies.
        
        Returns True if task is up-to-date, False if it needs to run.
        """
        import os
        
        # Check if a specific file has been modified since last run
        dep_file = os.path.join(ctxt.rundir, "external_dep.txt")
        
        if not os.path.exists(dep_file):
            return False
        
        # Compare timestamps, checksums, etc.
        # ...
        
        return True

See :doc:`pytask_api` for complete documentation of the ``UpToDateCtxt`` class.

Task Output
===========

When a task is determined to be up-to-date, it is marked as such in the 
output:

.. code-block:: text

    << [1] Task mypackage.compile (up-to-date) 0.05ms

The previous output data is loaded from the task's ``exec.json`` file,
allowing downstream tasks to use cached results without re-executing.


The Memento System
==================

DV Flow Manager uses a memento pattern to store task execution state for
incremental builds. Each task can save arbitrary data that describes its
execution, which is used to determine if re-execution is needed.

How Mementos Work
-----------------

When a task executes:

1. Task runs and produces outputs
2. Task optionally creates a memento (dictionary of state data)
3. Memento is saved to the task's exec.json file
4. On next run, memento is passed to up-to-date check

Mementos typically store:

* File timestamps
* Content hashes
* Configuration values
* Dependency versions
* Any data needed to detect changes

Creating Mementos
-----------------

Task implementations return mementos in the TaskDataResult:

.. code-block:: python

    async def MyTask(ctxt, input) -> TaskDataResult:
        # Perform work
        result_file = os.path.join(ctxt.rundir, "output.txt")
        with open(result_file, "w") as f:
            f.write("result data")
        
        # Create memento with file timestamp
        memento = {
            "result_file": result_file,
            "timestamp": os.path.getmtime(result_file),
            "parameters": {
                "param1": input.params.param1
            }
        }
        
        return TaskDataResult(
            changed=True,
            memento=memento
        )

On the next run, this memento is available in ``input.memento`` for
comparison with current state.

Accessing Mementos
------------------

In up-to-date checks:

.. code-block:: python

    async def CheckUpToDate(ctxt: UpToDateCtxt) -> bool:
        if ctxt.memento is None:
            return False  # No previous execution
        
        # Access saved memento data
        prev_timestamp = ctxt.memento.get("timestamp")
        prev_params = ctxt.memento.get("parameters", {})
        
        # Compare with current state
        current_timestamp = os.path.getmtime(ctxt.memento["result_file"])
        current_params = {"param1": ctxt.params.param1}
        
        # Up-to-date if nothing changed
        return (prev_timestamp == current_timestamp and
                prev_params == current_params)

Memento Best Practices
----------------------

**Keep mementos small**: Only store data needed for change detection

**Use structured data**: Store dictionaries and lists that serialize to JSON

**Include version info**: Store schema versions if memento format might change

**Handle missing data**: Check for None and missing keys gracefully

**Be deterministic**: Ensure memento generation is consistent

Exec.json Structure
===================

Each task's execution data is stored in a JSON file with the following
structure:

.. code-block:: json

    {
      "name": "my_pkg.my_task",
      "status": 0,
      "changed": true,
      "params": {
        "param1": "value1",
        "param2": 42
      },
      "inputs": [
        {
          "type": "std.FileSet",
          "filetype": "verilogSource",
          "files": ["file1.v", "file2.v"]
        }
      ],
      "outputs": [
        {
          "type": "std.FileSet",
          "filetype": "simImage",
          "files": ["sim_image.so"]
        }
      ],
      "memento": {
        "timestamp": 1703456789.123,
        "checksum": "abc123def456"
      },
      "markers": [
        {
          "severity": "warning",
          "message": "Unused signal detected",
          "location": {
            "file": "src/module.v",
            "line": 42,
            "column": 10
          }
        }
      ],
      "exec_info": [
        {
          "cmd": ["verilator", "-c", "file.v"],
          "status": 0
        }
      ],
      "start_time": "2024-12-24T10:30:00Z",
      "end_time": "2024-12-24T10:30:05Z",
      "duration_ms": 5432
    }

Fields
------

* **name**: Fully-qualified task name
* **status**: Exit status (0 = success, non-zero = failure)
* **changed**: Whether the task's output changed from previous execution
* **params**: Task parameters used for this execution
* **inputs**: List of input data items received from dependencies
* **outputs**: List of output data items produced
* **memento**: Custom state data for incremental builds
* **markers**: Warnings, errors, and info messages
* **exec_info**: Commands executed and their results
* **start_time, end_time, duration_ms**: Timing information

Using Exec Data
---------------

Exec data files can be used for:

* **Debugging**: Inspect what inputs a task received
* **Analysis**: Review parameters and execution time
* **Documentation**: Generate reports on build process
* **Testing**: Verify task behavior and outputs

Example: Reading exec data:

.. code-block:: python

    import json
    
    with open("rundir/my_task/my_task.exec_data.json") as f:
        data = json.load(f)
    
    print(f"Task took {data['duration_ms']}ms")
    print(f"Executed {len(data['exec_info'])} commands")
    
    for marker in data['markers']:
        print(f"{marker['severity']}: {marker['message']}")

Advanced Incremental Build Patterns
====================================

Content-Based Detection
-----------------------

Instead of timestamps, use content hashes for more reliable detection:

.. code-block:: python

    import hashlib
    
    async def ComputeHash(filepath):
        hasher = hashlib.sha256()
        with open(filepath, 'rb') as f:
            for chunk in iter(lambda: f.read(4096), b""):
                hasher.update(chunk)
        return hasher.hexdigest()
    
    async def MyTask(ctxt, input):
        # Process files
        output_file = os.path.join(ctxt.rundir, "output.dat")
        # ... create output ...
        
        # Store content hash
        file_hash = await ComputeHash(output_file)
        
        return TaskDataResult(
            memento={"output_hash": file_hash}
        )

Dependency Tracking
-------------------

Track external dependencies explicitly:

.. code-block:: python

    async def TrackDependencies(ctxt: UpToDateCtxt) -> bool:
        # Read dependency file
        dep_file = os.path.join(ctxt.srcdir, "dependencies.txt")
        if not os.path.exists(dep_file):
            return False
        
        with open(dep_file) as f:
            current_deps = f.read().splitlines()
        
        # Compare with saved dependencies
        saved_deps = ctxt.memento.get("dependencies", [])
        
        # Check if list changed
        if set(current_deps) != set(saved_deps):
            return False
        
        # Check if any dependency file changed
        for dep in current_deps:
            if not os.path.exists(dep):
                return False
            current_time = os.path.getmtime(dep)
            saved_time = ctxt.memento.get(f"dep_time_{dep}")
            if saved_time is None or current_time != saved_time:
                return False
        
        return True

Multi-Stage Checking
--------------------

Combine multiple checks for accuracy:

.. code-block:: python

    async def ComplexUpToDate(ctxt: UpToDateCtxt) -> bool:
        # Quick check: parameters changed?
        if ctxt.memento.get("params") != ctxt.params.model_dump():
            return False
        
        # Medium check: timestamps changed?
        for file in ctxt.memento.get("files", []):
            if os.path.getmtime(file) != ctxt.memento.get(f"time_{file}"):
                return False
        
        # Expensive check: content hashes changed?
        for file in ctxt.memento.get("critical_files", []):
            current_hash = await ComputeHash(file)
            if current_hash != ctxt.memento.get(f"hash_{file}"):
                return False
        
        return True