Connecting vs. full build?

What is the difference between a connecting build and a full build?

A connecting build is when one or more transforms are triggered because one or more input dataset was updated. This allows updates to propagate forward as and when new data arrives.

I am not sure exactly what you mean by a ‘full build’, but a common pattern is to configure a schedule to build or one or more datasets, and all their dependencies, at a certain time. This can be inefficient because the schedule might not line up with the arrival patterns of new data, leading to longer than neccessary update times. Also be aware of the ‘force build’ option which will run transforms and build datasets, consuming compute and storage, even if there are no updates to data or code.

Full Build would trigger a build of the target datasets and all of the upstream datasets. Creating full build schedules may lead to an increase of job count as lineage changes.

Connecting Build allows to build only part of the upstream pipeline by defining input datasets. Just like Full Build, all of the upstream datasets are considered, but only the datasets that are descendants (direct or indirect) of one of the input datasets would be built. This allows to define a clear scope of a build without manually selecting all the intermediary datasets.

Both in Full Build and Connecting Build, target datasets are always included. While inputs are often set up as triggers to a schedule, it’s not mandatory for any of the inputs to be the triggers of the schedule.