@lightweight transforms: external & new container-powered capabilities

taylor · January 31, 2024, 5:35am

From the docs

Improved External Transforms Performance with Lightweight Mode | External transforms now supports the @lightweight API, resulting in improved performance by reducing startup overhead. This enhancement is ideal for external transforms as they are typically not Spark dependent or CPU-intensive, leading to better latency and throughput.

Why this matters

For most datasets in most enterprises, you probably don’t need the full power of Spark. Lightweight Transforms typically have less overhead and can offer performance benefits on small data.

The ability to hit external endpoints from within lightweight transforms means that they’re one step closer to being a full replacement for Spark-based transforms for most use cases.

taylor · February 3, 2024, 2:53pm

Just a day or two after positing this, there’s a new update to @lightweight transforms that significantly expands their usefulness: “Lightweight transforms now support a wide range of data processing engines and bring-your-own-container (BYOC) workflows.”

I think that @lightweight transforms should become a lot of users’ default mode of writing transforms. The only big thing I can see that’s missing is support for incremental transforms, which I hope is coming in the not-so-distant future.