How can I determine when it’s appropriate to use Code Workbooks, Code Workspaces, Code Repositories, or Pipeline Builder?
Are you building a data pipeline?
Use code-repositories or pipeline-builder
Use Pipeline Builder if:
- You need to build a streaming pipeline
- You don’t know how to code
- You do know how to code but need to build the pipeline quickly
- You need to collaborate with subject-matter experts who need a point-and-click interface to validate the transformation logic
Use Code Repositories if:
- You need capabilities not available in Pipeline Builder (e.g. raw file access)
- You want to create UDFs for use in other repositories or Pipeline Builder files
- Your organization requires pipelines to be written in SQL/Python/Java
- You need to access external services in the context of a pipeline
- You need to apply/remove Markings or work with Encryption Channels
Are you doing analysis, building a model, etc.?
Use code-workbooks or code-workspaces
Use Code Workspaces if:
- You want a Jupyter Lab or R Studio UI/UX
- You want to leverage one machine for compute instead of a horizontally scalable cluster (i.e. no Spark)
Use Code Workbooks if:
- You need to write PySpark, SparkSQL, or SparkR code
- You want to test out logic prior to incorporating it into a Code Repositories pipeline
- You want to mix-and-match different languages in a single file
Are you writing Functions to use in Actions or a Workshop App?
Are you exposing a model via an easy-to-access endpoint?
Use code-repositories (i.e. wrap the model in a Function and publish that Function)
1 Like
Thanks Taylor! This is a very helpful guide.