Choosing between Code Workbooks, Code Workspaces, Code Repositories, or Pipeline Builder

How can I determine when it’s appropriate to use Code Workbooks, Code Workspaces, Code Repositories, or Pipeline Builder?

Are you building a data pipeline?

Use code-repositories or pipeline-builder

Use Pipeline Builder if:

  • You need to build a streaming pipeline
  • You don’t know how to code
  • You do know how to code but need to build the pipeline quickly
  • You need to collaborate with subject-matter experts who need a point-and-click interface to validate the transformation logic

Use Code Repositories if:

  • You need capabilities not available in Pipeline Builder (e.g. raw file access)
  • You want to create UDFs for use in other repositories or Pipeline Builder files
  • Your organization requires pipelines to be written in SQL/Python/Java
  • You need to access external services in the context of a pipeline
  • You need to apply/remove Markings or work with Encryption Channels

Are you doing analysis, building a model, etc.?

Use code-workbooks or code-workspaces

Use Code Workspaces if:

  • You want a Jupyter Lab or R Studio UI/UX
  • You want to leverage one machine for compute instead of a horizontally scalable cluster (i.e. no Spark)

Use Code Workbooks if:

  • You need to write PySpark, SparkSQL, or SparkR code
  • You want to test out logic prior to incorporating it into a Code Repositories pipeline
  • You want to mix-and-match different languages in a single file

Are you writing Functions to use in Actions or a Workshop App?

Use code-repositories

Are you exposing a model via an easy-to-access endpoint?

Use code-repositories (i.e. wrap the model in a Function and publish that Function)

1 Like

Thanks Taylor! This is a very helpful guide.