Editing a dataset based on PK with spark

How can I use Foundry and spark to go into a dataset and update individual rows according to their keys?

In regular SQL I can just do something like this:

UPDATE TableName
SET Column1 = 'NewValue1', Column2 = 'NewValue2'
WHERE PrimaryKeyColumn = SpecificPrimaryKeyValue;

Unlike with a relational database, you can’t edit Foundry datasets directly*. You need to either (a) change the code that’s used to generate the dataset, of (b) generate a downstream dataset that incorporates the changes you want.

In the case where you need to make a one-off change to certain values, I would suggest making a manually-created dataset by uploading a CSV and then joining that dataset into the one you want to change. That way, in the future you will always have both (a) a record of manual updates you needed to make, and (b) a way of adding additional updates if needed (you just add more rows to the CSV file and upload a new version to that same dataset).

[UPDATE]: You can now create manually-inputted datasets in Pipeline Builder, which offers a potentially faster path to creating the join dataset than creating a CSV and uploading it.

*Exceptions include manually-created datasets, such as those from uploaded Excel sheets or Fusion documents.

1 Like