Merging Datasets in Code Repository Based on a Criteria

Hello all, I’m trying to merge two datasets in code repository using pyspark based on whether or not the values from one column in a dataset matches another column in the opposing dataset and if they do, to add the corresponding jobtitle and department columns to the merged dataset. The issue that I’m having is that the “matched” columns are only displaying null values, and in one instance, only matched with one user. Despite several checks to ensure no discrepancies between columns in both datasets, the problem persists. Any insight would be greatly appreciated!

So it sounds like you’re joining two datasets together? If you can’t get the results you want in code, try using Contour to validate the join logic and investigate what might be amiss.

Hum, lot of questions here, if you do the same thing via pipeline builder do you have the same result? Are the joins of the same format?