Contour Visual - SKU (Long-tail) Analysis

Hi all! I’m looking for some help in Contour. My goal is to show the % of SKUs that comprise 80% of total revenue. I believe the way I can do this is to use a Chart > Vertical Bar Chart with a Line Chart overlay (that’ll be a pareto), with each bar representing a SKU, and the following detail:

  1. The y-axis being net_sales, thus each bar is a SKU’s sales, sorted high to low
  2. the x-axis being pre-calculated percent of SKUs, with each SKU bar being 1/X total SKUs.
  3. An overlay of the line chart being % of revenue, from 0-100%, which will ultimately be a pareto, where I can fix a line at 80%

Can’t quite get anything to look right, so if anyone has an idea, would be great to get some guidance.

I’m not sure how to get a nice looking vertical line in Contour. You might have to use Vega Plots in Quiver or something custom in Code Workbooks or Workspaces for that level of polish.

But I can get a parametrized pareto chart that conveys what I think you’re after.

I start with a simple dataset where each row is a SKU and its total sales.

Next I compute what percent of total sales each SKU makes up. I then calculate the cumulative sum for each SKU’s sales, ranked from most sales to least. Then I categorize each SKU as falling above or below the pareto value, which is a parameter. In this example I’ve set it to 80%.

Finally, I plot this on a bar chart and segment by is_greater_than_pareto_line to color each SKU accordingly. There are two visual groups, though no additional line.

Actually, here’s a screenshot showing how to get a line if that’s what you want.

First, use a lag expression to identify the row that is just above the pareto value: "is_greater_than_pareto_line" AND NOT lag("is_greater_than_pareto_line", 1) OVER (ORDER BY "sales" DESC). Then, set that row’s value to the max sale value.

CASE
WHEN "is_on_pareto_line" THEN max("sales")
ELSE 0
END

Finally, plot the same sales bar chart and add an overlay — also a bar chart — of that new column.

Since only one row has a non-zero value and since that row is the max value of the entire dataset, it will appear as a vertical line. If you change the pareto value, then this will recompute and move.