Cohort benchmarking without the privacy creep: how oddly compares your ROAS to your category

Most benchmark tools work one of two ways. They expose individual merchant data behind a paywall, or they ask you to upload your data into a black box and trust that nothing leaks. oddly's cohort benchmarking does neither. Here is how the architecture works and why it matters for any merchant who has been burned by a benchmark provider.

The setup

Benchmarks tell you whether your ROAS is good for your category. Triple Whale's industry benchmarks exist. Northbeam has them. Polar has them. They are useful, and they all rely on a centralised aggregation step where individual merchant data is collected, normalised, and then served back as a category median.

The catch: the centralised step is opaque. You do not see what gets collected, when it gets anonymised, or who has access. You upload your data and you trust the operator.

oddly's architecture inverts this. The merchant data never leaves the merchant's own logical boundary in identifiable form. The anonymisation runs at insertion time, not retrieval time, and the cohort floors prevent any single merchant from being reidentified.

What gets collected

The benchmark requires three numbers per merchant per week: blended ROAS, total spend band, and category tag. That is it. No customer-level data, no order detail, no product catalogue.

Before insertion into the benchmark pool, each merchant's contribution is hashed with SHA-256 over a salted merchant ID plus the week boundary. The hashed row is the only thing that lives in the benchmark store. The salt rotates weekly, which means even the merchant ID is not linkable across weeks. You cannot reconstruct a merchant's trajectory by following the hash.

Cohort floor

Every cohort needs a minimum of 5 merchants before the benchmark surfaces. Below the floor, the cohort returns null and the dashboard shows a yellow "insufficient cohort" state instead of a number. This is the privacy gate.

Five is not a research-grade statistical floor. It is a privacy floor. Below five, a clever observer could narrow the contributing merchants by category and revenue band and start reidentifying. At five and above, the cohort median is not attributable to any individual merchant.

What the dashboard shows you

Three states.

Green. Cohort has at least 10 contributing merchants. Median is shown alongside your own value with a percentile bracket (top quartile, middle quartiles, bottom quartile).

Yellow. Cohort has 5 to 9 contributing merchants. Median is shown with a caveat: small cohort, treat as directional. No percentile bracket; the spread is too narrow to claim significance.

Red. Cohort has fewer than 5 contributing merchants. No median shown. The dashboard tells you the cohort is gated and you will see numbers when the cohort grows.

The colour states are the privacy contract made visible. You always know whether the number you are looking at is reliable, suggestive, or unavailable.

Why this matters

Three reasons it matters in practice, not just on a privacy compliance checkbox.

Trust survives a leak. If oddly's benchmark store leaked tomorrow, the attacker would have a pile of hashed rows with a rotating salt. The rows do not link to merchants, do not link across weeks, and do not contain the raw spend or revenue. A leak is embarrassing but not material.

The merchant stays in control. Cohort contribution is opt-in. The setting is at /dashboard/settings/data-contributions. Off by default. Toggling on contributes your data and unlocks the cohort view; toggling off stops both. There is no asymmetric deal where you must contribute to see the benchmark of others.

The cohort defines itself. Categories are merchant-tagged, not algorithmically inferred. A baby apparel brand chooses to be in the apparel cohort and the baby and toddler subcohort. Two layers of granularity, both opt-in. You can be in the broader category for cohort-floor reasons and only see the median there; you can join the narrower one when the subcohort grows.

What yellow-gated means in practice

In normal use, the dashboard shows yellow when your category subcohort is between 5 and 9 contributing merchants. You will see the median, you will not see your percentile, and you should treat the comparison as suggestive.

Yellow does not mean the data is bad. It means the spread is too narrow to be confident about whether you are above or below median by a meaningful margin. With 5 to 9 datapoints, half a standard deviation moves you across percentiles. Wait for the cohort to grow, or use the broader category cohort.

What it does not do

The benchmark is not a forecast. It tells you where you are relative to your category this week. It does not tell you where you will be next week, what to change, or whether the category itself is healthy. Reallocation recommendations live on the Autopilot tier and use the benchmark as one input, not the only input.

The benchmark also does not cross categories. A baby apparel brand benchmarks against baby and toddler brands, not against electronics or beauty. The cohort architecture is built to prevent cross-category contamination, because a high-margin beauty median would mislead a low-margin apparel operator into bad calls.

What oddly does about this

Cohort benchmarking is gated to the Nudge tier and above. Opt in at /dashboard/settings/data-contributions. The cohort store is hashed at insertion, salted weekly, and rotates the salt out so historical hashes stop being linkable. The cohort floor at 5 is the privacy gate, not a statistical claim. Two cohort layers (category, subcategory). Opt out at any time; your historical contributions remain in the hashed pool but stop accumulating from the toggle date forward.

Cohort benchmarking without the privacy creep: how oddly compares your ROAS to your category.