Cohort Analysis with Interactive Simulation

Published: Author: Oliver Staubli, Founder & Data ScientistTags: Data VisualizationE-CommerceMarketingExploratory Data AnalysisRetailSalesExamplesTraining

Existing customers are 5 to 7 times more valuable than new customers. Period. Despite of this well known fact many companies still prioritize new customer acquisition at the expense of marketing programs designed to retain existing customers. A while back in 2011 Forrester found in a study that individual resellers spend almost 80% of their interactive marketing budgets on customers acquisition.[1]

Evidence that this disproportion hasn't changed much in the last five years can be found in the consistent high priority companies still give to the Conversion Rate as a Key Performance Indicator (KPI). Quite contrary to the rarely calculated KPI of the Retention Rate, that is how many customers have been retained over time. For an optimal marketing mix resulting in the maximal Return On Investment (ROI) however, both KPIs must be optimized simultaneously.

The value of existing customers, both returning and repeat purchasers, was analyzed in 2012 by Adobe in a broad study of anonymous data from 33 billion visits from 180 online retail websites from USA and Europe.[2]
The main findings were:

Benefits of Cohort Analysis

Cohort analysis provides a visual approach to the marketing mix optimization mentioned above. In addition it illustrates how revenue is relating to new and existing customers. At the same time the cohort analysis visualizations easily depict the success and failure of marketing campaigns.

Comparison of Total Revenue without and with colored cohorts

But first, let's have a look at the definition of cohorts: A cohort is a group of people who share a common characteristic or experience within a defined period.[3]
In the following example customers with the same acquisition date (or rather acquisition month) were assigned into the same cohort. It's reasonable as well to take demographic characteristics like age, gender or mobile/desktop access for assigning customers to cohorts. In the subsequent cohort analysis various KPIs like e.g. revenue, number of customers, retention rate, conversion rate and others can be analyzed and visualized. With the help of cohort analysis the behavior of customergroups over time can be shown in detail.

Interactive Simulation

The motivation behind this article was to make cohort analysis more tangible and therefore easier to understand. All the examples I found on the web explained the process of getting from the data via cohort analysis to a visualization (singular!) and how one can read off the cohort behavior (singular!) from this chart. For a deeper understanding of the cohort analysis visualizations a variety of behavior examples would be helpful. This finding led to the construction of the following interactive simulation of a fictive shop: It lets you select and alter the behavior of the cohorts interactively. The transactional data in the background is then adjusted, the cohort analysis applied and the visualizations updated - all in real-time. I hope this simulation helps to "grasp" the various cohort analysis visualizations in a playful way.

Chart 1: Total Revenue alias "Layer-Cake Graph"[4]

Before dealing with the colorful cohorts in the first chart, let's click on the following link: Hide cohorts. The chart below should now depict the traditional view on revenues over the past twelve months without any information about the split by cohorts. Click the link once more to show the cohorts again. As explained earlier, in this example we used the month of acquisition to group the customers in cohorts: Customers in cohort C01 have last month (December) purchased for the first time. Customers in cohort C12 are customers since 12 month now, as they were converted last January. The grey colored area represents revenues from even older customers, who were aquired more than 12 months ago.

Chart 1: Total Revenue colored by cohorts

Thanks to the coloring of the cohort revenues in chart 1, the red share of revenue by the new customers can be easily distinguished from the remaining revenue of the repeat purchasers. The following configuration presets display the cohort analysis visualization of different, constant retention rates per cohort:

By clicking the symbol on any chart and by changing the slider "Cohort Retention Rate Global" you can try out the effect of different retention rate on the visualizations.

Chart 2: Revenue per Cohort

Maybe the easiest way to explain the second visualization is by using the preset 80% Retention Rate: The revenues per cohort and month in chart 2 are no longer displayed in summary but separately. Each cohort has an initial revenue peak in the month of its acquisition and declines afterwards by 20% each month with the current configuration.

Chart 2: Revenue per cohort

The dashed lines in chart 2 connect all corresponding monthly revenues across cohorts - corresponding in relation to the month since acquisition: The red-dashed line e.g. connects the monthly revenues of the first month after acquisition of the correspondent cohort. From the fact that the red-dashed line runs horizontally we can conclude that the in this configuration all the cohorts started with the same revenue. And because all the other dashed lines are horizontal as well, this confirms that in this configuration the retention rate across all cohorts is in fact constant. How this changes with a variable revenue per month and cohort is depicted in the following preset examples: Increasing Start-Revenues and Decreasing Start-Revenues. Clearly evident by the curved, dashed lines are the following summer effects: Increasing Summer-Revenues und Decreasing Summer-Revenues. Please check also the impact on chart 1.

Chart 3: Revenue per cohort (aligned with acquisition)

In contrast to the chart 2, the monthly cohort revenues in the last visualization are aligned with the month since acquisition. In the example Increasing Start-Revenues the red-dashed line denotes again the start-revenues in the first month of the acquisition M01. This aligned visualization has the advantage of making it easier to compare the courses of the monthly revenues starting from the acquisition month. With the current configuration you should see all cohort revenues running parallel, as only the starting-height changes across cohorts but the retention rate stays constant.

Chart 3: Revenue per cohort (aligned with acquisition)

In chart 3, the fine red line denotes the revenues of the very last month of the cohort analysis period (December). This red line was already visible rightmost in chart 2. The additional solid-grey lines connect the corresponding monthly revenues that lie further back. Having fixed start-revenues the visualization clearly shows the difference with the following two configuration examples: Constant Retention and Increasing Retention. In the last instance it is obvious that in the second month after acquisition (M02) the more recent cohorts have a noticeable better retention rate (compare e.g. C02 and C12). In the configuration example Decreasing Retention the revenue courses run exactly the other way round.

Of course, one can construct special cases here as well, for example where the summer has a positive or negative impact on the retention per cohort. In the preset Increasing Summer-Retention the monthly revenues of cohorts K02 and K12 are decreasing faster than the cohorts K08 acquired in summer. You can see the revenue courses running conversely in preset Decreasing Summer-Retention. Again, it is worth comparing chart 1 and chart 2 as well for a better understanding.

Cohort Analysis Quiz

Would you like to test if you can figure out yourself randomly generated behaviors from the three charts above? Just click on the link below and compare your conclusion to the actual configuration:
Generate Behavior
Show Solution


  • Cohort Retention Rate Global
  • Cohort Retention Change per Month
  • Cohort Revenue Change per Month
  • Cohort Revenue Change in Summer
  • Cohort Revenue Change in Summer


  • 0% Retention Rate
  • 50% Retention Rate
  • 100% Retention Rate
  • Retention Change
  • Positive Trend
  • Negative Trend
  • Positive Summer
  • Negative Summer


  1. Forrester - US Interactive Marketing Forecast, 2011 To 2016 (November 2011)
  2. Adobe - The ROI from Marketing to Existing Online Customers (September 2012)
  3. Wikipedia - Cohort study
  4. AnalyzeCore - Cohort analysis with R – “layer-cake graph” (Mai 2014)

Are you interested in a Cohort Analysis on your own Data?

Get in touch