Exclude outliers from analysis

Published December 20, 2021
Modules
Difficulty

Often when working with big datasets and collecting huge amounts of data, it happens that many outliers end up in the database. In general, outliers are not beneficial to analysis and while representing true data points, they can obscure insights in the data.

Here we demonstrate two examples how you use outliers exclusion in analytics using quantiles. The first case is creating a segmentation of clients with typical purchases, and the second one is an example how to calculate a metric that only calculates typical (without outliers) values.

Prerequisites


Process


In this use case, you will go through the following steps:

  1. Create segmentation of clients with typical purchases.
  2. Create aggregate to define outliers.
  3. Create segmentation for customers who are not outliers.
  4. Create metric to calculate the sum of transactions without outliers.

Create segmentation of clients with typical purchases


We aim to create a segment of clients and include only typical behaviors with regards to the selected analytic. For example, a segment of clients that made typical purchases in a given time period.

Note: In this example, “typical” means that the value of purchases is between 5th and 95th percentiles, however this may be adjusted as needed.

Create aggregate to define outliers


You need to create two separate aggregates (outliers/high and outliers/low) to define the quantile which denotes outlier values.

  1. Go to Analytics icon Analytics > Aggregates > Create aggregate.
  2. As the aggregate type, select Profile.
  3. Enter the name of the aggregate.
  4. Set the Analyze profiles by option to Quantile and select the appropriate value (in this case 95 for high outliers or 5 for low outliers).
  5. Click Choose event.
  6. From the dropdown list, select transaction.charge.
    Note: Events may have different labels between workspaces, but you can always find them by their action name (in this step, it’s transaction.charge).
  7. Select the totalAmount attribute.
  8. Save the aggregate.
  9. Repeat steps 1-7 to create the other aggregate.
    A single aggregate should look more or less like this:
Example setup of aggregate to capture outliers
Outliers

Create segmentation for customers who are not outliers


In this stage, you create a segmentation that list the customers who are not outliers.

  1. Go to Analytics icon Analytics > Segmentations > New segmentation.
  2. Enter the name of the segmentation.
  3. On the canvas, click Choose filter.
  4. From the dropdown list, select the transaction.charge event.
  5. Click the where input that appeared on the canvas.
  6. Click $totalAmount.
    If the attribute is not visible in the list, you can use the search field.
  7. Choose the More than number operator.
  8. Click the Analytics icon icon until it changes to dictionary
  9. From the list of available dictionaries, select the low outlier aggregate you create earlier.
  10. Add the high outlier aggregate by repeating steps 5-9, but change the operator to Less than.
  11. Save the segmentation.

The end result should look more or less like this:

Example setup of segmentation that excludes outliers
Outliers

Calculate the sum of transactions without outliers


In this part, create a simple metric that calculates the sum of transactions for a given time range, without outliers in the transactions.

  1. Go to Analytics > Metrics > New Metric.
    Note: You can find the detailed instructions on creating simple metrics here.
  2. Enter the name of the metric.
  3. As the aggregator, select Sum.
  4. From the dropdown list, select the transaction.charge event.
  5. Click the where input that appeared on the canvas.
  6. Click $totalAmount.
    If the attribute is not visible in the list, you can use the search field.
  7. Choose the More than number operator.
  8. Click the Analytics icon icon until it changes to dictionary
  9. From the list of available dictionaries, select the low outlier aggregate you create earlier.
  10. Add the high outlier aggregate by repeating steps 5-9, but change the operator to Less than.
  11. Save the metric.
Example definition of metric that excludes outliers
Metric without outliers

Check the use case set up on the Synerise Demo workspace


You can check the analyses created in this use case in our Synerise Demo workspace:

If you don’t have access to the Synerise Demo workspace, please leave your contact details in this form, and our representative will contact you shortly.

Read more


😕

We are sorry to hear that

Thank you for helping improve out documentation. If you need help or have any questions, please consider contacting support.

😉

Awesome!

Thank you for helping improve out documentation. If you need help or have any questions, please consider contacting support.

Close modal icon Placeholder alt for modal to satisfy link checker