Exclude outliers from analysis
Often when working with big datasets and collecting huge amounts of data, it happens that many outliers end up in the database. In general, outliers are not beneficial to analysis and while representing true data points, they can obscure insights in the data.
Here we demonstrate two examples how you use outliers exclusion in analytics using quantiles. The first case is creating a segmentation of clients with typical purchases, and the second one is an example how to calculate a metric that only calculates typical (without outliers) values.
Prerequisites
- Implement tracking code on your website.
- Implement transactions.
Process
In this use case, you will go through the following steps:
- Create segmentation of clients with typical purchases.
- Create aggregate to define outliers.
- Create segmentation for customers who are not outliers.
- Create metric to calculate the sum of transactions without outliers.
Create segmentation of clients with typical purchases
We aim to create a segment of clients and include only typical behaviors with regards to the selected analytic. For example, a segment of clients that made typical purchases in a given time period.
Create aggregate to define outliers
You need to create two separate aggregates (outliers/high and outliers/low) to define the quantile which denotes outlier values.
- Go to Analytics > Aggregates > Create aggregate.
- As the aggregate type, select Profile.
- Enter the name of the aggregate.
- Set the Analyze profiles by option to Quantile and select the appropriate value (in this case
95
for high outliers or5
for low outliers). - Click Choose event.
- From the dropdown list, select transaction.charge.
Note: Events may have different labels between workspaces, but you can always find them by their action name (in this step, it’s transaction.charge).
- Select the totalAmount attribute.
- Save the aggregate.
- Repeat steps 1-7 to create the other aggregate.
A single aggregate should look more or less like this:
Create segmentation for customers who are not outliers
In this stage, you create a segmentation that list the customers who are not outliers.
- Go to Analytics > Segmentations > New segmentation.
- Enter the name of the segmentation.
- On the canvas, click Choose filter.
- From the dropdown list, select the transaction.charge event.
- Click the where input that appeared on the canvas.
- Click $totalAmount.
If the attribute is not visible in the list, you can use the search field. - Choose the More than number operator.
- Click the icon until it changes to dictionary
- From the list of available dictionaries, select the low outlier aggregate you create earlier.
- Add the high outlier aggregate by repeating steps 5-9, but change the operator to Less than.
- Save the segmentation.
The end result should look more or less like this:
Calculate the sum of transactions without outliers
In this part, create a simple metric that calculates the sum of transactions for a given time range, without outliers in the transactions.
- Go to Analytics > Metrics > New Metric.
Note: You can find the detailed instructions on creating simple metrics here. - Enter the name of the metric.
- As the aggregator, select Sum.
- From the dropdown list, select the transaction.charge event.
- Click the where input that appeared on the canvas.
- Click $totalAmount.
If the attribute is not visible in the list, you can use the search field. - Choose the More than number operator.
- Click the icon until it changes to dictionary
- From the list of available dictionaries, select the low outlier aggregate you create earlier.
- Add the high outlier aggregate by repeating steps 5-9, but change the operator to Less than.
- Save the metric.
Check the use case set up on the Synerise Demo workspace
You can check the analyses created in this use case in our Synerise Demo workspace:
- Segmentation of customers who made purchase in last 30 days.
- Outlier high aggregate
- Outlier low aggregate
- Segmentation of customers who are not outliers
- Metric that counts the sum of transactions without outliers.
If you don’t have access to the Synerise Demo workspace, please leave your contact details in this form, and our representative will contact you shortly.