Outliers: A World Beyond the Ordinary 📊✨
Master 3 Techniques to Handle Outliers & Conquer the Sunk Cost Fallacy
Hello, data-driven minds, welcome to the 15th edition of DataPulse Weekly!
Another milestone, but we are just getting started 🚀
Each newsletter promises a journey through the fascinating intersections of data, stories, and human experiences.
Whether you're a data professional or simply curious about how data and decisions shape our world, you're in the right place.
Now, let’s dive straight into it -
Today’s Data Menu
📊 Case Study: Outlier Treatment
💹 Metric: Bounce Rate
🧠 Human Bias: Sunk Cost Fallacy
📊 Data Case Study: Outlier Treatment
Most datasets will contain certain values that are significantly different from the majority of the data. These extreme values are known as outliers.
Outliers are not inherently bad. They can occur naturally or result from system errors.
Before deriving insights from data or building machine learning models, it is crucial to identify outliers and apply appropriate outlier treatment methods.
Let’s look at some real-world examples:
Football scores can range from 0 to 149 (Yes, that happened). Read more here.
A single stock market trade can involve millions of transaction values.
E-commerce delivery days can be recorded as high as 90 days due to delays from local delivery partners.
Most of the time, outliers will not impact your analysis. However, in certain situations, outliers can significantly distort outcomes.
We are here to provide you with a clear and concise way to treat outliers. But before we discuss techniques to handle outliers, let’s first understand why treating outliers is important:
Enhancing Metric Accuracy: Ensuring that metrics such as mean and standard deviation are not influenced by extreme values.
Improving Data Insights: Gaining more reliable insights into performance, trends, and behaviors, helping to identify areas for improvement.
Better Decision-Making: Making more informed decisions based on data that accurately represents typical patterns.
Now that we understand the importance of treating outliers, how do we identify them?
In a previous edition, we covered two commonly used outlier identification techniques: the Z-Score Method and the Interquartile Range (IQR). Check it out for more details (takes 2 minutes)!
Once identified, let's explore the three most popular ways to treat outliers and when to use each technique:
1. Removing the Outliers
When to Use:
The outliers are due to data entry errors or are irrelevant to the analysis.
The dataset is large enough that removing a few data points won't significantly impact the analysis.
Example: In the case of football match scores, most scores typically range from 0 to 10. However, if there is an entry like the infamous 149-0 match due to an unusual circumstance, such an extreme value can be removed.
2. Capping Outliers to Upper or Lower Boundaries
When to Use:
The outliers are genuine observations but extreme.
Removing them might lead to a loss of important information.
The analysis needs to retain all data points for better representation.
Example: In the stock market, a single trade can involve millions of transaction values, which are valid but extreme. To manage this, you can cap the extreme values at a reasonable upper boundary based on historical trading volumes.
3. Imputing Outliers by Replacing with Median or Mean
When to Use:
The dataset cannot afford to lose any data points.
The outliers are suspected to be errors, and you have a reasonable estimate of their true value.
Example: For e-commerce delivery times, if some entries show extreme values due to local delivery partner delays, replace these outliers with the median or mean delivery time to ensure consistency.
Conclusion
Outlier treatment is crucial in data analysis and building machine learning models, particularly in datasets where extreme values can distort results.
By identifying and appropriately handling outliers using methods like removal, capping, or imputation, you ensure that your analysis is robust, accurate, and reflective of real-world patterns.
This approach enhances the integrity of your insights, ultimately supporting better decision-making.
Next, we'll explore Bounce Rate, a key web analytics metric for understanding user engagement.
💹 Metric: Bounce Rate
Bounce Rate is the percentage of visitors who leave a website after viewing only one page. It provides insights into how well a site engages its visitors.
Importance of Bounce Rate:
User Engagement: High bounce rates can indicate that visitors either found exactly what they needed or were not sufficiently engaged to explore further.
Content Quality: Reflects whether the content meets visitor expectations and needs.
SEO Impact: A high bounce rate can negatively affect search engine rankings.
By monitoring and improving the bounce rate, you can enhance user engagement and overall site performance.
Now, let’s move on to the last section of our newsletter -
🧠 Human Bias: Sunk Cost Fallacy
Did you know there are more than 180 ways your brain can trick you? These tricks, called cognitive biases, can negatively impact the way humans process information, think critically and perceive reality. They can even change how we see the world. In this section, we'll talk about one of these biases and show you how it pops up in everyday life.
Imagine you've invested a significant amount of money in a particular stock, believing it would bring substantial returns.
As time goes by, the stock's value starts to plummet. Despite the continual decline, you decide to hold onto the stock, hoping it will rebound.
This is the scenario of the sunk cost fallacy.
The sunk cost fallacy is a human bias where people continue investing in a failing venture because of the time, money, or effort they've already put in.
This bias leads us to make irrational decisions, believing that abandoning the investment means admitting that the previous investments were wasted.
Other examples include:
Companies continue funding failing projects because of significant prior investments.
Couples stay in long-term, unfulfilling relationships because of the time they have invested.
People stay in unsatisfying careers for years due to the significant time and effort already invested, despite the lack of satisfaction or growth.
Understanding the sunk cost fallacy helps us make better decisions by focusing on future benefits rather than past investments. Recognizing this bias allows us to cut our losses and move forward rationally, leading to more effective outcomes.
Remember, recognizing any bias is the first step to overcoming its impact on our decision-making.
Check out the short video clip of Naval Ravikant sharing a relatable story in his unique style. If you haven't heard of Naval, you're in for a treat with his wealth of insightful content.
That wraps up our newsletter for today! If you found this valuable, please consider subscribing and sharing it with just one person—it motivates us to create more content.
Recommended Next Read:
This week’s newsletter was fun to read, especially the human bias section. 🤓