When Beer meets Diaper: A Retail Tale! 🍻🩲
Also, let's understand how AOV is important for businesses and how our decisions are influenced by Anchoring Effect.
Hello, curious minds and data enthusiasts! Welcome to our 3rd edition of DataPulse Weekly, where we unravel the magic behind data and its impact on our daily lives.
Each newsletter promises to be a journey through the fascinating intersections of data, technology, and human experiences. Whether you're a data analyst, a tech enthusiast, or simply curious about how data shapes our world, you're in the right place.
Before we delve into today’s newsletter, we'd like to extend our heartfelt gratitude to all the early believers who have supported our endeavor to bring you this data newsletter. We deeply appreciate the encouragement from our readers, who find our content ‘unique and refreshing’ and value the ‘great analogies’ we offer. Thank you to everyone who has joined us on this newsletter journey. Now, let's dive straight in!
Today’s Data Menu
Data Case Study: When Beer Meets Diaper: A Retail Tale!
Metric of the Week: Average Order (Basket) Value
Visualization Spotlight: E-commerce AOV Trends of Top Countries
Human Bias Focus: Anchoring Effect
Data Nugget: Average vs Median
Data Case Study:
Have you ever heard of the famous retail Diaper and Beer story from the 1990s? The tale goes like this: customers buying diapers were also grabbing beer! Quite peculiar, isn't it? While there's no solid evidence pinpointing which retailer initiated this strategy, it's often associated with Walmart. They reportedly placed diapers and beer together in one section of the retail shop to boost sales of both items.
In today’s data case study, we will look at how a retailer like Walmart might have unlocked this key association through Market Basket Analysis. Today we will look at one such famous Market Basket Analysis algorithm — the Apriori algorithm. Fret not! If you're familiar with basic mathematics, you'll soon grasp this simple yet powerful algorithm by the end of this newsletter.
Let’s apply the Apriori algorithm to a sample dataset of 6 transactions at a retail store, featuring five items: Beer, Diapers, Bread, Chips, and Butter -
Transactions:
Beer, Diapers, Butter
Diapers, Butter, Bread
Beer, Diapers, Bread
Bread, Butter
Beer, Diapers, Chips
Beer, Chips
The Apriori algorithm starts with calculating Support. Support measures the occurrence of an item or itemset within all transactions. In other words -
Support = Transactions containing that item / Total Transactions
Now, let's proceed with the calculations -
Step 1: Eliminate low-occurring items
Support(Beer) = 4/6 ≈ 0.67 (Transactions containing Beer / Total Transactions)
Explanation: Beer exists in transactions # 1, 3, 5, and 6 out of a total of 6 transactions. Similarly, we calculate the support for other items.Support(Diapers) = 4/6 ≈ 0.67
Support(Bread) = 3/6 ≈ 0.5
Support(Butter) = 3/6 ≈ 0.5
Support(Chips) = 2/6 ≈ 0.33
Here, we need to set an assumption for the Minimum Support Threshold. This threshold helps us concentrate on important items in our transaction dataset while disregarding those with low occurrence. Let's assume a Minimum Support Threshold of 0.4, meaning an item should appear in at least 40% of transactions. Don't worry about assumptions; algorithms rely on them, and we can adjust them in real-world projects based on desired outcomes. This assumption for the support threshold would remain the same throughout the rest of this exercise.
Given our support threshold of 0.4, beer, diapers, bread, and butter meet the criteria, while chips do not. So, we discard chips at this point and proceed with other items. Pretty intuitive, right? Hold on for 2 more such quick calculations before we conclude this section.
Step 2: Eliminate low-occurring pairs
Next, we look for combinations of two items (2-itemsets) that also meet or exceed the support threshold:
Support(Diapers, Beer) = 3/6 ≈ 0.50 (Transactions containing both Diapers and Beer / Total Transactions)
Explanation: Diapers and Beer both exist in transactions # 1, 3, and 5 out of a total of 6 transactions. Similarly, we calculate the support for other possible pairs.
Support(Beer, Bread) = 1/6 ≈ 0.17
Support(Beer, Butter) = 1/6 ≈ 0.17
Support(Diapers, Butter) = 2/6 ≈ 0.33
Support(Diapers, Bread) = 2/6 ≈ 0.33
Support(Bread, Butter) = 2/6 ≈ 0.33
Based on our support threshold of 0.4, the following 2-itemsets meet or exceed this criterion: Diapers and Beer with a support of ≈0.50. Therefore we discard all other pairs except Beer and Diapers from the next step.
Step 3: Evaluate Item Pair Strength Using Confidence Metric
Confidence - This reflects the probability of purchasing item Y when item X is bought. It quantifies the strength of the association between two items in a transaction. Put simply, it shows how often people buy Beer when they're also buying Diapers.
Now, let's establish a Minimum Confidence Threshold. For this exercise, we'll assume it to be 60%. Remember, assumptions are inherent in data analysis. This indicates that if item X is purchased, item Y should be bought along with it at least 60% of the time.
Now, let's calculate the Confidence for the selected 2-itemsets.
Confidence(Diapers → Beer) = Support(Diapers, Beer) / Support(Diapers) = 0.5 / 0.67= 0.75, indicating that if Diapers are bought, Beer is also bought 75% of the time based on our sample transaction dataset. This meets our 60% threshold criteria.
Our analysis has highlighted the strong association between Beer and Diapers, with a confidence level suggesting a significant likelihood that these items are purchased together. While we won't explore Lift analysis, the final part of the Apriori Algorithm, in this article, it's worth noting that it could further enhance our insights.
This fascinating link between unrelated items shows how understanding consumer behavior can reveal new opportunities for retailers. The retailer discovered that men often bought beer while purchasing diapers on Friday nights. By strategically placing these items together, they capitalized on convenience and impulse buying tendencies, leading to a significant increase in sales.
Key Takeaways:
The Apriori algorithm is a game-changer in retail, offering a lens into complex consumer behaviors through Market Basket Analysis. It's not just a tool but a strategic asset for giants like Walmart, enhancing marketing, optimizing store layouts, and streamlining inventory. Beyond retail, it identifies cross-sell opportunities in services, guiding targeted customer engagement strategies.
With its simplicity and effectiveness, Apriori cuts through the noise of thousands of transactions, revealing product associations that can dramatically improve business outcomes. It turns data into actionable insights, making it indispensable for businesses aiming to understand and capitalize on customer preferences.
This leads us to explore a fascinating metric significantly influenced by such algorithms: the Average Order Value (AOV).
Metric of the Week: Average Order (Basket) Value
Average Order Value (AOV) measures the average amount spent by customers per transaction with a business. Pretty straightforward, right? The AOV formula is simple -
AOV = Total Revenue in a Specific Period / Number of Orders Placed in the Same Period
So, why should one care about AOV?
Increasing the Average Order Value (AOV) directly boosts revenue and order profitability. This can be achieved through strategic cross-selling and offering creative incentives like bundle deals or free shipping thresholds. Additionally, a higher AOV helps businesses optimize revenue against fixed costs such as shipping, making each sale more cost-effective. By employing techniques like market basket analysis, businesses can enhance customer engagement and encourage higher spending per transaction, making every order more valuable.
And that's the scoop on AOV. It's not just a number—it's a handy guide for businesses to better understand and improve how much money they make from each sale.
💡 Remember that building a data mindset is effective only when we focus on solving data-related problems. The below question is designed for exactly this kind of practice. We will address this in the last section of this newsletter.
Food for thought:
Imagine you are a social media manager who encounters an influencer boasting an average of 600k views per Instagram video. This opportunity seems like a golden ticket to achieving your goal of 600k new impressions. However, should you go ahead and collaborate with them immediately, or should you pause and consider whether more information is needed?
Visualization Spotlight:
E-commerce AOV Trends of Top Countries:
Human Bias Focus: Anchoring Effect
Did you know there are more than 180 ways your brain can trick you? These tricks, called cognitive biases, can negatively impact the way humans process information, think critically and perceive reality. They can even change how we see the world. In this section, we'll talk about one of these biases and show you how it pops up in everyday life.
In the quiet courthouse of Germany in 2006, a surprising test happened. 10+ years of experienced judges, respected for their fairness, faced an unusual challenge: rolling dice.
Here's what happened: These German judges got a hypothetical case about stealing. Then, they rolled dice that only showed three or nine. Strangely, the dice outcome affected how long they wanted to send a fake thief to jail.
The judges who got a three said about five months, while those who got a nine said about eight months. Even though they were experienced judges, they still got tricked by this dice game.
That's the Anchoring Effect in action. It occurs when an initial piece of information (the "anchor") serves as a reference point for all subsequent decisions and evaluations, even if it's unrelated or arbitrary. It's a common bias in negotiations, pricing strategies, and financial decisions, where the initial number set can unduly influence perceptions of value and fairness.
Here are some everyday examples of the anchoring effect that most of you might relate to:
You walk into a brand store to purchase a new pair of headphones. The first pair you see is priced at 5k INR. That's pretty steep, right? But then, you see another pair for 3.5k INR. Suddenly, that 3.5k INR feels like a bargain! Why? Because your brain has made the 5k INR its anchor. It's the standard you compare everything else to, even if it's not the best standard to use.
When signing up for a service online, you might see a monthly subscription priced at $10 next to a yearly subscription of $90. The monthly rate serves as an anchor, making the yearly option seem like a bargain since it's presented as saving you $30 over the year, even if you weren't planning to commit to a full year initially.
Remember, understanding any bias is the first step to overcoming its impact on our decision-making.
Data Nugget: Average vs Median
Diving into data's crucial role in decision-making, let’s get back to our previous question: Should we partner with this social media influencer immediately? The answer is - No, it's wise to request additional data, and here’s why it’s essential.
Looking at average views is like seeing the big picture without the details. The average of 600k views per Instagram video sounds impressive, but it might be misleading. Why? Because a few viral hits could inflate this average, giving a skewed perspective of the influencer's usual reach. That's where the median comes in handy. The median number of views tells you the middle value of all their video views when lined up in order. So, if an influencer's median views are significantly lower than their average, it indicates that their typical video might not perform as well as the average suggests. This is crucial for setting realistic expectations for your campaign's reach.
Average vs. Median in Other Use Cases:
Average Use Case: Use the average when you want an overall idea of performance or value and the data is uniformly distributed without extreme outliers. For instance, calculating the average time spent on a customer service call can give you a good idea of efficiency if most calls are about the same length.
Median Use Case: Use the median when your data set includes outliers or is skewed. For example, when looking at household incomes in a neighborhood, a few very high incomes can skew the average upwards, making it look like everyone earns more than they actually do. The median income would provide a better indicator of the neighborhood's financial situation.
Understanding when to use average and when to use median helps you interpret data more accurately, ensuring that decisions are based on realistic and relevant metrics.
That wraps up our newsletter for today! We've simplified intricate data concepts and will keep doing so in future editions. If you found this helpful, please consider subscribing and sharing it with your friends—it motivates us to create more content. Next time you're shopping grocery online, don’t forget to take note of frequently bought items, and if you prefer offline shopping, pay attention to the arrangement of unusual products nearby.
Visualisation Source -