1 RFM : [R]ecency [F]requency [M]onetary

1.1 What is RFM Analysis

RFM (recency, frequency, monetary) analysis is a marketing technique used to determine quantitatively which customers are the best ones by examining how recently a customer has purchased (recency), how often they purchase (frequency), and how much the customer spends (monetary). RFM analysis is based on the marketing axiom that “80% of your business comes from 20% of your customers.”

You can click to review more info https://searchdatamanagement.techtarget.com/definition/RFM-analysis

2 An RFM Analysis

2.1 About Our Dataset

The file CDNOW_sample.txt contains the purchasing data for a 1/10th systematic sample of the CDNOW dataset originally used by Fader and Hardie (2001). The master dataset contains the entire purchase history up to the end of June 1998 of the cohort of 23,570 individuals who made their first-ever purchase at CDNOW in the first quarter of 1997.

Each record in this file, 6919 in total, comprises five fields: the customer’s ID in the master dataset, the customer’s ID in the 1/10th sample dataset (ranging from 1 to 2357), the date of the transaction, the number of CDs purchased, and the dollar value of the transaction.

Reference: Fader, Peter S. and Bruce G.,S. Hardie, (2001), “Forecasting Repeat Sales at CDNOW: A Case Study,” Interfaces, 31 (May-June), Part 2 of 2, S94-S107.

2.2 Import Dataset

dataFrame <- read.table("CDNOW_sample.txt")

Now it’s included. We can study on our dataframe.

head(dataFrame)
##   V1 V2       V3 V4    V5
## 1  4  1 19970101  2 29.33
## 2  4  1 19970118  2 29.73
## 3  4  1 19970802  1 14.96
## 4  4  1 19971212  2 26.48
## 5 21  2 19970101  3 63.34
## 6 21  2 19970113  1 11.77

In order to have an idea, we do coding this way.

str(dataFrame)
## 'data.frame':    6919 obs. of  5 variables:
##  $ V1: int  4 4 4 4 21 21 50 71 86 111 ...
##  $ V2: int  1 1 1 1 2 2 3 4 5 6 ...
##  $ V3: int  19970101 19970118 19970802 19971212 19970101 19970113 19970101 19970101 19970101 19970101 ...
##  $ V4: int  2 2 1 2 3 1 1 1 2 1 ...
##  $ V5: num  29.3 29.7 15 26.5 63.3 ...
summary(dataFrame)
##        V1              V2               V3                 V4        
##  Min.   :    4   Min.   :   1.0   Min.   :19970101   Min.   : 1.000  
##  1st Qu.: 5525   1st Qu.: 570.5   1st Qu.:19970222   1st Qu.: 1.000  
##  Median :11749   Median :1193.0   Median :19970415   Median : 2.000  
##  Mean   :11683   Mean   :1175.7   Mean   :19972169   Mean   : 2.382  
##  3rd Qu.:17717   3rd Qu.:1766.0   3rd Qu.:19971030   3rd Qu.: 3.000  
##  Max.   :23569   Max.   :2357.0   Max.   :19980630   Max.   :40.000  
##        V5        
##  Min.   :  0.00  
##  1st Qu.: 14.49  
##  Median : 25.99  
##  Mean   : 35.28  
##  3rd Qu.: 42.97  
##  Max.   :506.97

2.3 Feature Engineering

We don’t need V1 column, then i removed.

dataFrame$V1 <- NULL

If we create columns names, it would be more convenient.

colnames(dataFrame) <- c("customerID","Date","Amount","Price")

Some required conversion operations…

dataFrame$customerID <- as.factor(as.character(dataFrame$customerID))

is.factor(dataFrame$customerID)
## [1] TRUE
dataFrame$Date = as.Date(as.character(dataFrame$Date), "%Y%m%d")

str(dataFrame)
## 'data.frame':    6919 obs. of  4 variables:
##  $ customerID: Factor w/ 2357 levels "1","10","100",..: 1 1 1 1 1112 1112 1581 1692 1803 1914 ...
##  $ Date      : Date, format: "1997-01-01" "1997-01-18" ...
##  $ Amount    : int  2 2 1 2 3 1 1 1 2 1 ...
##  $ Price     : num  29.3 29.7 15 26.5 63.3 ...
Sys.Date() # It's gives todays date.
## [1] "2019-02-19"
referenceDay = max(dataFrame$Date) # Max function gives last day.

Our last day is reference day.

referenceDay
## [1] "1998-06-30"

We should find Recency.

library(dplyr)

rfm_recency <- dataFrame %>% group_by(customerID) %>% 
  summarise(Recency = as.numeric(referenceDay) - as.numeric(max(Date)))

We should find Frequency.

rfm_frequency <- dataFrame %>% group_by(customerID) %>% summarise(Frequency = n())

We should find Monetary.

rfm_monetary <- dataFrame %>% group_by(customerID) %>% summarise(Monetary = sum(Price))

And merge three columns.

rfm <- merge(rfm_recency, rfm_frequency, by="customerID")

rfm <- merge(rfm, rfm_monetary, by="customerID")

Good job, now we continue to clearer analysis.

To know distribution, we use quantile function.

quantile(rfm$Monetary)
##      0%     25%     50%     75%    100% 
##    0.00   19.77   43.06  104.60 6552.70

Creates rankMonetary columns.

rankMonetary <- cut(rfm$Monetary, breaks=c(0,20,45,105,1000,6600))

levels(rankMonetary)
## [1] "(0,20]"          "(20,45]"         "(45,105]"        "(105,1e+03]"    
## [5] "(1e+03,6.6e+03]"

We evaluate Monetary score.

levels(rankMonetary) <- c(1,2,3,4,5)
levels(rankMonetary)
## [1] "1" "2" "3" "4" "5"

To know distribution, we use quantile function.

quantile(rfm$Recency)
##   0%  25%  50%  75% 100% 
##    0  217  473  506  545
rankRecency <- cut(rfm$Recency, breaks=c(0,60,220,473,506,550))

Creates rankRecency columns.

levels(rankRecency) <- c(5,4,3,2,1)

levels(rankRecency)
## [1] "5" "4" "3" "2" "1"

We evaluate Recency score.

To know distribution, we use quantile function.

quantile(rfm$Frequency)
##   0%  25%  50%  75% 100% 
##    1    1    1    3   56
rankFrequency <- cut(rfm$Frequency, breaks=c(0,1,2,3,7,60))

Creates rankFrequency columns.

levels(rankFrequency) <- c(1,2,3,4,5)

levels(rankFrequency)
## [1] "1" "2" "3" "4" "5"

We evaluate Frequency score.

Almost done.

We’re creating current Dataframe.

rfmScores <- data.frame(cbind(rfm$customerID, rankRecency, rankFrequency, rankMonetary))

colnames(rfmScores) <- c("customerID","rankRecency","rankFrequency","rankMonetary")

Let’s see what happened.

head(rfmScores)
##   customerID rankRecency rankFrequency rankMonetary
## 1          1           2             4            3
## 2          2           5             1            2
## 3          3           3             2            2
## 4          4           1             5            4
## 5          5           5             1            2
## 6          6           5             1            1

2.4 Conclusion

For example, we will examine one of the customer. Let’s say, we need detailed information about 51th customer.

rfmScores[rfmScores$customerID == 51,]
##    customerID rankRecency rankFrequency rankMonetary
## 51         51           4             1            1

Yes, let’s examining. This customer has 4 point about rankRecency. It’s so this customer has been shopping recently. This means customer is still our customer. What about other Ranks?

rankFrequency has 1 point. This means, this customer our new customer. I think new customer want to attend new campaign, why not. New customer is suppose to be more shopping.

rankMonetary has 1 point. This means, maybe customer is poor. Maybe customer is stingy, who knows? We never know, but we can create closest guess. Thanks to R Studio and RFM Scoring.

Murat SAHIN | RFM - Analysis & Scoring | December 2018

Please leave your questions, ideas and messages. Thank you, for your review.

This html page has been created by R Studio.