k-means is a Kind of Machine Learning Model which uses clustering technique and unsupervised approach. Using k-means clustering we can quickly get insights from unlabeled data.
Using k-means for Customer Segmentation
Customer segmentation is the practice of partitioning a customer base into groups of individuals that have similar characteristics.
It is a significant strategy as a business can target these specific groups of customers and effectively allocate marketing resources.
For example, one group might contain customers who are high-profit and low-risk, that is, more likely to purchase products, or subscribe to a service. A business task is to retain those customers.
We will be importing important libraries which is needed for the analysis.
We will now be downloading the Dataset which will be worked upon
Read the data
Getting Insights from data and dealing with Missing values
We can drop “Address” Column which is not in much use for Feature selection. Moreover this is categorical values while K-means uses numerical values.
Using StandardScaler to standardize the dataset.
Lets Draw Elbow Curve
Let’s run our model and group our customers into three clusters.
Note that each row in our dataset represents a customer, and therefore, each row is assigned a label.
We can easily check the centroid values by averaging the features in each cluster
Visualizing Data using lmplot
k-means cluster has partitioned the customers into three groups since we specified the algorithm to generate 3 clusters.
The customers in each cluster are similar to each other in terms of the features included in the dataset.
Now we can create a profile for each group, considering the common characteristics of each cluster. For example, the 3 clusters can be:
- Label-1:OLDER, HIGH INCOME, AND INDEBTED
- Label-0: MIDDLE AGED, MIDDLE INCOME, AND FINANCIALLY RESPONSIBLE
- Label-2: YOUNG, LOW INCOME, AND INDEBTED
From the above plot it can be easily seen that Customers segmented as “Label-0” are the one who has “Never Defaulted” the payment.
We can also see that Label-0 customers have “years of experience” more than “Label-2” however less than “Label-1” Customers.
We can easily say that the Label2 Customers who has average experience as 3.689 years of employment are more likely to Default the payment.
Hence The Actionable insights would be:
- The customers having very less Employment years are most likely to default the payment.
- Customers have Employment years more than 10 years are highly indebted and most likely to default the payment
3. Only Customers having Employment years averaging 7 years are the one who most likely will never default. So its wise to give loans to such customers.
I hope you enjoyed the reading. This clustering algorithm provided us with insight into the dataset and lead us to group the data into three clusters.
Perhaps the same results would have been achieved but using multiple tests and experiments.
Happy Reading !!!