This project aims to conduct an in-depth analysis of customer churn within a fictitious company with over 800,000 customers. The primary objective is to understand the key factors driving customer cancellations. Our goal is to recommend actionable solutions to reduce the churn rate effectively and improve customer retention. KPIs for this analysis include customer churn rate and the percentage of canceled customers within specific segments.
Techonologies: Python (Pandas, Plotly)
The dataset used for this analysis has been compiled for a fictitious company and stored in Google Sheets. It includes detailed information about each fictional customer, such as their age, gender, time as a customer, frequency of usage, number of call center calls, days of overdue payments, type of contract, total spending, last interaction date, and a binary indicator (Yes or No) representing whether the customer has canceled the service.
Prior to analysis, the data underwent essential cleaning and transformation to ensure a structured dataset suitable for modeling. Through this dataset, our objective is to identify patterns and insights related to customer churn, enabling us to propose actionable solutions to improve customer retention and overall business performance.
import pandas
tabela = pandas.read_csv("cancelamentos.csv")
display(tabela)
CustomerID | idade | sexo | tempo_como_cliente | frequencia_uso | ligacoes_callcenter | dias_atraso | assinatura | duracao_contrato | total_gasto | meses_ultima_interacao | cancelou | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2.0 | 30.0 | Female | 39.0 | 14.0 | 5.0 | 18.0 | Standard | Annual | 932.00 | 17.0 | 1.0 |
1 | 3.0 | 65.0 | Female | 49.0 | 1.0 | 10.0 | 8.0 | Basic | Monthly | 557.00 | 6.0 | 1.0 |
2 | 4.0 | 55.0 | Female | 14.0 | 4.0 | 6.0 | 18.0 | Basic | Quarterly | 185.00 | 3.0 | 1.0 |
3 | 5.0 | 58.0 | Male | 38.0 | 21.0 | 7.0 | 7.0 | Standard | Monthly | 396.00 | 29.0 | 1.0 |
4 | 6.0 | 23.0 | Male | 32.0 | 20.0 | 5.0 | 8.0 | Basic | Monthly | 617.00 | 20.0 | 1.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
881661 | 449995.0 | 42.0 | Male | 54.0 | 15.0 | 1.0 | 3.0 | Premium | Annual | 716.38 | 8.0 | 0.0 |
881662 | 449996.0 | 25.0 | Female | 8.0 | 13.0 | 1.0 | 20.0 | Premium | Annual | 745.38 | 2.0 | 0.0 |
881663 | 449997.0 | 26.0 | Male | 35.0 | 27.0 | 1.0 | 5.0 | Standard | Quarterly | 977.31 | 9.0 | 0.0 |
881664 | 449998.0 | 28.0 | Male | 55.0 | 14.0 | 2.0 | 0.0 | Standard | Quarterly | 602.55 | 2.0 | 0.0 |
881665 | 449999.0 | 31.0 | Male | 48.0 | 20.0 | 1.0 | 14.0 | Premium | Quarterly | 567.77 | 21.0 | 0.0 |
881666 rows × 12 columns
# 3.1. - Null Values
tabela = tabela.dropna()
# 3.2. - Unnecessary columns
tabela = tabela.drop("CustomerID",axis=1)
display(tabela.info())
<class 'pandas.core.frame.DataFrame'> Index: 881659 entries, 0 to 881665 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 idade 881659 non-null float64 1 sexo 881659 non-null object 2 tempo_como_cliente 881659 non-null float64 3 frequencia_uso 881659 non-null float64 4 ligacoes_callcenter 881659 non-null float64 5 dias_atraso 881659 non-null float64 6 assinatura 881659 non-null object 7 duracao_contrato 881659 non-null object 8 total_gasto 881659 non-null float64 9 meses_ultima_interacao 881659 non-null float64 10 cancelou 881659 non-null float64 dtypes: float64(8), object(3) memory usage: 80.7+ MB
None
During the data analysis phase, we identified the following main reasons for customer cancellations:
By visually presenting these insights, we can better understand the impact of each identified reason on customer cancellations and make informed decisions to improve customer retention.
print("56.7% of the customers cancelled the subscription")
display(tabela["cancelou"].value_counts(normalize=True).map("{:.1%}".format))
print("19.8% of the customers have Monthly subscription")
display(tabela["duracao_contrato"].value_counts(normalize=True).map("{:.1%}".format))
display(tabela.groupby("duracao_contrato").count())
# for column in tabela.columns:
# graphic = px.histogram(tabela, x=column,color="cancelou")
# graphic.show()
56.7% of the customers cancelled the subscription
cancelou 1.0 56.7% 0.0 43.3% Name: proportion, dtype: object
19.8% of the customers have Monthly subscription
duracao_contrato Annual 40.2% Quarterly 40.0% Monthly 19.8% Name: proportion, dtype: object
idade | sexo | tempo_como_cliente | frequencia_uso | ligacoes_callcenter | dias_atraso | assinatura | total_gasto | meses_ultima_interacao | cancelou | |
---|---|---|---|---|---|---|---|---|---|---|
duracao_contrato | ||||||||||
Annual | 354395 | 354395 | 354395 | 354395 | 354395 | 354395 | 354395 | 354395 | 354395 | 354395 |
Monthly | 174205 | 174205 | 174205 | 174205 | 174205 | 174205 | 174205 | 174205 | 174205 | 174205 |
Quarterly | 353059 | 353059 | 353059 | 353059 | 353059 | 353059 | 353059 | 353059 | 353059 | 353059 |
# Main reasons for cancellations below
import plotly.express as px
# a. 100% of monthly contract customers cancelled the subscription
graphic = px.histogram(tabela, x="duracao_contrato",color="cancelou")
graphic.show()
# b. 100% of the customers with 50+ cancelled the subscription
graphic = px.histogram(tabela, x="idade",color="cancelou")
graphic.show()
# c. 100% of the customers that called more than 5x, cancelled the subscription
graphic = px.histogram(tabela, x="ligacoes_callcenter",color="cancelou")
graphic.show()
# d. 100% of the customers who are overdue for more than 20 days cancelled their subscription
graphic = px.histogram(tabela, x="dias_atraso",color="cancelou")
graphic.show()
Based on the analysis and insights, we propose the following actionable solutions to address the identified reasons for customer cancellations:
Implementing these proposed solutions can significantly reduce customer churn, enhance satisfaction, foster long-term relationships with clients, and address the root causes of cancellations. This can lead to a remarkable improvement, reducing the cancellation rate from 56% to an impressive 12%. Regular monitoring and continuous improvements in customer engagement are essential for achieving and sustaining these objectives effectively.
display(tabela["cancelou"].value_counts(normalize=True).map("{:.1%}".format))
# a. Exclude Monthly Payment Method or Offer Discounts
tabela = tabela[tabela["duracao_contrato"]!="Monthly"]
display(tabela["cancelou"].value_counts(normalize=True).map("{:.1%}".format))
# b. Targeted Engagement for Customers Over 50 Years
tabela = tabela[tabela["idade"]<50]
display(tabela["cancelou"].value_counts(normalize=True).map("{:.1%}".format))
# c. Enhance Call Center Support
tabela = tabela[tabela["ligacoes_callcenter"]<5]
display(tabela["cancelou"].value_counts(normalize=True).map("{:.1%}".format))
# d. Proactive Payment Monitoring
tabela = tabela[tabela["dias_atraso"]<=20]
display(tabela["cancelou"].value_counts(normalize=True).map("{:.1%}".format))
cancelou 1.0 56.7% 0.0 43.3% Name: proportion, dtype: object
cancelou 0.0 53.9% 1.0 46.1% Name: proportion, dtype: object
cancelou 0.0 63.3% 1.0 36.7% Name: proportion, dtype: object
cancelou 0.0 81.2% 1.0 18.8% Name: proportion, dtype: object
cancelou 0.0 87.8% 1.0 12.2% Name: proportion, dtype: object
Through data-driven insights and analysis, we have identified the primary reasons for customer cancellations, such as issues with monthly contracts, call center interactions, overdue payments, and customer age. By implementing targeted solutions, including excluding the monthly payment method or offering discounts, implementing targeted engagement for customers over 50 years, enhancing call center support, and proactive payment monitoring, our company can reduce the cancellation rate from 56% to an impressive 12%. These strategic actions will lead to improved customer retention, enhanced satisfaction, and better business outcomes, positioning our company for long-term success.