Python Insights - Analyzing Data with Python¶

Case - Reasons for Cancellations Analysis¶

Project Description and KPIs¶

This project aims to conduct an in-depth analysis of customer churn within a fictitious company with over 800,000 customers. The primary objective is to understand the key factors driving customer cancellations. Our goal is to recommend actionable solutions to reduce the churn rate effectively and improve customer retention. KPIs for this analysis include customer churn rate and the percentage of canceled customers within specific segments.

Techonologies: Python (Pandas, Plotly)

Data 📊 - Description and modeling¶

The dataset used for this analysis has been compiled for a fictitious company and stored in Google Sheets. It includes detailed information about each fictional customer, such as their age, gender, time as a customer, frequency of usage, number of call center calls, days of overdue payments, type of contract, total spending, last interaction date, and a binary indicator (Yes or No) representing whether the customer has canceled the service.

Prior to analysis, the data underwent essential cleaning and transformation to ensure a structured dataset suitable for modeling. Through this dataset, our objective is to identify patterns and insights related to customer churn, enabling us to propose actionable solutions to improve customer retention and overall business performance.

Step 0 - Understand the challenge and the company¶

Step 1 - Importing the data base¶

In [ ]:
import pandas
tabela = pandas.read_csv("cancelamentos.csv")

Step 2 - Visualization of the data base¶

In [ ]:
display(tabela)
CustomerID idade sexo tempo_como_cliente frequencia_uso ligacoes_callcenter dias_atraso assinatura duracao_contrato total_gasto meses_ultima_interacao cancelou
0 2.0 30.0 Female 39.0 14.0 5.0 18.0 Standard Annual 932.00 17.0 1.0
1 3.0 65.0 Female 49.0 1.0 10.0 8.0 Basic Monthly 557.00 6.0 1.0
2 4.0 55.0 Female 14.0 4.0 6.0 18.0 Basic Quarterly 185.00 3.0 1.0
3 5.0 58.0 Male 38.0 21.0 7.0 7.0 Standard Monthly 396.00 29.0 1.0
4 6.0 23.0 Male 32.0 20.0 5.0 8.0 Basic Monthly 617.00 20.0 1.0
... ... ... ... ... ... ... ... ... ... ... ... ...
881661 449995.0 42.0 Male 54.0 15.0 1.0 3.0 Premium Annual 716.38 8.0 0.0
881662 449996.0 25.0 Female 8.0 13.0 1.0 20.0 Premium Annual 745.38 2.0 0.0
881663 449997.0 26.0 Male 35.0 27.0 1.0 5.0 Standard Quarterly 977.31 9.0 0.0
881664 449998.0 28.0 Male 55.0 14.0 2.0 0.0 Standard Quarterly 602.55 2.0 0.0
881665 449999.0 31.0 Male 48.0 20.0 1.0 14.0 Premium Quarterly 567.77 21.0 0.0

881666 rows × 12 columns

Step 3 - Data Cleaning | error handling¶

In [ ]:
# 3.1. - Null Values
tabela = tabela.dropna()
# 3.2. - Unnecessary columns
tabela = tabela.drop("CustomerID",axis=1)
display(tabela.info())
<class 'pandas.core.frame.DataFrame'>
Index: 881659 entries, 0 to 881665
Data columns (total 11 columns):
 #   Column                  Non-Null Count   Dtype  
---  ------                  --------------   -----  
 0   idade                   881659 non-null  float64
 1   sexo                    881659 non-null  object 
 2   tempo_como_cliente      881659 non-null  float64
 3   frequencia_uso          881659 non-null  float64
 4   ligacoes_callcenter     881659 non-null  float64
 5   dias_atraso             881659 non-null  float64
 6   assinatura              881659 non-null  object 
 7   duracao_contrato        881659 non-null  object 
 8   total_gasto             881659 non-null  float64
 9   meses_ultima_interacao  881659 non-null  float64
 10  cancelou                881659 non-null  float64
dtypes: float64(8), object(3)
memory usage: 80.7+ MB
None

Data Analysis & Insights¶

During the data analysis phase, we identified the following main reasons for customer cancellations:

  • Monthly Contract Customers: All customers on monthly contracts canceled, indicating potential issues with plan attractiveness or pricing.
  • Customers Who Called More Than 5 Times: Customers with more than 5 support calls exhibited higher churn rate, indicating concerns about customer satisfaction and support effectiveness.
  • Customers Overdue for More Than 20 Days: All customers with overdue accounts beyond 20 days canceled, emphasizing the importance of timely payment reminders and collections strategies.
  • Customers with 50+ Years: 100% of customers aged 50 or older canceled, warranting further investigation into the specific reasons behind their churn.

By visually presenting these insights, we can better understand the impact of each identified reason on customer cancellations and make informed decisions to improve customer retention.

In [ ]:
print("56.7% of the customers cancelled the subscription")
display(tabela["cancelou"].value_counts(normalize=True).map("{:.1%}".format))

print("19.8% of the customers have Monthly subscription")
display(tabela["duracao_contrato"].value_counts(normalize=True).map("{:.1%}".format))
display(tabela.groupby("duracao_contrato").count())

# for column in tabela.columns:
#   graphic = px.histogram(tabela, x=column,color="cancelou")
#   graphic.show()
56.7% of the customers cancelled the subscription
cancelou
1.0    56.7%
0.0    43.3%
Name: proportion, dtype: object
19.8% of the customers have Monthly subscription
duracao_contrato
Annual       40.2%
Quarterly    40.0%
Monthly      19.8%
Name: proportion, dtype: object
idade sexo tempo_como_cliente frequencia_uso ligacoes_callcenter dias_atraso assinatura total_gasto meses_ultima_interacao cancelou
duracao_contrato
Annual 354395 354395 354395 354395 354395 354395 354395 354395 354395 354395
Monthly 174205 174205 174205 174205 174205 174205 174205 174205 174205 174205
Quarterly 353059 353059 353059 353059 353059 353059 353059 353059 353059 353059
In [ ]:
# Main reasons for cancellations below
import plotly.express as px

# a. 100% of monthly contract customers cancelled the subscription
graphic = px.histogram(tabela, x="duracao_contrato",color="cancelou")
graphic.show()

# b. 100% of the customers with 50+ cancelled the subscription
graphic = px.histogram(tabela, x="idade",color="cancelou")
graphic.show()

# c. 100% of the customers that called more than 5x, cancelled the subscription
graphic = px.histogram(tabela, x="ligacoes_callcenter",color="cancelou")
graphic.show()

# d. 100% of the customers who are overdue for more than 20 days cancelled their subscription
graphic = px.histogram(tabela, x="dias_atraso",color="cancelou")
graphic.show()

Proposed Solutions & Results: 🎉¶

Based on the analysis and insights, we propose the following actionable solutions to address the identified reasons for customer cancellations:

  • Exclude Monthly Payment Method or Offer Discounts: Since we observed 100% of customers choosing monthly payment method end up canceling, we recommend excluding this option or offering attractive discounts for longer-term payment plans.
  • Targeted Engagement for Customers Over 50 Years: Implement targeted engagement strategy for customers over 50 years old, addressing their specific issues and tailoring offerings accordingly.
  • Enhance Call Center Support:Improve call center efficiency in resolving customer issues to prevent churn after multiple calls.
  • Proactive Payment Monitoring:Implement proactive payment monitoring system, offering flexible options and timely reminders to prevent overdue cancellations.

Implementing these proposed solutions can significantly reduce customer churn, enhance satisfaction, foster long-term relationships with clients, and address the root causes of cancellations. This can lead to a remarkable improvement, reducing the cancellation rate from 56% to an impressive 12%. Regular monitoring and continuous improvements in customer engagement are essential for achieving and sustaining these objectives effectively.

In [ ]:
display(tabela["cancelou"].value_counts(normalize=True).map("{:.1%}".format))

# a. Exclude Monthly Payment Method or Offer Discounts
tabela = tabela[tabela["duracao_contrato"]!="Monthly"]
display(tabela["cancelou"].value_counts(normalize=True).map("{:.1%}".format))

# b. Targeted Engagement for Customers Over 50 Years
tabela = tabela[tabela["idade"]<50]
display(tabela["cancelou"].value_counts(normalize=True).map("{:.1%}".format))

# c. Enhance Call Center Support
tabela = tabela[tabela["ligacoes_callcenter"]<5]
display(tabela["cancelou"].value_counts(normalize=True).map("{:.1%}".format))

# d. Proactive Payment Monitoring
tabela = tabela[tabela["dias_atraso"]<=20]
display(tabela["cancelou"].value_counts(normalize=True).map("{:.1%}".format))
cancelou
1.0    56.7%
0.0    43.3%
Name: proportion, dtype: object
cancelou
0.0    53.9%
1.0    46.1%
Name: proportion, dtype: object
cancelou
0.0    63.3%
1.0    36.7%
Name: proportion, dtype: object
cancelou
0.0    81.2%
1.0    18.8%
Name: proportion, dtype: object
cancelou
0.0    87.8%
1.0    12.2%
Name: proportion, dtype: object

Conclusion 🙌¶

Through data-driven insights and analysis, we have identified the primary reasons for customer cancellations, such as issues with monthly contracts, call center interactions, overdue payments, and customer age. By implementing targeted solutions, including excluding the monthly payment method or offering discounts, implementing targeted engagement for customers over 50 years, enhancing call center support, and proactive payment monitoring, our company can reduce the cancellation rate from 56% to an impressive 12%. These strategic actions will lead to improved customer retention, enhanced satisfaction, and better business outcomes, positioning our company for long-term success.