Data Science & Machine Learning
June 1, 2025 at 07:22 AM
Today, let's move to the next topic in the Data Science Learning Series
*Handling Missing Data*
Missing data is super common in real-world datasets — especially in fields like healthcare, finance, and retail.
✅ *What is Missing Data?*
Missing data occurs when some values in a dataset are not recorded. This can be due to:
- Human error
- System failure
- Data corruption
- Intentional skipping (e.g., optional survey questions)
In pandas, missing values are usually represented as NaN (Not a Number).
🔍 *How to Detect Missing Data*
import pandas as pd
df = pd.read_csv("data.csv")
df.isnull().sum() # Shows count of missing values per column
df.info() # Shows non-null count per column
🧹 *How to Handle Missing Data*
*1. Remove Rows with Missing Values*
df_cleaned = df.dropna()
✅ *Use when: The missing rows are few and won't affect your analysis.*
*2. Fill Missing Values with a Default*
df['column_name'].fillna(0, inplace=True) # Replace with 0
df['column_name'].fillna(df['column_name'].mean()) # Replace with mean
df['column_name'].fillna(method='ffill') # Forward fill
✅ *Use when: You don’t want to lose data and can logically replace the missing values.*
*3. Check Missing % of Each Column*
(df.isnull().sum() / len(df)) * 100
✅ Helps decide whether to drop or fill based on how much is missing.
💡 *Real-World Example:*
In a hospital dataset, if Patient Age is missing for 5 out of 5,000 patients, you may just drop those rows.
But if 40% of Blood Pressure values are missing, you might fill them with the mean or median to avoid losing too much data.
*React with ❤️ once you're ready for the quiz*
Data Science Learning Series: https://whatsapp.com/channel/0029Va8v3eo1NCrQfGMseL2D/998
Python Cheatsheet: https://whatsapp.com/channel/0029VaiM08SDuMRaGKd9Wv0L/1660
❤️
👍
❤
🇮🇳
♥
😂
🇪🇭
🇵🇸
🙏
❣
193