Data Science & Machine Learning
June 1, 2025 at 07:22 AM
Today, let's move to the next topic in the Data Science Learning Series *Handling Missing Data* Missing data is super common in real-world datasets — especially in fields like healthcare, finance, and retail. ✅ *What is Missing Data?* Missing data occurs when some values in a dataset are not recorded. This can be due to: - Human error - System failure - Data corruption - Intentional skipping (e.g., optional survey questions) In pandas, missing values are usually represented as NaN (Not a Number). 🔍 *How to Detect Missing Data* import pandas as pd df = pd.read_csv("data.csv") df.isnull().sum() # Shows count of missing values per column df.info() # Shows non-null count per column 🧹 *How to Handle Missing Data* *1. Remove Rows with Missing Values* df_cleaned = df.dropna() ✅ *Use when: The missing rows are few and won't affect your analysis.* *2. Fill Missing Values with a Default* df['column_name'].fillna(0, inplace=True) # Replace with 0 df['column_name'].fillna(df['column_name'].mean()) # Replace with mean df['column_name'].fillna(method='ffill') # Forward fill ✅ *Use when: You don’t want to lose data and can logically replace the missing values.* *3. Check Missing % of Each Column* (df.isnull().sum() / len(df)) * 100 ✅ Helps decide whether to drop or fill based on how much is missing. 💡 *Real-World Example:* In a hospital dataset, if Patient Age is missing for 5 out of 5,000 patients, you may just drop those rows. But if 40% of Blood Pressure values are missing, you might fill them with the mean or median to avoid losing too much data. *React with ❤️ once you're ready for the quiz* Data Science Learning Series: https://whatsapp.com/channel/0029Va8v3eo1NCrQfGMseL2D/998 Python Cheatsheet: https://whatsapp.com/channel/0029VaiM08SDuMRaGKd9Wv0L/1660
❤️ 👍 🇮🇳 😂 🇪🇭 🇵🇸 🙏 193

Comments