
Data Science Jobs
June 1, 2025 at 07:43 AM
🧠 *Top Data Science Interview Questions & Answers*
1️⃣ *What is the difference between structured and unstructured data?*
- *Structured data* is organized, with a fixed format (tables, rows, columns).
- *Unstructured data* has no predefined format (text, images, videos).
2️⃣ *What is multicollinearity? How to remove it?*
- Multicollinearity occurs when features are highly correlated, causing redundancy and instability in models.
- Remove it by:
• Dropping correlated variables
• Using dimensionality reduction (e.g., PCA)
• Applying regularization methods (Ridge, Lasso)
3️⃣ *Which algorithms do you use to find the most correlated features?*
- Correlation matrix (Pearson, Spearman)
- Feature importance from tree-based models (Random Forest, XGBoost)
- Mutual information scores
4️⃣ *Define entropy.*
- Entropy measures randomness or uncertainty in data.
- In decision trees, it helps decide the best feature to split by measuring impurity.
5️⃣ *What is the workflow of Principal Component Analysis (PCA)?*
- Standardize data → Compute covariance matrix → Calculate eigenvectors & eigenvalues → Select top components → Transform data to new feature space.
6️⃣ *Applications of PCA beyond dimensionality reduction?*
- Noise reduction
- Visualization of high-dimensional data
- Feature extraction
- Data compression
7️⃣ *What is a Convolutional Neural Network (CNN)? Explain its working.*
- CNN is a deep learning model mainly for image data.
- It uses convolutional layers to extract spatial features, pooling layers to reduce size, and fully connected layers for classification.
💡 *Double tap ❤️ if you found this helpful!*
❤️
❤
6