
Data Science Jobs
June 16, 2025 at 04:02 PM
Now, let's move to the next topic in the data science learning series
*Data Cleaning & Preparation:*
🔹 *Topic: Combining Datasets*
In real-world projects, data often comes from multiple sources:
– Sales from one CSV
– Customer info from another
– Product data from a third
To build a full picture, you need to combine them.
✅ *3 Ways to Combine Datasets in Pandas*
*1. Concatenation (Stacking Data)*
Used when datasets have same columns but are split across multiple files or timeframes.
df_combined = pd.concat([df1, df2])
Use axis=0 for stacking rows (default)
Use axis=1 for combining side by side (columns)
*2. Merging (Joining on Keys)*
Used when datasets share a common key/column, like customer_id.
df_merged = pd.merge(df1, df2, on='customer_id', how='inner')
*Merge types:*
inner: only matching rows
left: keep all rows from df1
right: keep all rows from df2
outer: keep all rows from both
*3. Join Method (simplified merge)*
df1.join(df2, how='left')
Only works when indexes are aligned or you set the index before.
📊 *Real-Life Example:*
You have:
orders.csv → order_id, product_id, customer_id
customers.csv → customer_id, name, age
products.csv → product_id, name, price
You’ll use merge to link them all using common columns.
*React with ❤️ once you're ready for the quiz*
Data Science Learning Series: https://whatsapp.com/channel/0029Va8v3eo1NCrQfGMseL2D/998
Python Cheatsheet: https://whatsapp.com/channel/0029VaiM08SDuMRaGKd9Wv0L/1660
😂
1