A Beginner’s Guide to Data Cleaning Using Pandas

Most real-world datasets are messy. Before you can run any analysis or build a model, you need to deal with missing values, strange outliers, inconsistent formatting, and incorrect data types. This post walks through the basics of data cleaning using pandas, one of the most popular Python libraries for data manipulation. We’ll use a dataset of Nairobi property listings as our example. It contains information like location, price, number of bedrooms, and date posted. Let’s get started. ...

July 1, 2025 · 2 min · Brian Njenga Mwaura

🏘️ Nairobi Real Estate Market Analysis

This project explores a dataset of Nairobi property listings to uncover patterns in pricing based on location, property type, and features like bedrooms, bathrooms, and house size. The dataset was sourced from Kaggle and cleaned using pandas before performing exploratory data analysis (EDA) and predictive modeling. 🧹 Data Cleaning Summary We performed the following steps: Removed currency symbols and converted prices to integers Renamed columns for clarity (price to price_in_ksh, etc.) Corrected inconsistent entries (e.g., townhuse to townhouse) Handled missing values: Filled missing house_size_sqm using the median by propertytype Dropped duplicate and unused columns Filtered extreme outliers 📊 Exploratory Data Analysis (EDA) 🔹 Average Price by Location ...

June 29, 2025 · 3 min · Brian Njenga Mwaura