Skip to content

dhanesh456/Retail-Sales-Insights-SQL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ›’ SQL Retail Sales Performance Analysis

πŸ‘€ Author

Dhanesh Gaikwad
πŸŽ“ Master’s in Data Science | RMIT University
πŸ“ Melbourne, Australia
πŸ’Ό Data Analyst | SQL β€’ Python β€’ Power BI


🎯 Objective

Analyze retail sales data to uncover sales trends, customer behavior, and category performance using pure SQL.
The goal is to transform raw transactional data into actionable business insights that can guide decision-making, marketing strategy, and operational efficiency.


πŸ“– Overview

This project performs a complete Exploratory Data Analysis (EDA) using SQL only.
It involves database creation, data cleaning, data exploration, and detailed business analysis through structured queries.

Key questions addressed include:

  • Which product categories generate the highest revenue and sales volume?
  • Who are the top customers by spending?
  • Which months and times of day achieve peak sales?
  • How do age and gender influence buying patterns?
  • What are the seasonal trends and high-value customer segments?

🧰 Tech Stack

Component Tool
Language SQL
Database PostgreSQL
Environment pgAdmin 4 / VS Code
Version Control Git & GitHub
Dataset Simulated Retail Sales Data (1,997 records)

πŸ—οΈ Step 1: Database & Table Setup

CREATE DATABASE SQL_Project;

DROP TABLE IF EXISTS retail_sales;
CREATE TABLE retail_sales (
    transaction_id INT PRIMARY KEY,
    sale_date DATE,
    sale_time TIME,
    customer_id INT,
    gender VARCHAR(15),
    age INT,
    category VARCHAR(15),
    quantity INT,
    price_per_unit FLOAT,
    cogs FLOAT,
    total_sale FLOAT
);

βœ… Result: Database and table successfully created.


🧹 Step 2: Data Cleaning

Check for missing values and remove invalid records.

SELECT * FROM retail_sales
WHERE transaction_id IS NULL
   OR sale_date IS NULL
   OR sale_time IS NULL
   OR gender IS NULL
   OR category IS NULL
   OR quantity IS NULL
   OR cogs IS NULL
   OR total_sale IS NULL;

🧾 Result: 0 rows returned β†’ dataset is clean.
βœ… No missing or null values found across any field.


πŸ“Š Step 3: Data Exploration

Total Transactions

SELECT COUNT(*) AS total_sales FROM retail_sales;

Result: 1,997 transactions

Unique Customers

SELECT COUNT(DISTINCT customer_id) AS unique_customers FROM retail_sales;

Result: 149 (Clothing), 141 (Beauty), 144 (Electronics)

Product Categories

SELECT DISTINCT category FROM retail_sales;

Result: Clothing, Beauty, Electronics

βœ… Dataset is well-structured and balanced across product categories.


πŸ’Ό Step 4: Business Questions & Analysis

Q1. 🧾 Sales on 2022-11-05

SELECT * FROM retail_sales WHERE sale_date = '2022-11-05';

Result: 11 transactions
Insight:
Sales were recorded across all categories and genders β€” normal day-to-day activity, showing steady customer engagement.


Q2. πŸ‘• Bulk Clothing Purchases in November 2022

SELECT * FROM retail_sales
WHERE category = 'Clothing'
  AND TO_CHAR(sale_date, 'YYYY-MM') = '2022-11'
  AND quantity > 4;

Result: 17 transactions
Insight:
Bulk clothing sales spiked in November β€” possibly due to festive or seasonal demand.
Customers aged 19–61 show strong interest in clothing during this period.


Q3. πŸ’° Total Revenue & Orders per Category

SELECT category,
       SUM(total_sale) AS total_revenue,
       COUNT(*) AS total_orders
FROM retail_sales
GROUP BY category;
Category Total Revenue Total Orders
Electronics 313,810 684
Clothing 311,070 701
Beauty 286,840 612

Insight:

  • Electronics yields the highest revenue.
  • Clothing leads in number of transactions.
  • Beauty performs consistently but slightly lower.

Q4. πŸ’„ Average Age of Beauty Buyers

SELECT ROUND(AVG(age), 2) AS avg_age
FROM retail_sales
WHERE category = 'Beauty';

Result: 40.42 years
Insight:
The average Beauty buyer is around 40 years old, suggesting that middle-aged customers are the main audience for self-care and cosmetics products.


Q5. πŸ’Ž High-Value Transactions (Total Sale > 1000)

SELECT * FROM retail_sales WHERE total_sale > 1000;

Result: 306 transactions
Insight:
About 15% of all transactions are high-value, primarily from Clothing and Beauty.
These premium purchases are perfect for loyalty or VIP marketing programs.


Q6. πŸ‘©β€πŸ¦° Gender-wise Transactions by Category

SELECT category, gender, COUNT(*) AS total_transactions
FROM retail_sales
GROUP BY category, gender
ORDER BY category;
Category Female Male
Beauty 330 282
Clothing 347 354
Electronics 340 344

Insight:

  • Beauty is female-dominated.
  • Clothing and Electronics show balanced participation across genders.
    πŸ“ˆ Campaigns can be personalized by category demographics.

Q7. πŸ“… Best-Selling Month Each Year

SELECT year, month, avg_sale
FROM (
  SELECT EXTRACT(YEAR FROM sale_date) AS year,
         EXTRACT(MONTH FROM sale_date) AS month,
         AVG(total_sale) AS avg_sale,
         RANK() OVER (PARTITION BY EXTRACT(YEAR FROM sale_date)
                      ORDER BY AVG(total_sale) DESC) AS rank
  FROM retail_sales
  GROUP BY 1, 2
) ranked_sales
WHERE rank = 1;
Year Month Avg Sale
2022 July (7) 541.34
2023 February (2) 535.53

Insight:
Peak sales occur in July and February, likely due to mid-year and early-year discount seasons.


Q8. πŸ† Top 5 Customers by Total Spending

SELECT customer_id,
       SUM(total_sale) AS total_sales
FROM retail_sales
GROUP BY customer_id
ORDER BY total_sales DESC
LIMIT 5;
Customer ID Total Sales
3 38,440
1 30,750
5 30,405
2 25,295
4 23,580

Insight:
Top 5 customers generate a significant portion of total revenue β€” key targets for retention and VIP loyalty programs.


Q9. πŸ‘₯ Unique Customers per Category

SELECT category,
       COUNT(DISTINCT customer_id) AS unique_customers
FROM retail_sales
GROUP BY category;
Category Unique Customers
Beauty 141
Clothing 149
Electronics 144

Insight:
Customer distribution across categories is well-balanced β€” showing healthy interest across all retail segments.


Q10. ⏰ Orders by Time of Day

WITH hourly_sales AS (
    SELECT *,
           CASE
               WHEN EXTRACT(HOUR FROM sale_time) < 12 THEN 'Morning'
               WHEN EXTRACT(HOUR FROM sale_time) BETWEEN 12 AND 17 THEN 'Afternoon'
               ELSE 'Evening'
           END AS shift
    FROM retail_sales
)
SELECT shift, COUNT(*) AS total_orders
FROM hourly_sales
GROUP BY shift;
Shift Total Orders
Evening 1,062
Morning 558
Afternoon 377

Insight:
πŸŒ† Evening sales dominate, making up over half of all transactions.
Promotions and staffing can be optimized for peak evening hours.


πŸ“ˆ Key Business Insights

# Finding Business Impact
1 Electronics leads in revenue Maintain high stock & margin strategy
2 Clothing leads in transaction volume Prioritize promotions and offers
3 Beauty buyers average ~40 years Target middle-aged audience
4 Evening sales dominate Align operations & offers accordingly
5 July & February are peak months Schedule major marketing events
6 Top 5 customers are high-value spenders Create VIP engagement programs

🧠 Skills Demonstrated

  • SQL Data Cleaning & EDA
  • Aggregation Functions (SUM, AVG, COUNT)
  • Ranking & Window Functions (RANK(), PARTITION BY)
  • Common Table Expressions (CTEs)
  • Business Intelligence & Insight Generation
  • Markdown Documentation for Analytics Projects

⭐ Summary

This project transforms raw retail sales data into actionable insights through SQL-based analysis.
It identifies top-performing categories, customer demographics, sales patterns, and seasonal trends that can guide data-driven business decisions.

About

SQL project showcasing data cleaning, exploratory data analysis, and business insights using a retail sales dataset. Includes SQL scripts for setup, analysis, and performance evaluation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors