Overview

In SQL interviews, understanding advanced grouping and join techniques is crucial. This post dives into GROUPING SETS, ROLLUP, CUBE, and types of joins like self-joins, CROSS JOIN, and NATURAL JOIN. We also touch on how to interpret NULLs in grouping results and share interview tips along the way.


1. GROUPING SETS

GROUPING SETS let you define multiple GROUP BY combinations in a single query—ideal for multi-dimensional reports.

Problem

You want to calculate:

  • Revenue per staff per month
  • Revenue per month
  • Revenue per staff

Naively, you’d write:

SELECT NULL AS month, staff_id, SUM(amount)
FROM payment
GROUP BY staff_id

UNION

SELECT month, NULL AS staff_id, SUM(amount)
FROM payment
GROUP BY month

UNION

SELECT month, staff_id, SUM(amount)
FROM payment
GROUP BY month, staff_id
ORDER BY 1, 2;

Better: Use GROUPING SETS

SELECT
    month,
    staff_id,
    SUM(amount) AS total_amount
FROM payment
GROUP BY GROUPING SETS (
    (staff_id),
    (month),
    (month, staff_id)
)
ORDER BY 1, 2;

Interview Tip

  • Explain how GROUPING SETS improves performance and code clarity.
  • Mention that NULL in output may mean “total” for a dimension—distinguish from actual NULLs using the GROUPING() function.

2. CUBE & ROLLUP

Both are shortcuts for predefined grouping sets.

2.1 ROLLUP

Generates subtotals moving top-down in a hierarchy.

Syntax

GROUP BY ROLLUP (col1, col2, col3)

Equivalent to:

GROUP BY GROUPING SETS (
  (col1, col2, col3),
  (col1, col2),
  (col1),
  ()
)

Example: Revenue by Date → Month → Quarter

SELECT 
    'Q' || TO_CHAR(payment_date, 'Q') AS quarter,
    EXTRACT(month FROM payment_date) AS month,
    DATE(payment_date) AS date,
    SUM(amount) AS total_amount
FROM payment
GROUP BY ROLLUP (
    'Q' || TO_CHAR(payment_date, 'Q'),
    EXTRACT(month FROM payment_date),
    DATE(payment_date)
)
ORDER BY 1, 2, 3;

2.2 CUBE

Returns all possible combinations of columns.

Syntax

GROUP BY CUBE (col1, col2, col3)

Equivalent to:

GROUP BY GROUPING SETS (
  (col1, col2, col3),
  (col1, col2),
  (col1, col3),
  (col2, col3),
  (col1),
  (col2),
  (col3),
  ()
)

Example: Revenue by Customer, Staff, and Day

SELECT
    customer_id,
    staff_id,
    DATE(payment_date) AS payment_date,
    SUM(amount) AS total_amount
FROM payment
GROUP BY CUBE (
    customer_id,
    staff_id,
    DATE(payment_date)
)
ORDER BY 1, 2, 3;

3. Understanding GROUPING() Function

Use GROUPING(col) to distinguish real NULL from NULL added by rollups/cubes.

SELECT 
    customer_id,
    staff_id,
    GROUPING(staff_id) AS is_staff_grouped,
    SUM(amount)
FROM payment
GROUP BY CUBE (customer_id, staff_id);
  • GROUPING(col) returns 1 if the column was aggregated (i.e., NULL due to subtotal), else 0.

4. SELF JOIN

A self-join joins a table to itself—commonly used to query hierarchical relationships.

Example: Employee & Manager Names

SELECT
    e.employee_id,
    e.name AS employee_name,
    m.name AS manager_name
FROM employee e
LEFT JOIN employee m
  ON e.manager_id = m.employee_id;

Example: Manager of Manager

SELECT
    e.name AS employee_name,
    m1.name AS manager_name,
    m2.name AS manager_of_manager
FROM employee e
LEFT JOIN employee m1 ON e.manager_id = m1.employee_id
LEFT JOIN employee m2 ON m1.manager_id = m2.employee_id;

5. CROSS JOIN

Returns the Cartesian product—every row from one table paired with every row from another.

Syntax

SELECT
    t1.column1,
    t2.column2
FROM table1 t1
CROSS JOIN table2 t2;

Interview Tip

Mention potential performance risks—used in scenarios like generating all combinations (e.g., calendar x product).


6. NATURAL JOIN

Automatically joins on columns with the same name in both tables.

Syntax

SELECT *
FROM table1
NATURAL JOIN table2;

Caution

  • Implicit behavior—may join on unintended columns.
  • Avoid in production code, but know it for interviews.

Summary for Interviews

ConceptUse CaseCaution
GROUPING SETSCombine multiple GROUP BY clauses into oneAvoid duplicate logic via UNION
ROLLUPHierarchical totals (e.g., Year → Month → Day)Interpreting NULLs
CUBEAll combinations (no hierarchy)Can be expensive
GROUPING()Detect subtotal rowsMust use with ROLLUP/CUBE
SELF JOINHierarchies (e.g. manager → employee)Alias carefully
CROSS JOINAll row combinationsPerformance risk
NATURAL JOINAuto-join on same column namesUnpredictable in large schemas

Further Reading