Common Table Expressions in SQL
Common Table Expressions (CTE)
CTEs are shortcuts you can build into your SQL scripts to help simplify and condense your analyses. In particular, they help simplify complex subqueries and joins in scripts. Introduced in 2005 in SQL, Microsoft’s descriptoin of CTE is: “Specifies a temporary named result set, known as a common table expression (CTE). This is derived from a simple query and defined within the execution scope of a single SELECT, INSERT, UPDATE, DELETE or MERGE statement” (source: https://docs.microsoft.com/en-us/sql/t-sql/queries/with-common-table-expression-transact-sql?view=sql-server-ver15). CTEs are also extremly useful if you’re working in a program that only allows you to run a single query at a time. In these instances, declaring a CTE and then calling it in a select or join query is really helpful to simplify queries and processess. I’ll dive into some examples now that will hopefully clarify the syntax and use of these expressions.
Syntax
A simple example of a CTE is as follows. Let’s say you have sales data of how much a store sold in an online store. The data is from https://www.kaggle.com/vijayuv/onlineretail.
select top 5 *
from dbo.retail_sales
InvoiceNo | StockCode | Description | Quantity | InvoiceDate | UnitPrice | CustomerID | Country |
---|---|---|---|---|---|---|---|
536365 | 85123A | WHITE HANGING HEART T-LIGHT HOLDER | 6 | 2010-12-01 08:26:00.0000000 | 2.55 | 17850 | United Kingdom |
536365 | 71053 | WHITE METAL LANTERN | 6 | 2010-12-01 08:26:00.0000000 | 3.39 | 17850 | United Kingdom |
536365 | 84406B | CREAM CUPID HEARTS COAT HANGER | 8 | 2010-12-01 08:26:00.0000000 | 2.75 | 17850 | United Kingdom |
536365 | 84029G | KNITTED UNION FLAG HOT WATER BOTTLE | 6 | 2010-12-01 08:26:00.0000000 | 3.39 | 17850 | United Kingdom |
536365 | 84029E | RED WOOLLY HOTTIE WHITE HEART. | 6 | 2010-12-01 08:26:00.0000000 | 3.39 | 17850 | United Kingdom |
For a simple example just to grasp the syntax, imagine you want to only select item with Stock Code = 22423 and then perform a wide variety of calculations on that sub-population without creating new temp tables or tables.
with cakestand_sales as (
select top 5 *
from dbo.retail_sales
where StockCode = '22423'
)
select *
from cakestand_sales
InvoiceNo | StockCode | Description | Quantity | InvoiceDate | UnitPrice | CustomerID | Country |
---|---|---|---|---|---|---|---|
536477 | 22423 | REGENCY CAKESTAND 3 TIER | 16 | 2010-12-01 12:27:00.0000000 | 10.95 | 16210 | United Kingdom |
536502 | 22423 | REGENCY CAKESTAND 3 TIER | 2 | 2010-12-01 12:36:00.0000000 | 12.75 | 16552 | United Kingdom |
536525 | 22423 | REGENCY CAKESTAND 3 TIER | 2 | 2010-12-01 12:54:00.0000000 | 12.75 | 14078 | United Kingdom |
536528 | 22423 | REGENCY CAKESTAND 3 TIER | 1 | 2010-12-01 13:17:00.0000000 | 12.75 | 15525 | United Kingdom |
536530 | 22423 | REGENCY CAKESTAND 3 TIER | 1 | 2010-12-01 13:21:00.0000000 | 12.75 | 17905 | United Kingdom |
Now, let’s say you want to calculate the average amount of each item sold per day and compare that to the number sold in a particular sale and you want to do this in one query.
with avg_quantity as (
SELECT InvoiceDate, StockCode, avg(Quantity) as avg_quantity
FROM DBO.RETAIL_SALES
GROUP BY InvoiceDate, StockCode
)
select a.InvoiceDate,a.InvoiceNo, a.StockCode, a.Quantity daily_quantity, b.avg_quantity, abs(a.Quantity - b.avg_quantity) difference
from dbo.retail_sales a
left join avg_quantity b
on a.stockCode = b.stockCode and a.InvoiceDate = b.InvoiceDate
InvoiceDate | InvoiceNo | StockCode | daily_quantity | avg_quantity | difference |
---|---|---|---|---|---|
2010-12-01 08:26:00.0000000 | 536365 | 85123A | 6 | 6 | 0 |
2010-12-01 08:26:00.0000000 | 536365 | 71053 | 6 | 6 | 0 |
2010-12-01 08:26:00.0000000 | 536365 | 84406B | 8 | 8 | 0 |
2010-12-01 08:26:00.0000000 | 536365 | 84029G | 6 | 6 | 0 |
2010-12-01 08:26:00.0000000 | 536365 | 84029E | 6 | 6 | 0 |
2010-12-01 08:26:00.0000000 | 536365 | 22752 | 2 | 2 | 0 |
2010-12-01 08:26:00.0000000 | 536365 | 21730 | 6 | 6 | 0 |
2010-12-01 08:28:00.0000000 | 536366 | 22633 | 6 | 6 | 0 |
2010-12-01 08:28:00.0000000 | 536366 | 22632 | 6 | 6 | 0 |
2010-12-01 08:34:00.0000000 | 536367 | 84879 | 32 | 32 | 0 |
In these two examples, we build one CTE and then applied it to a query. However, you can build as many as you want. Let’s take our previous query but say we also want to add two columns indicating the max and min quantities sold in a day for all products across the store. Then, we could write the following query:
with avg_quantity as (
SELECT InvoiceDate, StockCode, avg(Quantity) as avg_quantity
FROM DBO.RETAIL_SALES
GROUP BY InvoiceDate, StockCode
),
min_max_all_products as (
SELECT InvoiceDate, max(Quantity) as max_quantity, min(Quantity) as min_quantity
FROM dbo.retail_sales
group by InvoiceDate
)
select a.InvoiceDate,a.InvoiceNo, a.StockCode, a.Quantity daily_quantity, b.avg_quantity, abs(a.Quantity - b.avg_quantity) difference, c.max_quantity, c.min_quantity
from dbo.retail_sales a
left join avg_quantity b
on a.stockCode = b.stockCode and a.InvoiceDate = b.InvoiceDate
left join min_max_all_products c
on a.InvoiceDate = c.InvoiceDate
InvoiceDate | InvoiceNo | StockCode | daily_quantity | avg_quantity | difference | max_quantity | min_quantity |
---|---|---|---|---|---|---|---|
2010-12-01 08:26:00.0000000 | 536365 | 85123A | 6 | 6 | 0 | 8 | 2 |
2010-12-01 08:26:00.0000000 | 536365 | 71053 | 6 | 6 | 0 | 8 | 2 |
2010-12-01 08:26:00.0000000 | 536365 | 84406B | 8 | 8 | 0 | 8 | 2 |
2010-12-01 08:26:00.0000000 | 536365 | 84029G | 6 | 6 | 0 | 8 | 2 |
2010-12-01 08:26:00.0000000 | 536365 | 84029E | 6 | 6 | 0 | 8 | 2 |
2010-12-01 08:26:00.0000000 | 536365 | 22752 | 2 | 2 | 0 | 8 | 2 |
2010-12-01 08:26:00.0000000 | 536365 | 21730 | 6 | 6 | 0 | 8 | 2 |
2010-12-01 08:28:00.0000000 | 536366 | 22633 | 6 | 6 | 0 | 6 | 6 |
2010-12-01 08:28:00.0000000 | 536366 | 22632 | 6 | 6 | 0 | 6 | 6 |
2010-12-01 08:34:00.0000000 | 536367 | 84879 | 32 | 32 | 0 | 32 | 2 |