SQL Tips for Beginners
Here I’ve complied some simple SQL tips and queries to help new programmers or anyone that needs a refresher. This is not intended to be all incompassing but just contains the things I use reguarly when viewing and filtering data in SQL. All of this information is for Microsoft SQL Server Management Studio on a PC.
Select Database
About 60% of the time that I open SQL, I go to run a query and get the following error: “Invalid object name”. After about 30 seconds of panic in which I double check my query for any typos, I realize I forgot to add that essential line at the top of my script: “USE database”. In order to use any tables or stored procedures etc. in a query that are saved in a database, you need to let your SQL server know which database you’re using within the server. You only need to run that line of code once and then you can access all tables, stored procedures etc. in that database.
Running a Query
In Microsoft SQL, to run a SQL query, you can use the Excute button with the green arrow button above your query. If you just click the button, your entire query will run. If want to run certain lines of of your SQL query, you can select those lines under the shift key or highlighting them with your mouse then click execute. On your keyboard, the F5 button also works to execute your query.
Building a Query
SQL queries are essentially built on the use of five verbs: SELECT, FROM, WHERE, GROUP BY AND HAVING and are built in that order:
From
The “FROM” function identifies which table/view/object you want to pull data from. To reference a table/object outside of your database write “FROM SCHEMA_NAME.TABLE_NAME”.
Select
SELECT indicates what you want returned in the query. If a field or calculation is not listed in the SELECT statement, it will not appear in the result of the query. If you want to return all columns, use an asterick sign ("SELECT *") as a shortcut to listing out all the column names. In the following example, the query selects two fields from the table dbo.retail_sales and outputs two other columns that are the product of the fields outputted:
select Quantity,
UnitPrice,
Quantity * UnitPrice,
Quantity * UnitPrice as Revenue
from dbo.retail_sales
Quantity | UnitPrice | Revenue | |
---|---|---|---|
6 | 2.55 | 15.30 | 15.30 |
6 | 3.39 | 20.34 | 20.34 |
8 | 2.75 | 22.00 | 22.00 |
6 | 3.39 | 20.34 | 20.34 |
6 | 3.39 | 20.34 | 20.34 |
2 | 7.65 | 15.30 | 15.30 |
6 | 4.25 | 25.50 | 25.50 |
6 | 1.85 | 11.10 | 11.10 |
6 | 1.85 | 11.10 | 11.10 |
32 | 1.69 | 54.08 | 54.08 |
The final two fields outputted above are the exact same calculation but the final field just is renamed for a cleaner and easier to follow output. Next, let’s say you just want to glipse at a couple rows in your output because even here you can see this data goes on for almost 1,000 more rows. Tod do this, just add the word “TOP” after your SELECT statement to indicate you want to show the top X number of rows in your output (the default is 10). In the example below, the query outputs the first five rows of the table. If you delete the “5” after the word TOP,it would automatically default to returning the first ten rows of data.
SELECT TOP 5 *
FROM dbo.retail_sales
InvoiceNo | StockCode | Description | Quantity | InvoiceDate | UnitPrice | CustomerID | Country |
---|---|---|---|---|---|---|---|
536365 | 85123A | WHITE HANGING HEART T-LIGHT HOLDER | 6 | 2010-12-01 08:26:00.0000000 | 2.55 | 17850 | United Kingdom |
536365 | 71053 | WHITE METAL LANTERN | 6 | 2010-12-01 08:26:00.0000000 | 3.39 | 17850 | United Kingdom |
536365 | 84406B | CREAM CUPID HEARTS COAT HANGER | 8 | 2010-12-01 08:26:00.0000000 | 2.75 | 17850 | United Kingdom |
536365 | 84029G | KNITTED UNION FLAG HOT WATER BOTTLE | 6 | 2010-12-01 08:26:00.0000000 | 3.39 | 17850 | United Kingdom |
536365 | 84029E | RED WOOLLY HOTTIE WHITE HEART. | 6 | 2010-12-01 08:26:00.0000000 | 3.39 | 17850 | United Kingdom |
You can also use the key word “DISTINCT” that will return the distinct records for the fields in your select statment (ex: SELECT DISTINCT * would return the distinct rows of all fields). This function works similarly to the remove duplicates functionality in Excel.
Where
Okay here is where things get fun y’all. The WHERE statement enables you to filter your data. Similar to filters in other programming languages, you can apply conditional filtering using “AND” and “OR”. Viewing data is great but filtering is really when you get to play around. The key to filtering in SQL (and honestly most other programming languages) is remembering PEMDAS and making sure your parentheses are not only placed in the right location but that you’re not missing any closed paranetheses. This is one of the most common mistakes I make when querying so if you’re getting an error make sure to double check that everything is in the right place and closed.
In the two examples below, you can see that the query intends to focus on two invoice numbers and filter to only show records where the unit price is less than 2.00. However, the second example does not use the parentheses and in its output, you can see two records where the unit price is greater than 2 because the calculation moves from left to right so the only filter on these two records is InvoiceNo = 537446.
SELECT top 5 *
FROM dbo.retail_sales
where unitPrice < 2.00 and (invoiceno = '536365' OR InvoiceNo = '537446')
InvoiceNo | StockCode | Description | Quantity | InvoiceDate | UnitPrice | CustomerID | Country |
---|---|---|---|---|---|---|---|
537446 | 21135 | VICTORIAN METAL POSTCARD SPRING | 8 | 2010-12-07 09:13:00.0000000 | 1.69 | 18055 | United Kingdom |
537446 | 21397 | BLUE POLKADOT EGG CUP | 12 | 2010-12-07 09:13:00.0000000 | 1.25 | 18055 | United Kingdom |
537446 | 22948 | METAL DECORATION NAUGHTY CHILDREN | 24 | 2010-12-07 09:13:00.0000000 | 0.85 | 18055 | United Kingdom |
537446 | 46000M | POLYESTER FILLER PAD 45x45cm | 2 | 2010-12-07 09:13:00.0000000 | 1.55 | 18055 | United Kingdom |
537446 | 72741 | GRAND CHOCOLATECANDLE | 18 | 2010-12-07 09:13:00.0000000 | 1.45 | 18055 | United Kingdom |
SELECT top 5 *
FROM dbo.retail_sales
where unitPrice < 2.00 and invoiceno = '536365' OR InvoiceNo = '537446'
InvoiceNo | StockCode | Description | Quantity | InvoiceDate | UnitPrice | CustomerID | Country |
---|---|---|---|---|---|---|---|
537446 | 20782 | CAMOUFLAGE EAR MUFF HEADPHONES | 4 | 2010-12-07 09:13:00.0000000 | 5.49 | 18055 | United Kingdom |
537446 | 21135 | VICTORIAN METAL POSTCARD SPRING | 8 | 2010-12-07 09:13:00.0000000 | 1.69 | 18055 | United Kingdom |
537446 | 21175 | GIN + TONIC DIET METAL SIGN | 12 | 2010-12-07 09:13:00.0000000 | 2.10 | 18055 | United Kingdom |
537446 | 21397 | BLUE POLKADOT EGG CUP | 12 | 2010-12-07 09:13:00.0000000 | 1.25 | 18055 | United Kingdom |
537446 | 21411 | GINGHAM HEART DOORSTOP RED | 3 | 2010-12-07 09:13:00.0000000 | 4.25 | 18055 | United Kingdom |
Group By
The group by statement allows you to group the data by at least one field and then output calculations based off that grouping. The most common calculations that I use regularly are max(), min() and count(). Min and max both return the smallest or largest value respectively of hte field provided by the fields grouped by. For example, in our sample data, grouping by the InvoiceNo and returning the max(UnitPrice) would return the largest unit price of an item sold in each invoiceNo. Let’s look at some examples:
SELECT InvoiceNo , InvoiceDate, max(unitprice) as highest_price
FROM dbo.retail_sales
GROUP BY InvoiceNo , InvoiceDate
InvoiceNo | InvoiceDate | highest_price |
---|---|---|
561521 | 2011-07-27 17:06:00.0000000 | 4.65 |
544582 | 2011-02-21 14:38:00.0000000 | 5.75 |
569493 | 2011-10-04 13:35:00.0000000 | 12.50 |
C551463 | 2011-04-28 16:13:00.0000000 | 10.40 |
553466 | 2011-05-17 11:10:00.0000000 | 0.72 |
561060 | 2011-07-24 13:30:00.0000000 | 4.15 |
565328 | 2011-09-02 13:50:00.0000000 | 8.50 |
C577815 | 2011-11-22 09:16:00.0000000 | 4.95 |
551888 | 2011-05-05 09:29:00.0000000 | 12.75 |
C571336 | 2011-10-17 11:43:00.0000000 | 3.75 |
A useful trick is using the count(*) or count(distinct FIELD_NAME) functions as the aggregate in a group by query to get the row count for each group:
SELECT InvoiceNo , count(distinct stockcode) as product_ct
FROM dbo.retail_sales
GROUP BY InvoiceNo
InvoiceNo | product_ct |
---|---|
536365 | 7 |
536366 | 2 |
536367 | 12 |
536368 | 4 |
536369 | 1 |
536370 | 20 |
536371 | 1 |
536372 | 2 |
536373 | 16 |
536374 | 1 |
Having
The final verb, HAVING, is similar to a where clause in that it filters data, but HAVING is only utilized when you have a group by statement. If you have a grouped query and want to filter it, you can use the HAVING statement to filter the rows of your output.
For example, if we add on to our previous example where we calculated the number of distinct products in each invoice to only return invoices with more than 10 products we would do the following:
SELECT InvoiceNo , COUNT(DISTINCT stockcode) as product_ct
FROM dbo.retail_sales
GROUP BY InvoiceNo
HAVING COUNT(DISTINCT stockcode) > 10
InvoiceNo | product_ct |
---|---|
536367 | 12 |
536370 | 20 |
536373 | 16 |
536375 | 16 |
536378 | 19 |
536381 | 34 |
536382 | 12 |
536384 | 13 |
536388 | 14 |
536389 | 14 |
This has hopfeully been a useful quick overview of how to build queries in SQL and how to get started. Next up, I’m going to write up how to do simple joins in SQL so you can link your tables together.