Using R to Program in SQL
Last week I was chatting with a coworker (over Skype because #2020quaratine & I never leave my apartment anymore) and we both agreed that the hardest task to recieve is “hey can you load this data into SQL?”. It doesn’t matter how many times we do it or how good we are at programming, this task is tough. It should be straightforward but even the most simple of data ends up with at least one error. As a result, after a while I realized there had to be an easier way addressing periodic tasks that needed data loaded into SQL.
For example, a project I recently worked on needed us to update a table in a SQL database on a weekly basis with the data from the previous week. So, every Monday, I had to load in the new data into SQL and then union it with the complete table containing all of our data. Instead of just loading the data into SQL (which if I’m being honest would have taken at least 15 minutes) and the unioning in it SQL, I built a Python script that loaded in the data via Python (which takes less than a couple seconds) and then updated my SQL table in the Python script. You can also do this in R and for tasks like this where you don’t need the table you’re loading in to be saved in your database, it’s a really useful trick.
Whether you’re doing it in Python, R, or another language, the general gist is the same. Within your script, you open up a connection to your SQL database and then write your SQL code within the R/Python script.
Here is how I set up the connection to a database. For the purpose of this post, I’m going to be connecting to Microsoft SQL Server and my SQL express server account. In the following query, we set up the connection to the SQL database that we want to use using the odbc and DBI libraries:
library(dplyr)
library(dbplyr)
library(DBI)
## Warning: package 'DBI' was built under R version 3.6.2
library(odbc)
con <- dbConnect(odbc::odbc(),
Driver = "SQL Server",
Server = "LAALINEHAN\\SQLEXPRESS",
Database = "alinehan",
Trusted_connection = "True")
To query your data in SQL within your R script, you can use the dbGetQuery() function. Here’s a simple example of selecting some data from my SQL database:
dbGetQuery(conn = con, statement = '
select top 10 *
from dbo.retail_sales
'
)
## InvoiceNo StockCode Description Quantity
## 1 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6
## 2 536365 71053 WHITE METAL LANTERN 6
## 3 536365 84406B CREAM CUPID HEARTS COAT HANGER 8
## 4 536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6
## 5 536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6
## 6 536365 22752 SET 7 BABUSHKA NESTING BOXES 2
## 7 536365 21730 GLASS STAR FROSTED T-LIGHT HOLDER 6
## 8 536366 22633 HAND WARMER UNION JACK 6
## 9 536366 22632 HAND WARMER RED POLKA DOT 6
## 10 536367 84879 ASSORTED COLOUR BIRD ORNAMENT 32
## InvoiceDate UnitPrice CustomerID Country
## 1 2010-12-01 08:26:00.0000000 2.55 17850 United Kingdom
## 2 2010-12-01 08:26:00.0000000 3.39 17850 United Kingdom
## 3 2010-12-01 08:26:00.0000000 2.75 17850 United Kingdom
## 4 2010-12-01 08:26:00.0000000 3.39 17850 United Kingdom
## 5 2010-12-01 08:26:00.0000000 3.39 17850 United Kingdom
## 6 2010-12-01 08:26:00.0000000 7.65 17850 United Kingdom
## 7 2010-12-01 08:26:00.0000000 4.25 17850 United Kingdom
## 8 2010-12-01 08:28:00.0000000 1.85 17850 United Kingdom
## 9 2010-12-01 08:28:00.0000000 1.85 17850 United Kingdom
## 10 2010-12-01 08:34:00.0000000 1.69 13047 United Kingdom
If you want to use data from your R environment with your SQL connection you can use the paste function to build your query alongside your R objects and variables:
num <- 536365
dbGetQuery(conn = con, statement = paste0('
select *
from dbo.retail_sales
where InvoiceNo = \'', num , '\'')
)
## InvoiceNo StockCode Description Quantity
## 1 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6
## 2 536365 71053 WHITE METAL LANTERN 6
## 3 536365 84406B CREAM CUPID HEARTS COAT HANGER 8
## 4 536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6
## 5 536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6
## 6 536365 22752 SET 7 BABUSHKA NESTING BOXES 2
## 7 536365 21730 GLASS STAR FROSTED T-LIGHT HOLDER 6
## InvoiceDate UnitPrice CustomerID Country
## 1 2010-12-01 08:26:00.0000000 2.55 17850 United Kingdom
## 2 2010-12-01 08:26:00.0000000 3.39 17850 United Kingdom
## 3 2010-12-01 08:26:00.0000000 2.75 17850 United Kingdom
## 4 2010-12-01 08:26:00.0000000 3.39 17850 United Kingdom
## 5 2010-12-01 08:26:00.0000000 3.39 17850 United Kingdom
## 6 2010-12-01 08:26:00.0000000 7.65 17850 United Kingdom
## 7 2010-12-01 08:26:00.0000000 4.25 17850 United Kingdom
The query above can easly be adapted for an update statement to solve my problem above of continuously updating a SQL table.