Posted in tSQL

Basic Use Cases of Lag and Lead

Lag and Lead are both window functions available in SQL Server 2012 and forward. Most of you have probably at least heard the term window functions, but what does that mean?  Well, first of all, it has nothing at all to do with the Windows operating system. For those of you that have done some studying up on performance tuning of queries, you’ll remember that the best way to make your query run faster is to limit the data set that you’re acting against as soon as possible in the query/procedure.

A window function will use just the dataset returned by the query, and act upon THAT while still within the query. It can do additional aggregations or perform other tasks on the initial select of the query, without requiring subqueries or other hits to the actual database table. Lag will find the value of n rows back, while lead can find you the value of n rows ahead.  Great, right?  But, what does that mean in practical terms?

Let’s set up some simple queries to illustrate.

/* please run the code setup from the attached document: setupCode.adhocsql
Then, on the same connection, run this code */

SELECT a.tripNumber
    , a.stopNumber
    , a.stopAction
    , a.actArriveDate
FROM #stops a 
WHERE a.tripNumber = 121279 
ORDER BY a.actArriveDate;

So, we can see in the results 4 stops on this leg. 

Maybe we need to find out what action is supposed to happen at the next stop. This is how we would have had to write this code in the past, which would have involved 5 table scans and 15 logical reads:

SELECT a. tripNumber
   , a.stopNumber
   , a.stopAction
   , a.actArriveDate 
   , (SELECT TOP 1 stopAction  
      FROM #stops b  
      WHERE a.tripNumber = b.tripNumber 
         AND b.actArriveDate > a.actArriveDate 
      ORDER BY b.actArriveDate) AS nextAction
FROM #stops a 
WHERE a.tripNumber = 121279 
ORDER BY a.actArriveDate;

Here are our results:

And, as we mentioned before, the statistics IO and time:

Now, though, we can use LEAD to find out what’s going on in the NEXT row.

SELECT a.tripNumber
    , a.stopNumber
    , a.stopAction
    , a.actArriveDate
    , LEAD(a.stopAction,1,'N/A') 
            OVER (PARTITION BY a.tripNumber 
                  ORDER BY a.actArriveDate) AS nextAction FROM #stops a 
WHERE a.tripNumber = 121279 
ORDER BY a.actArriveDate;

As we can see in the result set, the nextAction finds the stopAction from the next row, and we see our default N/A on the last row (because there’s no “next” row!)

Let’s break this down:

The red rectangle denotes the column that we’re choosing. The green rectangle is the offset – that is, how many rows ahead we’re looking for the value that we want. The blue rectangle denotes the default value if there is no “next” row.  The orange rectangle denotes a partition – that is, we’re going to start this over for every tripNumber. And, there must be an order by in the OVER clause for lag and lead because we need to know what will determine the “next” or “previous” row. This is denoted by the pink rectangle. Please note that the order by in the lag/lead does NOT have to be the same order by that the query uses.

This is our statistics IO and time for this query:

The Window Function version causes 1 scan of the table and 3 logical reads. Yay, less work!

LAG allows us to look back at previous rows. Say we want to see how much time elapses between stops….:

SELECT a.tripNumber
    , a.stopNumber
    , a.stopAction
    , a.actArriveDate
    , DATEDIFF(mi
         ,LAG(a.actArriveDate,1) 
                 OVER (PARTITION BY a.tripNumber 
                      ORDER BY a.actArriveDate)
         , a.actArriveDate) AS minElapsed
   , LEAD(a.stopAction,1,'N/A') 
                 OVER (PARTITION BY a.tripNumber 
                      ORDER BY a.actArriveDate) AS nextAction 
FROM #stops a 
WHERE a.tripNumber = 121279 
ORDER BY a.actArriveDate;

Here, we didn’t specify a default, and so we get NULL for the value of that field. And for this query, we also get 1 table scan and 3 logical reads!

What would this look like if we wanted to find data for a few different trip numbers?

SELECT a.tripNumber
 , a.stopNumber
 , a.stopAction
 , a.actArriveDate
 , DATEDIFF(mi
 ,LAG(a.actArriveDate,1) 
 OVER (PARTITION BY a.tripNumber 
 ORDER BY a.actArriveDate)
 , a.actArriveDate) AS minElapsed
   , LEAD(a.stopAction,1,'N/A') 
 OVER (PARTITION BY a.tripNumber 
 ORDER BY a.actArriveDate) AS nextAction 
FROM #stops a
WHERE a.tripNumber IN (121279,121230,121333) 
ORDER BY a.tripnumber, a.actArriveDate;

Our result set shows the lag/lead row calculations completed within each tripNumber:

Checking on the statistics IO, we still see 1 scan and 3 reads. Woo hoo!

These and other Window Functions can be used for some pretty advanced tSQL, but for now we’ll leave it with these simple usage cases.  Thanks for reading!