Posted in PASS

The T-SQL Tuesday Post I Should Have Written

The November 2017 topic for T-SQL Tuesday was “Folks Who Have Made A Difference”.  And, although I don’t really have a good excuse to not have gotten this written on time, I still want to thank those who have made a difference for me regardless.

The first thank you goes to Steve Ingalls (known to most of my friends as Channel 0 Steve). He engaged me in my first professional database project – redesigning the chemical inventory database for the research branch of the Cleveland VA Hospital. It was a summer project between college years, and it was in (I know, I know *wince*) Access, but the guidance he gave me stood me in good stead then and ever since, as does his continued wisdom and humor.

The next shout goes out to Cheryl Johnson, who was my boss for many years while I worked for The Ohio State University (well, a small section of a segment of.) She allowed me to make mistakes and encouraged me to learn from them. I learned from her to take time in my work, which resulted in pride in what I created or helped create. She also showed me that women have a very definite place in the technology world.

That brings us to the the people who showed me #sqlfamily, including the best hugs ever.  I had the honor to be in Allen White’s Powershell precon at my first PASS Summit. (@SQLRunr) We connected because I might have been the smartass in the front row who likes to talk to presenters when they ask for audience participation. Allen has been nagging supporting me to become more involved with PASS and to give back to the community.  His encouragement has really meant a lot as I try to find ways to become involved – I’m not comfortable with public speaking, but maybe I’ll get there…. someday!

And last but by no means least, thank you to Mindy Curnutt (@sqlgirl). She and Eric Blinn (@SQL2TheSequel)  built up the dream team in our DBA Services group – each one of us had good basic skills, and exceptional skills in certain areas. I learned for the first time what it’s like to really work in a team of great people, and that I was valued for my contributions. This is no small thing. She is involved in many different activities and is inspiring in her work.

Thank you, all of you – for your support, for sharing your knowledge, and for being my friends.

Posted in tSQL

Basic Use Cases of Lag and Lead

Lag and Lead are both window functions available in SQL Server 2012 and forward. Most of you have probably at least heard the term window functions, but what does that mean?  Well, first of all, it has nothing at all to do with the Windows operating system. For those of you that have done some studying up on performance tuning of queries, you’ll remember that the best way to make your query run faster is to limit the data set that you’re acting against as soon as possible in the query/procedure.

A window function will use just the dataset returned by the query, and act upon THAT while still within the query. It can do additional aggregations or perform other tasks on the initial select of the query, without requiring subqueries or other hits to the actual database table. Lag will find the value of n rows back, while lead can find you the value of n rows ahead.  Great, right?  But, what does that mean in practical terms?

Let’s set up some simple queries to illustrate.

/* please run the code setup from the attached document: setupCode.adhocsql
Then, on the same connection, run this code */

SELECT a.tripNumber
    , a.stopNumber
    , a.stopAction
    , a.actArriveDate
FROM #stops a 
WHERE a.tripNumber = 121279 
ORDER BY a.actArriveDate;

So, we can see in the results 4 stops on this leg. 

Maybe we need to find out what action is supposed to happen at the next stop. This is how we would have had to write this code in the past, which would have involved 5 table scans and 15 logical reads:

SELECT a. tripNumber
   , a.stopNumber
   , a.stopAction
   , a.actArriveDate 
   , (SELECT TOP 1 stopAction  
      FROM #stops b  
      WHERE a.tripNumber = b.tripNumber 
         AND b.actArriveDate > a.actArriveDate 
      ORDER BY b.actArriveDate) AS nextAction
FROM #stops a 
WHERE a.tripNumber = 121279 
ORDER BY a.actArriveDate;

Here are our results:

And, as we mentioned before, the statistics IO and time:

Now, though, we can use LEAD to find out what’s going on in the NEXT row.

SELECT a.tripNumber
    , a.stopNumber
    , a.stopAction
    , a.actArriveDate
    , LEAD(a.stopAction,1,'N/A') 
            OVER (PARTITION BY a.tripNumber 
                  ORDER BY a.actArriveDate) AS nextAction FROM #stops a 
WHERE a.tripNumber = 121279 
ORDER BY a.actArriveDate;

As we can see in the result set, the nextAction finds the stopAction from the next row, and we see our default N/A on the last row (because there’s no “next” row!)

Let’s break this down:

The red rectangle denotes the column that we’re choosing. The green rectangle is the offset – that is, how many rows ahead we’re looking for the value that we want. The blue rectangle denotes the default value if there is no “next” row.  The orange rectangle denotes a partition – that is, we’re going to start this over for every tripNumber. And, there must be an order by in the OVER clause for lag and lead because we need to know what will determine the “next” or “previous” row. This is denoted by the pink rectangle. Please note that the order by in the lag/lead does NOT have to be the same order by that the query uses.

This is our statistics IO and time for this query:

The Window Function version causes 1 scan of the table and 3 logical reads. Yay, less work!

LAG allows us to look back at previous rows. Say we want to see how much time elapses between stops….:

SELECT a.tripNumber
    , a.stopNumber
    , a.stopAction
    , a.actArriveDate
    , DATEDIFF(mi
         ,LAG(a.actArriveDate,1) 
                 OVER (PARTITION BY a.tripNumber 
                      ORDER BY a.actArriveDate)
         , a.actArriveDate) AS minElapsed
   , LEAD(a.stopAction,1,'N/A') 
                 OVER (PARTITION BY a.tripNumber 
                      ORDER BY a.actArriveDate) AS nextAction 
FROM #stops a 
WHERE a.tripNumber = 121279 
ORDER BY a.actArriveDate;

Here, we didn’t specify a default, and so we get NULL for the value of that field. And for this query, we also get 1 table scan and 3 logical reads!

What would this look like if we wanted to find data for a few different trip numbers?

SELECT a.tripNumber
 , a.stopNumber
 , a.stopAction
 , a.actArriveDate
 , DATEDIFF(mi
 ,LAG(a.actArriveDate,1) 
 OVER (PARTITION BY a.tripNumber 
 ORDER BY a.actArriveDate)
 , a.actArriveDate) AS minElapsed
   , LEAD(a.stopAction,1,'N/A') 
 OVER (PARTITION BY a.tripNumber 
 ORDER BY a.actArriveDate) AS nextAction 
FROM #stops a
WHERE a.tripNumber IN (121279,121230,121333) 
ORDER BY a.tripnumber, a.actArriveDate;

Our result set shows the lag/lead row calculations completed within each tripNumber:

Checking on the statistics IO, we still see 1 scan and 3 reads. Woo hoo!

These and other Window Functions can be used for some pretty advanced tSQL, but for now we’ll leave it with these simple usage cases.  Thanks for reading!