Hello learners!
This article is next in the series of Windows Functions Part 1 (LAG and LEAD). This article focuses on other Windows Functions; i.e., First_Value and Last_Value.
As the name suggests, these are used to fetch the first value and the last value. But there is one interesting thing to notice that you can use these functions to fetch the first and last values based on the window frame. Window frame is a subset of Subset Of Rows (Partition By). This means you can create a window even inside a "Partition By" clause used to create a window based on the column name provided to "Partition By" clause.
There are some ways you can define a window frame in a table. To begin with, please have an understanding of the terminologies used to create a window.
- Unbounded Preceding
This means that the starting point of the Window Frame is the TOP ROW
- Current Row
This means that the Starting or the endpoint of the Window Frame is the CURRENT ROW.
- Unbounded Following
This means the that the End Point of the Window is the END ROW.
These can be used to create a Window Frame. I will tell you the difference between Rows and Range as well. But first, understand the concept of Window Frame. Below mentioned are some of the Windows Frame.
- Rows/Range Between Unbounded Preceding and Current Row - This means that Window Frame starts at the TOP ROW and ends in the CURRENT ROW.
- Rows/Range Between Unbounded Preceding and Unbounded Following - This means that the Window Frame starts at the TOP ROW and ends at the BOTTOM ROW.
- Rows/Range Between Current Row and Unbounded Following - This means that the Window Frame starts at the CURRENT ROW and ends at the BOTTOM ROW. This frame is also used for the reverse calculations as well. I will show you that as well.
Now, let’s start and understand First_Value and Last_Value with the Window Frames. We will create a table and fill it with some random data.
Now, our test table is ready and we will fire a T-SQL to understand how the Windows Functions work. So, let’s run our T-SQL.
Now, we will understand how this Window Frame works. The default Window Frame is Unbounded Preceding and Current Row, if we do not mention any frame. So, let’s understand the T-SQL and how it is evaluating the values.
- F_VAL= FIRST_VALUE([SALE_AMT]) OVER(ORDER BY [SALEID])
This means to get the first value in the Sale_Amt column, which is ordered by SaleID. As mentioned if we do not specify any frame it will be Unbound in the Preceding and Current Row.
- F_VAL_PART= FIRST_VALUE([SALE_AMT]) OVER(PARTITION BY [STATE] ORDER BY [SALEID])
This is evaluating the First Value in Sale_Amt column which is Partitioned on the basis of State name and ordered by the SaleID. So, it is evaluating the first value within a partition as can be seen by the results.
- F_VAL_UP_UF= FIRST_VALUE([SALE_AMT]) OVER(ORDER BY [SALEID] ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
This is evaluating the First Value between the rows from beginning to end, based on ordering. If we check, the first value is 10000 if we order by SaleID Column.
- F_VAL_UP_CR= FIRST_VALUE([SALE_AMT]) OVER(ORDER BY [SALEID] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
This is evaluating the First value between the rows from Top and Current Row. The frame will start at the Top and Current Row and check the first value in it, which is 10000. Then comes to the second row and the frame starts at the Top until the second row, hence the first value will be 10000 and it goes on till the end.
- F_VAL_CR_UF= FIRST_VALUE([SALE_AMT]) OVER(ORDER BY [SALEID] ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
This is evaluating the First value between the Current Row and the Row at the End. When the query is at Delhi – 10000, it checks the frame, which is from the Current Row till the End Row, hence the first value will be the same value as it is picking up the first value between the Current Row and the Last Row.
Now, let’s understand the Last_Value. We will run the same query but with the Last_Value.
The result will be,
As told earlier, the default frame would be Unbound in the Preceding and Current Row, so the first two queries will output the result based on that frame only. The last value functions work a bit differently, as in the case of Partition By clause. If you see, if we run the second query it will not return the Last Value inside a partition but rather working on the default frame Unbounded Preceding and Current Row, which is correct. In order to get the Last Value inside a partition, we need to use Unbounded Preceding and Unbounded Following Window Frame.
- L_VAL= LAST_VALUE([SALE_AMT]) OVER(ORDER BY [SALEID])
and
- L_VAL_PART= LAST_VALUE([SALE_AMT]) OVER(PARTITION BY [STATE] ORDER BY [SALEID])
This will return the Last Value on the frames created from Unbounded Preceding and Current Row which is the default frame.
- L_VAL_PART_UP_UF= LAST_VALUE([SALE_AMT]) OVER(PARTITION BY [STATE] ORDER BY [SALEID] ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
This is evaluating the Last Value inside a Partition.
- L_VAL_UP_UF= LAST_VALUE([SALE_AMT]) OVER(ORDER BY [SALEID] ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
This is evaluating the Last Value in the Frame starting at the Top Row and Ending at the Bottom Row, basically the last value in the Sale_Amt Column order by SaleID.
- L_VAL_UP_CR= LAST_VALUE([SALE_AMT]) OVER(ORDER BY [SALEID] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
This is evaluating the Last Value within a frame starting at the Top Row to the Current Row, hence giving the same values. As you see if the query starts at Delhi 10000, this is the last value within that frame, when it comes to the next one, the frame starts at Delhi-10000 till Delhi-20200, so the Last Value is 20200 and it goes on like this.
- L_VAL_CR_UF= LAST_VALUE([SALE_AMT]) OVER(ORDER BY [SALEID] ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
This is evaluating the Last value within a frame starting at the Current Row and End Row (Bottom Row). So the Last value will always be the value at the End Row.
Now, let’s understand the difference between the Range and Rows. Here we will change the values of the Sale_Amt in order to easily understand the calculations and logic as well.
Now, we will run the query to get an understanding between Rows and Range. Let’s fire it.
When we use Range, it limits the Rows within a partition or you can say it evaluates the rows within a partition and gives you the collective result.
The first two queries are run just to show you the calculations within a Frame.
- ROW_SUM_UP_CR = SUM(SALE_AMT) OVER(ORDER BY [STATE] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AND
RANGE_SUM_UP_CR = SUM(SALE_AMT) OVER(ORDER BY [STATE] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
This is evaluating the Sum within a frame of Top Row and Current Row. Now if you see closely the sum calculation in a Range is calculating the Sum for the rows within that partition of State, whereas Rows is calculating the Sum irrespective of the Partition of State.
- ROW_SUM_CR_UF = SUM(SALE_AMT) OVER(ORDER BY [STATE] ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
AND
RANGE_SUM_CR_UF = SUM(SALE_AMT) OVER(ORDER BY [STATE] RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
This is evaluating Sum within the frame of Current Row and the End Row. This frame is also used for the REVERSE CALCULATIONS as well.
So, this is all from my side regarding the Windows Functions. I hope you have an understanding of how a Windows Function works with and without Window Frames, what Window Frames are and how many types there are, and how they work.
Thank you for taking time to read this article. Any feedback will be welcomed. Happy Learning!