December 5, 2013 at 9:26 am
Ah, there's no higher form of flattery than plagiarism... So I was once told when a Fortune 500 company "used" my materials.
In the "some good can come from this" category, I didn't know this book existed, and I'm a HUGE fan of window functions...the idea that someone put out a book that blends T-SQL with Window Functions was enough to make me pop over to Amazon before responding to this thread.
As far as window functions we use them a TON in our financial institution BI/Analytical work (we do analytics/strategy integration and Data Warehousing for Credit Unions)
Being able to put and control counters in queries has been the difference between "oh, that's going to be an ugly query" a couple years ago to "piece of cake" now.
One of the great uses is controlling the sequence of records when no reliable sequencer exists in the source data. For example, one of the credit union systems we work with has a "sequence number" for phone number records. Seq=1 is the primary phone number... UNLESS sequence numbers do not get reset when old phone numbers are deleted (they do the same thing with email addresses). Seq = 1 will get you the primary phone/email if and only if the original primary hasn't been deleted... and we all know the hazards of trying to get a MIN() in there and the query performance that results from a subquery...If you are controlling the Fetch (or want to show all possible numbers), it doesn't matter if the sequence number doesn't start at 1. If you are trying to create a report that only contains the one primary contact number...it's a hassle.
Before:
select ind.individual_id, ind.last_name, ph.phone_str
from individual ind
inner join phone ph
on ind.individual_id = ph.individual_id
and ph.seq =
(select min(seq)
from phone ph2
where ph2.individual_id = ph.individual_id
)
where ind.individual_id = 123456
With a Window:
with phone_help as
(
select ind.individual_id, ind.last_name, ph.phone_str,
row_number() over (partition by ph.individual_id order by ph.seq) as accurate_sequence /*Make an accurate sequence nbr that is predicatable*/
from db2inst1.individual ind
inner join db2inst1.phone ph
on ind.individual_id = ph.individual_id
)
select * from phone_help
where individual_id = 123456
and accurate_sequence = 1
Using Row_number() over is a technique we use all the time when we need to systematically control the order of things especially when that order does not always fall into the structure we need for analysis.
It is also useful to create "tools" to help with dates as well. We have a view that uses the Row_Number() over ... to put a "sequence" on list of month end business dates (there is a table full of retained business dates in the warehouse). The most recent date is 1, the most recent month end is 2, before that is 3... and so on. If we need to do a trending report that looks at data over the last 6 month ends, filter on sequence <= 7 (today=1 + 6 month ends). Need 12 months? Sequence <=13. No more pesky date math.
select eom.month_end_date, count(*) as open_loan_count, sum(balance) as total_blance
from v_monthend_load_dates eom
inner join loan ln
on eom.month_end_date = ln.load_date
and ln.closed = false
where eom sequence between <= 7
(in our view, 1= today, and may not be a month end... so using sequence <=7 would give me the most recent data date, and the 6 month ends dates prior to that.)
Tada.
Window functions are the greatest thing to happen to SQL since Microsoft put the "T" in it.
December 5, 2013 at 9:29 am
Ranks CustomerID and Sales Order number by Total due (Using the AdventureWorks2012 database):
select
h.CustomerID,
h.SalesOrderNumber,
h.TotalDue,
rank() over (order by h.TotalDue desc) as TotalRank
from Sales.SalesOrderHeader h
December 5, 2013 at 9:29 am
SELECT
Name
,RANK() OVER (Partitin By Age ORDER BY Date)
FROM
dbo.Person WITH (NOLOCK);
December 5, 2013 at 9:30 am
Keep up the good work!
SELECT
Id
,[1] As Note1
,[2] As Note2
,[3] As Note3
FROM
(
SELECT a.Id, a.Note
, RANK() OVER (PARTITION BY a.Id ORDER BY a.CreatedDate DESC, a.Id DESC) AS RankNum
FROM (SELECT Id, Note, CreatedDate FROM MyTable (NOLOCK)) a
) p
PIVOT
(
MAX(Note) FOR RankNum IN ([1], [2], [3])
) AS pvt
December 5, 2013 at 9:31 am
Itzik is a great instructor. Just ask him, his native language is T-SQL!
USE SQLSERVERCENTRAL.COM
;with posts as
(
SELECT RANK() OVER (ORDER BY PostDate) PostNum
,AuthorName
,PostDate
FROM Posts.PostHeader
WHERE ContentTag = 'Window Functions in T-SQL 2012'
)
SELECT AuthorName
,CASE PostNum WHEN 1 THEN 'Expert' ELSE 'Plagiarizer' END AS AuthorStatus
FROM posts
ORDER BY PostNum
;
December 5, 2013 at 9:33 am
Window functions are a great feature and very helpful in solving otherwise hard SQL problems.
Reading a good book on the subject is worth a processionals time and money.
As for this particular incident and luckily it doesn't happen often, I think it is hard to prevent pre-publication. That makes how it is handled after the fact all that more important I think. In this case I will say that I am impressed how SQL Server Central its handling it. Attention to the true authors available work and handing out copies seems a just response to me. I hope the author sees it similar and that overall it gets people interested to see more of his work now they know where to look for it.
As for free stuff...I leave that to others. I am perfectly fine with spending fair money on my personal development!
December 5, 2013 at 9:34 am
Handy for baking in aggregates for use in Reporting Services datasets:
use AdventureWorks2008
go
select
lastname + ', ' + firstname RepName
, CountryRegionName
, StateProvinceName
, COUNT(*) OVER(PARTITION BY CountryRegionName) RepsPerCountry
, SalesYTD SalesYTD_Personal
, SUM(SalesYTD) OVER(PARTITION BY CountryRegionName) SalesYTD_Country
from Sales.vSalesPerson
where JobTitle = 'Sales Representative'
Hey, based on all these use case examples... well.. I trust/hope this doesn't put Itzik's book in the shade 😛
December 5, 2013 at 9:35 am
SELECT
[object_idx] = ROW_NUMBER() OVER (ORDER BY [type_desc]),
[group_idx] = DENSE_RANK() OVER (ORDER BY [type_desc]),
[group_name] = [type_desc],
[group_object_idx] = ROW_NUMBER() OVER (PARTITION BY [type_desc] ORDER BY [name]),
[object_name] = [name],
[objects_in_group] = COUNT(*) OVER (PARTITION BY [type_desc])
FROM sys.objects
ORDER BY 1
December 5, 2013 at 9:39 am
I'm still using a DB with limited window functions, but they make my life much easier. I created and maintain a lims system for my soil and water laboratory and 80% of the functionality is in the SQL server.
I need window functions to create partitioned on the fly indexes so I can build quality control data.
There is Standard Deviation, but no MAD or Meadian functions, so I have to make my own. The window function will order in sets data by value of the result, and in case of a tie, by sample id.
Median is the middle of the data set. First, I need to figure out the middle of the data set. Going with slightly more simple code, I expect 2 indexes for even number of samples, and I repeat the single index with odd number of samples.
Breaking this down into temp tables with anonymous indexes may improve speed.
with RData as (
select t.analyteid,s.setgroupid,r.result,x.sampleid,x.testid,dense_rank() over (partition by t.analyteid,setgroupid order by r.result,x.sampleid) Midx
from qc.soil_runlistsXqcrun(@r1,0) x
inner join soil.results r on x.testid = r.testid
inner join soil.samples s on x.sampleid = s.sampleid
inner join soil.tests t on x.testid = t.testid
where r.qcrunid = @r1
)
,RStats as (
select analyteid,setgroupid,round(avg(result),3,1) Ravg,round(stdev(result),3,1) RSdev, count(sampleid) RSamples
from RData
group by analyteid,setgroupid
)
,
RMadIDX as (
select analyteid,setgroupid,((RSamples -1) /2 ) + 1 MAidx,((RSamples -1) /2 ) + case when RSamples % 2 = 0 then 2 else 1 end MBidx
from RStats
)
,
RMadMedian as (
select mi.analyteid,mi.setgroupid,round(avg(result),3,1) RMedian
from RData rd inner join RMadIDX mi on rd.setgroupid = mi.setgroupid and rd.analyteid = mi.analyteid
where rd.Midx between mi.MAidx and mi.MBidx
group by mi.analyteid,mi.setgroupid
)
,
RMadStats as (
select rm.analyteid,rm.setgroupid,rm.RMedian,round(abs(avg(rs.Ravg - rm.RMedian)) ,3,1) RMad
from RStats rs inner join RMadMedian rm on rs.analyteid = rm.analyteid and rs.setgroupid = rm.setgroupid
group by rm.analyteid,rm.setgroupid,rm.RMedian
)
And by posting this, I just figured out I don't need my parameterized query because all the tables I've joined in with it actually provide me all the data. So I will be making adjustments.
Thank you SQL Server Central!
December 5, 2013 at 9:40 am
Windowing functions are very helpful not just within SQL server objects development but also for ASP.NET data driven applications.
I am using them all the time in my applications to fetch the data for my business rules.
I am writing a simple example to paginate the data and BoxNo field is AlphaNumeric and user would like to see the sorted data in chronological order and not by AlphaNumeric order. For bravity, I have remove other fields and join conditions....
SELECT ROW_NUMBER() OVER (ORDER BY
(case when patindex('%[^0-9]%',BoxNo) > 5 THEN CONVERT(INT,LEFT(b.BoxNo,4))
WHEN patindex('%[^0-9]%',BoxNo) >0 THEN CONVERT(INT,LEFT(b.BoxNo,patindex('%[^0-9]%',BoxNo)-1))
ELSE CONVERT(INT,b.BoxNo)
END)
) as RowRank
FROM [dbo].[BoxRecord] AS b WHERE RowRank > @startRowIndex AND RowRank <= (@startRowIndex + @maximumRows)
December 5, 2013 at 9:44 am
/*
Delete rows that have a duplicate pathhash value and are older than the newest ID (ID column is ascending INT)
*/
;with DuplicateCandidateRows
as
(
select Id, row_number() over (partition by pathhash order by Id desc) as seqno
from [dbo].[perm_table]
where pathhash is not null
)
delete from [dbo].[perm_table]
where exists (select 'x' from DuplicateCandidateRows dc where dc.Id = perm_table.Id and dc.seqno > 1);
December 5, 2013 at 9:44 am
Here's another example of fetching a random set of rows.
We have 2 uses for this:
1. Sometimes our Credit Union Customers want to do random drawings/giveaways during the month. "Open a new membership and win a George Foreman grill!" (Yeah, that's a real example... the code is cooler than the prize).
2. Every so often those pesky examiners show up and they want to do a random sample to review new loans.
Both super easy using row number(), and you're guaranteed to get a different result every time you run the query. There are a couple of ways to do this (you could also just order on newid() and leave the window out of it, but using the Row_Number() window function, you can see the "defined" order of things...and numbers make more sense to marketing folks and auditors than GUIDs.
We typically run these queries 3 times, even though the output is random on its own. Running them more than once drives an exaggerated "higher randomness" that makes the marketing folks and examiners more comfortable with the technique (and proves the returned recordset truly is random every single time).
For the drawing: We have 3 prizes to award for new accounts opened last month... And the winners are:
select
top 3 member_nbr, open_date,
row_number() over (order by newid()) as drawing_order
from membership
where open_date between '2013-11-01 00:00:00' and '2013-11-30 00:00:00'
order by 3
Auditors are here...they want a 30% sample of loans booked in the last 6 months:
select
top 30 percent member_nbr, loan_nbr, open_date,
row_number() over (order by newid()) as random_sequence
from loan ln
where ln.open_date >= dateadd(m,-6, getdate())
order by 4
And... you could write the results of both queries to a table, add in a fancy left outer join to the query and then you could guarantee that someone couldn't win 2 months in a row, or never showed up on 2 audits within n months.
December 5, 2013 at 9:51 am
Here's a simple one I use occasionally to see what Reports are being run most often in Reporting Services:
SELECT C.[NAME]
,ROW_NUMBER() OVER (ORDER BY COUNT(*) DESC) AS [Most_Run]
,COUNT(*) AS [TIMES_RUN]
FROM [dbo].[ExecutionLog] AS E
INNER JOIN [CATALOG] AS C
ON (E.[REPORTID] = C.[ITEMID])
GROUP BY C.[NAME]
December 5, 2013 at 9:52 am
Here's a script that will return the ID closest to 75% and 95% through the table when searching by an ordered key between 1 and 4000. This can be used to partition large fact tables where you know data is being added at a constant rate over time.
The below uses the PERCENT_RANK window function.
/*
The below script generates some records in a table to emulate the surrogate key of a fact table. We are currently partitioning
fact tables logically in our SSAS database by date into Current Month, Last Month and History. The below allows you to
select arbitary surrogate keys to partition your fact table.
*/
--Check for the temo table and drop it if it exists
IF OBJECT_ID('tempdb..#FactTable') IS NOT NULL
BEGIN
DROP TABLE #FactTable;
END
--Create the table we're going to use
CREATE TABLE #FactTable
(
ImportantID INT NOT NULL,
PRIMARY KEY CLUSTERED (ImportantID)
)
--Populate it with data using the Itzik-Style Cross-Join. More info here: http://www.sqlservercentral.com/articles/T-SQL/74118/
;WITH
E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), -- 1*10^1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), -- 1*10^2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), -- 1*10^4 or 10,000 rows
E8(N) AS (SELECT 1 FROM E4 a, E4 b) -- 1*10^8 or 100,000,000 rows
INSERT #FactTable
SELECT TOP (4000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) [ImportantID]
FROM E8
--Declare the variables we'll use to set the partitions.
--We want to find the key value that is closest to 75% of the way through the table.
DECLARE @FirstPartition FLOAT = .75;
--We want to find the key value that is closest to 95% of the way through the table.
DECLARE @SecondPartition FLOAT = .95;
--Get the two "split" key values, which we can use to split the fact table into 3 sections.
SELECT 'FirstPartitionKey' [PartitionKey] ,MAX([ImportantID]) [ImportantID]
FROM
(
--The PERCENT_RANK() window funcation allows you to see where the current record sits in the ordered
--table. For an INTEGER IDENTITY(1,1) field in a table with 1000 records, the key 1 will be at 0%
--and the key 1000 will be at 100%
--The below will return the [ImportantID] value of 3000 as it is closest to 75% between 1 and 4000
SELECT [ImportantID], PERCENT_RANK() OVER (ORDER BY [ImportantID]) [ImportantID_PercentRank]
FROM #FactTable
) [Partitions]
WHERE [ImportantID_PercentRank] < @FirstPartition
UNION ALL
--The below will return the [ImportantID] value of 3800 as it is closest to 95% between 1 and 4000
SELECT 'SecondPartitionKey' [PartitionKey] ,MAX([ImportantID]) [ImportantID]
FROM
(
SELECT [ImportantID], PERCENT_RANK() OVER (ORDER BY [ImportantID]) [ImportantID_PercentRank]
FROM #FactTable
) [Partitions]
WHERE [ImportantID_PercentRank] < @SecondPartition
December 5, 2013 at 9:53 am
I have the book on Kindle, but would love to get it in paperback. It's a great book, well worth the investment.
Viewing 15 posts - 136 through 150 (of 287 total)
You must be logged in to reply to this topic. Login to reply