SQLServerCentral apologizes and you can win a book

  • Ah, there's no higher form of flattery than plagiarism... So I was once told when a Fortune 500 company "used" my materials.

    In the "some good can come from this" category, I didn't know this book existed, and I'm a HUGE fan of window functions...the idea that someone put out a book that blends T-SQL with Window Functions was enough to make me pop over to Amazon before responding to this thread.

    As far as window functions we use them a TON in our financial institution BI/Analytical work (we do analytics/strategy integration and Data Warehousing for Credit Unions)

    Being able to put and control counters in queries has been the difference between "oh, that's going to be an ugly query" a couple years ago to "piece of cake" now.

    One of the great uses is controlling the sequence of records when no reliable sequencer exists in the source data. For example, one of the credit union systems we work with has a "sequence number" for phone number records. Seq=1 is the primary phone number... UNLESS sequence numbers do not get reset when old phone numbers are deleted (they do the same thing with email addresses). Seq = 1 will get you the primary phone/email if and only if the original primary hasn't been deleted... and we all know the hazards of trying to get a MIN() in there and the query performance that results from a subquery...If you are controlling the Fetch (or want to show all possible numbers), it doesn't matter if the sequence number doesn't start at 1. If you are trying to create a report that only contains the one primary contact number...it's a hassle.

    Before:

    select ind.individual_id, ind.last_name, ph.phone_str

    from individual ind

    inner join phone ph

    on ind.individual_id = ph.individual_id

    and ph.seq =

    (select min(seq)

    from phone ph2

    where ph2.individual_id = ph.individual_id

    )

    where ind.individual_id = 123456

    With a Window:

    with phone_help as

    (

    select ind.individual_id, ind.last_name, ph.phone_str,

    row_number() over (partition by ph.individual_id order by ph.seq) as accurate_sequence /*Make an accurate sequence nbr that is predicatable*/

    from db2inst1.individual ind

    inner join db2inst1.phone ph

    on ind.individual_id = ph.individual_id

    )

    select * from phone_help

    where individual_id = 123456

    and accurate_sequence = 1

    Using Row_number() over is a technique we use all the time when we need to systematically control the order of things especially when that order does not always fall into the structure we need for analysis.

    It is also useful to create "tools" to help with dates as well. We have a view that uses the Row_Number() over ... to put a "sequence" on list of month end business dates (there is a table full of retained business dates in the warehouse). The most recent date is 1, the most recent month end is 2, before that is 3... and so on. If we need to do a trending report that looks at data over the last 6 month ends, filter on sequence <= 7 (today=1 + 6 month ends). Need 12 months? Sequence <=13. No more pesky date math.

    select eom.month_end_date, count(*) as open_loan_count, sum(balance) as total_blance

    from v_monthend_load_dates eom

    inner join loan ln

    on eom.month_end_date = ln.load_date

    and ln.closed = false

    where eom sequence between <= 7

    (in our view, 1= today, and may not be a month end... so using sequence <=7 would give me the most recent data date, and the 6 month ends dates prior to that.)

    Tada.

    Window functions are the greatest thing to happen to SQL since Microsoft put the "T" in it.

  • Ranks CustomerID and Sales Order number by Total due (Using the AdventureWorks2012 database):

    select

    h.CustomerID,

    h.SalesOrderNumber,

    h.TotalDue,

    rank() over (order by h.TotalDue desc) as TotalRank

    from Sales.SalesOrderHeader h

  • SELECT

    Name

    ,RANK() OVER (Partitin By Age ORDER BY Date)

    FROM

    dbo.Person WITH (NOLOCK);

  • Keep up the good work!

    SELECT

    Id

    ,[1] As Note1

    ,[2] As Note2

    ,[3] As Note3

    FROM

    (

    SELECT a.Id, a.Note

    , RANK() OVER (PARTITION BY a.Id ORDER BY a.CreatedDate DESC, a.Id DESC) AS RankNum

    FROM (SELECT Id, Note, CreatedDate FROM MyTable (NOLOCK)) a

    ) p

    PIVOT

    (

    MAX(Note) FOR RankNum IN ([1], [2], [3])

    ) AS pvt

  • Itzik is a great instructor. Just ask him, his native language is T-SQL!

    USE SQLSERVERCENTRAL.COM

    ;with posts as

    (

    SELECT RANK() OVER (ORDER BY PostDate) PostNum

    ,AuthorName

    ,PostDate

    FROM Posts.PostHeader

    WHERE ContentTag = 'Window Functions in T-SQL 2012'

    )

    SELECT AuthorName

    ,CASE PostNum WHEN 1 THEN 'Expert' ELSE 'Plagiarizer' END AS AuthorStatus

    FROM posts

    ORDER BY PostNum

    ;

  • Window functions are a great feature and very helpful in solving otherwise hard SQL problems.

    Reading a good book on the subject is worth a processionals time and money.

    As for this particular incident and luckily it doesn't happen often, I think it is hard to prevent pre-publication. That makes how it is handled after the fact all that more important I think. In this case I will say that I am impressed how SQL Server Central its handling it. Attention to the true authors available work and handing out copies seems a just response to me. I hope the author sees it similar and that overall it gets people interested to see more of his work now they know where to look for it.

    As for free stuff...I leave that to others. I am perfectly fine with spending fair money on my personal development!

  • Handy for baking in aggregates for use in Reporting Services datasets:

    use AdventureWorks2008

    go

    select

    lastname + ', ' + firstname RepName

    , CountryRegionName

    , StateProvinceName

    , COUNT(*) OVER(PARTITION BY CountryRegionName) RepsPerCountry

    , SalesYTD SalesYTD_Personal

    , SUM(SalesYTD) OVER(PARTITION BY CountryRegionName) SalesYTD_Country

    from Sales.vSalesPerson

    where JobTitle = 'Sales Representative'

    Hey, based on all these use case examples... well.. I trust/hope this doesn't put Itzik's book in the shade 😛

  • SELECT

    [object_idx] = ROW_NUMBER() OVER (ORDER BY [type_desc]),

    [group_idx] = DENSE_RANK() OVER (ORDER BY [type_desc]),

    [group_name] = [type_desc],

    [group_object_idx] = ROW_NUMBER() OVER (PARTITION BY [type_desc] ORDER BY [name]),

    [object_name] = [name],

    [objects_in_group] = COUNT(*) OVER (PARTITION BY [type_desc])

    FROM sys.objects

    ORDER BY 1

  • I'm still using a DB with limited window functions, but they make my life much easier. I created and maintain a lims system for my soil and water laboratory and 80% of the functionality is in the SQL server.

    I need window functions to create partitioned on the fly indexes so I can build quality control data.

    There is Standard Deviation, but no MAD or Meadian functions, so I have to make my own. The window function will order in sets data by value of the result, and in case of a tie, by sample id.

    Median is the middle of the data set. First, I need to figure out the middle of the data set. Going with slightly more simple code, I expect 2 indexes for even number of samples, and I repeat the single index with odd number of samples.

    Breaking this down into temp tables with anonymous indexes may improve speed.

    with RData as (

    select t.analyteid,s.setgroupid,r.result,x.sampleid,x.testid,dense_rank() over (partition by t.analyteid,setgroupid order by r.result,x.sampleid) Midx

    from qc.soil_runlistsXqcrun(@r1,0) x

    inner join soil.results r on x.testid = r.testid

    inner join soil.samples s on x.sampleid = s.sampleid

    inner join soil.tests t on x.testid = t.testid

    where r.qcrunid = @r1

    )

    ,RStats as (

    select analyteid,setgroupid,round(avg(result),3,1) Ravg,round(stdev(result),3,1) RSdev, count(sampleid) RSamples

    from RData

    group by analyteid,setgroupid

    )

    ,

    RMadIDX as (

    select analyteid,setgroupid,((RSamples -1) /2 ) + 1 MAidx,((RSamples -1) /2 ) + case when RSamples % 2 = 0 then 2 else 1 end MBidx

    from RStats

    )

    ,

    RMadMedian as (

    select mi.analyteid,mi.setgroupid,round(avg(result),3,1) RMedian

    from RData rd inner join RMadIDX mi on rd.setgroupid = mi.setgroupid and rd.analyteid = mi.analyteid

    where rd.Midx between mi.MAidx and mi.MBidx

    group by mi.analyteid,mi.setgroupid

    )

    ,

    RMadStats as (

    select rm.analyteid,rm.setgroupid,rm.RMedian,round(abs(avg(rs.Ravg - rm.RMedian)) ,3,1) RMad

    from RStats rs inner join RMadMedian rm on rs.analyteid = rm.analyteid and rs.setgroupid = rm.setgroupid

    group by rm.analyteid,rm.setgroupid,rm.RMedian

    )

    And by posting this, I just figured out I don't need my parameterized query because all the tables I've joined in with it actually provide me all the data. So I will be making adjustments.

    Thank you SQL Server Central!

  • Windowing functions are very helpful not just within SQL server objects development but also for ASP.NET data driven applications.

    I am using them all the time in my applications to fetch the data for my business rules.

    I am writing a simple example to paginate the data and BoxNo field is AlphaNumeric and user would like to see the sorted data in chronological order and not by AlphaNumeric order. For bravity, I have remove other fields and join conditions....

    SELECT ROW_NUMBER() OVER (ORDER BY

    (case when patindex('%[^0-9]%',BoxNo) > 5 THEN CONVERT(INT,LEFT(b.BoxNo,4))

    WHEN patindex('%[^0-9]%',BoxNo) >0 THEN CONVERT(INT,LEFT(b.BoxNo,patindex('%[^0-9]%',BoxNo)-1))

    ELSE CONVERT(INT,b.BoxNo)

    END)

    ) as RowRank

    FROM [dbo].[BoxRecord] AS b WHERE RowRank > @startRowIndex AND RowRank <= (@startRowIndex + @maximumRows)

  • /*

    Delete rows that have a duplicate pathhash value and are older than the newest ID (ID column is ascending INT)

    */

    ;with DuplicateCandidateRows

    as

    (

    select Id, row_number() over (partition by pathhash order by Id desc) as seqno

    from [dbo].[perm_table]

    where pathhash is not null

    )

    delete from [dbo].[perm_table]

    where exists (select 'x' from DuplicateCandidateRows dc where dc.Id = perm_table.Id and dc.seqno > 1);

  • Here's another example of fetching a random set of rows.

    We have 2 uses for this:

    1. Sometimes our Credit Union Customers want to do random drawings/giveaways during the month. "Open a new membership and win a George Foreman grill!" (Yeah, that's a real example... the code is cooler than the prize).

    2. Every so often those pesky examiners show up and they want to do a random sample to review new loans.

    Both super easy using row number(), and you're guaranteed to get a different result every time you run the query. There are a couple of ways to do this (you could also just order on newid() and leave the window out of it, but using the Row_Number() window function, you can see the "defined" order of things...and numbers make more sense to marketing folks and auditors than GUIDs.

    We typically run these queries 3 times, even though the output is random on its own. Running them more than once drives an exaggerated "higher randomness" that makes the marketing folks and examiners more comfortable with the technique (and proves the returned recordset truly is random every single time).

    For the drawing: We have 3 prizes to award for new accounts opened last month... And the winners are:

    select

    top 3 member_nbr, open_date,

    row_number() over (order by newid()) as drawing_order

    from membership

    where open_date between '2013-11-01 00:00:00' and '2013-11-30 00:00:00'

    order by 3

    Auditors are here...they want a 30% sample of loans booked in the last 6 months:

    select

    top 30 percent member_nbr, loan_nbr, open_date,

    row_number() over (order by newid()) as random_sequence

    from loan ln

    where ln.open_date >= dateadd(m,-6, getdate())

    order by 4

    And... you could write the results of both queries to a table, add in a fancy left outer join to the query and then you could guarantee that someone couldn't win 2 months in a row, or never showed up on 2 audits within n months.

  • Here's a simple one I use occasionally to see what Reports are being run most often in Reporting Services:

    SELECT C.[NAME]

    ,ROW_NUMBER() OVER (ORDER BY COUNT(*) DESC) AS [Most_Run]

    ,COUNT(*) AS [TIMES_RUN]

    FROM [dbo].[ExecutionLog] AS E

    INNER JOIN [CATALOG] AS C

    ON (E.[REPORTID] = C.[ITEMID])

    GROUP BY C.[NAME]

  • Here's a script that will return the ID closest to 75% and 95% through the table when searching by an ordered key between 1 and 4000. This can be used to partition large fact tables where you know data is being added at a constant rate over time.

    The below uses the PERCENT_RANK window function.

    /*

    The below script generates some records in a table to emulate the surrogate key of a fact table. We are currently partitioning

    fact tables logically in our SSAS database by date into Current Month, Last Month and History. The below allows you to

    select arbitary surrogate keys to partition your fact table.

    */

    --Check for the temo table and drop it if it exists

    IF OBJECT_ID('tempdb..#FactTable') IS NOT NULL

    BEGIN

    DROP TABLE #FactTable;

    END

    --Create the table we're going to use

    CREATE TABLE #FactTable

    (

    ImportantID INT NOT NULL,

    PRIMARY KEY CLUSTERED (ImportantID)

    )

    --Populate it with data using the Itzik-Style Cross-Join. More info here: http://www.sqlservercentral.com/articles/T-SQL/74118/

    ;WITH

    E1(N) AS (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ), -- 1*10^1 or 10 rows

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), -- 1*10^2 or 100 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), -- 1*10^4 or 10,000 rows

    E8(N) AS (SELECT 1 FROM E4 a, E4 b) -- 1*10^8 or 100,000,000 rows

    INSERT #FactTable

    SELECT TOP (4000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) [ImportantID]

    FROM E8

    --Declare the variables we'll use to set the partitions.

    --We want to find the key value that is closest to 75% of the way through the table.

    DECLARE @FirstPartition FLOAT = .75;

    --We want to find the key value that is closest to 95% of the way through the table.

    DECLARE @SecondPartition FLOAT = .95;

    --Get the two "split" key values, which we can use to split the fact table into 3 sections.

    SELECT 'FirstPartitionKey' [PartitionKey] ,MAX([ImportantID]) [ImportantID]

    FROM

    (

    --The PERCENT_RANK() window funcation allows you to see where the current record sits in the ordered

    --table. For an INTEGER IDENTITY(1,1) field in a table with 1000 records, the key 1 will be at 0%

    --and the key 1000 will be at 100%

    --The below will return the [ImportantID] value of 3000 as it is closest to 75% between 1 and 4000

    SELECT [ImportantID], PERCENT_RANK() OVER (ORDER BY [ImportantID]) [ImportantID_PercentRank]

    FROM #FactTable

    ) [Partitions]

    WHERE [ImportantID_PercentRank] < @FirstPartition

    UNION ALL

    --The below will return the [ImportantID] value of 3800 as it is closest to 95% between 1 and 4000

    SELECT 'SecondPartitionKey' [PartitionKey] ,MAX([ImportantID]) [ImportantID]

    FROM

    (

    SELECT [ImportantID], PERCENT_RANK() OVER (ORDER BY [ImportantID]) [ImportantID_PercentRank]

    FROM #FactTable

    ) [Partitions]

    WHERE [ImportantID_PercentRank] < @SecondPartition

    SQL SERVER Central Forum Etiquette[/url]

  • I have the book on Kindle, but would love to get it in paperback. It's a great book, well worth the investment.

Viewing 15 posts - 136 through 150 (of 287 total)

You must be logged in to reply to this topic. Login to reply