SQLServerCentral apologizes and you can win a book

Question

SQLServerCentral apologizes and you can win a book

Viewing 15 posts - 136 through 150 (of 287 total)

You must be logged in to reply to this topic. Login to reply

jchapman SSC Eights! Points: 989 More actions · Answer 1

Ah, there's no higher form of flattery than plagiarism... So I was once told when a Fortune 500 company "used" my materials.

In the "some good can come from this" category, I didn't know this book existed, and I'm a HUGE fan of window functions...the idea that someone put out a book that blends T-SQL with Window Functions was enough to make me pop over to Amazon before responding to this thread.

As far as window functions we use them a TON in our financial institution BI/Analytical work (we do analytics/strategy integration and Data Warehousing for Credit Unions)

Being able to put and control counters in queries has been the difference between "oh, that's going to be an ugly query" a couple years ago to "piece of cake" now.

One of the great uses is controlling the sequence of records when no reliable sequencer exists in the source data. For example, one of the credit union systems we work with has a "sequence number" for phone number records. Seq=1 is the primary phone number... UNLESS sequence numbers do not get reset when old phone numbers are deleted (they do the same thing with email addresses). Seq = 1 will get you the primary phone/email if and only if the original primary hasn't been deleted... and we all know the hazards of trying to get a MIN() in there and the query performance that results from a subquery...If you are controlling the Fetch (or want to show all possible numbers), it doesn't matter if the sequence number doesn't start at 1. If you are trying to create a report that only contains the one primary contact number...it's a hassle.

Before:

select ind.individual_id, ind.last_name, ph.phone_str

from individual ind

inner join phone ph

on ind.individual_id = ph.individual_id

and ph.seq =

(select min(seq)

from phone ph2

where ph2.individual_id = ph.individual_id

)

where ind.individual_id = 123456

With a Window:

with phone_help as

(

select ind.individual_id, ind.last_name, ph.phone_str,

row_number() over (partition by ph.individual_id order by ph.seq) as accurate_sequence /*Make an accurate sequence nbr that is predicatable*/

from db2inst1.individual ind

inner join db2inst1.phone ph

on ind.individual_id = ph.individual_id

)

select * from phone_help

where individual_id = 123456

and accurate_sequence = 1

Using Row_number() over is a technique we use all the time when we need to systematically control the order of things especially when that order does not always fall into the structure we need for analysis.

It is also useful to create "tools" to help with dates as well. We have a view that uses the Row_Number() over ... to put a "sequence" on list of month end business dates (there is a table full of retained business dates in the warehouse). The most recent date is 1, the most recent month end is 2, before that is 3... and so on. If we need to do a trending report that looks at data over the last 6 month ends, filter on sequence <= 7 (today=1 + 6 month ends). Need 12 months? Sequence <=13. No more pesky date math.

select eom.month_end_date, count(*) as open_loan_count, sum(balance) as total_blance

from v_monthend_load_dates eom

inner join loan ln

on eom.month_end_date = ln.load_date

and ln.closed = false

where eom sequence between <= 7

(in our view, 1= today, and may not be a month end... so using sequence <=7 would give me the most recent data date, and the 6 month ends dates prior to that.)

Tada.

Window functions are the greatest thing to happen to SQL since Microsoft put the "T" in it.

tthellebuyck SSC Rookie Points: 27 More actions · Answer 2

Ranks CustomerID and Sales Order number by Total due (Using the AdventureWorks2012 database):

select

h.CustomerID,

h.SalesOrderNumber,

h.TotalDue,

rank() over (order by h.TotalDue desc) as TotalRank

from Sales.SalesOrderHeader h

anbon.cheng Newbie Points: 1 More actions · Answer 3

SELECT

Name

,RANK() OVER (Partitin By Age ORDER BY Date)

FROM

dbo.Person WITH (NOLOCK);

bingbing.yu SSC Journeyman Points: 95 More actions · Answer 4

Keep up the good work!

SELECT

Id

,[1] As Note1

,[2] As Note2

,[3] As Note3

FROM

(

SELECT a.Id, a.Note

, RANK() OVER (PARTITION BY a.Id ORDER BY a.CreatedDate DESC, a.Id DESC) AS RankNum

FROM (SELECT Id, Note, CreatedDate FROM MyTable (NOLOCK)) a

) p

PIVOT

(

MAX(Note) FOR RankNum IN ([1], [2], [3])

) AS pvt

Jared Zagelbaum Valued Member Points: 62 More actions · Answer 5

Itzik is a great instructor. Just ask him, his native language is T-SQL!

USE SQLSERVERCENTRAL.COM

;with posts as

(

SELECT RANK() OVER (ORDER BY PostDate) PostNum

,AuthorName

,PostDate

FROM Posts.PostHeader

WHERE ContentTag = 'Window Functions in T-SQL 2012'

)

SELECT AuthorName

,CASE PostNum WHEN 1 THEN 'Expert' ELSE 'Plagiarizer' END AS AuthorStatus

FROM posts

ORDER BY PostNum

;

peter-757102 SSCertifiable Points: 6877 More actions · Answer 6

Window functions are a great feature and very helpful in solving otherwise hard SQL problems.

Reading a good book on the subject is worth a processionals time and money.

As for this particular incident and luckily it doesn't happen often, I think it is hard to prevent pre-publication. That makes how it is handled after the fact all that more important I think. In this case I will say that I am impressed how SQL Server Central its handling it. Attention to the true authors available work and handing out copies seems a just response to me. I hope the author sees it similar and that overall it gets people interested to see more of his work now they know where to look for it.

As for free stuff...I leave that to others. I am perfectly fine with spending fair money on my personal development!

jjturner Right there with Babe Points: 788 More actions · Answer 7

Handy for baking in aggregates for use in Reporting Services datasets:

use AdventureWorks2008

go

select

lastname + ', ' + firstname RepName

, CountryRegionName

, StateProvinceName

, COUNT(*) OVER(PARTITION BY CountryRegionName) RepsPerCountry

, SalesYTD SalesYTD_Personal

, SUM(SalesYTD) OVER(PARTITION BY CountryRegionName) SalesYTD_Country

from Sales.vSalesPerson

where JobTitle = 'Sales Representative'

Hey, based on all these use case examples... well.. I trust/hope this doesn't put Itzik's book in the shade

pierre-702284 SSC Veteran Points: 216 More actions · Answer 8

SELECT

[object_idx] = ROW_NUMBER() OVER (ORDER BY [type_desc]),

[group_idx] = DENSE_RANK() OVER (ORDER BY [type_desc]),

[group_name] = [type_desc],

[group_object_idx] = ROW_NUMBER() OVER (PARTITION BY [type_desc] ORDER BY [name]),

[object_name] = [name],

[objects_in_group] = COUNT(*) OVER (PARTITION BY [type_desc])

FROM sys.objects

ORDER BY 1

dray SSC Enthusiast Points: 107 More actions · Answer 9

I'm still using a DB with limited window functions, but they make my life much easier. I created and maintain a lims system for my soil and water laboratory and 80% of the functionality is in the SQL server.

I need window functions to create partitioned on the fly indexes so I can build quality control data.

There is Standard Deviation, but no MAD or Meadian functions, so I have to make my own. The window function will order in sets data by value of the result, and in case of a tie, by sample id.

Median is the middle of the data set. First, I need to figure out the middle of the data set. Going with slightly more simple code, I expect 2 indexes for even number of samples, and I repeat the single index with odd number of samples.

Breaking this down into temp tables with anonymous indexes may improve speed.

with RData as (

select t.analyteid,s.setgroupid,r.result,x.sampleid,x.testid,dense_rank() over (partition by t.analyteid,setgroupid order by r.result,x.sampleid) Midx

from qc.soil_runlistsXqcrun(@r1,0) x

inner join soil.results r on x.testid = r.testid

inner join soil.samples s on x.sampleid = s.sampleid

inner join soil.tests t on x.testid = t.testid

where r.qcrunid = @r1

)

,RStats as (

select analyteid,setgroupid,round(avg(result),3,1) Ravg,round(stdev(result),3,1) RSdev, count(sampleid) RSamples

from RData

group by analyteid,setgroupid

)

,

RMadIDX as (

select analyteid,setgroupid,((RSamples -1) /2 ) + 1 MAidx,((RSamples -1) /2 ) + case when RSamples % 2 = 0 then 2 else 1 end MBidx

from RStats

)

,

RMadMedian as (

select mi.analyteid,mi.setgroupid,round(avg(result),3,1) RMedian

from RData rd inner join RMadIDX mi on rd.setgroupid = mi.setgroupid and rd.analyteid = mi.analyteid

where rd.Midx between mi.MAidx and mi.MBidx

group by mi.analyteid,mi.setgroupid

)

,

RMadStats as (

select rm.analyteid,rm.setgroupid,rm.RMedian,round(abs(avg(rs.Ravg - rm.RMedian)) ,3,1) RMad

from RStats rs inner join RMadMedian rm on rs.analyteid = rm.analyteid and rs.setgroupid = rm.setgroupid

group by rm.analyteid,rm.setgroupid,rm.RMedian

)

And by posting this, I just figured out I don't need my parameterized query because all the tables I've joined in with it actually provide me all the data. So I will be making adjustments.

Thank you SQL Server Central!

nhirpara 42776 Newbie Points: 2 More actions · Answer 10

Windowing functions are very helpful not just within SQL server objects development but also for ASP.NET data driven applications.

I am using them all the time in my applications to fetch the data for my business rules.

I am writing a simple example to paginate the data and BoxNo field is AlphaNumeric and user would like to see the sorted data in chronological order and not by AlphaNumeric order. For bravity, I have remove other fields and join conditions....

SELECT ROW_NUMBER() OVER (ORDER BY

(case when patindex('%[^0-9]%',BoxNo) > 5 THEN CONVERT(INT,LEFT(b.BoxNo,4))

WHEN patindex('%[^0-9]%',BoxNo) >0 THEN CONVERT(INT,LEFT(b.BoxNo,patindex('%[^0-9]%',BoxNo)-1))

ELSE CONVERT(INT,b.BoxNo)

END)

) as RowRank

FROM [dbo].[BoxRecord] AS b WHERE RowRank > @startRowIndex AND RowRank <= (@startRowIndex + @maximumRows)

tiermark Newbie Points: 1 More actions · Answer 11

/*

Delete rows that have a duplicate pathhash value and are older than the newest ID (ID column is ascending INT)

*/

;with DuplicateCandidateRows

as

(

select Id, row_number() over (partition by pathhash order by Id desc) as seqno

from [dbo].[perm_table]

where pathhash is not null

)

delete from [dbo].[perm_table]

where exists (select 'x' from DuplicateCandidateRows dc where dc.Id = perm_table.Id and dc.seqno > 1);

jchapman SSC Eights! Points: 989 More actions · Answer 12

Here's another example of fetching a random set of rows.

We have 2 uses for this:

1. Sometimes our Credit Union Customers want to do random drawings/giveaways during the month. "Open a new membership and win a George Foreman grill!" (Yeah, that's a real example... the code is cooler than the prize).

2. Every so often those pesky examiners show up and they want to do a random sample to review new loans.

Both super easy using row number(), and you're guaranteed to get a different result every time you run the query. There are a couple of ways to do this (you could also just order on newid() and leave the window out of it, but using the Row_Number() window function, you can see the "defined" order of things...and numbers make more sense to marketing folks and auditors than GUIDs.

We typically run these queries 3 times, even though the output is random on its own. Running them more than once drives an exaggerated "higher randomness" that makes the marketing folks and examiners more comfortable with the technique (and proves the returned recordset truly is random every single time).

For the drawing: We have 3 prizes to award for new accounts opened last month... And the winners are:

select

top 3 member_nbr, open_date,

row_number() over (order by newid()) as drawing_order

from membership

where open_date between '2013-11-01 00:00:00' and '2013-11-30 00:00:00'

order by 3

Auditors are here...they want a 30% sample of loans booked in the last 6 months:

select

top 30 percent member_nbr, loan_nbr, open_date,

row_number() over (order by newid()) as random_sequence

from loan ln

where ln.open_date >= dateadd(m,-6, getdate())

order by 4

And... you could write the results of both queries to a table, add in a fancy left outer join to the query and then you could guarantee that someone couldn't win 2 months in a row, or never showed up on 2 audits within n months.

wsushkoNY SSC Enthusiast Points: 123 More actions · Answer 13

Here's a simple one I use occasionally to see what Reports are being run most often in Reporting Services:

SELECT C.[NAME]

,ROW_NUMBER() OVER (ORDER BY COUNT(*) DESC) AS [Most_Run]

,COUNT(*) AS [TIMES_RUN]

FROM [dbo].[ExecutionLog] AS E

INNER JOIN [CATALOG] AS C

ON (E.[REPORTID] = C.[ITEMID])

GROUP BY C.[NAME]

Jim Mackenzie Hall of Fame Points: 3728 More actions · Answer 14

Here's a script that will return the ID closest to 75% and 95% through the table when searching by an ordered key between 1 and 4000. This can be used to partition large fact tables where you know data is being added at a constant rate over time.

The below uses the PERCENT_RANK window function.

/*

The below script generates some records in a table to emulate the surrogate key of a fact table. We are currently partitioning

fact tables logically in our SSAS database by date into Current Month, Last Month and History. The below allows you to

select arbitary surrogate keys to partition your fact table.

*/

--Check for the temo table and drop it if it exists

IF OBJECT_ID('tempdb..#FactTable') IS NOT NULL

BEGIN

DROP TABLE #FactTable;

END

--Create the table we're going to use

CREATE TABLE #FactTable

(

ImportantID INT NOT NULL,

PRIMARY KEY CLUSTERED (ImportantID)

)

--Populate it with data using the Itzik-Style Cross-Join. More info here: http://www.sqlservercentral.com/articles/T-SQL/74118/

;WITH

E1(N) AS (

SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

), -- 1*10^1 or 10 rows

E2(N) AS (SELECT 1 FROM E1 a, E1 b), -- 1*10^2 or 100 rows

E4(N) AS (SELECT 1 FROM E2 a, E2 b), -- 1*10^4 or 10,000 rows

E8(N) AS (SELECT 1 FROM E4 a, E4 b) -- 1*10^8 or 100,000,000 rows

INSERT #FactTable

SELECT TOP (4000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) [ImportantID]

FROM E8

--Declare the variables we'll use to set the partitions.

--We want to find the key value that is closest to 75% of the way through the table.

DECLARE @FirstPartition FLOAT = .75;

--We want to find the key value that is closest to 95% of the way through the table.

DECLARE @SecondPartition FLOAT = .95;

--Get the two "split" key values, which we can use to split the fact table into 3 sections.

SELECT 'FirstPartitionKey' [PartitionKey] ,MAX([ImportantID]) [ImportantID]

FROM

(

--The PERCENT_RANK() window funcation allows you to see where the current record sits in the ordered

--table. For an INTEGER IDENTITY(1,1) field in a table with 1000 records, the key 1 will be at 0%

--and the key 1000 will be at 100%

--The below will return the [ImportantID] value of 3000 as it is closest to 75% between 1 and 4000

SELECT [ImportantID], PERCENT_RANK() OVER (ORDER BY [ImportantID]) [ImportantID_PercentRank]

FROM #FactTable

) [Partitions]

WHERE [ImportantID_PercentRank] < @FirstPartition

UNION ALL

--The below will return the [ImportantID] value of 3800 as it is closest to 95% between 1 and 4000

SELECT 'SecondPartitionKey' [PartitionKey] ,MAX([ImportantID]) [ImportantID]

FROM

(

SELECT [ImportantID], PERCENT_RANK() OVER (ORDER BY [ImportantID]) [ImportantID_PercentRank]

FROM #FactTable

) [Partitions]

WHERE [ImportantID_PercentRank] < @SecondPartition

SQL SERVER Central Forum Etiquette[/url]

dean.rother Grasshopper Points: 11 More actions · Answer 15

I have the book on Kindle, but would love to get it in paperback. It's a great book, well worth the investment.

SQLServerCentral apologizes and you can win a book

Cookies on SQLServerCentral