User Defined Function - Passing and Processing a Table

Question

User Defined Function - Passing and Processing a Table

Viewing 15 posts - 16 through 29 (of 29 total)

You must be logged in to reply to this topic. Login to reply

Bob Fazio SSChampion Points: 10794 More actions · Answer 1

Jeff Moden (4/28/2008)[/bIt really would help if the OP would post exactly what output is expected from the input in the original post, though.

I agree with all your points, I was only trying to explain why people were still answering a question that appeared to have been answered.

The question asked was definately answered.

This kind of goes with Steve editorial today. I can't even guess at how many times I have been asked how to do something, only to find out after the fact that my "Solution" caused more problems because the person asked me the question didn't understand their problem, but was successful in asking a question that would get them the answer "they" wanted, because other attempts at asking the right question got them an answer they didn't want. (whew... talk about run on 😉 , also too lazy to fix it ) Easy software, just makes making mistakes easier.

Obviously as an old Oracle DBA, I am used to things not being easy 🙂

puneetsingh77 SSC Enthusiast Points: 135 More actions · Answer 2

Ok - in short, heres the problem.

I have variosu financial factors (P/E, P/B) performance - by country and by sector in an SQL table. This performance is on a daily basis. I need to create a mean, stdev, median, 95th percentile and 5th percentile for the same. I have managed the mean and stdev in the pivot table (using inbuilt functions). Was hoping for an elegant solution where I pass a column to a function which gives me the median for example. The structure is simple

I pass a column (containing n values) to a function. The function orders these, then chooses the n/2 value (if n is even, or (n+(n+1))/2 if n is odd) and returns this one value. Easy enough! Trouble is in passing the array.

Given the structure of the pivot table query, I can only ask it (e.g. if DATA is the name of the column) to SUM(DATA) or MEDIAN(DATA). It fails even on MEDIAN(DATA/5) (I know I can do DATA/5 while pulling out - this is just an example).

Reason I am using the pivot table is because the data is in a list format i.e. stock 1, date 1, data 1, stock 1, date2, data2... so on - with about a million plus rows. The pivot table takes a lot of hassle away in the presentation...

Jeff Moden SSC Guru Points: 1004403 More actions · Answer 3

Can't do a whole lot for you if you don't post the "simple structure" of the data you're talking about... please click on, read, and provide the information from the URL in may signature.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 4

Median is mathematically not an aggregate function per se - it's more of a windowing function (meaning - it's pulling out unchanged a speicific value in a sequence). You REALLY don't want to compute that as TransactSQL function, this should be done as a derived table and then incorporated in.

Median looks something like this in SQL 2005

select groupCol1,groupCol2, orderamount from

(select groupcol1,

groupCol2,

orderamount,

row_number() over (partition by groupcol1,groupcol2 order by orderamount) RN,

Count(*) over (partition by groupcol1,groupcol2) groupcount

from Mytable) r

where rn=ceiling((groupcount+1)*0.5)

You can pivot table if need be - but perf is going to start suffering pretty badly. Perhaps dump this to a temp table, index that and pivot on that temp table.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Jeff Moden SSC Guru Points: 1004403 More actions · Answer 5

Ok... just sharing... a million row median calculation can be fast... very fast... here's a chunk of code I posted about 3 years ago... it's demo code so you'll need to change things to suite your own tables...

--===== If the temp table already exists, drop it

IF OBJECT_ID('TempDB..#Steps') IS NOT NULL

DROP TABLE #Steps

--===== Simulate loading your real temp table with ...

-- A MILLION rows, 17576 reps (3 character "names"), a DeltaT from 0 to 1999,

-- and 4 "Steps" (Step 1, Step 2, etc)

SELECT TOP 1000000 IDENTITY(INT,1,1) AS RowNum,

CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65))

+ CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65))

+ CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65)) AS Rep,

CAST(RAND(CAST(NEWID() AS VARBINARY))*2000 AS INT) AS DeltaT,

'Step '

+ CAST(CAST(RAND(CAST(NEWID() AS VARBINARY))*4 AS INT)+1 AS VARCHAR(25)) AS MileCode

INTO #Steps

FROM Master.dbo.SYSCOLUMNS sc1,

Master.dbo.SYSCOLUMNS sc2

--===== Every table, even temporary ones, should have a Primary Key...

-- Makes all the difference in the world, performance wise, on this script.

ALTER TABLE #Steps

ADD PRIMARY KEY CLUSTERED (RowNum)

--===== And, trust me, you want this index. It's a "covering" index for both the

-- ORDER BY's, the JOINs, and the data that will appear in the code that follows.

CREATE INDEX IX_Steps ON #Steps

(Rep,MileCode,DeltaT)

--===== Return the Median for ALL reps by name and by step.

-- It returns all the "analysis" rows in just 32 seconds.

-- 1 MILLION rows, 17576 Reps, and 4 steps are simulated in the table

-- (note that because this is all random data, some reps may be missing a step, but most have 4)

-- for a total of about 70,200 Medians in just 32 seconds

-- (or 1 Median analysis in on 0.0004558 seconds)

SELECT DISTINCT

m.Rep,

m.MileCode,

(

(SELECT TOP 1 DeltaT FROM

(SELECT TOP 50 PERCENT DeltaT FROM #Steps t

WHERE t.Rep = m.Rep AND t.MileCode = m.MileCode

ORDER BY DeltaT ASC

) lo

ORDER BY DeltaT DESC)

+(SELECT TOP 1 DeltaT FROM

(SELECT TOP 50 PERCENT DeltaT FROM #Steps t

WHERE t.Rep = m.Rep AND t.MileCode = m.MileCode

ORDER BY DeltaT DESC

) hi

ORDER BY DeltaT ASC)

) / 2 AS MEDIAN

FROM #Steps m

ORDER BY m.Rep,m.MileCode

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004403 More actions · Answer 6

Darned "smiley" faces... lemme try again...

--===== If the temp table already exists, drop it

IF OBJECT_ID('TempDB..#Steps') IS NOT NULL

DROP TABLE #Steps

--===== Simulate loading your real temp table with ...

-- A MILLION rows, 17576 reps (3 character "names"), a DeltaT from 0 to 1999,

-- and 4 "Steps" (Step 1, Step 2, etc)

SELECT TOP 1000000 IDENTITY(INT,1,1) AS RowNum,

CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65))

+ CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65))

+ CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65)) AS Rep,

CAST(RAND(CAST(NEWID() AS VARBINARY))*2000 AS INT) AS DeltaT,

'Step '

+ CAST(CAST(RAND(CAST(NEWID() AS VARBINARY))*4 AS INT)+1 AS VARCHAR(25)) AS MileCode

INTO #Steps

FROM Master.dbo.SYSCOLUMNS sc1,

Master.dbo.SYSCOLUMNS sc2

--===== Every table, even temporary ones, should have a Primary Key...

-- Makes all the difference in the world, performance wise, on this script.

ALTER TABLE #Steps

ADD PRIMARY KEY CLUSTERED (RowNum)

--===== And, trust me, you want this index. It's a "covering" index for both the

-- ORDER BY's, the JOINs, and the data that will appear in the code that follows.

CREATE INDEX IX_Steps ON #Steps

(Rep,MileCode,DeltaT)

--===== Return the Median for ALL reps by name and by step.

-- It returns all the "analysis" rows in just 32 seconds.

-- 1 MILLION rows, 17576 Reps, and 4 steps are simulated in the table

-- (note that because this is all random data, some reps may be missing a step, but most have 4)

-- for a total of about 70,200 Medians in just 32 seconds

-- (or 1 Median analysis in on 0.0004558 seconds)

SELECT DISTINCT

m.Rep,

m.MileCode,

(

(SELECT TOP 1 DeltaT FROM

(SELECT TOP 50 PERCENT DeltaT FROM #Steps t

WHERE t.Rep = m.Rep AND t.MileCode = m.MileCode

ORDER BY DeltaT ASC

) lo

ORDER BY DeltaT DESC)

+(SELECT TOP 1 DeltaT FROM

(SELECT TOP 50 PERCENT DeltaT FROM #Steps t

WHERE t.Rep = m.Rep AND t.MileCode = m.MileCode

ORDER BY DeltaT DESC

) hi

ORDER BY DeltaT ASC))

/ 2 AS MEDIAN

FROM #Steps m

ORDER BY m.Rep,m.MileCode

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 7

..by the way - it's probably worth pointing out that there are multiple definitions of Median.

The solution Jeff is showing is what is called the statistical median (the average of the two middle elements in an ordered set if the set is even, and the middle item it the set is odd), and mine was the "pure math" (from ordered sets/sequential math) definition (middle element if the set is odd and the (n/2+1)th item if it's even).

Depending on how you talk to - they might expect one definition or the other.

And in terms of efficiency - if you do it as a derived table - it should be quite fast either way. The function would NOT be so lucky however.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 8

(Jeff - psst)....

(1000000 row(s) affected)

=====================

Jeff's Statistical Median Calc

=====================

(70266 row(s) affected)

SQL Server Execution Times:

CPU time = 70031 ms, elapsed time = 42039 ms.

========================

Matt's Mathematical Median Calc

========================

(70266 row(s) affected)

SQL Server Execution Times:

CPU time = 16938 ms, elapsed time = 18221 ms.

So in case someone doesn't specify which one they want - you might get away with the "faster" one....:)

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 9

One more - this is the "windowed function" version of the statistical median:

select r.Rep,r.MileCode, (r.deltat +r1.DeltaT)*.5 dt from

(select m.Rep,m.MileCode,

deltat,

row_number() over (partition by m.Rep,m.MileCode order by DeltaT) RN,

Count(*) over (partition by m.Rep,m.MileCode) groupcount

from #Steps m) r inner join

(select m.Rep,m.MileCode,

deltat,

row_number() over (partition by m.Rep,m.MileCode order by DeltaT) RN,

Count(*) over (partition by m.Rep,m.MileCode) groupcount

from #Steps m) r1 on r.rep=r1.rep and r.milecode=r1.milecode

where r.rn=ceiling((r.groupcount+1)*0.5)

and r1.rn=floor((r1.groupcount+1)*0.5)

And perf:

SQL Server Execution Times:

CPU time = 30969 ms, elapsed time = 32954 ms.

So - half as fast as the math Median, but twice as fast as the 2000 version of the statistical median....

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Jeff Moden SSC Guru Points: 1004403 More actions · Answer 10

Matt Miller (4/29/2008)
(Jeff - psst)....
(1000000 row(s) affected)
=====================
Jeff's Statistical Median Calc
=====================
(70266 row(s) affected)
SQL Server Execution Times:
CPU time = 70031 ms, elapsed time = 42039 ms.
========================
Matt's Mathematical Median Calc
========================
(70266 row(s) affected)
SQL Server Execution Times:
CPU time = 16938 ms, elapsed time = 18221 ms.
So in case someone doesn't specify which one they want - you might get away with the "faster" one....:)

Outstanding! Heh... that's what I get for not testing both sides first.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004403 More actions · Answer 11

Say... do you happen to have the full test slab for that 16 second version?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 12

Jeff Moden (4/29/2008)
Say... do you happen to have the full test slab for that 16 second version?

Sure - I stole your test setup....

--===== If the temp table already exists, drop it

IF OBJECT_ID('TempDB..#Steps') IS NOT NULL

DROP TABLE #Steps

--===== Simulate loading your real temp table with ...

-- A MILLION rows, 17576 reps (3 character "names"), a DeltaT from 0 to 1999,

-- and 4 "Steps" (Step 1, Step 2, etc)

SELECT TOP 1000000 IDENTITY(INT,1,1) AS RowNum,

CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65))

+ CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65))

+ CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65)) AS Rep,

CAST(RAND(CAST(NEWID() AS VARBINARY))*2000 AS INT) AS DeltaT,

'Step '

+ CAST(CAST(RAND(CAST(NEWID() AS VARBINARY))*4 AS INT)+1 AS VARCHAR(25)) AS MileCode

INTO #Steps

FROM Master.dbo.SYSCOLUMNS sc1,

Master.dbo.SYSCOLUMNS sc2

--===== Every table, even temporary ones, should have a Primary Key...

-- Makes all the difference in the world, performance wise, on this script.

ALTER TABLE #Steps

ADD PRIMARY KEY CLUSTERED (RowNum)

--===== And, trust me, you want this index. It's a "covering" index for both the

-- ORDER BY's, the JOINs, and the data that will appear in the code that follows.

CREATE INDEX IX_Steps ON #Steps

(Rep,MileCode,DeltaT)

go

--===== Return the Median for ALL reps by name and by step.

-- It returns all the "analysis" rows in just 32 seconds.

-- 1 MILLION rows, 17576 Reps, and 4 steps are simulated in the table

-- (note that because this is all random data, some reps may be missing a step, but most have 4)

-- for a total of about 70,200 Medians in just 32 seconds

-- (or 1 Median analysis in on 0.0004558 seconds)

Print replicate('=',50)

print 'Jeffs Statistical Median'

Print replicate('=',50)

Set statistics time on

SELECT DISTINCT

m.Rep,

m.MileCode,

(

(SELECT TOP 1 DeltaT FROM

(SELECT TOP 50 PERCENT DeltaT FROM #Steps t

WHERE t.Rep = m.Rep AND t.MileCode = m.MileCode

ORDER BY DeltaT ASC

) lo

ORDER BY DeltaT DESC)

+(SELECT TOP 1 DeltaT FROM

(SELECT TOP 50 PERCENT DeltaT FROM #Steps t

WHERE t.Rep = m.Rep AND t.MileCode = m.MileCode

ORDER BY DeltaT DESC

) hi

ORDER BY DeltaT ASC))

/ 2 AS MEDIAN

FROM #Steps m

ORDER BY m.Rep,m.MileCode

Set statistics time off

Print replicate('=',50)

print 'Windowed Statistical Median'

Print replicate('=',50)

Set statistics time on

select r.Rep,r.MileCode, (r.deltat +r1.DeltaT)*.5 dt from

(select m.Rep,m.MileCode,

deltat,

row_number() over (partition by m.Rep,m.MileCode order by DeltaT) RN,

Count(*) over (partition by m.Rep,m.MileCode) groupcount

from #Steps m) r inner join

(select m.Rep,m.MileCode,

deltat,

row_number() over (partition by m.Rep,m.MileCode order by DeltaT) RN,

Count(*) over (partition by m.Rep,m.MileCode) groupcount

from #Steps m) r1 on r.rep=r1.rep and r.milecode=r1.milecode

where r.rn=ceiling((r.groupcount+1)*0.5)

and r1.rn=floor((r1.groupcount+1)*0.5)

Set statistics time off

Print replicate('=',50)

print 'Windowed Math Median'

Print replicate('=',50)

Set statistics time on

select r.Rep,r.MileCode, r.deltat from

(select m.Rep,m.MileCode,

deltat,

row_number() over (partition by m.Rep,m.MileCode order by DeltaT) RN,

Count(*) over (partition by m.Rep,m.MileCode) groupcount

from #Steps m) r

where r.rn=ceiling((r.groupcount+1)*0.5)

Set statistics time off

I was being funny last night - just remember the "math" version and the "stats" version actually give you different results (based on their different definitions)....

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 13

I forgot that my test machine was running a full backup and a virus scan last night...Without the impediments:

(1000000 row(s) affected)

==================================================

Jeffs Statistical Median

==================================================

(70271 row(s) affected)

SQL Server Execution Times:

CPU time = 22266 ms, elapsed time = 12282 ms.

==================================================

Windowed Statistical Median

==================================================

(70271 row(s) affected)

SQL Server Execution Times:

CPU time = 11765 ms, elapsed time = 11950 ms.

==================================================

Windowed Math Median

==================================================

(70271 row(s) affected)

SQL Server Execution Times:

CPU time = 5719 ms, elapsed time = 5727 ms.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Jeff Moden SSC Guru Points: 1004403 More actions · Answer 14

Matt Miller (4/30/2008)
I was being funny last night - just remember the "math" version and the "stats" version actually give you different results (based on their different definitions)....

Not a problem... I'm just interested in how you're doing this and wanted to play a bit. Thanks, Matt.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 15

Jeff Moden (4/30/2008)
Matt Miller (4/30/2008)
I was being funny last night - just remember the "math" version and the "stats" version actually give you different results (based on their different definitions)....
Not a problem... I'm just interested in how you're doing this and wanted to play a bit. Thanks, Matt.

By the way - once the caffeine infusion kicked in this morning - I realized that I'm taking two passes to do what I can do in just one in the "windowed Stat version".

Set statistics time off

Print replicate('=',50)

print 'Windowed Stat Median - method #2'

Print replicate('=',50)

Set statistics time on

select r.Rep,r.MileCode, Avg(r.deltat) dt

from

(select m.Rep,m.MileCode,

deltat,

row_number() over (partition by m.Rep,m.MileCode order by DeltaT) RN,

Count(*) over (partition by m.Rep,m.MileCode) groupcount

from #Steps m) r

where r.rn = ceiling((r.groupcount+1)*0.5) or

r.rn = floor((r.groupcount+1)*0.5)

group by r.Rep,r.MileCode

order by r.Rep,r.MileCode

Set statistics time off

Damn near as fast as the math median at this point....

SQL Server Execution Times:

CPU time = 6266 ms, elapsed time = 6421 ms.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?