March 2, 2010 at 1:31 pm
Hi,
Could anyone please help me in writing a query to find the weighted Median for the following data
CREATE TABLE [dbo].[dr_temp](
[PK] [int] IDENTITY(1,1) NOT NULL,
[ID] [int] NOT NULL,
[Code] [nvarchar](10) NOT NULL,
[Value] [int] NOT NULL,
[Volume] [int] NOT NULL
)
insert into dr_temp(ID,Code,Volume,Value) values(1,'120',40,30778)
insert into dr_temp(ID,Code,Volume,Value) values(1,'120',40,24860)
insert into dr_temp(ID,Code,Volume,Value) values(1,'120',40,82043)
insert into dr_temp(ID,Code,Volume,Value) values(1,'120',41,136116)
insert into dr_temp(ID,Code,Volume,Value) values(1,'120',42,106886)
insert into dr_temp(ID,Code,Volume,Value) values(1,'120',45,179646)
IDCODEValueVolumeRunningTotalAverage
1120403077830778
1120402486055638
11204082043137681
112041136116273797
112042106886380683
112045179646560329280164.5
I'm expectiing the Value to be 45... How
Cumulative/2 value is 280164.5 which actually falls b/w 106886 and 380683 but 380683 is greater than 280164.5 so the value is 45.
Hope this helps.
Kindly help me out .
If this is possible in SSIS also please let me know.
For any questions do reply me.
Thanks
March 2, 2010 at 1:54 pm
Would you mind elaborate the business case?
Especially the part
falls b/w 106886 and 380683 but 380683 is greater than 280164.5 so the value is 45.
Why do you compare the median value to a single Value (106886) instead of the previous running toal (273797)?
Also, whydo you use the value of the next row that is larger than the row that already exceeded your median value? (I would expect the target value to be 42, not 45).
Finally, your sample data don't match your result set (value and volume are mixed).
Please clarify.
March 2, 2010 at 2:03 pm
Hi,
Thanks for the reply
Here is the business rule
280164.5 falls in the range b/w 106886 and 380683
However the ending range i.e 380683 > than 280164.5, due to which we need to select the next column which matched the criteria.
However this might not be happening to othe Code's
Hope this helps.
And oops! :ermm: Column Mismatched.
Thanks
March 2, 2010 at 2:16 pm
It still is not really clear:
280164.5 falls in the range b/w 106886 and 380683.
But it also falls in the range b/w 179646 and 560329. Why not use that one?
March 2, 2010 at 2:20 pm
Hi,
Yep we can use the last one.
Can this done in a SQL Query?
Thanks
March 2, 2010 at 2:35 pm
ashok_raja (3/2/2010)
Hi,Yep we can use the last one.
Can this done in a SQL Query?
Thanks
It depends.
Usually, a SQL query will rely on some logic requirement(s). If both results are possible, which one has to be used as a base to determin "the next row"?
Let's assume your next row would look like the following (ignore the wrong math. I didn't bother to calculate the new RunningTotal value).
ID CODE Value Volume RunningTotal
1 120 46 379646 660329
If we would use the logic from your previous statements, the the result would be 45.
But if we'd also consider your last statement, the result would be either 45 or 46.
That's possible to do with SQL as well. But the question is: What is the correct answer? And why?
March 2, 2010 at 2:41 pm
Hi,
Actually 46 will not be there as the next row will be for different ID
i.e
Insert into dr_temp(ID,Code,Value,Volume) Values(
2,'120',31,2311)
if present then Yes! it matches, I think Let's get the 1st match
B/w 106886 and 380683. Is this Possible.
Thanks
March 2, 2010 at 2:53 pm
ashok_raja (3/2/2010)
Hi,Actually 46 will not be there as the next row will be for different ID
i.e
Insert into dr_temp(ID,Code,Value,Volume) Values(
2,'120',31,2311)
if present then Yes! it matches, I think Let's get the 1st match
B/w 106886 and 380683. Is this Possible.
Thanks
Yes.
Do you have a clustered index on your table and if yes, what columns are included?
Is there any correlation between the clustered index and the volume column? (eg. if I sort the table based on the clustered index column, will the values for the volume column per ID be in order as well?)
March 3, 2010 at 9:47 am
Hi
Apoligize for the confusion..
Here is what I want
ID CODE Value Volume RunningTotal Average
1 120 40 30778 30778
1 120 40 24860 55638
1 120 40 82043 137681
1 120 41 136116 273797
1 120 42 106886 380683
1 120 45 179646 560329 280164.5
Since 280164.5 falls b/w 273797 and 380683, the result set should be
1 120 42 106886 380683
Should be able to find median b/w the running total...
Hope this clears...
Thanks
March 3, 2010 at 10:06 am
ashok_raja (3/3/2010)
HiApoligize for the confusion..
Here is what I want
ID CODE Value Volume RunningTotal Average
1 120 40 30778 30778
1 120 40 24860 55638
1 120 40 82043 137681
1 120 41 136116 273797
1 120 42 106886 380683
1 120 45 179646 560329 280164.5
Since 280164.5 falls b/w 273797 and 380683, the result set should be
1 120 42 106886 380683
Should be able to find median b/w the running total...
Hope this clears...
Thanks
With your reply, none of my questions in my previous post has been answered. :crying:
Hard to help...
March 3, 2010 at 10:09 am
Hi,
There is no index because this is a temp table to find the Median value per Code.
Hope this clears.
Thanks
March 3, 2010 at 10:54 am
Since you're using a temp table - you will want to add a clustered index in order to make the running totals work.
You should start by reading Jeff's article on the matter (so you can understand why the clustered index is such a hot topic all of a sudden):
http://www.sqlservercentral.com/articles/T-SQL/68467/[/url]
Also since you're using temp tables, you should pre-aggregate the data so that there is ONE record per code+value combination. Referring back to your example records, having multiple likes of code=160, value= 40 would make like substantially more complicated than necessary. Praggregating would also allow you to know what your target number is on the running total.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
March 3, 2010 at 11:20 am
Ok, here's what I came up with.
It is imporant to understand the quirky update concept as well as the requirements that need to be fulfilled. So, I urge you to carefully read the article Chris mentioned (the same article is referenced in my sample code, too...)
SET NOCOUNT ON
--- original data
CREATE TABLE #dr_temp(
[PK] [INT] IDENTITY(1,1) NOT NULL,
[ID] [INT] NOT NULL,
Code [NVARCHAR](10) NOT NULL,
[VALUE] [INT] NOT NULL,
[Volume] [INT] NOT NULL
)
INSERT INTO #dr_temp(ID,Code,VALUE,Volume) VALUES(1,'120',40,30778)
INSERT INTO #dr_temp(ID,Code,VALUE,Volume) VALUES(1,'120',40,24860)
INSERT INTO #dr_temp(ID,Code,VALUE,Volume) VALUES(1,'120',40,82043)
INSERT INTO #dr_temp(ID,Code,VALUE,Volume) VALUES(1,'120',41,136116)
INSERT INTO #dr_temp(ID,Code,VALUE,Volume) VALUES(1,'120',42,106886)
INSERT INTO #dr_temp(ID,Code,VALUE,Volume) VALUES(1,'120',45,179646)
-- column added to hold RunningTotal values
ALTER TABLE #dr_temp ADD RunningTotal INT
-- add clustered index to guarantee the specific order for the running total
CREATE CLUSTERED INDEX IX_#dr_temp_ID_Volume ON #dr_temp (ID,PK);
-- perform the "quirky update"
-- for details, please see Jeff Modens related article:
-- Link: http://www.sqlservercentral.com/articles/T-SQL/68467/
DECLARE @PrevID INT
DECLARE @RunningTotal INT
UPDATE #dr_temp
SET
@RunningTotal = RunningTotal =
CASE
WHEN ID = @PrevID THEN @RunningTotal + VolumeELSE Volume
END,
@PrevID = ID
FROM #dr_temp
WITH (TABLOCKX) OPTION (MAXDOP 1)
;WITH cte AS -- calculate the median per ID
(
SELECT
pk,
ID,
MedianDiff = (RunningTotal - (MAX(RunningTotal) OVER (PARTITION BY id))/2.0 )
FROM #dr_temp
), cte2 AS -- find the PK values holding values larger than the median and number thos PK's
(
SELECT
pk AS pkSub,
ID,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY pk ) AS ROW
FROM cte
WHERE MedianDiff > 0
)
-- final output: values of the row that holds the 2nd value larger than the median
SELECT #dr_temp.*
FROM cte2
INNER JOIN #dr_temp
ON cte2.id = #dr_temp.id
AND pksub=pk
WHERE ROW=2
DROP TABLE #dr_temp
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply