April 25, 2014 at 9:27 am
Dear Reader,
Fan: Of Percentile curves in Excel.
This presents in Excel a value (or a count of values) and the percentile which values falls below that value.
So a curve which starts at 0 and finishes at 100 percent and always goes up.
On of the uses can be how the distribution of the number of different values is for a varchar field.
This is represented below.
But could also be about amounts.
Used to do the row based calculations in SQL-server. And the running total in Excel.
But then you have to add a few columns in Excel.
Now I have build a 'template' to do all calculations in SQL-server.
So this is less work, but I am disappointed in the performance of this query.
Can this be improved ?
Am I doing something wrong ?
Typical number of rows: 1 000 000
Typical number of different values: 1 000
(In excel this results in hundreds or less rows).
Or when a numeric field is used, the number of values can be greater or smaller.
Ben
--
-- Calculate a percentiele column for a distribution of a value in a column CCCCCC1 (or in two or more columns).
--
-- For presentation the column Tel is the X axis.
-- column percentile is the Y axis. (Scale from 0 to 100).
declare @work1 varchar(8000)
set @work1 = '
declare @Row_Count bigint
select @Row_Count = count(*) from TTTTTT
;
With
A as (select CCCCC1, CCCCC2, COUNT(*) as tel FROM patients where WWWWWW group by CCCCC1, CCCCC2 ),
B as (select tel,COUNT(*) as freq, tel* COUNT(*) as area from A group by tel),
C as (select ROW_NUMBER() OVER(PARTITION BY 1 ORDER BY tel) nr, * from B),
D as (
select *, tel as curve from C where c.nr = 1
Union all
select C.*, D.curve+c.area as curve from C join D on C.nr = D.nr+1
)
select *, 100.0*curve/@row_count as percentile from D order by tel OPTION (MAXRECURSION 0)
'
set @work1 = REPLACE(@work1,'TTTTTT', '<Tabel_name>')
set @work1 = REPLACE(@work1,'CCCCC1', '<Column_name>')
set @work1 = REPLACE(@work1,'WWWWWW', '1 = 1')
set @work1 = REPLACE(@work1,', CCCCC2', '')
PRINT @WORK1
exec (@work1)
A good example of a percentile curve is the Lorenz Curve,
which shows in a single glance the distribution of wealth.
April 26, 2014 at 1:46 pm
What do you mean by
I am disappointed in the performance
?
Would you please post some sample data (table def and sample data) together with your actual execution plan so we have something to work with?
April 27, 2014 at 9:01 am
well...I didn't know this was called a Lorenz Curve...and neither have I done this in SQL before.
but saw this "simple" explanation http://courses.washington.edu/geog326/Lorenz.pdf in how to do it in Excel.
so...thought I would share the following code to see ifs anywhere need what you are looking for
/*some thoughts on how to get data for Lorenz Curve
intial ideas are based on http://courses.washington.edu/geog326/Lorenz.pdf*/
/*set up some test data*/
USE tempdb
GO
IF OBJECT_ID('tempdb..TransData', 'U') IS NOT NULL
DROP TABLE tempdb..TransData ;
IF OBJECT_ID('tempdb..tempcalc', 'U') IS NOT NULL
DROP TABLE tempdb..tempcalc ;
SELECT TOP 1000000
RegionID = 1 + CAST(Abs(Checksum(Newid()) % 1000 ) AS INT),
PopCount = 1 + CAST(Abs(Checksum(Newid()) % 90) AS INT),
Amount = 1 + CAST(Rand(Checksum(Newid())) * 1000 AS INT)
INTO TransData
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
CROSS JOIN sys.all_columns ac3
/*skew the data a litle to emphasise the results*/
UPDATE TransData
SET PopCount = PopCount * 1.2, Amount = Amount * 0.3
WHERE (RegionID % 3 = 0)
UPDATE TransData
SET PopCount = PopCount * 0.3, Amount = Amount * 2.5
WHERE (RegionID % 23 = 0)
/*solution follows.....there are other methods for running totals but here we use the quirky update method....
http://www.sqlservercentral.com/articles/T-SQL/68467/ by Jeff Moden
be sure you read and fully understand ALL the requirements to ensure data integrity*/
/*first off we will pre calculate the different requirements into "tempcalc"
this could be a cte/temp table...but the new table is created here for further analysis and testing*/
DECLARE @pop AS INT
DECLARE @amt AS INT
SET @pop = ( SELECT SUM(PopCount) FROM TransData )
SET @amt = ( SELECT SUM(Amount) FROM TransData )
SELECT RegionID,
SUM(PopCount) popcnt,
SUM(PopCount) * 1.00 / @pop as poppct,
SUM(Amount) AS amt ,
SUM(Amount) * 1.00 / @amt as amtpct,
(SUM(Amount) * 1.00 / @amt) / (SUM(PopCount) * 1.00 / @pop) as calc,
row_number() over (order by (SUM(Amount) * 1.00 / @amt) / (SUM(PopCount) * 1.00 / @pop)) rn,
cast(0 as numeric (25,13)) as popRT,
cast(0 as numeric (25,13)) as amtRT
INTO tempcalc
FROM TransData
GROUP BY RegionID
/*the CIX below is an absolute requirement when using quirky update*/
CREATE UNIQUE CLUSTERED INDEX [CIX_rn] ON [dbo].[tempcalc]
([rn] ASC )
GO
DECLARE @popRT NUMERIC(25, 13)
DECLARE @amtRT NUMERIC(25, 13)
DECLARE @SafetyCounter INT
SET @popRT = 0
SET @amtRT = 0
SET @SafetyCounter = 1
;WITH
cteWork AS
(
SELECT SafetyCounter = ROW_NUMBER() OVER (order by rn),
[poppct],[amtpct],[popRT],[amtRT]
FROM dbo.tempcalc
)
UPDATE cteWork
SET @popRT = popRT = CASE
WHEN SafetyCounter = @SafetyCounter
THEN @popRT + poppct
ELSE CAST('Safety Counter went out of sync at ' + CAST(@SafetyCounter AS VARCHAR(10)) AS INT)
END,
@amtRT = amtRT = CASE
WHEN SafetyCounter = @SafetyCounter
THEN @amtRT+amtpct
ELSE CAST('Safety Counter went out of sync at ' + CAST(@SafetyCounter AS VARCHAR(10)) AS INT)
END,
@SafetyCounter = @SafetyCounter + 1
FROM cteWork WITH (TABLOCKX)
OPTION (MAXDOP 1)
GO
/*results here*/
select null as RegionID, 0 as popRT , 0 as amtRT, 0 as rn
union all
SELECT RegionID, popRT, amtRT, rn
FROM tempcalc
order by rn
drop table tempcalc
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
April 28, 2014 at 7:31 am
Hello all,
Thanks for your contributions and sorry for not responding.
This weekend I had a fall and have a head injuri. In a few days time everything should be back to normal.
I'll pick up this thread then. Sorry that I am not able to participate at this time.
Ben
April 28, 2014 at 7:42 am
ben.brugman (4/28/2014)
Hello all,Thanks for your contributions and sorry for not responding.
This weekend I had a fall and have a head injuri. In a few days time everything should be back to normal.
I'll pick up this thread then. Sorry that I am not able to participate at this time.
Ben
Sorry to hear that Ben...trust you recover quickly,that is far more important than this thread !
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply