February 22, 2011 at 9:57 am
Hi,
I'm trying to find a way to programmatically pivot a very large data set, the structure of which is:
CREATE TABLE [dbo].[f_Data](
[RecordID] [bigint] NOT NULL,
[RecordTypeID] [int] NOT NULL,
[DataVersionID] [int] NOT NULL,
[VariableName] [nvarchar](128) NOT NULL,
[VariableValue] [int] NULL,
CONSTRAINT [PK_f_Data_1] PRIMARY KEY CLUSTERED
(
[RecordID] ASC,
[VariableName] ASC
)
For each RecordID there are upwards of 2000 different variables that we would like to have a as columns, with a row for each RecordID. Now, I know I could do this using a PIVOT statement - but the prospect of typing out 2000 column names (for each different variable) isn't all that appealing. There's also an added difficulty in that there is a second table with additional data on the variables, such as the order that they should be listed in - this makes building a programmatic-yet-dynamic statement more difficult as well. Though I am not beyond doing that, I'm thinking there has to be an easier way (mainly because I can do exactly what I want in Excel with a few clicks). Any ideas?
February 22, 2011 at 2:31 pm
Building a PIVOT query dynamically is a viable option. Using your DDL I created a simple PIVOT query:
SELECT pvt.RecordID,
pvt.Variable1,
pvt.Variable2
FROM (SELECT RecordID,
VariableName,
VariableValue
FROM dbo.f_Data
) a PIVOT
( MAX(a.VariableValue) FOR a.VariableName IN (Variable1, Variable2) ) AS pvt ;
When building the SELECT and PIVOT column list joining to the external table you mentioned that supplies the column-order for the variable-columns is an extra step in building the SQL string you'll eventually execute however it will not be prohibitively complex.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
February 22, 2011 at 2:40 pm
As an alternative to the PIVOT syntax you could have a look at the CrossTab and DynamicCrossTab article referenced in my signature.
It's not as easy as using Excel, but Excel isn't a RDBMS (at least not intentionally...).
Edit: stuff like PIVOT usually is done at the presentation layer (e.g. ReportingServices) since it violates normalization.
February 22, 2011 at 3:20 pm
Lutz has a point...dynamically generating a crosstab query would require approximately the same level of effort from you and would almost certainly perform better than a PIVOT. My code example as a crosstab:
SELECT RecordID,
SUM(CASE WHEN VariableName = 'Variable1' THEN VariableValue ELSE NULL END) AS [Variable1],
SUM(CASE WHEN VariableName = 'Variable2' THEN VariableValue ELSE NULL END) AS [Variable2]
FROM (SELECT RecordID,
VariableName,
VariableValue
FROM dbo.f_Data
) d
GROUP BY RecordID
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
February 22, 2011 at 3:21 pm
opc.three (2/22/2011)
Building a PIVOT query dynamically is a viable option. Using your DDL I created a simple PIVOT query:
SELECT pvt.RecordID,
pvt.Variable1,
pvt.Variable2
FROM (SELECT RecordID,
VariableName,
VariableValue
FROM dbo.f_Data
) a PIVOT
( MAX(a.VariableValue) FOR a.VariableName IN (Variable1, Variable2) ) AS pvt ;
When building the SELECT and PIVOT column list joining to the external table you mentioned that supplies the column-order for the variable-columns is an extra step in building the SQL string you'll eventually execute however it will not be prohibitively complex.
I have actually tried doing the pivot and building the IN clause dynamically (using this approach http://www.simple-talk.com/community/blogs/andras/archive/2007/09/14/37265.aspx), but it turns out that the column names actually end up overflowing the max varchar length. It does work fine for a subset of variables, but not the whole dataset. On top of that, I actually need to try and get a description field in the column name as well so the length is a big issue. The columns that will be included will likely be subject to change too - otherwise I would just write the big nasty query once and call it a day.
February 22, 2011 at 3:24 pm
Kevin Dahl (2/22/2011)
I have actually tried doing the pivot and building the IN clause dynamically (using this approach http://www.simple-talk.com/community/blogs/andras/archive/2007/09/14/37265.aspx), but it turns out that the column names actually end up overflowing the max varchar length.
Use NVARCHAR(MAX) instead of NVARCHAR(4000)
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
February 23, 2011 at 1:46 pm
opc.three (2/22/2011)
Kevin Dahl (2/22/2011)
I have actually tried doing the pivot and building the IN clause dynamically (using this approach http://www.simple-talk.com/community/blogs/andras/archive/2007/09/14/37265.aspx), but it turns out that the column names actually end up overflowing the max varchar length.Use NVARCHAR(MAX) instead of NVARCHAR(4000)
Great tips, thanks guys... The dynamic crosstab was much easier/simpler to build programmatically, and the NVARCHAR(MAX) worked fine as well.
Ordinarily I would leave the summary of the data to the presentation layer, but this particular dataset is going into an OLAP cube.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply