Flatening this table structure

  • Have a table that is of this structure:

    CREATE TABLE [dbo].[Normalized](

    [TimeStamp] [datetime] NOT NULL,

    [ToolID] [smallint] NOT NULL,

    [VariableName] [nvarchar](12) NULL,

    [VariableValue] [nvarchar](25) NULL,

    CONSTRAINT [PK_Normalized] PRIMARY KEY CLUSTERED

    (

    [TimeStamp] ASC,

    [ToolID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    Each Timestamp/ToolID key combination will have roughly 100-400 variables/value pairs associated.

    Any ideas flattening this table so that each row would consist of:

    Timestamp, ToolID, Var1,Var2,VarN

    1/1/2011 12:00AM, 1, Value1, Value2, ValueN

    With the variable names being arbitrary.

    Sample rate is roughly every 40 secs

    There will be a specific LotID variable that will be used to differentiate each run.

    Two Major ways of doing analysis:

    1. Will need to be able to look at each individual run as a whole

    2. Need to be able to compare different runs in a statistical manner (Average, Sample count, StDev, Mean, Mode).

    If you could help shove me in the right direction, it would be greatly appreciated.

    Thank you,

    Tim

  • I think what you're looking for is either a dynamic pivot - this will produce a variable number of columns for each timestampt/toolID key. In a dynamic pivot you will first have to supply a timestamp/toolID and get the variables needed in a column list. Then create a variable containing the SQL PIVOT Statement

    Something like:

    DECLARE @COLS varchar(MAX);

    SELECT @COLS = COALESCE(@COL + ', ',' ') + QUOTENAME(

    VariableValue) FROM dbo.Normalized

    WHERE timestamp = @timestamp and ToolID = @ToolId

    DECLARE @sql varchar(MAX);

    SET @sql = 'SELECT timestamp, ToolId, ' + @COLS + '

    FROM (SELECT timestamp, ToolId, VariableValue

    FROM dbo.Normalized

    WHERE timestamp = @TimeStamp

    AND ToolId = @ToolId) TBL

    PIVOT ( COUNT(ToolID) FOR VariableValue IN ( ' + @COLS + ') ) AS PVT;

    EXEC (@SQL)

    Or if you want all the variable values s in one comma-separated column then you'd be looking at a concatenation. Concatenation methods performance is variable by the number of rows. There are several methods. The STUFF ... FOR XML PATH method may suit 100-400 values.

    STUFF FOR XML Path Concatenations are covered in Jeff Moden's article http://www.sqlservercentral.com/articles/Test+Data/61572/

  • So I figured it out, thanks to your example. I altered the original table slightly.

    CREATE TABLE [dbo].[Normalized](

    [TimeStamp] [datetime] NOT NULL,

    [LotID] [nvarchar](18) NOT NULL,

    [VariableName] [nvarchar](12) NULL,

    [VariableValue] [nvarchar](25) NULL,

    CONSTRAINT [PK_Normalized] PRIMARY KEY CLUSTERED

    (

    [TimeStamp] ASC,

    [LotID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    SET @LotID = 'EB390005'

    SELECT @COLS = COALESCE(@COLS + ', ',' ') + QUOTENAME(

    t1.VariableName) FROM (SELECT Distinct VariableName FROM dbo.Normalized where LotID = @LotID) t1

    order by VariableName asc

    SET @sql = 'SELECT TIMESTAMP, LotID, ' + @COLS + '

    FROM (SELECT TIMESTAMP, LotID, CONVERT(numeric(25,13),isnull(VariableValue,0.0)) as VariableValue, VariableName

    FROM dbo.Normalized

    WHERE

    LotID = '+ char(39) + @LotID + char(39) + ') TBL

    PIVOT ( sum(VariableValue) FOR VariableName IN ( ' + @COLS + ') ) AS PVT;'

    EXEC (@SQL)

    This works well with numeric types. I am having a hard time figuring out how I would treat name/value pairs where the value would have a string.

    Does this force me down the road of cross tabs?

    So I guess no DECLARE statements need apply?

  • Another approach is us FOR XML PATH ....

    CREATE TABLE #Normalized

    ([TimeStamp] [datetime] NOT NULL,

    [ToolID] [smallint] NOT NULL,

    [VariableName] [nvarchar](12) NULL,

    [VariableValue] [nvarchar](25) NULL)

    INSERT #Normalized

    SELECT '12/13/2011 08:30:00', 1,'Avar','1' UNION ALL

    SELECT '12/13/2011 08:30:00', 1,'Avar','2' UNION ALL

    SELECT '12/13/2011 08:30:00', 1,'Avar','test' UNION ALL

    SELECT '12/13/2011 08:30:00', 2,'Bvar','something' UNION ALL

    SELECT '12/13/2011 08:30:00', 2,'Bvar','another' UNION ALL

    SELECT '12/13/2011 08:30:00', 2,'Bvar','xxx' UNION ALL

    SELECT '12/13/2011 08:30:00', 3,'Cvar','yyyyy' UNION ALL

    SELECT '12/13/2011 08:30:00', 3,'Cvar','nothing' UNION ALL

    SELECT '12/13/2011 08:30:00', 3,'Cvar','wow'

    SELECT ', ' + [VariableName] + '=' + [VariableValue]

    FROM #Normalized

    WHERE [ToolID] = 1

    FOR XML PATH('')

    DROP TABLE #Normalized

    ______________________________________________________________________

    Personal Motto: Why push the envelope when you can just open it?

    If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.

    Jason L. Selburg
  • In your dynamic SQL I think you just need to test for numeric

    Where you have

    ' FROM (SELECT TIMESTAMP, LotID, CONVERT(numeric(25,13),isnull(VariableValue,0.0)) as VariableValue, VariableName

    FROM dbo.Normalized

    '

    Replace with

    ' FROM (SELECT TIMESTAMP, LotID,

    CASE

    WHEN ISNUMERIC(VariableValue) = 1 THEN CONVERT(sql_variant,isnull(VariableValue,0.0))

    ELSE variableValue

    END AS VariableValue,

    VariableName

    FROM dbo.Normalized

    '

    This means all your numeric values will be stored in sql_variant columns in the final output. That is if you're sure all VariableValues for the same VariableNames will always be numeric or NULL. Non numerics will still be the original nvarchar(25).

  • Pretty sure that the Pivot expects to do the aggregate function on the data regardless of what it is.

    PIVOT ( sum(VariableValue) FOR VariableName IN ( ' + @COLS + ') ) AS PVT;'

    So if the VariableValue does not contain a numeric, it barfs on it. Maybe it is possible to use a case statement there also?

    Actually, I changed it to use Max and it appears to work(Along with the suggestion for the case statement). I just need to ensure that the data set does not include multiple copies of individual variables for a given timestamp/lotId combo

    Tim

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply