Normalization help for a DBA-wannabie

  • I'm trying to figure out a database that a customer provided. It's either a crap design, or more normalized than I understand.

    It's similar to this:

    Table 1

    Item Shape Descriptor Value

    1 square 1 12

    1 square 2 red

    2 rectangle 1 8

    2 rectangle 2 10

    2 rectangle 3 blue

    3 triangle 1 3

    3 triangle 2 4

    3 triangle 3 5

    3 triangle 4 yellow

    Table 2

    Item Descriptor Column Header

    1 1 Length

    1 2 Color

    2 1 Length

    2 2 Width

    2 3 Color

    3 1 Length side 1

    3 2 Length side 2

    3 3 Length hypotenuse

    3 4 Color

    I need to get this data into a csv file for another part of the project. The csv file will have lots of nulls in it, but I need each column to have the same type of data with a header.

    Even if you say, "It's fifteenth normal form, Google it, Skippy" that will be a big help.

  • What do you want the CSV file to look like for column headers?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ideally something like:

    Shape, Length, Width, Color, Length 1, Length 2, Hypotenuse

    Square, 12,,Red,,,

    Rectangle,8,10,Blue,,,

    Triangle,,,Yellow,3,4,5

  • I think you want to convert rows to columns, so something like this:

    select a.shape, a.value, b.header as length1, c.headers as length2

    from a (first file)

    left outer join b (second file)

    on a.item = b.item

    and b.descriptor = 'length 1'

    left outer join c (second file)

    on a.item = c.item

    and c.descriptor = 'length 1'

    Something like that. If you're transforming these in something like DTS, SSIS, I'd actually load them into a staging table by loading the first file (looks like the parent).

    insert into #temp

    select item, shape, et.

    Before this, create the #temp to look like what you want, with the columns.

    then insert from the second file

    update #temp

    set length1 = b.column

    from b

    where b.item = #temp.item

    and b.column = 'length 1'

    repeat for missing columns, then select out of #temp for the final CSV.

  • quoteEven if you say, "It's fifteenth normal form, Google it, Skippy" that will be a big help.

    It is "fifteenth" normal form but you don't need to Google it, Skippy ... instead, Google "SQL Crosstab" to find out how this pig works ...

    --====================================================================================

    --     Create some sample data to immitate John's real tables

    --     (NOT PART OF THE SOLUTION!!!)

    --====================================================================================

    DECLARE @Table1 TABLE (Item INT, Shape VARCHAR(20), Descriptor INT, Value VARCHAR(20))

    INSERT INTO @Table1(Item,Shape,Descriptor,Value)

    SELECT 1,'square',1,'12' UNION ALL

    SELECT 1,'square',2,'red' UNION ALL

    SELECT 2,'rectangle',1,'8' UNION ALL

    SELECT 2,'rectangle',2,'10' UNION ALL

    SELECT 2,'rectangle',3,'blue' UNION ALL

    SELECT 3,'triangle',1,'3' UNION ALL

    SELECT 3,'triangle',2,'4' UNION ALL

    SELECT 3,'triangle',3,'5' UNION ALL

    SELECT 3,'triangle',4,'yellow'

    DECLARE @Table2 TABLE (Item INT,Descriptor INT, ColumnHeader VARCHAR(30))

    INSERT INTO @Table2 (Item,Descriptor,ColumnHeader)

    SELECT 1,1,'Length' UNION ALL

    SELECT 1,2,'Color' UNION ALL

    SELECT 2,1,'Length' UNION ALL

    SELECT 2,2,'Width' UNION ALL

    SELECT 2,3,'Color' UNION ALL

    SELECT 3,1,'Length side 1' UNION ALL

    SELECT 3,2,'Length side 2' UNION ALL

    SELECT 3,3,'Length hypotenuse' UNION ALL

    SELECT 3,4,'Color'

    --====================================================================================

    --      Solve the problem using a "crosstab" to resolve the "long skinny" tables.

    --====================================================================================

     SELECT 'Shape,Length,Width,Color,Length side 1,Length side 2,Length hypotenuse'

      UNION ALL

     SELECT t1.Shape+','

          + MAX(CASE WHEN t2.ColumnHeader = 'Length'            THEN t1.Value ELSE '' END) + ','

          + MAX(CASE WHEN t2.ColumnHeader = 'Width'             THEN t1.Value ELSE '' END) + ','

          + MAX(CASE WHEN t2.ColumnHeader = 'Color'             THEN t1.Value ELSE '' END) + ','

          + MAX(CASE WHEN t2.ColumnHeader = 'Length side 1 '    THEN t1.Value ELSE '' END) + ','

          + MAX(CASE WHEN t2.ColumnHeader = 'Length side 2'     THEN t1.Value ELSE '' END) + ','

          + MAX(CASE WHEN t2.ColumnHeader = 'Length hypotenuse' THEN t1.Value ELSE '' END)

       FROM @Table1 t1,

            @Table2 t2

      WHERE t1.Item       = t2.Item

        AND t1.Descriptor = t2.Descriptor

      GROUP BY t1.Shape

    ...and if you want it to "auto-magically" figure out the columns, well need to use a little dynamic SQL on this pit-wuffy... lemme know.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I ran Jeff's code (because I want to understand it!!) and got what I expected which is:

    Shape,Length,Width,Color,Length side 1,Length side 2,Length hypotenuse

    rectangle,8,10,blue,,,

    square,12,,red,,,

    triangle,,,yellow,3,4,5

    I added another item to the data:

    INSERT INTO @table1...

    ...UNION ALL

    SELECT 4,'square',1,'5' UNION ALL

    SELECT 4,'square',2,'zebra stripe'

    INSERT INTO @table2...

    ...UNION ALL

    SELECT 4,1,'Length' UNION ALL

    SELECT 4,2,'Color'

    Ran the code and got:

    Shape,Length,Width,Color,Length side 1,Length side 2,Length hypotenuse

    rectangle,8,10,blue,,,

    square,5,,zebra stripe,,,

    triangle,,,yellow,3,4,5

    NOT what I expected.  There is only one square, the last one.

    My assumption was that John has lots of diffferent squares, rectangles and triangles, but he knows they are all one of those types. 

    I guess I'll be reading up on crosstabs, too. 

  • I'm learning a whole lot here. Thanks guys.

    In Grasshopper's corollary, I'd really like to be able to have

    square,5,,red : zebra stripe,,,

    In my real application that would be perfect, but may be too much to hope for.

  • quote

    I ran Jeff's code (because I want to understand it!!) and got what I expected which is: Shape,Length,Width,Color,Length side 1,Length side 2,Length hypotenuse rectangle,8,10,blue,,, square,12,,red,,,...

    NOT what I expected.  There is only one square, the last one.

    Sorry about that, folks... this'll do it...

    --====================================================================================

    --     Create some sample data to immitate John's real tables

    --     (NOT PART OF THE SOLUTION!!!)

    --====================================================================================

    DECLARE @Table1 TABLE (Item INT, Shape VARCHAR(20), Descriptor INT, Value VARCHAR(20))

    INSERT INTO @Table1(Item,Shape,Descriptor,Value)

    SELECT 1,'square',1,'12' UNION ALL

    SELECT 1,'square',2,'red' UNION ALL

    SELECT 2,'rectangle',1,'8' UNION ALL

    SELECT 2,'rectangle',2,'10' UNION ALL

    SELECT 2,'rectangle',3,'blue' UNION ALL

    SELECT 3,'triangle',1,'3' UNION ALL

    SELECT 3,'triangle',2,'4' UNION ALL

    SELECT 3,'triangle',3,'5' UNION ALL

    SELECT 3,'triangle',4,'yellow' UNION ALL

    SELECT 4,'square',1,'5' UNION ALL

    SELECT 4,'square',2,'zebra stripe'

    DECLARE @Table2 TABLE (Item INT,Descriptor INT, ColumnHeader VARCHAR(30))

    INSERT INTO @Table2 (Item,Descriptor,ColumnHeader)

    SELECT 1,1,'Length' UNION ALL

    SELECT 1,2,'Color' UNION ALL

    SELECT 2,1,'Length' UNION ALL

    SELECT 2,2,'Width' UNION ALL

    SELECT 2,3,'Color' UNION ALL

    SELECT 3,1,'Length side 1' UNION ALL

    SELECT 3,2,'Length side 2' UNION ALL

    SELECT 3,3,'Length hypotenuse' UNION ALL

    SELECT 3,4,'Color' UNION ALL

    SELECT 4,1,'Length' UNION ALL

    SELECT 4,2,'Color'

    --====================================================================================

    --      Solve the problem using a "crosstab" to resolve the "long skinny" tables.

    --====================================================================================

     SELECT 'Shape,Length,Width,Color,Length side 1,Length side 2,Length hypotenuse'

      UNION ALL

     SELECT d.MyOutput

      FROM (

             SELECT t1.Item,t1.Shape+','

                  + MAX(CASE WHEN t2.ColumnHeader = 'Length'            THEN t1.Value ELSE '' END) + ','

                  + MAX(CASE WHEN t2.ColumnHeader = 'Width'             THEN t1.Value ELSE '' END) + ','

                  + MAX(CASE WHEN t2.ColumnHeader = 'Color'             THEN t1.Value ELSE '' END) + ','

                  + MAX(CASE WHEN t2.ColumnHeader = 'Length side 1 '    THEN t1.Value ELSE '' END) + ','

                  + MAX(CASE WHEN t2.ColumnHeader = 'Length side 2'     THEN t1.Value ELSE '' END) + ','

                  + MAX(CASE WHEN t2.ColumnHeader = 'Length hypotenuse' THEN t1.Value ELSE '' END)

               AS MyOutput

               FROM @Table1 t1,

                    @Table2 t2

              WHERE t1.Item       = t2.Item

                AND t1.Descriptor = t2.Descriptor

              GROUP BY t1.Item,t1.Shape

            )d

    Like I said, this isn't much more than a simple "CrossTab"... the MAX's are simply to overcome the Group By on the CASE statements.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply