SQL help needed with a split function sql server 2008

  • Hi All

    I have a rather easy looking yet complex sql query. Just wondering if anyone could shed some light on it? Basically i have a row which has a column with a comma seperated string... BUT the client wants this converted to expand over multiple rows. Example below.

    Thanks very much in advance!

    My original data

    Col1 Col2 Col3

    John Doe 12,34,4.5,9,10

    Jane Doe 3,9,9

    Desired Outcome

    Col1 Col2 Col3

    John Doe 12

    John Doe 34

    John Doe 4.5

    John Doe 9

    John Doe 10

    Jane Doe 3

    Jane Doe 9

    Jane Doe 9

  • Read this article : http://www.sqlservercentral.com/articles/T-SQL/62867/

    Especially looking out for the section One Final "Split" Trick with the Tally Table

    But really, read the article and understand it before you use the code!

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Thanks MM. I've just finished reading the entire article and will start working on my sql now 🙂

    If anyone already has a sql already done please feel free to post it as well :hehe:

  • Hey MM, I've just finished my script and it works a treat!!! Thanks so much for your help 😀

  • Sichy (1/9/2011)


    Hey MM, I've just finished my script and it works a treat!!! Thanks so much for your help 😀

    No problem - it was the article that helped I hope - thanks for the feedback - if you feel like posting your script, others may find it and be helped in turn 😀

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Yes of course here is the script

    Step 1 - Create the Tally Table

    --=============================================================================

    -- Setup. Duration - 00:02:31 Rows - 158,533,281

    --=============================================================================

    USE Source_Cleansed_DM

    SET NOCOUNT ON --Supress the auto-display of rowcounts for appearance/speed

    DECLARE @StartTime DATETIME --Timer to measure total duration

    SET @StartTime = GETDATE() --Start the timer

    --=============================================================================

    -- Create and populate a Tally table

    --=============================================================================

    --===== Conditionally drop

    IF OBJECT_ID('dbo.Tally') IS NOT NULL

    DROP TABLE dbo.Tally

    --===== Create and populate the Tally table on the fly

    SELECT TOP 158533281 --158,533,281

    IDENTITY(INT,1,1) AS N

    INTO dbo.Tally

    FROM Master.dbo.SysColumns sc1,

    Master.dbo.SysColumns sc2

    --===== Add a Primary Key to maximize performance

    ALTER TABLE dbo.Tally

    ADD CONSTRAINT PK_Tally_N

    PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100

    --===== Let the public use it

    GRANT SELECT, REFERENCES ON dbo.Tally TO PUBLIC

    --===== Display the total duration - 221830 Milliseconds duration

    SELECT STR(DATEDIFF(ms,@StartTime,GETDATE())) + ' Milliseconds duration'

    select COUNT(*) from dbo.Tally

    Step 2 and 3 - Create temp table and run split to normalize table. (i've commented out the create sample table as this only needs to be run once obviously)

    --===== Create a sample denormalized table with a CSV column

    -- CREATE TABLE #testtable

    -- (PK INT IDENTITY(1,1) PRIMARY KEY CLUSTERED

    -- ,Fact1 nvarchar(20)

    --,Fact2 nvarchar(20)

    -- ,[Values] nvarchar(max))

    --insert into #testtable (Fact1, Fact2, [Values])

    --VALUES ('test1','Fact2','1,2.9,4,9')

    --insert into #testtable (Fact1, Fact2, [Values])

    --VALUES ('test3','Fact4','2,6,7.3,5.5')

    --insert into #testtable (Fact1, Fact2, [Values])

    --VALUES ('test5','Fact4','4,7,9')

    --insert into #testtable (Fact1, Fact2, [Values])

    --VALUES ('test1','Fact9','114,514,5,343.45,8,234,5,912,345')

    --insert into #testtable (Fact1, Fact2, [Values])

    --VALUES ('test1','Fact2','11,4514,5,343.45,82,345,9,1,2345')

    --===== Split or "Normalize" the whole table at once

    SELECT tt.PK,tt.Fact1,tt.Fact2,

    SUBSTRING(','+tt.[Values]+',',N+1,CHARINDEX(',',','+tt.[Values]+',',N+1)-N-1) AS Value

    FROM dbo.Tally t

    CROSS JOIN #testtable tt

    WHERE t.N < LEN(','+tt.[Values]+',')

    AND SUBSTRING(','+tt.[Values]+',',N,1) = ','

  • Sichy (1/9/2011)


    Sichy,

    I'm really curious about this line of code:

    SELECT TOP 158533281 --158,533,281

    What does that number represent? It's a really odd number to use - why not 200 million?

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • WayneS (1/9/2011)


    Sichy (1/9/2011)


    Sichy,

    I'm really curious about this line of code:

    SELECT TOP 158533281 --158,533,281

    What does that number represent? It's a really odd number to use - why not 200 million?

    I'll add to that... why so big? Are you really splitting strings with that many bytes in them?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Good question! At first i actually didn't have access to the table and i couldn't imagine how many rows it would initialy have so i simply put 999999999999... When i ran the tally table and did a count of rows i found that it actually maxed out at 158,533,281. So when i needed to create the tally table on a different DB i just replaced my 99999999999 with 158,533,281 to save a few mins on the tally table creation.

    p.s. Turns out the initial table only has about 900,000 rows so now I've changed the number to 1,000,000 to save even more time.

  • Oh forgot to mention, Thanks Jeff Moden for writing the article 🙂

  • Sichy (1/10/2011)


    Good question! At first i actually didn't have access to the table and i couldn't imagine how many rows it would initialy have so i simply put 999999999999... When i ran the tally table and did a count of rows i found that it actually maxed out at 158,533,281. So when i needed to create the tally table on a different DB i just replaced my 99999999999 with 158,533,281 to save a few mins on the tally table creation.

    p.s. Turns out the initial table only has about 900,000 rows so now I've changed the number to 1,000,000 to save even more time.

    There's a bit of a misnomer that you may have taken. The Tally Table [font="Arial Black"]doesn't [/font]need the same number of rows as the table data that you're trying to split. It only needs the same number of rows as the largest number of bytes in a column that you're trying to split. Generally speaking, the largest datatype most folks need to split is VARCHAR(8000). With that in mind, the largest your Tally Table needs to be to be able to do splits on such a column is only 8000 numbers.

    And, thanks for the feedback on the article.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ahhh yes i see. Thanks again Jeff your help is greatly appreciated!

    JS

  • Sichy (1/13/2011)


    Ahhh yes i see. Thanks again Jeff your help is greatly appreciated!

    JS

    You bet. Thanks for the feedback.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Viewing 13 posts - 1 through 12 (of 12 total)

    You must be logged in to reply to this topic. Login to reply