Query taking 6 hrs and counting (HELP!)- how to concatenate rows into single string

  • My table looks like this

    email project_name

    test@test.com lab1

    test@test.com lab2

    test@test.com lab3

    test@test.com lab4

    trial@test.com rm1

    trial@test.com rm2

    trial@test.com rm3

    But want it to look like this

    email project_name

    test@test.com lab1, lab2, lab3, lab4

    trial@test.com rm1,rm2,rm3,

    My query looks like this, how can I make it faster? This query has been running for 6hrs already

    select distinct email ,

    STUFF((Select ','+project_name

    from dbo.[UMG sent 2016] as T1

    where T1.email=T2.email

    FOR XML PATH('')),1,1,'') from dbo.[UMG sent 2016] as T2;

    FYI- my table has 2.6m records

  • Please post a copy of the execution plan.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • run sp_whoisactive to see how much work it has been doing, if blocking is happening, what it is waiting on (my guess is cxpacket or some form of IO), use the delay option to see work done in 60 seconds, get the plan, etc, etc.

    sp_whoisactive can be found on sqlblog.com and there is a 30-day series about how to use it. It is self-documented and totally awesomesauce!

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • I would guess that blocking is happening somewhere in your db. Maybe there's an open transaction. Even with 2.6m rows, it shouldn't take hours to complete.

    However, this should be faster as the concatenation happens once per group instead of once per row.

    SELECT email,

    STUFF((SELECT ','+project_name

    FROM dbo.[UMG sent 2016] as T1

    WHERE T1.email=T2.email

    FOR XML PATH('')),1,1,'')

    FROM dbo.[UMG sent 2016] as T2

    GROUP BY email;

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Not sure if this will be any better, but it is what I would have written.

    with basedata as (

    select distinct

    email

    from

    dbo.[UMG sent 2016]

    )

    select

    bd.email,

    ca.Projects

    from

    basedata bd

    cross apply (select STUFF((select ',' + project_name

    from dbo.[UMG sent 2016] as T1

    where T1.email = bd.email

    for xml path(''),TYPE).value('.','varchar(max)'),1,1,''))ca(Projects);

    Also, I would check your indexing on the table, especially if this query is run often.

  • How long does it take to run this?

    DECLARE @basedata table(

    email NVARCHAR(200) COLLATE DATABASE_DEFAULT,

    project_name VARCHAR(200) COLLATE DATABASE_DEFAULT

    PRIMARY KEY (email, project_name)

    )

    INSERT INTO @basedata ( email, project_name )

    SELECT email, project_name

    FROM dbo.[UMG sent 2016]

    WHERE email > '' AND project_name > ''

    GROUP BY email, project_name

    ORDER BY email, project_name

    _____________
    Code for TallyGenerator

  • you can give the complete query and execution plan so we can provide bit more hand full information.

  • That took 46 seconds.

    I tried select * from @basedata, but it didn't work. It says 'must declare the table variable @basedata'. I'm new to sql so apologies if this is a silly question

  • Thanks, that works abit faster but its still very slow. 16hrs and no where near done

  • IA123 (6/8/2016)


    That took 46 seconds.

    I tried select * from @basedata, but it didn't work. It says 'must declare the table variable @basedata'. I'm new to sql so apologies if this is a silly question

    Assuming you are referring to Sergiy's query, then that's roughly how long your query might be expected to take. Since your query is taking many times longer than this, then you should assume that the pseudocode you show in your first post isn't fully representative of your actual query. Now might be a good time to post it.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • IA123 (6/8/2016)


    That took 46 seconds.

    I tried select * from @basedata, but it didn't work. It says 'must declare the table variable @basedata'. I'm new to sql so apologies if this is a silly question

    Table variables exist within a scope of the same batch.

    Just like any other variable.

    When execution is completed all variables, including table variables, are cleared.

    Your solution is ready.

    Just take your query, replace your table name with table variable and place it right after my query which took 46 seconds.

    Declare @basedata ....

    ......

    select email ,

    STUFF((Select ','+project_name

    from @basedata as T1

    where T1.email=T2.email

    FOR XML PATH('')),1,1,'') from @basedata as T2

    GROUP BY email;

    Then run it altogether.

    Please note - I used GROUP BY instead of DISTINCT.

    You should do it too.

    _____________
    Code for TallyGenerator

  • 1) Did you do what I said about sp_whoisactive??

    2) How much memory do you have on this sql server? IIRC XML can be a MASSIVE consumer of memory and if you don't have enough you will be spooling to disk (or you could just be waiting for a memory grant).

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • There's something going on with your server. Taking 16 hours to complete a single query is insane.

    We're assuming this is a table, but if it's a view, that could be another problem. I just run the query on my laptop creating almost 8 million rows and the query completed in one minute.

    We need your help to get more information on where the problem might be. Phil asked for the execution plan and Kevin asked you to run sp_whoisactive. Those are good starts to find the root cause of your problem.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • error says table variable must declare can you share the full script will be easy to find where this missing to get the reference .

  • Here was the first part:

    Sergiy (6/7/2016)


    How long does it take to run this?

    DECLARE @basedata table(

    email NVARCHAR(200) COLLATE DATABASE_DEFAULT,

    project_name VARCHAR(200) COLLATE DATABASE_DEFAULT

    PRIMARY KEY (email, project_name)

    )

    INSERT INTO @basedata ( email, project_name )

    SELECT email, project_name

    FROM dbo.[UMG sent 2016]

    WHERE email > '' AND project_name > ''

    GROUP BY email, project_name

    ORDER BY email, project_name

    _____________
    Code for TallyGenerator

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply