Query taking 6 hrs and counting (HELP!)- how to concatenate rows into single string

Question

Query taking 6 hrs and counting (HELP!)- how to concatenate rows into single string

Viewing 8 posts - 16 through 22 (of 22 total)

You must be logged in to reply to this topic. Login to reply

Sergiy SSC Guru Points: 110208 More actions · Answer 1

Here was the second part:

Sergiy (6/8/2016)
Declare @basedata ....
......
select email ,
STUFF((Select ','+project_name
from @basedata as T1
where T1.email=T2.email
FOR XML PATH('')),1,1,'') from @basedata as T2
GROUP BY email;

_____________
Code for TallyGenerator

Sergiy SSC Guru Points: 110208 More actions · Answer 2

And then was an instruction, right below the second part:

Then run it altogether.
Please note - I used GROUP BY instead of DISTINCT.
You should do it too.

"run it altogether" means place those 2 parts into 1 batch, one after another, and run it in 1 go.

_____________
Code for TallyGenerator

Sergiy SSC Guru Points: 110208 More actions · Answer 3

Luis Cazares (6/8/2016)
There's something going on with your server. Taking 16 hours to complete a single query is insane.
We're assuming this is a table, but if it's a view, that could be another problem. I just run the query on my laptop creating almost 8 million rows and the query completed in one minute.
We need your help to get more information on where the problem might be. Phil asked for the execution plan and Kevin asked you to run sp_whoisactive. Those are good starts to find the root cause of your problem.

There is nothing wrong with his server.

His query completed in 46 seconds too.

The problem is with DISTINCT.

It's applied to the set returned by SELECT.

It means - he's building an XML for all how many millions non-unique email addresses recorded in that "log" table, and then ask the server to return distinct strings.

With GROUP BY XML's are built only once per each unique email.

Much less work to do for the server.

_____________
Code for TallyGenerator

Luis Cazares SSC Guru Points: 183706 More actions · Answer 4

I mentioned that because he stated the following:

Thanks, that works abit faster but its still very slow. 16hrs and no where near done

Now, even with the DISTINCT, it shouldn't perform that bad. My bet is that the main problem is not in the code, but something we can't see.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Sergiy SSC Guru Points: 110208 More actions · Answer 5

Luis Cazares (6/8/2016)
I mentioned that because he stated the following:
Thanks, that works abit faster but its still very slow. 16hrs and no where near done
Now, even with the DISTINCT, it shouldn't perform that bad. My bet is that the main problem is not in the code, but something we can't see.

Executing 2.6 millions correlated subqueries to build 2.6 millions XML strings?

It can easily put the server out of memory. And after that everything becomes soooo slooooow.

And assuming indexing is "not perfect" for this kind of queries, we're are looking on a table scan for each correlated subquery.

2.6 millions scans against 2.6 millions records each.

6 hours don't look so bad, really.

_____________
Code for TallyGenerator

Sergiy SSC Guru Points: 110208 More actions · Answer 6

Sergiy (6/8/2016)
Luis Cazares (6/8/2016)
I mentioned that because he stated the following:
Thanks, that works abit faster but its still very slow. 16hrs and no where near done
Now, even with the DISTINCT, it shouldn't perform that bad. My bet is that the main problem is not in the code, but something we can't see.
Executing 2.6 millions correlated subqueries to build 2.6 millions XML strings?
It can easily put the server out of memory. And after that everything becomes soooo slooooow.
And assuming indexing is "not perfect" for this kind of queries, we're are looking on a table scan for each correlated subquery.
2.6 millions scans against 2.6 millions records each.
6 hours don't look so bad, really.

Look at the original query:

select distinct email ,
STUFF((Select ','+project_name
from dbo.[UMG sent 2016] as T1
where T1.email=T2.email
FOR XML PATH('')),1,1,'') from dbo.[UMG sent 2016] as T2;

There is no distinction of any kind for project_name in the subquery.

Same name will be added to the string as many times as how many times that project was reported to that email recipient over the year.

Can you imagine the length of those strings?

_____________
Code for TallyGenerator

Luis Cazares SSC Guru Points: 183706 More actions · Answer 7

Sergiy (6/8/2016)
Luis Cazares (6/8/2016)
I mentioned that because he stated the following:
Thanks, that works abit faster but its still very slow. 16hrs and no where near done
Now, even with the DISTINCT, it shouldn't perform that bad. My bet is that the main problem is not in the code, but something we can't see.
Executing 2.6 millions correlated subqueries to build 2.6 millions XML strings?
It can easily put the server out of memory. And after that everything becomes soooo slooooow.
And assuming indexing is "not perfect" for this kind of queries, we're are looking on a table scan for each correlated subquery.
2.6 millions scans against 2.6 millions records each.
6 hours don't look so bad, really.

Except when my 4GB Intel Core i5 laptop would complete the 8 million rows query in 35 minutes. I would agree that sample data and workload aren't representative of the server, but neither should be the capabilities.

In this same laptop, the GROUP BY option takes 4 minutes (the previous test was on my work laptop) just to give some perspective.

I'm afraid that the table might be a complex view that would slow things down horribly. Another option is that the table is too wide and there's no proper index, so that causes slow table scans. Or something is taking all the memory from the server and the query won't finish. Or many other options that are just guesses right now.

I understand your point, but this problem doesn't seem to be just on the code (at least not the code we have seen).

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Sergiy SSC Guru Points: 110208 More actions · Answer 8

Luis Cazares (6/8/2016)
but this problem doesn't seem to be just on the code (at least not the code we have seen).

May be, may be not.

Hard to tell from here.

That's why I made up that query populating a table variable.

It excludes all the text manipulations on denormalised data, and prepares the clean data set for further string concatenations.

If it would be still slow - yes that would clearly indicate the issues with the server.

But it took 46 seconds out of badly indexed 2.6 millions rows - well, could be better I must admit, but still acceptable.

I'm pretty sure that the further string concatenations out of the table variable will take not more than several seconds. Dropping the execution time from 6 hours down to 1 minute without doing anything with the environment settings must be a good hint that the biggest part of the problem was actually in the code.

_____________
Code for TallyGenerator