July 28, 2015 at 6:00 am
I've got to say the first thing I thought as I got halfway through the article was "my god they need some indexes on those tables". You'd be surprised how many application developers and support analysts don't realise the importance of indexes on database tables. We had a new system that was performing really slowly, I wan't supporting it another team was, I tried to persuade the support team that they needed to add some indexes to the tables, I even offer to assist them, to get anything like a reasonable response, there reply was they didn't think it would work and then they ignored me after that. Their solution was to rewrite the system at a cost of about £2 million and the result was a non-performing system, again! (The CEO was replaced as a lot of the projects were over budget mainly due to this one, I think). I've advised them again that they need some indexes they have started out adding some to help some reports and to their complete amazement the runtimes have come down from several minutes, where the web-page times out, to about 3 seconds. It can be an expensive mistake if you don't get people who know what they are doing on a system.
July 28, 2015 at 6:35 am
Jonathan AC Roberts (7/28/2015)
I've got to say the first thing I thought as I got halfway through the article was "my god they need some indexes on those tables". You'd be surprised how many application developers don't realise the importance of indexes on their database tables. We had a new system that was performing really slowly, I wan't supporting it another team was, I tried to persuade the support team that they needed to add some indexes to the tables to get anything like a reasonable response, there reply was they didn't think it would work and then they ignored me after that. Their solution was to rewrite the system at a cost of about £2 million and the result was a non-performing system, again! (The CEO was replaced as a lot of the projects were over budget partly due to this one, I think). I've advised them again that they need some indexes they have started out adding some to help some reports and to their complete amazement the runtimes have come down from several minutes where the web-page times out timing out to about 3 seconds. It can be an expensive mistake if you don't get people who know what they are doing on a system.
I agree with you Jonathan, good indexing is a must when working with RBMS. However, as you can see, in this case the fastest code works fine with or without indexes for the simple reason that it updates the whole table. A query such as
UPDATE g SET
OriginalGroupId = @GroupId,
OriginalMemberId = @MemberId
FROM Groups g
WHERE ClientId = @ClientId
Receives a great boost from the index because it won't have to scan the whole table to update a single row.
Basically, you need both, good querying and a good design which includes appropriate indexes.
July 28, 2015 at 6:52 am
Now I tested the third command
Here are the results:
--groups = problematic solution
--groups2= set-based loop
--groups3= set-based solution
--set-based loop vs. set-based solution (225628 Zeile(n) betroffen
select * from groups2 g2 inner join groups3 g1 on g1.clientid =g2.clientId where g1.OriginalMemberId<>g2.OriginalMemberId
--set-based loop vs. problematic solution (56358 Zeile(n) betroffen
select * from groups2 g2 inner join groups g1 on g1.clientid =g2.clientId where g1.OriginalMemberId<>g2.OriginalMemberId
--set-based solution vs. problematic solution (213166 Zeile(n) betroffen
select * from groups3 g2 inner join groups g1 on g1.clientid =g2.clientId where g1.OriginalMemberId<>g2.OriginalMemberId
Here you can see what is wrong:
--set-based loop vs. set-based solution (225628 Zeile(n) betroffen
select * from groups2 g2 inner join groups3 g1 on g1.clientid =g2.clientId where g1.OriginalMemberId<>g2.OriginalMemberId
and g1.OriginalGroupId =g1.GroupId
1788922146965093881788922146962146962
July 28, 2015 at 6:54 am
Michael Meierruth (7/28/2015)
Luis,Why doesn't the GroupsHistory table have a ClientId column?
Why this wierd hierarchy - which it really isn't? It's more like a series.
To answer your first question, that table doesn't have a ClientId because of bad design and it was a real problem that we had to solve. With a good design, this process shouldn't be complicated at all.
This is not really a hierarchy, it's more a change log. However, it has the same structure as one of the most common representations for hierarchies: the Adjacency list.
July 28, 2015 at 6:58 am
aswin.iyer91 (7/27/2015)
Hi Luis,It was a interesting article though highlighting the perception that has been taken up for the RBAR and set based data traversing in SQL. The major thing which i found highlighting is the Indexing that has been done on the table which makes the data parsing quicker which by default since the traverse will get the necessary information from the non clustered indexes, so ideally speaking is that any indexed huge data will support quick data access even RBAR is implemented?
Correct me if i am wrong.
Thanks
With warm Regards
Aswin Sankar M
Yes, indexing will help, but it won't be as good as having a good set-based solution that can run several times faster. In a previous reply, I show which is the main command benefited by the index in the RBAR solution and the reason the index didn't benefit that much the other two.
July 28, 2015 at 7:09 am
karlheinz.ulonska (7/28/2015)
Thanks for your great article.:-)I have a problem with three “While”-loops and was searching for a better solution.
Instead of using the “The problematic solution” now I use your “Set-based Loop”. Thanks for bringing this Idea back in my mind.:-)
But I have some trouble with your article.
First Point: (it deals with the comment from sam.dahl (old Hand))
Code:
select COUNT ([Level]), MemberId, GroupId from GroupsHistory
group by MemberId, GroupId
order by 1 desc
I got some combinations MemberID, GroupId, where Count was greater 1.
Thank you for your comments.
I'm checking your comments, but I'm not sure if you're using the data generator from the article or the one from this forum. The code from the article has a mistake that generates a many to many relationship which causes several problems.
July 28, 2015 at 7:20 am
Hi Luis,
in my first post I use the old data.
in all other posts I use the new script.
So my first problem is solved, because I use of outdated data.
But the second is still actual.The three algorithms have different resultsets.
"Problematic solution" always stops after one step.
"set based solution" have sometimes the same groupid = originalgroupid and memberid = originalmemberid
"set-based loop" works
with greetings
Karl-Heinz
July 28, 2015 at 7:23 am
GoofyGuy (7/27/2015)
Terrific article, and thanks for it! It nicely shows how plan optimisation influences run-times and efficiency.
I agree, nicely done.
July 28, 2015 at 3:06 pm
Nice Article!
I'm defnitly trying this technique 🙂
Grtz,
Theo
July 28, 2015 at 5:58 pm
Great article.
I want to thank you for something else though. You put all of your scripts into a file and included it rather than leaving it to be copied and pasted. Wonderfully considerate of you. Thanks again.
ATBCharles Kincaid
July 31, 2015 at 9:27 pm
Very nicely done, Luis. I like detailed approach you took to the article as a whole and the performance data on each solution.
I agree with your conclusion that a cursor itself is not inherently evil when used to loop through groups of set-based updates. I've used it myself after testing against several alternatives.
Again, well done.
August 10, 2015 at 8:30 am
Nice Article,
there is a small mistake in the code though.
WITH rCTE AS(
SELECT g.ClientId, g.GroupId, g.MemberId, h.PreviousGroupId, h.PreviousMemberId
FROM hft.Groups g
JOIN hft.GroupsHistory h ON g.GroupId = h.GroupId AND g.MemberId = h.MemberId
UNION ALL
SELECT g.ClientId, g.GroupId, g.MemberId, h.PreviousGroupId, h.PreviousMemberId
FROM rCTE g
JOIN hft.GroupsHistory h ON g.PreviousGroupId = h.GroupId AND g.PreviousMemberId = h.MemberId
)
should be as follows:
WITH rCTE AS(
SELECT g.ClientId, g.GroupId, g.MemberId, h.PreviousGroupId, h.PreviousMemberId
FROM hft.Groups g
JOIN hft.GroupsHistory h ON g.GroupId = h.GroupId AND g.MemberId = h.MemberId
UNION ALL
SELECT g.ClientId, h.GroupId, h.MemberId, h.PreviousGroupId, h.PreviousMemberId
FROM rCTE g
JOIN hft.GroupsHistory h ON g.PreviousGroupId = h.GroupId AND g.PreviousMemberId = h.MemberId
)
August 13, 2015 at 12:27 pm
Great article Luis! 5 Stars sir! I just finished reading it again and it's an excellent article; I can't wait to see more (you are going to write more articles right?).
I really learned something and love the examples you put together. I have actually used what I learned in this article a few times since first reading it. The concept of a set-based loop is new to me and there's really not much out there about it; it's good to have something to refer people to.
Again, great work Luis.
-- Itzik Ben-Gan 2001
August 13, 2015 at 12:56 pm
Thank you all for the nice comments.
You're right Hassan, my code is wrong and that might be the reason for not getting the same results on the 2 approaches. I must admit that I didn't test the correctness of the results for the rCTE and the RBAR approach as those didn't give the adequate performance.
Alan, I'd love to write more articles but I'm missing ideas on good topics. I'll keep an eye on the Articles requested forum to try something.
August 31, 2015 at 3:58 am
Thanks Luis, for a well written article.
There are a couple of things to note when using this technique (I use set based loops regularly) that I think are worth re-iterating here:
1. try to make your WHILE statement perform ONE UPDATE or INSERT - in a similar way to an iTVF, they perform much better with one statement in the WHILE, with no BEGIN..END construct.
2. Watch out for halloween protection spoiling the party if the UPDATE/INSERT is reading from the same table/index it is updating/inserting - you may see this in the form of Spools in the execution plan - they will significantly affect performance.
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
Viewing 15 posts - 16 through 30 (of 49 total)
You must be logged in to reply to this topic. Login to reply