October 9, 2013 at 10:48 am
I've tried running two different queries: one that was based on a much smaller query that didn't scale, and one that I thought would scale better, since it used a CTE to pre-process the data set to test. I've run both separately, and cancelled each after an hour. Both columns are varchar and neither is nullable. Running different portions of each query returns quick results. It's comparing ~56k rows to ~588k rows.
The Thing That Shall Not Scale:
select a1.sam_batch, count(*) as Records
from aww_sample a1,
[p42].sample.dbo.wireless_blocks p2
where substring(a1.sam_phonenumber, 4, 9) = p2.match
and a1.sam_batch between 437 and 449
group by a1.sam_batch
From Query To Eternity:
with cc(sam_batch, phone) as (
select sam_batch, SUBSTRING(sam_phonenumber, 4, 9)
from aww_sample
where sam_batch between 437 and 449
)
select cc.sam_batch, count(*)
from cc,
[p42].sample.dbo.wireless_blocks p2
where cc.phone = p2.match
group by cc.sam_batch
So then I re-wrote things a touch and ran from the server with the larger table. It completed in 2 seconds (mostly) flat:
with cc(sam_batch, phone) as (
select sam_batch, SUBSTRING(sam_phonenumber, 4, 9)
from [phlsql].capabilities.dbo.aww_sample
where sam_batch between 437 and 449
)
select cc.sam_batch, count(*) as Records
from cc,
sample.dbo.wireless_blocks p2
where cc.phone = p2.match
group by cc.sam_batch
Is there a smart SQL reason for this, or is the network the culprit?
October 10, 2013 at 3:46 am
erikd (10/9/2013)
Is there a smart SQL reason for this, or is the network the culprit?
Yes it could be either OR there are many reasons to explain your concern(actually surprise) here.
Couple of days back, i have seen a post replied by paul white that how the art of writing (re-designing) of query can boost the performance.
its basically how the data will be handled by memory buffer
or table spooling/ work tables happeining in tempDB.
how the index is taking help or retrieving data with the help of clustered index (key lookup and then sorting happening)
So there are other factors too. i have spent time to search that post so that i can refer that here but couldnt retreive it . but i hope that either GAIL , JEFF or PAUL will hit this post and do the explanation. 🙂
-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)
October 10, 2013 at 6:17 am
/* ON THE SERVER phlsql */
CREATE PROCEDURE dbo.FetchData
AS
SELECT sam_batch, SUBSTRING(sam_phonenumber, 4, 9) AS phone
FROM capabilities.dbo.aww_sample
WHERE sam_batch BETWEEN 437 AND 449;
GO
/* ON THE SERVER p42 */
CREATE TABLE #Results (sam_batch [DATATYPE], phone VARCHAR(9));
INSERT INTO #Results
EXEC [phlsql].capabilities.dbo.FetchData;
SELECT r.sam_batch, COUNT(*) as Records
FROM #Results r
INNER JOIN sample.dbo.wireless_blocks p2
ON r.phone = p2.match
GROUP BY r.sam_batch;
October 10, 2013 at 7:04 am
Sean Pearce (10/10/2013)
/* ON THE SERVER phlsql */
CREATE PROCEDURE dbo.FetchData
AS
SELECT sam_batch, SUBSTRING(sam_phonenumber, 4, 9) AS phone
FROM capabilities.dbo.aww_sample
WHERE sam_batch BETWEEN 437 AND 449;
GO
/* ON THE SERVER p42 */
CREATE TABLE #Results (sam_batch [DATATYPE], phone VARCHAR(9));
INSERT INTO #Results
EXEC [phlsql].capabilities.dbo.FetchData;
SELECT r.sam_batch, COUNT(*) as Records
FROM #Results r
INNER JOIN sample.dbo.wireless_blocks p2
ON r.phone = p2.match
GROUP BY r.sam_batch;
This is probably the best solution because network traffic to support the join is minimised. There's another way to do it, using a join hint to send the small table to the server hosting the big table:
SELECT
a1.sam_batch,
COUNT(*) as Records
FROM [p42].sample.dbo.wireless_blocks p2
INNER REMOTE JOIN aww_sample a1
ON SUBSTRING(a1.sam_phonenumber, 4, 9) = p2.match
AND a1.sam_batch BETWEEN 437 AND 449
GROUP BY a1.sam_batch
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
October 10, 2013 at 8:33 am
Thanks, Sean and Chris. It was a one off query, otherwise I would have probably moved some stuff around. But it got me thinking about what the heck was taking so long.
That's a neat trick with the remote hint. Another simple thing I've seen on here that I'm mad at myself for not knowing.
I'm going to try it out with another query that kind of gets beat up by comparing to another remote table. I was, as a novice looking at execution plans, sort of surprised to find the two remote queries taking up 49% of the entire plan a piece; mostly because I thought parts of the query were bad enough to take up more than 1% 😉
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply