query performance vs server to server latency

  • I've tried running two different queries: one that was based on a much smaller query that didn't scale, and one that I thought would scale better, since it used a CTE to pre-process the data set to test. I've run both separately, and cancelled each after an hour. Both columns are varchar and neither is nullable. Running different portions of each query returns quick results. It's comparing ~56k rows to ~588k rows.

    The Thing That Shall Not Scale:

    select a1.sam_batch, count(*) as Records

    from aww_sample a1,

    [p42].sample.dbo.wireless_blocks p2

    where substring(a1.sam_phonenumber, 4, 9) = p2.match

    and a1.sam_batch between 437 and 449

    group by a1.sam_batch

    From Query To Eternity:

    with cc(sam_batch, phone) as (

    select sam_batch, SUBSTRING(sam_phonenumber, 4, 9)

    from aww_sample

    where sam_batch between 437 and 449

    )

    select cc.sam_batch, count(*)

    from cc,

    [p42].sample.dbo.wireless_blocks p2

    where cc.phone = p2.match

    group by cc.sam_batch

    So then I re-wrote things a touch and ran from the server with the larger table. It completed in 2 seconds (mostly) flat:

    with cc(sam_batch, phone) as (

    select sam_batch, SUBSTRING(sam_phonenumber, 4, 9)

    from [phlsql].capabilities.dbo.aww_sample

    where sam_batch between 437 and 449

    )

    select cc.sam_batch, count(*) as Records

    from cc,

    sample.dbo.wireless_blocks p2

    where cc.phone = p2.match

    group by cc.sam_batch

    Is there a smart SQL reason for this, or is the network the culprit?

  • erikd (10/9/2013)


    Is there a smart SQL reason for this, or is the network the culprit?

    Yes it could be either OR there are many reasons to explain your concern(actually surprise) here.

    Couple of days back, i have seen a post replied by paul white that how the art of writing (re-designing) of query can boost the performance.

    its basically how the data will be handled by memory buffer

    or table spooling/ work tables happeining in tempDB.

    how the index is taking help or retrieving data with the help of clustered index (key lookup and then sorting happening)

    So there are other factors too. i have spent time to search that post so that i can refer that here but couldnt retreive it . but i hope that either GAIL , JEFF or PAUL will hit this post and do the explanation. 🙂

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • /* ON THE SERVER phlsql */

    CREATE PROCEDURE dbo.FetchData

    AS

    SELECT sam_batch, SUBSTRING(sam_phonenumber, 4, 9) AS phone

    FROM capabilities.dbo.aww_sample

    WHERE sam_batch BETWEEN 437 AND 449;

    GO

    /* ON THE SERVER p42 */

    CREATE TABLE #Results (sam_batch [DATATYPE], phone VARCHAR(9));

    INSERT INTO #Results

    EXEC [phlsql].capabilities.dbo.FetchData;

    SELECT r.sam_batch, COUNT(*) as Records

    FROM #Results r

    INNER JOIN sample.dbo.wireless_blocks p2

    ON r.phone = p2.match

    GROUP BY r.sam_batch;

    The SQL Guy @ blogspot[/url]

    @SeanPearceSQL

    About Me[/url]

  • Sean Pearce (10/10/2013)


    /* ON THE SERVER phlsql */

    CREATE PROCEDURE dbo.FetchData

    AS

    SELECT sam_batch, SUBSTRING(sam_phonenumber, 4, 9) AS phone

    FROM capabilities.dbo.aww_sample

    WHERE sam_batch BETWEEN 437 AND 449;

    GO

    /* ON THE SERVER p42 */

    CREATE TABLE #Results (sam_batch [DATATYPE], phone VARCHAR(9));

    INSERT INTO #Results

    EXEC [phlsql].capabilities.dbo.FetchData;

    SELECT r.sam_batch, COUNT(*) as Records

    FROM #Results r

    INNER JOIN sample.dbo.wireless_blocks p2

    ON r.phone = p2.match

    GROUP BY r.sam_batch;

    This is probably the best solution because network traffic to support the join is minimised. There's another way to do it, using a join hint to send the small table to the server hosting the big table:

    SELECT

    a1.sam_batch,

    COUNT(*) as Records

    FROM [p42].sample.dbo.wireless_blocks p2

    INNER REMOTE JOIN aww_sample a1

    ON SUBSTRING(a1.sam_phonenumber, 4, 9) = p2.match

    AND a1.sam_batch BETWEEN 437 AND 449

    GROUP BY a1.sam_batch


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • Thanks, Sean and Chris. It was a one off query, otherwise I would have probably moved some stuff around. But it got me thinking about what the heck was taking so long.

    That's a neat trick with the remote hint. Another simple thing I've seen on here that I'm mad at myself for not knowing.

    I'm going to try it out with another query that kind of gets beat up by comparing to another remote table. I was, as a novice looking at execution plans, sort of surprised to find the two remote queries taking up 49% of the entire plan a piece; mostly because I thought parts of the query were bad enough to take up more than 1% 😉

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply