Performance issue with the table having 40M rows

  • INSERT INTO #tt
    SELECT DISTINCT dr.terr_grp_id,dr.terry_cd,dl.deal_line,dl.revs_no FROM deal_right dr
    inner join deal_line dl on dr.deal_id=dl.deal_id and dr.revs_no=dl.revs_no
    and dr.deal_line_no=dl.deal_line_no
    inner join #deal dt on dt
    .deal_id=dl.deal_id and dt.revs_no=dl.revs_no
    where dl.title_status_cd NOT IN('NS','REJ','W')

    The above SQL is running for more than 20 mints.
    #deal table will have around 19k records
    deal_right will have around 40M records
    deal_line is having around 8M records

    Please help me in fine tuning the above SQL.
    Please let me know if you need more info, so that I can post it.

  • You will need to give more details for the best answer.  Look at what Indexes are on each table and provide that info.  Table definitions, and some sample data would also help.

    With that being said you are joining to #deal but you are not returning any rows from that table, based on your sample code I would say this is not needed.
    Here is my stab in the dark at it.  This is just making the 'deal_line' a sub query, hoping that the WHERE will drop the number of rows needed to join.
    INSERT INTO #tt
    SELECT dr.terr_grp_id, dr.terry_cd, DL2.deal_line, DL2.revs_no
    FROM deal_right AS dr
        INNER JOIN (SELECT dl.deal_line, dl.revs_no, dl.deal_id, dl.deal_line_no
                    FROM deal_line AS dl
                    WHERE dl.title_status_cd NOT IN ('NS','REJ','W')
                    GROUP BY dl.deal_line, dl.revs_no, dl.deal_id, dl.deal_line_no) AS DL2
            ON dr.deal_id = DL2.deal_id
            AND dr.revs_no = DL2.revs_no
            AND dr.deal_line_no = DL2.deal_line_no
    GROUP BY dr.terr_grp_id, dr.terry_cd, DL2.deal_line, DL2.revs_no
    ;

    -------------------------------------------------------------
    we travel not to escape life but for life not to escape us
    Don't fear failure, fear regret.

  • Best of all, post the actual execution plan so we can understand how the optimizer is resolving your query and how well the statistics match your data.

    One immediate point, why a DISTINCT operator? That's an aggregation operation and is frequently costly. Most data, if correctly structured with enforced referential constraints, shouldn't need it. That's the first thing I'd look to eliminate from the query to enhance performance.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Dear Experts, 
                              Index scan is happening for the below SQL. Is there a way to change it to index seek . Also attached the execution plan. avt_deal_restriction is having around 12M records and # lines have .6M records. 
    The final output is around 7500 after grouping. Please help me. 

    INSERT #terr_grp(terr_grp_id,deal_id,revs_no)
       SELECT
        a.terr_grp_id
             ,A.deal_id,
             A.revs_no
       FROM dbo.avt_deal_restriction AS a
       INNER JOIN #lines dt
        ON a.deal_id = dt.deal_id
        AND a.revs_no = dt.revs_no
             and a.deal_line_no=dt.deal_line_no
       WHERE NOT EXISTS (SELECT
        1
       FROM #terr_grp
       WHERE #terr_grp.terr_grp_id = a.terr_grp_id)
            group by a.terr_grp_id
             ,A.deal_id,
             A.revs_no

  • archana.mandiriappan - Wednesday, May 9, 2018 6:08 AM

    Dear Experts, 
                              Index scan is happening for the below SQL. Is there a way to change it to index seek . Also attached the execution plan. avt_deal_restriction is having around 12M records and # lines have .6M records. 
    The final output is around 7500 after grouping. Please help me. 

    INSERT #terr_grp(terr_grp_id,deal_id,revs_no)
       SELECT
        a.terr_grp_id
             ,A.deal_id,
             A.revs_no
       FROM dbo.avt_deal_restriction AS a
       INNER JOIN #lines dt
        ON a.deal_id = dt.deal_id
        AND a.revs_no = dt.revs_no
             and a.deal_line_no=dt.deal_line_no
       WHERE NOT EXISTS (SELECT
        1
       FROM #terr_grp
       WHERE #terr_grp.terr_grp_id = a.terr_grp_id)
            group by a.terr_grp_id
             ,A.deal_id,
             A.revs_no

    Wow, looks a lot different than original post, but OK.
    1.  Is there an index on terr_grp_id on the avt_deal table?  Can you add one?
    2.  Have you tested without joining to the #lines?  Is it faster? # rows returned?  You may want to add an index to this temp table for the fields you are joining on.
    3.  Maybe try and eliminate the rows not needed from the avt_deal table before joining to #lines. (maybe the SQL is smart enough to do that, but i would test it)
    4.  How many rows on the #terr table?  any indexes?

    -------------------------------------------------------------
    we travel not to escape life but for life not to escape us
    Don't fear failure, fear regret.

  • So you're moving 3.3 million rows filtered down after the JOINs, etc., from 120 million rows. In short, you're moving close to 3% of the data. That's actually not usually selective enough to see a seek operation being useful in most cases. Selectivity to make a seek useful is usually down around .1% or less (although, sometimes, it's different, varies based on the volume, the query, etc., etc.). It's down to the optimizer choosing to use sequential I/O (scan) versus random I/O (seek). When dealing with large data volumes, the scan can work better. You are absolutely in a position of large data volumes.

    The way the query is written, there is no filtering on the 120 million row table. You're counting on the aggregation of the other table on what is effectively just a join, as was already stated, to do the elimination. I agree with @below86. Try it using a JOIN.

    Also, be sure the  statistics are up to date on everything.  That  #terr_grp  table only seems to have a single row.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • As always, clustering key is the most critical thing.

    Do you (almost) always specify the deal_id when reading the dbo.avt_deal_restriction table?  If so, cluster that table on ( deal_id, revs_no, deal_line_no ) (or put deal_line_no first, depending on how you query the table and how many rows of each type there are). 
    Cluster the #lines table the same way: ( deal_id, revs_no, deal_line_no ).

    That should give you a MERGE join, which is fastest when it's proper to use.

    For #terr_grp, for just an exists, a nonclus index on terr_grp_id is fine.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • You're also using a GROUP BY as a replacement for DISTINCT.  That usually means that the criteria is insufficient for the job.

    Also, what you posted is an estimated execution plan, which could be a bit different from the actual execution plan.  For performance problems, we need a bit more.  Please see the article at the second link in my signature line below under "Helpful Links".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply