Performance issue with the table having 40M rows

Question

Performance issue with the table having 40M rows

archana.mandiriappan

SSC Veteran

Points: 256
More actions
May 8, 2018 at 10:06 am

#410483

INSERT INTO #tt
SELECT DISTINCT dr.terr_grp_id,dr.terry_cd,dl.deal_line,dl.revs_no FROM deal_right dr
inner join deal_line dl on dr.deal_id=dl.deal_id and dr.revs_no=dl.revs_no
and dr.deal_line_no=dl.deal_line_no
inner join #deal dt on dt.deal_id=dl.deal_id and dt.revs_no=dl.revs_no
where dl.title_status_cd NOT IN('NS','REJ','W')
The above SQL is running for more than 20 mints.
#deal table will have around 19k records
deal_right will have around 40M records
deal_line is having around 8M records
Please help me in fine tuning the above SQL.
Please let me know if you need more info, so that I can post it.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

below86 SSChampion Points: 11538 More actions · Answer 1

You will need to give more details for the best answer. Look at what Indexes are on each table and provide that info. Table definitions, and some sample data would also help.

With that being said you are joining to #deal but you are not returning any rows from that table, based on your sample code I would say this is not needed.
Here is my stab in the dark at it. This is just making the 'deal_line' a sub query, hoping that the WHERE will drop the number of rows needed to join.
INSERT INTO #tt SELECT dr.terr_grp_id, dr.terry_cd, DL2.deal_line, DL2.revs_no FROM deal_right AS dr INNER JOIN (SELECT dl.deal_line, dl.revs_no, dl.deal_id, dl.deal_line_no FROM deal_line AS dl WHERE dl.title_status_cd NOT IN ('NS','REJ','W') GROUP BY dl.deal_line, dl.revs_no, dl.deal_id, dl.deal_line_no) AS DL2 ON dr.deal_id = DL2.deal_id AND dr.revs_no = DL2.revs_no AND dr.deal_line_no = DL2.deal_line_no GROUP BY dr.terr_grp_id, dr.terry_cd, DL2.deal_line, DL2.revs_no ;

-------------------------------------------------------------
we travel not to escape life but for life not to escape us
Don't fear failure, fear regret.

Grant Fritchey SSC Guru Points: 398908 More actions · Answer 2

Best of all, post the actual execution plan so we can understand how the optimizer is resolving your query and how well the statistics match your data.

One immediate point, why a DISTINCT operator? That's an aggregation operation and is frequently costly. Most data, if correctly structured with enforced referential constraints, shouldn't need it. That's the first thing I'd look to eliminate from the query to enhance performance.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

archana.mandiriappan SSC Veteran Points: 256 More actions · Answer 3

Dear Experts,
Index scan is happening for the below SQL. Is there a way to change it to index seek . Also attached the execution plan. avt_deal_restriction is having around 12M records and # lines have .6M records.
The final output is around 7500 after grouping. Please help me.

INSERT #terr_grp(terr_grp_id,deal_id,revs_no) SELECT a.terr_grp_id ,A.deal_id, A.revs_no FROM dbo.avt_deal_restriction AS a INNER JOIN #lines dt ON a.deal_id = dt.deal_id AND a.revs_no = dt.revs_no and a.deal_line_no=dt.deal_line_no WHERE NOT EXISTS (SELECT 1 FROM #terr_grp WHERE #terr_grp.terr_grp_id = a.terr_grp_id) group by a.terr_grp_id ,A.deal_id, A.revs_no

below86 SSChampion Points: 11538 More actions · Answer 4

archana.mandiriappan - Wednesday, May 9, 2018 6:08 AM
Dear Experts,
Index scan is happening for the below SQL. Is there a way to change it to index seek . Also attached the execution plan. avt_deal_restriction is having around 12M records and # lines have .6M records.
The final output is around 7500 after grouping. Please help me.
INSERT #terr_grp(terr_grp_id,deal_id,revs_no) SELECT a.terr_grp_id ,A.deal_id, A.revs_no FROM dbo.avt_deal_restriction AS a INNER JOIN #lines dt ON a.deal_id = dt.deal_id AND a.revs_no = dt.revs_no and a.deal_line_no=dt.deal_line_no WHERE NOT EXISTS (SELECT 1 FROM #terr_grp WHERE #terr_grp.terr_grp_id = a.terr_grp_id) group by a.terr_grp_id ,A.deal_id, A.revs_no

Wow, looks a lot different than original post, but OK.
1. Is there an index on terr_grp_id on the avt_deal table? Can you add one?
2. Have you tested without joining to the #lines? Is it faster? # rows returned? You may want to add an index to this temp table for the fields you are joining on.
3. Maybe try and eliminate the rows not needed from the avt_deal table before joining to #lines. (maybe the SQL is smart enough to do that, but i would test it)
4. How many rows on the #terr table? any indexes?

-------------------------------------------------------------
we travel not to escape life but for life not to escape us
Don't fear failure, fear regret.

Grant Fritchey SSC Guru Points: 398908 More actions · Answer 5

So you're moving 3.3 million rows filtered down after the JOINs, etc., from 120 million rows. In short, you're moving close to 3% of the data. That's actually not usually selective enough to see a seek operation being useful in most cases. Selectivity to make a seek useful is usually down around .1% or less (although, sometimes, it's different, varies based on the volume, the query, etc., etc.). It's down to the optimizer choosing to use sequential I/O (scan) versus random I/O (seek). When dealing with large data volumes, the scan can work better. You are absolutely in a position of large data volumes.

The way the query is written, there is no filtering on the 120 million row table. You're counting on the aggregation of the other table on what is effectively just a join, as was already stated, to do the elimination. I agree with @below86. Try it using a JOIN.

Also, be sure the statistics are up to date on everything. That #terr_grp table only seems to have a single row.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

ScottPletcher SSC Guru Points: 101118 More actions · Answer 6

As always, clustering key is the most critical thing.

Do you (almost) always specify the deal_id when reading the dbo.avt_deal_restriction table? If so, cluster that table on ( deal_id, revs_no, deal_line_no ) (or put deal_line_no first, depending on how you query the table and how many rows of each type there are).
Cluster the #lines table the same way: ( deal_id, revs_no, deal_line_no ).

That should give you a MERGE join, which is fastest when it's proper to use.

For #terr_grp, for just an exists, a nonclus index on terr_grp_id is fine.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Jeff Moden SSC Guru Points: 1004402 More actions · Answer 7

You're also using a GROUP BY as a replacement for DISTINCT. That usually means that the criteria is insufficient for the job.

Also, what you posted is an estimated execution plan, which could be a bit different from the actual execution plan. For performance problems, we need a bit more. Please see the article at the second link in my signature line below under "Helpful Links".

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)