SQL query is taking 9 hours to complete

Question

SQL query is taking 9 hours to complete

surafeb7

SSC Enthusiast

Points: 185
More actions
April 3, 2022 at 12:47 am

#4009909

Overview:
Table D (20 mil records, 200 columns)
Table S (400k records, 4 columns)
on coalesce(D.[O_NAME_04],'') = coalesce(S.[O_NAME_04],'')
And coalesce(D.[O_NAME_05],'') = coalesce(S.[O_NAME_05],'')
And coalesce(D.[O_NAME_06],'') = coalesce(S.[O_NAME_06],'');
All the above columns may contain null values
This query is taking at least 8 - 9 hours min to produce 355753582 rows
About the environment
Microsoft SQL Server 2016 (SP1) (KB3182545) - 13.0.4001.0 (X64) Oct 28 2016 18:17:30 Copyright (c) Microsoft Corporation Enterprise Edition: Core-based Licensing (64-bit) on Windows Server 2012 R2 Standard 6.3 <X64> (Build 9600: ) (Hypervisor)
64GB RAM
HD- TBs of free space
What I have did so far:
1. To avoid coalesce function, I have introduced 3 new columns (Big int)
2. Created cluster index
3. Join these 3 new columns
still no changes in the performance. In other words it is taking more time.

Any suggestions to cut down the run time?
Thanks in advance
RR

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004340 More actions · Answer 1

All those NULLs and Blanks that you're joining on? Yeah... guess what they're doing? MASSIVE accidental CROSS JOIN which is more politely called "many-too-many" join just on those alone.

Second, are the name "triplets" in either of the two tables actually unique within the two tables/

Other than that and without you posting the CREATE TABLE and CREATE INDEX statements for either table and haven't told us what you're trying to do and haven't posted the rest of the code for the offending query, there's not much else we can to do help until we have such information.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jonathan AC Roberts SSCoach Points: 18354 More actions · Answer 2

Create two temporary tables which are copies of D and S but with the additional columns of [O_NAME_04_NOT_NULL], [O_NAME_05_NOT_NULL], [O_NAME_06_NOT_NULL] , then add a clustered index on the name columns.

e.g. for for table D:

SELECT *,
       coalesce(D.[O_NAME_04],'') AS [O_NAME_04_NOT_NULL],
       coalesce(D.[O_NAME_05],'') AS [O_NAME_05_NOT_NULL],
       coalesce(D.[O_NAME_06],'') AS [O_NAME_06_NOT_NULL]
  INTO #D
  FROM D;

CREATE CLUSTERED INDEX IX_#D_1 on #D([O_NAME_04_NOT_NULL], [O_NAME_05_NOT_NULL], [O_NAME_06_NOT_NULL]);

Do the same for table S.

Then join these two temporary tables using the _NOT_NULL columns:

 ON D.[O_NAME_04_NOT_NULL] = S.[O_NAME_04_NOT_NULL]
AND D.[O_NAME_05_NOT_NULL] = S.[O_NAME_05_NOT_NULL]
AND D.[O_NAME_06_NOT_NULL] = S.[O_NAME_06_NOT_NULL]

Hopefully you will then get a merge join which will be about as fast as it can get.

Jeff Moden SSC Guru Points: 1004340 More actions · Answer 3

I'm thinking that's still going to result in a massive accidental cross join I previously spoke of for the rows where all 3 of the join criteria are blank or null.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

surafeb7 SSC Enthusiast Points: 185 More actions · Answer 4

Thanks Jonathan and Jeff for your guidance and support.

Jonathan: Those 3 newly introduced columns contains KEYs I have created and using that for join. YES It is cluster index

Jeff: You are right, it is a massive data explode.

Let me explain why the data exploded. The main table (join keys + data) is the data table, and the reference table (user details + join keys) is the security table. The goal is to figure out who can view what. The output generates data sets for each user. For example if you are me are at the same level and have the same privilege to see a data set, this logic will produce a data set for you as well as for me. This will be used at the reporting level, with an AD-based filter applied.

This was created four years ago by a consultant, however it is now being delayed owing to data growth.

What I am doing?

When it came to my knowledge, I noticed the below issues

3 columns used to join the two tables (Info)
I found no index
Join key used coalesce
Those 3 columns are Nvarchar 255 data type
Join Keys contains NULLs

I introduced two intermediate tables

Introduced 3 new KEY columns with not null
Created cluster index

Even after these, I wasn't able to find any performance improvement. That's a surprise to me.

Hope I have explained enough!

Thanks & Regards

RR

Jeff Moden SSC Guru Points: 1004340 More actions · Answer 5

Ok... so the first and very important step is to "know thy enemy". With that, we have to know the "gazintas" for both tables. Run the following query one each of your two tables. You might want to store the results of each run in a working table so that you can do additional analysis later...

 SELECT Name_04, Name_05, Name_06, Cnt = COUNT(*)
   INTO dbo.SomeWorkingTable  --TODO: Change the table name here
   FROM dbo.OneOfTheTwoTables --TODO: Change the table name here
  GROUP BY Name_04, Name_05, Name_06
;

Then figure out what it really is that you're after from any group of rows identified in the working table as having more than 1 row in the group.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Sergiy SSC Guru Points: 110208 More actions · Answer 6

to produce 355753582 rows

Producing 335 mil rows cannot be fast, no matter what indexes are used in a query.

And any suggestions regarding COALESCE replacement might be generally correct but totally useless in this case.

You need to rethink the whole approach to the query.

How that huge recordset is used further?

If at the end you need to figure out the set of privileges for a specific user, then you need to build a query for a specific user, not for all of them at once.

Then that indexing and other optimization techniques might come to play.

_____________
Code for TallyGenerator

ScottPletcher SSC Guru Points: 101098 More actions · Answer 7

(1) You need to verify that the 3 values are unique on at least one table; otherwise you will get the equivalent of a CROSS JOIN for all non-unique groups. If the 3 values are not unique in one table, add $IDENTITY to the key to make it unique without SQL having to "uniquify" it.

(2) Are you encoding the char values to convert them to a bigint? (3)I don't see why you need a bigint rather than just an int, given that you have fewer than 60M different values.

(4) What is the fragmentation of the tables like?

(5) What is the tables' DDL, the query text and the query plan (estimated is fine for now)?

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

surafeb7 SSC Enthusiast Points: 185 More actions · Answer 8

Hey Jeff,

"SELECT Name_04, Name_05, Name_06, Cnt = COUNT(*)"

I executed the above query in the reference (security) table and the findings are as below. In total 1271 rows in total.

Cnt

299

300

301

302

303

304

305

306

307

308

309

311

312

313

314

315

316

317

318

319

320

321

322

323

326

328

337

surafeb7 SSC Enthusiast Points: 185 More actions · Answer 9

Hi Sergiy

Thanks for your time. I think so. It a new learning for me as the Index process took more time than the non-index query 🙂

The whole idea is to create a data set for each individual users + their access level. The output of these joins will bring an addition column as a username for each row. This username is the key value for each users to filter their entitlement to see the data.

Thanks & Regards

RR

Sergiy SSC Guru Points: 110208 More actions · Answer 10

surafeb7 wrote:

The whole idea is to create a data set for each individual users + their access level. The output of these joins will bring an addition column as a username for each row. This username is the key value for each users to filter their entitlement to see the data.

So, you better put those joins into a iTVF (aka "parameterized view) and invoke the function for each particular user/access privilege.

If built correctly, such function will be lightning fast very effective.

Building those "overall" huge recordsets is as terrible as common approach to this kind of tasks.

_____________
Code for TallyGenerator

Jeff Moden SSC Guru Points: 1004340 More actions · Answer 11

surafeb7 wrote:

Hey Jeff,
"SELECT Name_04, Name_05, Name_06, Cnt = COUNT(*)"
I executed the above query in the reference (security) table and the findings are as below. In total 1271 rows in total.
Cnt 299 300 301 302 303 304 305 306 307 308 309 311 312 313 314 315 316 317 318 319 320 321 322 323 326 328 337

So, what is in that table that you need to query all 20 million rows instead of just the 1,271 rows? Keep in mind that you've told us absolutely nothing about what is in the tables other than the 3 columns you're joining on and you've posting nothing of your query except the join criteria. You've not even explained what exactly it is that you're looking for other than the very nebulous description of "reference (security) table" and equally nebulous requirement of "The whole idea is to create a data set for each individual users + their access level."

The code I posted is step 1 and that is to identify the individual users. But, like I said, everything else is a total guess because of how little you've provided. You didn't even provide how long the query I provided took to run. :p

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

surafeb7 SSC Enthusiast Points: 185 More actions · Answer 12

My apologies, COVID delayed my progress.

Start working slowly.

Regards

RR

Jeff Moden SSC Guru Points: 1004340 More actions · Answer 13

surafeb7 wrote:

My apologies, COVID delayed my progress.
Start working slowly.
Regards
RR

You came down with COVID? Sorry to hear and glad you're still around. I know I sound like a Grandma but take care of yourself first and get some good sleep.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)