Any way to reduce the execution time for 8mil + rows table with cross join

Question

Post reply

Any way to reduce the execution time for 8mil + rows table with cross join

Cool2018

Old Hand

Points: 354

August 20, 2018 at 2:20 am

#361551

Dear All,

I have following Query in my production. My Table_B has 8mil + rows. If I have 4 role in Table_C, my query result will return 32mil + rows.
Current execution time taken is about 1 day. I am trying to reduce the execution time. When I create the Non-cluster Index with the columns from my where clause, it was a bit faster but not significant. Is there any way to reduce the execution time please? I tried to create NonCluster ColumnStore index in my local with 2.5mil rows but it was failed with memory error. Any suggestion please?

Select (CASE WHEN Table_A.[Column A] = 'YES' Table_B.Col_A atSE Table_A.Column_A END) AS Col_A,
.
.
.
From table_B
Cross Join Table_A
inner join TABLE_C on Table_A.colc=Table_C.colc
where
(Table_A.[Ag_CD] is NULL
OR
Table_AAg_CD]=Table_B.[Ag_CD]
OR
Table_A.[Ag_CD]=Table_B.[Ag_CD_New]
)
AND
(Table_A.[Dept] is NULL
OR
Table_A.[Dept]=Table_B.[Dept]
OR
Table_A.[Dept]=Table_B.[Dept_New]
)
AND
(Table_A.[Org1] is NULL
OR
Table_A.[Org1]=Table_B.[Org1]
OR
Table_A.[Org1]=Table_B.[Org1_New]
)
AND
(Table_A.[Org2] is NULL
OR
Table_A.[Org2]=Table_B.[Org2]
OR
Table_A.[CoOrg2]=Table_B.[Org2_New]
)
AND
(Table_A.[Org3] is NULL
OR
Table_A.[Org3]=Table_B.[Org3]
OR
Table_A.[Org3]=Table_B.[Org3_New]
)

Table structure as follows:

Table_A

UserID

Ag_CD

Dept

Org1

Org2

Org3

Org4

Org5

Role

001|002

R_All

001

A1

R_A

R_All

002

A1

Dept1

003

A1

Dept2

OrgA2

R_A

004

A1

Dept2

OrgA2

Org2

R_A

005

R_B_ALL

005

B1

R_B

006

B1

R_B

007

B1

R_B

Table_C

Role

Leave

Age

AgeGrp

Allowance

Sal

Branch

R_All

Y

N

Y

R_A

Y

N

R_B_ALL

N

Y

R_B

N

Y

N

Table_B

ID

Name

Ag_CD

Dept

Org1

Org2

Org3

Org4

Org5

Leave

Age

AgeGrp

Allowance

Sal

Branch

S_Date

P_Date

1

UserA

A1

Dept1

OrgA2

NULL

10

40

40-45

1000

10000

A

01/08/2018

08/08/2018

1

UserA

A1

Dept1

OrgA1

NULL

11

39

35-39

700

8000

B

01/08/2017

12/08/2017

2

UserB

A1

Dept1

OrgA2

NULL

10

40

40-45

1000

10000

B

01/08/2018

08/08/2018

3

UserC

B1

Dept1

OrgA1

NULL

10

40

40-45

1000

10000

A

01/08/2018

08/08/2018

4

UserD

B1

Dept2

OrgA2

Org2

NULL

10

40

40-45

1000

10000

C

01/08/2018

08/08/2018

Thank you for your time!

Best Regards

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Jonathan AC Roberts SSCoach Points: 18361 More actions · Answer 1

Firstly this doesn't look like a CROSS JOIN is needed. I think the query would produce the same results with an INNER JOIN:
SELECT * FROM table_B INNER JOIN Table_A ON(Table_A.[Ag_CD] IS NULL OR Table_A.[Ag_CD] = Table_B.[Ag_CD] OR Table_A.[Ag_CD] = Table_B.[Ag_CD_New]) AND (Table_A.[Dept] IS NULL OR Table_A.[Dept] = Table_B.[Dept] OR Table_A.[Dept] = Table_B.[Dept_New]) AND (Table_A.[Org1] IS NULL OR Table_A.[Org1] = Table_B.[Org1] OR Table_A.[Org1] = Table_B.[Org1_New]) AND (Table_A.[Org2] IS NULL OR Table_A.[Org2] = Table_B.[Org2] OR Table_A.[CoOrg2] = Table_B.[Org2_New]) AND (Table_A.[Org3] IS NULL OR Table_A.[Org3] = Table_B.[Org3] OR Table_A.[Org3] = Table_B.[Org3_New]) INNER JOIN TABLE_C ON Table_A.colc = Table_C.colc;

Have you got an index on TABLE_C(colc)? It might help
What indexes have you got on tables: Table_A and Table_B?
Can you post the DDL and some code to create some data?

Cool2018 Old Hand Points: 354 More actions · Answer 2

Dear Jonathan,

Yes. It is actually inner join already since we have join in where clause.
My indexes are as follows.

Table_A=>
Create Clustered Index on
(UserID, Ag_CD, Dept, Org1, Org2, Org3, Org4, Org5, Role)

Table_C =>
Create Clustered Index on (Role)

Table_B =>
Create Index on Table_B
(Ag_CD,
Ag_CD_New,
Dept,
Dept_New,
Org1,
Org1_New
Org2,
Org2_New,
Org3,
Org3_New
)
Please replace column name =Colc with Role.
Please find table structure as follows:

create table Table_A(
UserID VARCHAR(30),
Ag_CD VARCHAR(30),
Dept VARCHAR(30),
Org1 VARCHAR(30),
Org2 VARCHAR(30),
Role VARCHAR(30)
);

Create table Table_C(
Role VARCHAR(10),
Leave VARCHAR(10),
Age VARCHAR(10),
AgeGrp VARCHAR(10),
Allowance VARCHAR(10),
Sal VARCHAR(10),
Branch VARCHAR(10)
);

create Table Table_B(
ID VARCHAR(30),
Name VARCHAR(30),
Ag_CD VARCHAR(30),
S_Date Datetime NOT NULL,
P_Date Datetime NOT NULL,
Dept VARCHAR(30),
Org1 VARCHAR(30),
Org2 VARCHAR(30),
Org3 VARCHAR(30),
Leave INT,
Age INT,
AgeGrp VARCHAR(30),
Allowance VARCHAR(30),
Sal INT,
Branch VARCHAR(30),
Ag_CD_New VARCHAR(30),
Dept_New VARCHAR(30),
Org1_New VARCHAR(30),
Org2_New VARCHAR(30),
Org3_New VARCHAR(30)

);

You may use sample data from my 1st post.

Problem here is we cannot reduce the data amount.
Currently we have 8 mil + rows in Table_B. If we have 4 roles in Table_C, our result set become 32mil + rows. Do you have any idea to reduce the execution time? How to make faster the query?

Thank you so much for your response.

Best Regards,

Jonathan AC Roberts SSCoach Points: 18361 More actions · Answer 3

One way of approaching this is to try to separate the query into more than one query using temporary tables. So in your case you could try creating a temporary table from a join of tableA to tableB, or maybe tableA to tableC, then create indexes on the new temporary table and use that to join to the remaining third table.
You only need to select the columns that you will need to use in your query, so don't select them all
So for example:

SELECT A.Role, -- Only the columns you need
...
B.ColumnX, ...
INTO #AandB 
FROM TableA A
INNER JOIN TableB B ON
(A.[Ag_CD] IS NULL
OR A.[Ag_CD] = B.[Ag_CD]
OR A.[Ag_CD] = B.[Ag_CD_New])
AND (A.[Dept] IS NULL
 OR A.[Dept] = B.[Dept]
 OR A.[Dept] = B.[Dept_New])
AND (A.[Org1] IS NULL
 OR A.[Org1] = B.[Org1]
 OR A.[Org1] = B.[Org1_New])
AND (A.[Org2] IS NULL
 OR A.[Org2] = B.[Org2]
 OR A.[CoOrg2] = B.[Org2_New])
AND (A.[Org3] IS NULL
 OR A.[Org3] = B.[Org3]
 OR A.[Org3] = B.[Org3_New])

Then create any helpful indexes on the temporary table that will help with the next join to TableC:
CREATE INDEX IX_#AandC_1 ON #AandB(col....)
Then do the select joining #AandB to TableC
There will be a bit of trial and error on index creation (it's not obvious to me what indexes can help with that join of TableA to TableB) and which tables are best to join but you need to do a few experiments and time the results.

sgmunson SSC Guru Points: 110640 More actions · Answer 4

Cool2018 - Monday, August 20, 2018 2:20 AM
Dear All,
I have following Query in my production. My Table_B has 8mil + rows. If I have 4 role in Table_C, my query result will return 32mil + rows.
Current execution time taken is about 1 day. I am trying to reduce the execution time. When I create the Non-cluster Index with the columns from my where clause, it was a bit faster but not significant. Is there any way to reduce the execution time please? I tried to create NonCluster ColumnStore index in my local with 2.5mil rows but it was failed with memory error. Any suggestion please?
...

You mentioned that you were unable to create a nonclustered columnstore index due to a memory error, and you were only looking at 2.5 million rows. That may suggest you have some kind of RAM constraint. How much RAM does your server have available to SQL Server, and how much space do all of your databases use?

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

ScottPletcher SSC Guru Points: 101119 More actions · Answer 5

Do you really want every match from Table_B to Table_A or only the most relevant match? If you want to list every role, then this join will always be very big.

For a single "best match" to Table_A, you could add a unique column to Table_A (if it doesn't already have one) and store that column in Table_B. A trigger of Table_B could set the value whenever a row is INSERTed or UPDATEd.

Assuming the values in Table_B are relatively static, if you need every match, you should consider a separate "junction" table that stores the joins for Table_A to Table_B. You'd want to add a unique column to both tables -- if they didn't already have it -- to use in the junction table. You could perhaps use a materialized view or, again, use triggers to add the matching junction table row(s) when either table is INSERTed to or UPDATEd.

Btw:
-- The clus index for Table_A should not have UserID at the first column, rather it should be Ag_CD. If you need UserID in the clus index, move it to the end.
-- I'm somewhat confused by the duplicate "ID" values in Table_B. Typically an "ID" is unique.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Cool2018 Old Hand Points: 354 More actions · Answer 6

Dear ScottPletcher,

Yes. We need every match from Table_B to Table_A. We need to list every role whenever there is no value for Ag_CD or Dept or .... As such if we have 8 mil rows in Table_B and we have 6 roles in Table_A, it become 48mil+ rows in our output dataset.
Also we cannot add unique number from Table_A to Table_B since it has more than one role for each ID from Table_B.
Please assume our Table_A is unique. We already make it unique by concatenating the UserID into one for same group.
Is there any idea we can tune to be faster please?

Thank you,

Best Regards,

Cool2018 Old Hand Points: 354 More actions · Answer 7

Dear sgmumson,

Please find below for my ans for your question.

SELECT object_name, cntr_value

FROM sys.dm_os_performance_counters

WHERE counter_name IN ('Total Server Memory (KB)', 'Target Server Memory (KB)');

SQLServer:Memory Manager

3645440

SQLServer:Memory Manager

3625208

SELECT physical_memory_kb FROM sys.dm_os_sys_info;

16657508

For DB spaces:

Database Name	Database_Size	Unallocated space
DB_NAME	77518.00 MB	9548.98 MB

reserved	data	index_size	unused
43842448 KB	43161632 KB	453040 KB	227776 KB

Thank you,

Best Regards,

ScottPletcher SSC Guru Points: 101119 More actions · Answer 8

Cool2018 - Thursday, August 23, 2018 2:34 AM
Dear ScottPletcher,
Yes. We need every match from Table_B to Table_A. We need to list every role whenever there is no value for Ag_CD or Dept or .... As such if we have 8 mil rows in Table_B and we have 6 roles in Table_A, it become 48mil+ rows in our output dataset.
Also we cannot add unique number from Table_A to Table_B since it has more than one role for each ID from Table_B.
Please assume our Table_A is unique. We already make it unique by concatenating the UserID into one for same group.
Is there any idea we can tune to be faster please?
Thank you,
Best Regards,

The unique number is not "from" one table to another. It's unique only for the table it's on, such as an identity column or other unique #.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Cool2018 Old Hand Points: 354 More actions · Answer 9

Dear ScottPletcher,

Noted. Thanks.
According to your suggestion, what should I put in junction table? Is it table_A.uid & Table_B.uID? If I create materialize view, what would be the advantage for my current problem? Currently we use cluster column index in our PROD.it As per my knowledge, cluster column index is very suit for DW. It will give fastest performance but still have performance issue.
Our prod table has used partition as well. Any more idea you have to resolve this performance issue?

Thank you,
Regards,

sgmunson SSC Guru Points: 110640 More actions · Answer 10

Cool2018 - Thursday, August 23, 2018 3:14 AM

Dear sgmumson,

Please find below for my ans for your question.

SELECT object_name, cntr_value

FROM sys.dm_os_performance_counters

WHERE counter_name IN ('Total Server Memory (KB)', 'Target Server Memory (KB)');

SQLServer:Memory Manager	3645440
SQLServer:Memory Manager	3625208

SELECT physical_memory_kb FROM sys.dm_os_sys_info;

16657508

For DB spaces:

Database Name	Database_Size	Unallocated space
DB_NAME	77518.00 MB	9548.98 MB

reserved	data	index_size	unused
43842448 KB	43161632 KB	453040 KB	227776 KB

Thank you,

Best Regards,

Well, you have just 3.6 GB available memory to SQL Server on a 16 GB RAM server, and are expecting a 77 GB database to run faster with that kind of limitation on RAM ???
Hate to say this, but you are seriously RAM constrained. Assuming you have at least 4 cores on this box, you need a serious RAM upgrade. Other than leaving 2 GB out of
the 16 GB total for the OS, what else needs RAM on this box? You could probably speed things up at least some just by allocating more of the RAM for SQL Server. If you
have apps and web services all on this box, you might also be cpu constrained. You definitely need more RAM. I'd jump right up to 64 GB without a lot of hesitation. And if
you measure how much cpu you are using and find a high usage percentage, you might want to add more cores if possible, and if that server can't accommodate same, it's time
for a new server.

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

Cool2018 Old Hand Points: 354 More actions · Answer 11

Dear All,

Do you think Clustered columnstored index is suit for my case? I read Internet which said it suit for DW but I saw there is calculation included. Our case is just to join and extract. It's no need to calculate or no need group by. What I mean is our query is not like sql from BI. However our table has 8mil+ rows and about 350 columns. Is there any idea please? I still cannot create that index in my local computer. There is memory error. I cannot test it out. Is there any idea to resolve that memory issue?

Thank you,

Best Regards,

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 12

Cool2018 - Friday, August 24, 2018 8:10 PM
Dear All,
Do you think Clustered columnstored index is suit for my case? I read Internet which said it suit for DW but I saw there is calculation included. Our case is just to join and extract. It's no need to calculate or no need group by. What I mean is our query is not like sql from BI. However our table has 8mil+ rows and about 350 columns. Is there any idea please? I still cannot create that index in my local computer. There is memory error. I cannot test it out. Is there any idea to resolve that memory issue?
Thank you,
Best Regards,

What is the purpose of producing this "data"? Is this for some kind of Data Warehouse?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)