Any way to reduce the execution time for 8mil + rows table with cross join

  • Dear All,

    I have following Query in my production. My Table_B has 8mil + rows. If I have 4 role in Table_C, my query result will return 32mil + rows.
    Current execution time taken is about 1 day. I am trying to reduce the execution time. When I create the Non-cluster Index with the columns from my where clause, it was a bit faster but not significant. Is there any way to reduce the execution time please? I tried to create NonCluster ColumnStore index in my local with 2.5mil rows but it was failed with memory error. Any suggestion please?

    Select (CASE WHEN Table_A.[Column A] = 'YES' Table_B.Col_A atSE Table_A.Column_A END) AS Col_A,
    .
    .
    .
    From table_B
    Cross Join Table_A
    inner join TABLE_C on Table_A.colc=Table_C.colc
    where
    (Table_A.[Ag_CD] is NULL
    OR
    Table_AAg_CD]=Table_B.[Ag_CD]
    OR
    Table_A.[Ag_CD]=Table_B.[Ag_CD_New]
    )
    AND
    (Table_A.[Dept] is NULL
    OR
    Table_A.[Dept]=Table_B.[Dept]
    OR
    Table_A.[Dept]=Table_B.[Dept_New]
    )
    AND
    (Table_A.[Org1] is NULL
    OR
    Table_A.[Org1]=Table_B.[Org1]
    OR
    Table_A.[Org1]=Table_B.[Org1_New]
    )
    AND
    (Table_A.[Org2] is NULL
    OR
    Table_A.[Org2]=Table_B.[Org2]
    OR
    Table_A.[CoOrg2]=Table_B.[Org2_New]
    )
    AND
    (Table_A.[Org3] is NULL
    OR
    Table_A.[Org3]=Table_B.[Org3]
    OR
    Table_A.[Org3]=Table_B.[Org3_New]
    )

    Table structure as follows:

    Table_A
    UserIDAg_CDDeptOrg1Org2Org3Org4Org5Role
    001|002R_All
    001A1R_A

    R_All
    002A1Dept1
    003A1Dept2OrgA2R_A
    004A1Dept2OrgA2Org2R_A
    005R_B_ALL
    005B1R_B
    006B1R_B
    007B1R_B
    Table_C
    RoleLeaveAgeAgeGrpAllowanceSalBranch
    R_AllYYYYNY
    R_AYYYNNN
    R_B_ALLNNYYYY
    R_BNNNYYN
    Table_B
    IDNameAg_CDDeptOrg1Org2Org3Org4Org5LeaveAgeAgeGrpAllowanceSalBranchS_DateP_Date
    1UserAA1Dept1OrgA2NULLNULLNULLNULL104040-45100010000A01/08/201808/08/2018
    1UserAA1Dept1OrgA1NULLNULLNULLNULL113935-397008000B01/08/201712/08/2017
    2UserBA1Dept1OrgA2NULLNULLNULLNULL104040-45100010000B01/08/201808/08/2018
    3UserCB1Dept1OrgA1NULLNULLNULLNULL104040-45100010000A01/08/201808/08/2018
    4UserDB1Dept2OrgA2Org2NULLNULLNULL104040-45100010000C01/08/201808/08/2018

    Thank you for your time!

    Best Regards

  • Firstly this doesn't look like a CROSS JOIN is needed. I think the query would produce the same results with an INNER JOIN:
    SELECT *
    FROM table_B
    INNER JOIN Table_A
       ON(Table_A.[Ag_CD] IS NULL
         OR Table_A.[Ag_CD] = Table_B.[Ag_CD]
         OR Table_A.[Ag_CD] = Table_B.[Ag_CD_New])
        AND (Table_A.[Dept] IS NULL
          OR Table_A.[Dept] = Table_B.[Dept]
          OR Table_A.[Dept] = Table_B.[Dept_New])
        AND (Table_A.[Org1] IS NULL
          OR Table_A.[Org1] = Table_B.[Org1]
          OR Table_A.[Org1] = Table_B.[Org1_New])
        AND (Table_A.[Org2] IS NULL
          OR Table_A.[Org2] = Table_B.[Org2]
          OR Table_A.[CoOrg2] = Table_B.[Org2_New])
        AND (Table_A.[Org3] IS NULL
          OR Table_A.[Org3] = Table_B.[Org3]
          OR Table_A.[Org3] = Table_B.[Org3_New])
    INNER JOIN TABLE_C
       ON Table_A.colc = Table_C.colc;

    Have you got an index on TABLE_C(colc)? It might help
    What indexes have you got on tables: Table_A and Table_B?
    Can you post the DDL and some code to create some data?

  • Dear Jonathan,

    Yes. It is actually inner join already since we have join in where clause.
    My indexes are as follows.

    Table_A=>
    Create Clustered Index on
    (UserID, Ag_CD, Dept, Org1, Org2, Org3, Org4, Org5, Role)

    Table_C =>
    Create Clustered Index on (Role)

    Table_B =>
    Create Index on Table_B
    (Ag_CD,
    Ag_CD_New,
    Dept,
    Dept_New,
    Org1,
    Org1_New
    Org2,
    Org2_New,
    Org3,
    Org3_New
    )
    Please replace column name =Colc with Role.
    Please find table structure as follows:

    create table Table_A(
    UserID VARCHAR(30),
    Ag_CD VARCHAR(30),
    Dept VARCHAR(30),
    Org1 VARCHAR(30),
    Org2 VARCHAR(30),
    Role VARCHAR(30)
    );

    Create table Table_C(
    Role VARCHAR(10),
    Leave VARCHAR(10),
    Age VARCHAR(10),
    AgeGrp VARCHAR(10),
    Allowance VARCHAR(10),
    Sal VARCHAR(10),
    Branch VARCHAR(10)
    );

    create Table Table_B(
    ID VARCHAR(30),
    Name VARCHAR(30),
    Ag_CD VARCHAR(30),
    S_Date Datetime NOT NULL,
    P_Date Datetime NOT NULL,
    Dept VARCHAR(30),
    Org1 VARCHAR(30),
    Org2 VARCHAR(30),
    Org3 VARCHAR(30),
    Leave INT,
    Age INT,
    AgeGrp VARCHAR(30),
    Allowance VARCHAR(30),
    Sal INT,
    Branch VARCHAR(30),
    Ag_CD_New VARCHAR(30),
    Dept_New VARCHAR(30),
    Org1_New VARCHAR(30),
    Org2_New VARCHAR(30),
    Org3_New VARCHAR(30)

    );

    You may use sample data from my 1st post.

    Problem here is we cannot reduce the data amount.
    Currently we have 8 mil + rows in Table_B. If we have 4 roles in Table_C, our result set become 32mil + rows. Do you have any idea to reduce the execution time? How to make faster the query?

    Thank you so much for your response.

    Best Regards,

  • One way of approaching this is to try to separate the query into more than one query using temporary tables. So in your case you could try creating a temporary table from a join of tableA to tableB, or maybe tableA to tableC, then create indexes on the new temporary table and use that to join to the remaining third table.
    You only need to select the columns that you will need to use in your query, so don't select them all
    So for example:
    SELECT A.Role, -- Only the columns you need
    ...
    B.ColumnX, ...
    INTO #AandB 
    FROM TableA A
    INNER JOIN TableB B ON
    (A.[Ag_CD] IS NULL
    OR A.[Ag_CD] = B.[Ag_CD]
    OR A.[Ag_CD] = B.[Ag_CD_New])
    AND (A.[Dept] IS NULL
    OR A.[Dept] = B.[Dept]
    OR A.[Dept] = B.[Dept_New])
    AND (A.[Org1] IS NULL
    OR A.[Org1] = B.[Org1]
    OR A.[Org1] = B.[Org1_New])
    AND (A.[Org2] IS NULL
    OR A.[Org2] = B.[Org2]
    OR A.[CoOrg2] = B.[Org2_New])
    AND (A.[Org3] IS NULL
    OR A.[Org3] = B.[Org3]
    OR A.[Org3] = B.[Org3_New])

    Then create any helpful indexes on the temporary table that will help with the next join to TableC:
    CREATE INDEX IX_#AandC_1 ON #AandB(col....)
    Then do the select joining #AandB to TableC
    There will be a bit of trial and error on index creation (it's not obvious to me what indexes can help with that join of TableA to TableB) and which tables are best to join but you need to do a few experiments and time the results.

  • Cool2018 - Monday, August 20, 2018 2:20 AM

    Dear All,

    I have following Query in my production. My Table_B has 8mil + rows. If I have 4 role in Table_C, my query result will return 32mil + rows.
    Current execution time taken is about 1 day. I am trying to reduce the execution time. When I create the Non-cluster Index with the columns from my where clause, it was a bit faster but not significant. Is there any way to reduce the execution time please? I tried to create NonCluster ColumnStore index in my local with 2.5mil rows but it was failed with memory error. Any suggestion please?

    ...

    You mentioned that you were unable to create a nonclustered columnstore index due to a memory error, and you were only looking at 2.5 million rows.   That may suggest you have some kind of RAM constraint.  How much RAM does your server have available to SQL Server, and how much space do all of your databases use?

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Do you really want every match from Table_B to Table_A or only the most relevant match?  If you want to list every role, then this join will always be very big.

    For a single "best match" to Table_A, you could add a unique column to Table_A (if it doesn't already have one) and store that column in Table_B.  A trigger of Table_B could set the value whenever a row is INSERTed or UPDATEd.

    Assuming the values in Table_B are relatively static, if you need every match, you should consider a separate "junction" table that stores the joins for Table_A to Table_B.  You'd want to add a unique column to both tables -- if they didn't already have it -- to use in the junction table.  You could perhaps use a materialized view or, again, use triggers to add the matching junction table row(s) when either table is INSERTed to or UPDATEd.

    Btw:
    -- The clus index for Table_A should not have UserID at the first column, rather it should be Ag_CD.  If you need UserID in the clus index, move it to the end.
    -- I'm somewhat confused by the duplicate "ID" values in Table_B.  Typically an "ID" is unique.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Dear ScottPletcher,

    Yes. We need every match from Table_B to Table_A. We need to list every role whenever there is no value for Ag_CD or Dept or .... As such if we have 8 mil rows in Table_B and we have 6 roles in Table_A, it become 48mil+ rows in our output dataset.
    Also we cannot add unique number from Table_A to Table_B since it has more than one role for each ID from Table_B.
    Please assume our Table_A is unique. We already make it unique by concatenating the UserID into one for same group. 
    Is there any idea we can tune to be faster please?

    Thank you,

    Best Regards,

  • Dear sgmumson,

    Please find below for my ans for your question.

    SELECT object_name, cntr_value

    FROM sys.dm_os_performance_counters

    WHERE counter_name IN ('Total Server Memory (KB)', 'Target Server Memory (KB)');

    SQLServer:Memory Manager                                                                                                        

    3645440

    SQLServer:Memory Manager                                                                                                        

    3625208


    SELECT physical_memory_kb FROM sys.dm_os_sys_info;

    16657508

    For DB spaces:

    Database NameDatabase_SizeUnallocated space
    DB_NAME77518.00 MB9548.98 MB
    reserveddataindex_sizeunused
    43842448 KB43161632 KB453040 KB227776 KB

    Thank you,

    Best Regards,

  • Cool2018 - Thursday, August 23, 2018 2:34 AM

    Dear ScottPletcher,

    Yes. We need every match from Table_B to Table_A. We need to list every role whenever there is no value for Ag_CD or Dept or .... As such if we have 8 mil rows in Table_B and we have 6 roles in Table_A, it become 48mil+ rows in our output dataset.
    Also we cannot add unique number from Table_A to Table_B since it has more than one role for each ID from Table_B.
    Please assume our Table_A is unique. We already make it unique by concatenating the UserID into one for same group. 
    Is there any idea we can tune to be faster please?

    Thank you,

    Best Regards,

    The unique number is not "from" one table to another.  It's unique only for the table it's on, such as an identity column or other unique #.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Dear ScottPletcher,

    Noted. Thanks.
    According to your suggestion, what should I put in junction table? Is it table_A.uid & Table_B.uID? If I create materialize view, what would be the advantage for my current problem? Currently we use cluster column index in our PROD.it As per my knowledge, cluster column index is very suit for DW. It will give fastest performance but still have performance issue.
    Our prod table has used partition as well. Any more idea you have to resolve this performance issue?

    Thank you,
    Regards,

  • Cool2018 - Thursday, August 23, 2018 3:14 AM

    Dear sgmumson,

    Please find below for my ans for your question.

    SELECT object_name, cntr_value

    FROM sys.dm_os_performance_counters

    WHERE counter_name IN ('Total Server Memory (KB)', 'Target Server Memory (KB)');

    SQLServer:Memory Manager                                                                                                        3645440
    SQLServer:Memory Manager                                                                                                        3625208


    SELECT physical_memory_kb FROM sys.dm_os_sys_info;

    16657508


    For DB spaces:

    Database NameDatabase_SizeUnallocated space
    DB_NAME77518.00 MB9548.98 MB
    reserveddataindex_sizeunused
    43842448 KB43161632 KB453040 KB227776 KB

    Thank you,

    Best Regards,

    Well, you have just 3.6 GB available memory to SQL Server on a 16 GB RAM server, and are expecting a 77 GB database to run faster with that kind of limitation on RAM ???
    Hate to say this, but you are seriously RAM constrained.   Assuming you have at least 4 cores on this box, you need a serious RAM upgrade.   Other than leaving 2 GB out of
    the 16 GB total for the OS, what else needs RAM on this box?   You could probably speed things up at least some just by allocating more of the RAM for SQL Server.   If you
    have apps and web services all on this box, you might also be cpu constrained.   You definitely need more RAM.   I'd jump right up to 64 GB without a lot of hesitation.   And if
    you measure how much cpu you are using and find a high usage percentage, you might want to add more cores if possible, and if that server can't accommodate same, it's time
    for a new server.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Dear All,

    Do you think Clustered columnstored index is suit for my case? I read Internet which said it suit for DW but I saw there is calculation included. Our case is just to join and extract. It's no need to calculate or no need group by. What I mean is our query is not like sql from BI. However our table has 8mil+ rows and about 350 columns. Is there any idea please? I still cannot create that index in my local computer. There is memory error. I cannot test it out. Is there any idea to resolve that memory issue?

    Thank you,

    Best Regards,

  • Cool2018 - Friday, August 24, 2018 8:10 PM

    Dear All,

    Do you think Clustered columnstored index is suit for my case? I read Internet which said it suit for DW but I saw there is calculation included. Our case is just to join and extract. It's no need to calculate or no need group by. What I mean is our query is not like sql from BI. However our table has 8mil+ rows and about 350 columns. Is there any idea please? I still cannot create that index in my local computer. There is memory error. I cannot test it out. Is there any idea to resolve that memory issue?

    Thank you,

    Best Regards,

    What is the purpose of producing this "data"?  Is this for some kind of Data Warehouse?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply