Table Performance

  • Angels!!!

    I have a table with more than 900,000 rows and would expand exponentially. The table definition is something like (Attribute_type, Attribute_ID, Attribute_data, Row_Status). Some types of Attribute_ID can have duplicates others cannot have duplicates. So basis the Attribute Type I need to check for duplicates before performing any inserts or updates. Because of the huge data in the table the performance is very bad.

    Could anyone suggest how I can enhance the performance.

    Thanks All in anticipation

    Raj

  • Raj

    Please will you post the table DDL so that we can see what constraints you have on the data (primary keys and so on).

    Thanks

    John

  • John,

    The Table is

    CREATE TABLE [CL_PRODUCT_ATTRIBUTE] (

    [PROD_ATTRIBUTE_ID] [bigint] IDENTITY (1, 1) NOT NULL ,

    [ITEM_ATTRIBUTE_ID] [bigint] NOT NULL ,

    [CUST_PRODUCT_ID] [bigint] NOT NULL ,

    [ATTRIBUTE_ID] [bigint] NOT NULL ,

    [DATA] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,

    [CREATED_BY] [CREATED_BY] NOT NULL ,

    [CREATED_ON] [CREATED_ON] NOT NULL ,

    [CREATED_IP_ADDRESS] [CREATED_IP_ADDRESS] NOT NULL ,

    [LAST_MODIFIED_BY] [LAST_MODIFIED_BY] NULL ,

    [LAST_MODIFIED_ON] [LAST_MODIFIED_ON] NULL ,

    [LAST_MODIFIED_IP_ADDRESS] [LAST_MODIFIED_IP_ADDRESS] NULL ,

    [ROW_STATUS] [int] NOT NULL

    ) ON [PRIMARY]

    We generally try to check duplicacy in DATA field based on Attribute_ID the Parent table of which has rules about the attribute.

    Thanks and regards

    Raj

  • Raj

    OK.  It looks as if your database design here is not ideal.  There is no primary key constraint on the table.  Is there a column (or combination of columns) that uniquely identifies each row in your table?  How are you checking for duplicates at the moment?  What I would suggest, given your current design, is to join to the parent table of Attrribute_ID to see whether the value needs to be unique.  If it doesn't, then you don't need to do your check.  You will probably need an index on the ATTRIBUTE_ID column for optimal performance.

    John

  • John,

    The only combination which is unique is CUST_PRODUCT_ID, ATTRIBUTE_ID and DATA

    Creating an Index(we cannot have a Unqiue Key) on ATTRIBUTE_ID didn't help us boost the performance. Neither did it creating a unqiue index on CUST_PRODUCT_ID, ATTRIBUTE_ID and DATA help in the performance.

    I converted the same table into Oracle and imported data into the table, there I'm getting optimum performance. Though I used a Materialized view in Oracle based on Attribute Types which ensured performance.

    Can it be possible to adopt a similar mechanism in MS SQL. My DB server is in a failover cluster mode (in SAN) and the server configuration is also pretty good.

    Thanks

    Rajesh

  • Raj

    I'm sorry, I don't know anything about Materialized views in Oracle so I can't advise you on that.

    If you had a unique constraint on CUST_PRODUCT_ID, ATTRIBUTE_ID and DATA then that would prevent any duplicate data going into the database.

    If you need advice on performance, please say exactly how you are checking for duplicates.

    A good place for you to start would be to study the execution plan for your insert and update queries and look out for table or clustered index scans.

    John

  • John,

    I do a select something like this

    select count(1)

    from CL_PRODUCT_ATTRIBUTE

    where ATRRIBUTE_ID = @attrib_id

    and data = @data;

    to check the duplicates. Having indexes on ATRRIBUTE_ID didn't help boost performance.

  • Raj

    So you're doing this once for every row you're inserting into the table?  Are you using a cursor or a WHILE loop to do that?  I think this is probably the reason for the poor performance.

    You said in your original post that some values of Attrib_ID are allowed duplicates.  How do you check for that when inserting?

    John

  • John,

    We are using this select statement each time we insert a attribute for which data duplicates are not allowed. We have a Attribute Definition table which has the rules set. Also we are not using any cursors we simply do count based on the Attribute and data feilds. If a row exists for that critera we throw an error.

    thanks and regards

    Rajesh

  • Raj

    So do you only ever insert one row at a time?  Is it during the insert itself that you experience the poor performance, or during the duplicate checking?

    John

  • Since you have no Primary Key, I have to agree... might be a problem with the table design a bit...

    Anyway... take the query you posted...

    select count(1)

    from CL_PRODUCT_ATTRIBUTE

    where ATRRIBUTE_ID = @attrib_id

    and data = @data;

    And run it through the index tuning wizard in Query Analyzer... 900k rows is not a big table and you should not be having any problems with it. 

    One of the big problems I see is that you are only checking for dupes on one row at a time... RBAR on steroids!!! ("RBAR" is pronounced "ree-bar" and stands for "Row By Agonzing Row").  I recomment you post the query that makes use of the above code so we might help you with the real problem.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • We are indeed checking for duplicates one row at a time though it is conditional. I'm using it at the time of Updates only. The actual code is called through a procedure the code of which is

    check if attribute can have duplicates. --- no performance problem

    if no then

    select @count = count(1)

    from CL_PRODUCT_ATTRIBUTE

    where ATRRIBUTE_ID = @attrib_id

    and data = @data; ---- definite performance problem

    if @count <=1 then

    update statement -- no performance problem

    else

    return error

    Updating the table doesn't kill the performance but only the select statement causes the kills the performance

  • The only reason why you think there is no performance problem on the UPDATE is because it is only updating 1 row at a time and you are comparing it's performance to the SELECT that does have a performance problem.

    And, I wasn't looking for psuedo code... I was looking the actual proc that does the update so I could show you what to do in your terms.

    Anyway, if you insist on using RBAR to do this, at least run the SELECT through the Index Tuning Wizard... it still won't be as fast as a set based solution but at least we can pull it off the floor...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • My PROC is...

    CREATE PROC USP_CL_ATTRIB_U

    (@PROD_ATTRIBUTE_IDBIGINT,

    @AttributeidSMALLINT,

    @CrnIDVARCHAR(40),

    @dataVARCHAR(250),

    @RetValTINYINT = 0 OUT)

    AS

    DECLARE

    --@CustProdIDBIGINT,

    @CountSMALLINT

    @isduplallowchar(1)

    BEGIN

    SET @RetVal = 0

    SET @isdupallow ='N'

    select @isdupallow = IS_DUP_ALLOW

    FROM SU_ATTRIBUTE_DEFINITION

    WHERE ATTRIBUTE_ID = @Attributeid

    IF @isdupallow = 'N'

    BEGIN

    SELECT

    @Count = COUNT(1)

    FROM

    CL_PRODUCT_ATTRIBUTE,

    WHERE

    CL_PRODUCT_ATTRIBUTE.ATTRIBUTE_ID= @AttributeidAND

    CL_PRODUCT_ATTRIBUTE.DATA = @data

    CL_PRODUCT_ATTRIBUTE.ROW_STATUS = 1

    END

    IF @Count > 1 and @isdupallow = 'N' 1

    BEGIN

    SET @RetVal = 1

    END

    ELSE

    UPDATE CL_PRODUCT_ATTRIBUTE

    SET DATA = @data

    WHERE PROD_ATTRIBUTE_ID = @PROD_ATTRIBUTE_ID

    SET @retval = 0

    END

    GO

  • While I'm looking at this, gotta ask... did you run the SELECT through the Index Tuning Wizard, yet?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply