Table Performance

Question

Post reply

Table Performance

Rajesh Majji

SSC Veteran

Points: 254
More actions
December 21, 2006 at 1:07 am

#368583

Angels!!!
I have a table with more than 900,000 rows and would expand exponentially. The table definition is something like (Attribute_type, Attribute_ID, Attribute_data, Row_Status). Some types of Attribute_ID can have duplicates others cannot have duplicates. So basis the Attribute Type I need to check for duplicates before performing any inserts or updates. Because of the huge data in the table the performance is very bad.
Could anyone suggest how I can enhance the performance.
Thanks All in anticipation
Raj

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 1

Raj

Please will you post the table DDL so that we can see what constraints you have on the data (primary keys and so on).

Thanks

John

Rajesh Majji SSC Veteran Points: 254 More actions · Answer 2

John,

The Table is

CREATE TABLE [CL_PRODUCT_ATTRIBUTE] (

[PROD_ATTRIBUTE_ID] [bigint] IDENTITY (1, 1) NOT NULL ,

[ITEM_ATTRIBUTE_ID] [bigint] NOT NULL ,

[CUST_PRODUCT_ID] [bigint] NOT NULL ,

[ATTRIBUTE_ID] [bigint] NOT NULL ,

[DATA] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,

[CREATED_BY] [CREATED_BY] NOT NULL ,

[CREATED_ON] [CREATED_ON] NOT NULL ,

[CREATED_IP_ADDRESS] [CREATED_IP_ADDRESS] NOT NULL ,

[LAST_MODIFIED_BY] [LAST_MODIFIED_BY] NULL ,

[LAST_MODIFIED_ON] [LAST_MODIFIED_ON] NULL ,

[LAST_MODIFIED_IP_ADDRESS] [LAST_MODIFIED_IP_ADDRESS] NULL ,

[ROW_STATUS] [int] NOT NULL

) ON [PRIMARY]

We generally try to check duplicacy in DATA field based on Attribute_ID the Parent table of which has rules about the attribute.

Thanks and regards

Raj

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 3

Raj

OK. It looks as if your database design here is not ideal. There is no primary key constraint on the table. Is there a column (or combination of columns) that uniquely identifies each row in your table? How are you checking for duplicates at the moment? What I would suggest, given your current design, is to join to the parent table of Attrribute_ID to see whether the value needs to be unique. If it doesn't, then you don't need to do your check. You will probably need an index on the ATTRIBUTE_ID column for optimal performance.

John

Rajesh Majji SSC Veteran Points: 254 More actions · Answer 4

John,

The only combination which is unique is CUST_PRODUCT_ID, ATTRIBUTE_ID and DATA

Creating an Index(we cannot have a Unqiue Key) on ATTRIBUTE_ID didn't help us boost the performance. Neither did it creating a unqiue index on CUST_PRODUCT_ID, ATTRIBUTE_ID and DATA help in the performance.

I converted the same table into Oracle and imported data into the table, there I'm getting optimum performance. Though I used a Materialized view in Oracle based on Attribute Types which ensured performance.

Can it be possible to adopt a similar mechanism in MS SQL. My DB server is in a failover cluster mode (in SAN) and the server configuration is also pretty good.

Thanks

Rajesh

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 5

Raj

I'm sorry, I don't know anything about Materialized views in Oracle so I can't advise you on that.

If you had a unique constraint on CUST_PRODUCT_ID, ATTRIBUTE_ID and DATA then that would prevent any duplicate data going into the database.

If you need advice on performance, please say exactly how you are checking for duplicates.

A good place for you to start would be to study the execution plan for your insert and update queries and look out for table or clustered index scans.

John

Rajesh Majji SSC Veteran Points: 254 More actions · Answer 6

John,

I do a select something like this

select count(1)

from CL_PRODUCT_ATTRIBUTE

where ATRRIBUTE_ID = @attrib_id

and data = @data;

to check the duplicates. Having indexes on ATRRIBUTE_ID didn't help boost performance.

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 7

Raj

So you're doing this once for every row you're inserting into the table? Are you using a cursor or a WHILE loop to do that? I think this is probably the reason for the poor performance.

You said in your original post that some values of Attrib_ID are allowed duplicates. How do you check for that when inserting?

John

Rajesh Majji SSC Veteran Points: 254 More actions · Answer 8

John,

We are using this select statement each time we insert a attribute for which data duplicates are not allowed. We have a Attribute Definition table which has the rules set. Also we are not using any cursors we simply do count based on the Attribute and data feilds. If a row exists for that critera we throw an error.

thanks and regards

Rajesh

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 9

Raj

So do you only ever insert one row at a time? Is it during the insert itself that you experience the poor performance, or during the duplicate checking?

John

Jeff Moden SSC Guru Points: 1004318 More actions · Answer 10

Since you have no Primary Key, I have to agree... might be a problem with the table design a bit...

Anyway... take the query you posted...

select count(1)

from CL_PRODUCT_ATTRIBUTE

where ATRRIBUTE_ID = @attrib_id

and data = @data;

And run it through the index tuning wizard in Query Analyzer... 900k rows is not a big table and you should not be having any problems with it.

One of the big problems I see is that you are only checking for dupes on one row at a time... RBAR on steroids!!! ("RBAR" is pronounced "ree-bar" and stands for "Row By Agonzing Row"). I recomment you post the query that makes use of the above code so we might help you with the real problem.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Rajesh Majji SSC Veteran Points: 254 More actions · Answer 11

We are indeed checking for duplicates one row at a time though it is conditional. I'm using it at the time of Updates only. The actual code is called through a procedure the code of which is

check if attribute can have duplicates. --- no performance problem

if no then

select @count = count(1)

from CL_PRODUCT_ATTRIBUTE

where ATRRIBUTE_ID = @attrib_id

and data = @data; ---- definite performance problem

if @count <=1 then

update statement -- no performance problem

else

return error

Updating the table doesn't kill the performance but only the select statement causes the kills the performance

Jeff Moden SSC Guru Points: 1004318 More actions · Answer 12

The only reason why you think there is no performance problem on the UPDATE is because it is only updating 1 row at a time and you are comparing it's performance to the SELECT that does have a performance problem.

And, I wasn't looking for psuedo code... I was looking the actual proc that does the update so I could show you what to do in your terms.

Anyway, if you insist on using RBAR to do this, at least run the SELECT through the Index Tuning Wizard... it still won't be as fast as a set based solution but at least we can pull it off the floor...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Rajesh Majji SSC Veteran Points: 254 More actions · Answer 13

My PROC is...

CREATE PROC USP_CL_ATTRIB_U

(@PROD_ATTRIBUTE_IDBIGINT,

@AttributeidSMALLINT,

@CrnIDVARCHAR(40),

@dataVARCHAR(250),

@RetValTINYINT = 0 OUT)

AS

DECLARE

--@CustProdIDBIGINT,

@CountSMALLINT

@isduplallowchar(1)

BEGIN

SET @RetVal = 0

SET @isdupallow ='N'

select @isdupallow = IS_DUP_ALLOW

FROM SU_ATTRIBUTE_DEFINITION

WHERE ATTRIBUTE_ID = @Attributeid

IF @isdupallow = 'N'

BEGIN

SELECT

@Count = COUNT(1)

FROM

CL_PRODUCT_ATTRIBUTE,

WHERE

CL_PRODUCT_ATTRIBUTE.ATTRIBUTE_ID= @AttributeidAND

CL_PRODUCT_ATTRIBUTE.DATA = @data

CL_PRODUCT_ATTRIBUTE.ROW_STATUS = 1

END

IF @Count > 1 and @isdupallow = 'N' 1

BEGIN

SET @RetVal = 1

END

ELSE

UPDATE CL_PRODUCT_ATTRIBUTE

SET DATA = @data

WHERE PROD_ATTRIBUTE_ID = @PROD_ATTRIBUTE_ID

SET @retval = 0

END

GO

Jeff Moden SSC Guru Points: 1004318 More actions · Answer 14

While I'm looking at this, gotta ask... did you run the SELECT through the Index Tuning Wizard, yet?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)