December 21, 2006 at 1:07 am
Angels!!!
I have a table with more than 900,000 rows and would expand exponentially. The table definition is something like (Attribute_type, Attribute_ID, Attribute_data, Row_Status). Some types of Attribute_ID can have duplicates others cannot have duplicates. So basis the Attribute Type I need to check for duplicates before performing any inserts or updates. Because of the huge data in the table the performance is very bad.
Could anyone suggest how I can enhance the performance.
Thanks All in anticipation
Raj
December 21, 2006 at 1:53 am
Raj
Please will you post the table DDL so that we can see what constraints you have on the data (primary keys and so on).
Thanks
John
December 21, 2006 at 2:02 am
John,
The Table is
CREATE TABLE [CL_PRODUCT_ATTRIBUTE] (
[PROD_ATTRIBUTE_ID] [bigint] IDENTITY (1, 1) NOT NULL ,
[ITEM_ATTRIBUTE_ID] [bigint] NOT NULL ,
[CUST_PRODUCT_ID] [bigint] NOT NULL ,
[ATTRIBUTE_ID] [bigint] NOT NULL ,
[DATA] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[CREATED_BY] [CREATED_BY] NOT NULL ,
[CREATED_ON] [CREATED_ON] NOT NULL ,
[CREATED_IP_ADDRESS] [CREATED_IP_ADDRESS] NOT NULL ,
[LAST_MODIFIED_BY] [LAST_MODIFIED_BY] NULL ,
[LAST_MODIFIED_ON] [LAST_MODIFIED_ON] NULL ,
[LAST_MODIFIED_IP_ADDRESS] [LAST_MODIFIED_IP_ADDRESS] NULL ,
[ROW_STATUS] [int] NOT NULL
) ON [PRIMARY]
We generally try to check duplicacy in DATA field based on Attribute_ID the Parent table of which has rules about the attribute.
Thanks and regards
Raj
December 21, 2006 at 2:35 am
Raj
OK. It looks as if your database design here is not ideal. There is no primary key constraint on the table. Is there a column (or combination of columns) that uniquely identifies each row in your table? How are you checking for duplicates at the moment? What I would suggest, given your current design, is to join to the parent table of Attrribute_ID to see whether the value needs to be unique. If it doesn't, then you don't need to do your check. You will probably need an index on the ATTRIBUTE_ID column for optimal performance.
John
December 21, 2006 at 3:07 am
John,
The only combination which is unique is CUST_PRODUCT_ID, ATTRIBUTE_ID and DATA
Creating an Index(we cannot have a Unqiue Key) on ATTRIBUTE_ID didn't help us boost the performance. Neither did it creating a unqiue index on CUST_PRODUCT_ID, ATTRIBUTE_ID and DATA help in the performance.
I converted the same table into Oracle and imported data into the table, there I'm getting optimum performance. Though I used a Materialized view in Oracle based on Attribute Types which ensured performance.
Can it be possible to adopt a similar mechanism in MS SQL. My DB server is in a failover cluster mode (in SAN) and the server configuration is also pretty good.
Thanks
Rajesh
December 21, 2006 at 3:29 am
Raj
I'm sorry, I don't know anything about Materialized views in Oracle so I can't advise you on that.
If you had a unique constraint on CUST_PRODUCT_ID, ATTRIBUTE_ID and DATA then that would prevent any duplicate data going into the database.
If you need advice on performance, please say exactly how you are checking for duplicates.
A good place for you to start would be to study the execution plan for your insert and update queries and look out for table or clustered index scans.
John
December 21, 2006 at 4:17 am
Raj
So you're doing this once for every row you're inserting into the table? Are you using a cursor or a WHILE loop to do that? I think this is probably the reason for the poor performance.
You said in your original post that some values of Attrib_ID are allowed duplicates. How do you check for that when inserting?
John
December 21, 2006 at 4:51 am
John,
We are using this select statement each time we insert a attribute for which data duplicates are not allowed. We have a Attribute Definition table which has the rules set. Also we are not using any cursors we simply do count based on the Attribute and data feilds. If a row exists for that critera we throw an error.
thanks and regards
Rajesh
December 21, 2006 at 6:27 am
Raj
So do you only ever insert one row at a time? Is it during the insert itself that you experience the poor performance, or during the duplicate checking?
John
December 21, 2006 at 1:35 pm
Since you have no Primary Key, I have to agree... might be a problem with the table design a bit...
Anyway... take the query you posted...
select count(1)
from CL_PRODUCT_ATTRIBUTE
where ATRRIBUTE_ID = @attrib_id
and data = @data;
And run it through the index tuning wizard in Query Analyzer... 900k rows is not a big table and you should not be having any problems with it.
One of the big problems I see is that you are only checking for dupes on one row at a time... RBAR on steroids!!! ("RBAR" is pronounced "ree-bar" and stands for "Row By Agonzing Row"). I recomment you post the query that makes use of the above code so we might help you with the real problem.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 21, 2006 at 10:17 pm
We are indeed checking for duplicates one row at a time though it is conditional. I'm using it at the time of Updates only. The actual code is called through a procedure the code of which is
check if attribute can have duplicates. --- no performance problem
if no then
select @count = count(1)
from CL_PRODUCT_ATTRIBUTE
where ATRRIBUTE_ID = @attrib_id
and data = @data; ---- definite performance problem
if @count <=1 then
update statement -- no performance problem
else
return error
Updating the table doesn't kill the performance but only the select statement causes the kills the performance
December 21, 2006 at 10:28 pm
The only reason why you think there is no performance problem on the UPDATE is because it is only updating 1 row at a time and you are comparing it's performance to the SELECT that does have a performance problem.
And, I wasn't looking for psuedo code... I was looking the actual proc that does the update so I could show you what to do in your terms.
Anyway, if you insist on using RBAR to do this, at least run the SELECT through the Index Tuning Wizard... it still won't be as fast as a set based solution but at least we can pull it off the floor...
--Jeff Moden
Change is inevitable... Change for the better is not.
December 21, 2006 at 11:02 pm
My PROC is...
CREATE PROC USP_CL_ATTRIB_U
(@PROD_ATTRIBUTE_IDBIGINT,
@AttributeidSMALLINT,
@CrnIDVARCHAR(40),
@dataVARCHAR(250),
@RetValTINYINT = 0 OUT)
AS
DECLARE
--@CustProdIDBIGINT,
@CountSMALLINT
@isduplallowchar(1)
BEGIN
SET @RetVal = 0
SET @isdupallow ='N'
select @isdupallow = IS_DUP_ALLOW
FROM SU_ATTRIBUTE_DEFINITION
WHERE ATTRIBUTE_ID = @Attributeid
IF @isdupallow = 'N'
BEGIN
SELECT
@Count = COUNT(1)
FROM
CL_PRODUCT_ATTRIBUTE,
WHERE
CL_PRODUCT_ATTRIBUTE.ATTRIBUTE_ID= @AttributeidAND
CL_PRODUCT_ATTRIBUTE.DATA = @data
CL_PRODUCT_ATTRIBUTE.ROW_STATUS = 1
END
IF @Count > 1 and @isdupallow = 'N' 1
BEGIN
SET @RetVal = 1
END
ELSE
UPDATE CL_PRODUCT_ATTRIBUTE
SET DATA = @data
WHERE PROD_ATTRIBUTE_ID = @PROD_ATTRIBUTE_ID
SET @retval = 0
END
GO
December 21, 2006 at 11:09 pm
While I'm looking at this, gotta ask... did you run the SELECT through the Index Tuning Wizard, yet?
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 20 total)
You must be logged in to reply to this topic. Login to reply