Long, complicated problem (SQL Svr 7)

Question

Long, complicated problem (SQL Svr 7)

Nicholas Wang

Default port

Points: 1463
More actions
August 12, 2003 at 11:43 am

#106019

This is going to be long... so please bear with me.
Our sales team uses a CRM system that uses SQL Server 7 as backend. Customer purchase our "online products", and these products are created onto our website, which uses Oracle as backend.
Our DQ department now has to look at every single sale, check the data in the CRM System and compare it to the product created on our website, make sure that all the attributes match.
Now, I've been tasked to create a "process" to check everything automatically. Only those with problems (discrepancies between the 2 systems) should be returned for manual inspection.
Our website's product data are heavily guarded, I can only get a snapshot of a part of it 4 times a day. The data I got are the attributes of the products, I store them into a table in SQL Server 7.
Coming up - the script I've created (so far):

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply

Nicholas Wang Default port Points: 1463 More actions · Answer 1

 



/* This script search for all companies that have a new product pending Data Quality check/approval.

** Inside those companies, we want to compare each and every product between the CRM system (SQL Server 7),

** and the web data (loaded into a table in SQL Server 7 - *web products attributes table*).

** If discrepancies exist, insert that company's ID into a table called ErrorCompTable.

*/



DECLARE @comp_idint

DECLARE @web_row_idint

--DECLARE all web_product_attributes variables

DECLARE @tempvarchar(100)





DECLARE company_cursor CURSOR

FOR (SELECT company_id

FROM st_cases

WHERE status = 'New'

) -- this returns all the company_ids that have new products.







OPEN company_cursor

FETCH NEXT FROM company_cursor INTO @comp_id



WHILE @@FETCH_STATUS = 0

BEGIN



/******************************************************************************************************

**Inside the COMPANY_CURSOR, we declare and open another cursor called row_cursor,

**and loop through all the products inside company @comp_id.

******************************************************************************************************/



DECLARE row_cursor CURSOR

FOR (SELECT /*attributes*/

FROM /*web products attributes table*/

WHERE company_id = @comp_id

)



OPEN row_cursor

FETCH NEXT FROM row_cursor

INTO /*@web_row_id, @ web_products_attributes... etc*/



WHILE @@FETCH_STATUS = 0

BEGIN





/******************************************************************************************************

**Inside the ROW_CURSOR, we compare one product's attributes between the web's (@web_products_attributes)

**and CRM one by one. (Linked by CRM_row_id - @web_row_id.)  

        **      IF <> THEN we insert the company_id into table *ErrorCompTable* and BREAK from

**row_cursor & company_cursor (skip to next company).

******************************************************************************************************/



SET @temp = (SELECT CRM_product_attributes FROM CRM_product_table

WHERE crm_row_id = @web_row_id)

IF @temp <> CRM_product_attributes

BEGIN

  INSERT INTO ErrorCompTable @comp_id

  BREAK

END



/******************************************************************************************************

**REPEAT FOR PRODUCT'S ALL ATTRIBUTES ... ... ... etc etc...

******************************************************************************************************/



/******************************************************************************************************

**DONE!  After we've checked all the attributes for the previous row (@web_row_id),

**we FETCH NEXT web_row_id & attributes from the cursor into the variables, then go back

**in the WHILE loop.

******************************************************************************************************/

FETCH NEXT FROM row_cursor INTO /*@web_row_id, @ web_products_attributes... etc*/

END-- End row_cursor WHILE loop



/******************************************************************************************************

**If we BREAK out, we will be here, which closes the row_cursor and fetch the next company

**and start again.

******************************************************************************************************/

CLOSE row_cursor

DEALLOCATE row_cursor





FETCH NEXT FROM company_cursor INTO @comp_id



END-- End company_cursor WHILE loop



CLOSE company_cursor

DEALLOCATE company_cursor

Nicholas Wang Default port Points: 1463 More actions · Answer 2

OK, the problem is performance... the script takes more than 7 minutes to run in my testing environment! I obviously can't have that run as a job on the production server!

I think the biggest problem is that each them I loop through the cursor to check every single product's attributes, I have to find the attributes in multiple tables in the CRM system's SQL Server 7 DB.

SELECT crm_attribute1 FROM this_table

IF @web_attribute1 <> crm_attribute1

...

SELECT crm_attribute2 FROM this_table JOIN that_table

IF @web_attribute2 <> crm_attribute2

...

Now, if we were on SQL Svr 2000, then I think I should have created an Indexed View for all the CRM system's product attributes, right? That way, inside the cursor, it doesn't have to look for attributes by joining tables & stuff. The Index should help a lot too!

However, we're on SQL Server 7 and it doesn't support Indexed View. Is there ANY WAY I can make this process faster?

(Also, as a side question, since I'm inserting the company_ids into a table, should I enclose this scrpit in a transaction? If so, how? )

Thanks for any help!

timfoster Mr or Mrs. 500 Points: 566 More actions · Answer 3

I may have mis-read this but...

Can you not join the CRM and your snapshot tables selecting only Comp_id and use the where clause to exclude matched records.

Something like...

SELECT Comp_ID

FROM CRM c

JOIN Snapshot s

ON JOIN COLUMNS

WHERE c.f1 <> s.f1

Or c.f2 <> s.f2

Etc...

Obviously when you create your snapshot table you can apply indexes to it.

NPeeters SSChampion Points: 12652 More actions · Answer 4

If I understand you correctly, in the *web products attribute table* you have one row per product containing all the attributes.

In your CRM table, you have one row per product and per product attribute.

And it is hell to start comparing these :).

You could remove all of the cursors by comparing each attribute one at a time in a join.



SELECT web.company_id

FROM web

  LEFT OUTER JOIN

     (SELECT crm_row_id, attribute_value

      FROM crm

      WHERE attribute_id = <matches>) attr

  ON web.row_id = attr.crm_row_id

WHERE web.attribute_value <> attr.attribute_value

   OR attr.attribute_value = NULL

The attr subquery in the join, just selects the correct row from the crm table with the correct attribute to compare.

Repeat this for each attribute.

Nicholas Wang Default port Points: 1463 More actions · Answer 5

NPeeters is mostly right. The web table has 1 row per product, the crm table also has 1 row per product HOWEVER... there are many different categories of products. So, the table is like this:

CREATE TABLE crm_product_attributes

(

prod_category int NOT NULL,

crm_row_id int NOT NULL,

att_col_1 NULL,

att_col_2 NULL,

...

att_col_20 NULL

)

So, depending on the different categories, products can have different attributes. Also, customers can purchase "additional" attributes products to add to their original purchase! These "additional attributes" are add onto the "Related_crm_row_id" to combine into ONE row in the web table!!!!! (Ugly huh?!)

Some attributes on the web table like "sales_person_id", "company_name" are stored in different tables in the CRM db.

The only link between the CRM tables and the web table is the row_id (product level only!) I cannot link them with company_id.

Indexes were built on the web table and the CRM tables.

------------------------------------------------------------------------------

I think we haven't seen a working solution yet, right?

Now, a co-worker suggested that I get rid of the cursors because SQL Server sucks at it. (I'm relatively new to the DB Dev scene and have been reading SQL Server Central for about 1 year. I tend to think that SQL is fast and powerful, it's just us mortals writing bad codes on it that makes it slow. 🙂 Anyways, he suggested that in SQL Server, "Less is More"... I should try separating the 2 cursors into 2 stored procedures and write codes to loop through them.

I haven't started writing it that way yet, what do you all think?

Again, thanks for any input on this!!!

Calvin Lawson SSChampion Points: 11030 More actions · Answer 6

Breaking this into two procs may definitely help, but you really need to avoid nesting cursors. That's your performance problem, I'm sure.

Try to figure out how to do one of those cursors as a batch, then use a "While" loop for the other one. Or use "While" loops for both if you have to. Below's an example of a loop from an earlier discussion:

If object_id('tempdb..#table1') is not null Drop Table #table1

Create Table #table1 (Number int, Name_ varchar(50))

Insert #table1 Values (1, 'ee,ed,ff,er,fsd')

Insert #table1 Values (2, 'rr,kk')

If object_id('tempdb..#table2') is not null Drop Table #table2

Create Table #table2 (Number int, Name_ varchar(10))

While 1 = 1 --Always - Loop is controlled by "Break"

Begin

Insert Into #table2

Select t1.Number

,

Case

When Charindex(',', t1.Name_) > 0

Then Replace(Left(t1.Name_, (Charindex(',', t1.Name_))), ',', '')

Else t1.Name_

End

From #table1 t1

Where Len(Name_) <> 0

IF @@Rowcount = 0 Break --Once there are no more records to process break loop

Update #table1

Set Name_ =

Case When Charindex(',', Name_) > 0

Then Right(Name_, Len(Name_) - Charindex(',', Name_))

Else ''

End

From #table1

Where Len(Name_) <> 0

End

Select *

From #table2

Signature is NULL

P1Daniels SSC Journeyman Points: 86 More actions · Answer 7

First, I should say that I can't endorse the use of cursors, but that is really a 'do as I say and not as I do' sort of thing. Second, what is the total number of comparisons you have to make? Unless it is in the tens of thousands (at least), you shouldn't be seeing the performance problem.

Some things to check:

Mark your cursor as 'forward_only'. By default SQL 2000 will use a dynamic cursor, which is slower, unless you specify otherwise...Since you are using 7.0, I'm not sure what your cursor options are (I know 'Fast_Forward_Only' is not available) but you may want to look into that.

Many problems that appear to require a cursor, in fact, do not. I know that the challenge I set for my team is 'no cursors' in production'...Your friend gave you a good suggestion when he recommended SP's (using output parameters, I presume). However, case statements, cross-joins, self-joins, and temp tables can get you a long way. I had a bit of trouble following the description, but if you had a subset of some dummy data, I'd give it a go as a challenge.

Write a short script that creates the tabels and inserts some values to recreate the problem...I'm sure someone will pick it up and run with it.

ciao ciao

P1Daniels SSC Journeyman Points: 86 More actions · Answer 8

Are you saying that you are supposed check the CRM system to find out if the attributes of the product you sold match the attributes of the product you put into production? Shouldn't they do that BEFORE they put it into production?

Also, unless you are either comparing a huge number of variables or running against an awful big # of records, you shouldn't really have this problem. SQL 7 was not great with cursors, but it wasn't that bad either. I would check both the box this process is running on, the priority being given to SQL Server within the Windows OS, as well as how much memory is available to the process.

ciao ciao

Nicholas Wang Default port Points: 1463 More actions · Answer 9

Thursday, Aug. 14th, I was in the middle of rewriting the query as Calvin suggested when suddenly, VOOooooommmmm... ... the power went out! Now I moved to NYC in beginning of 2002 from Seattle, where I had experience a wide power outage in the winter of 92. It was actually quite fun, I was in high school, my friends came over and we tell ghost stories... sigh, those were the days. This time, I started walking at ~ 4:30pm, from 31st St & 9th Ave (West side) all the way across the Queensboro bridge (59th St and 1st Ave)... I was lucky enough to caught a bus after that, got home at ~8:30pm... completely exhausted in the heat! Man! What an experience.

So anyway, now I'm back, trying to figure this problem out.

P1Daniels, it's funny you mentioned. The company I work at is one of the biggest internet company in located in NYC, however, the sales process is done completely manually! Sales people call clients and get sales, enter sales info into the CRM system, then "Order Fulfillment" people fulfill these sales onto our WEB system. So as you can probably guess, there are tons of user errors in between! The scipt I'm developing here is meant to check all the new sales data that have recently been fulfilled, and notify my group to correct them if errors exist.

I'll look into the "forward_only" cursor option, however, I really want to try rewriting this whole thing without cursors, but I don't know how... can someone please give me some hints?

The general process should go like this, right?...

CREATE TABLE #Company_table (company_id INT)

INSERT INTO #Company_table

SELECT company_id

FROM CRM_system WHERE "new sales cases exist"

SELECT 1st company_id FROM #Company_table

WHILE --(don't know how to condition this loop)

BEGIN

(Call a stored procedure with company_id as input, the sproc should check all products' attributes and return value 1 or 0 depending if errors exist)

IF (error exist), INSERT company_id INTO error_table

SELECT next company_id?

END

Hmmm... I guess my biggest question is "How to condition the loop to loop through the company_ids". Should I add an identity column to the #company_table?

Thanks for letting me pick your brains 🙂 Much appreciated!! Thanks all!

NPeeters SSChampion Points: 12652 More actions · Answer 10

A different approach maybe?

Is it possible to build a table that links a 'web-product-attribute' to a 'crm-product-attribute' in some way? Is this some kind of general structure that is always applied, regardless of the product / customer who purchases the stuff?

If that's the case, it should be fairly straightforward to build this as a set-based solution.

Is it in some way possible to post some (simple) sample data? This makes the discussion a lot easier...

Calvin Lawson SSChampion Points: 11030 More actions · Answer 11

Man, that power outage was some crazy stuff. Only inevitable a complex self-referencing system, though (Chaos!). I heard people were being pretty cool about things, though; probably have a baby boom in another 9 months. 🙂

Npeeters has the right idea about how this should be done...in a linking table and using sets instead of loops..much much better peformance. However, as long as company_id is unique:

if object_ID('tempdb..#Company_table') is not null drop table #Company_table

CREATE TABLE #Company_table (company_id INT, error varchar(25))

if object_ID('tempdb..#Error_Table') is not null drop table #Error_Table

create table #Error_Table (company_id INT, error varchar(25))

INSERT INTO #Company_table Values (1, 'Error')

INSERT INTO #Company_table Values (2, 'Good')

INSERT INTO #Company_table Values (3, 'Error')

INSERT INTO #Company_table Values (4, 'Good')

INSERT INTO #Company_table Values (5, 'Good')

INSERT INTO #Company_table Values (6, 'Error')

declare @minID int,

@maxID int

select @minID = 0,

@maxID = max(company_id)

FROM #Company_table

While @minID < @maxID

begin

Select top 1 @minID = company_id

From#Company_table

Wherecompany_id > @minID

--This is where you'd do your error checking

Insert#Error_Table

--sp_CheckforError @minID

Select*

From#Company_table

Wherecompany_id = @minID and error = 'error'

end

Select * from #Error_Table

Signature is NULL

Nicholas Wang Default port Points: 1463 More actions · Answer 12

Yes! Yes! We're getting closer... Let me first try to make up some sample data:

if object_ID('tempdb..#CRM_table') is not null drop table #CRM_table

CREATE TABLE #CRM_table (company_id INT, row_id INT, prod_type varchar(50), number_of_ad INT, end_date datetime, related_row_id INT NULL)

if object_ID('tempdb..#WEB_table') is not null drop table #WEB_table

create table #WEB_table (company_id INT, row_id INT, prod_type varchar(50), number_of_ad INT, end_date datetime)

INSERT INTO #CRM_table Values (001, 5001, 'Single Advertiser', 5, '2003-12-31', NULL)

INSERT INTO #CRM_table Values (001, 5002, 'Additional Advertiser', 10, '2003-12-31', 5001)

INSERT INTO #CRM_table Values (002, 5003, 'Single Advertiser', 10, '2004-03-31', NULL)

INSERT INTO #CRM_table Values (003, 5004, 'Single Advertiser', 8, '2003-11-30', NULL)

INSERT INTO #WEB_table Values (001, 5001, 'Member Advertiser', 15, '2003-12-31')

INSERT INTO #WEB_table Values (002, 5003, 'Single Advertiser', 10, '2004-03-31')

INSERT INTO #WEB_table Values (003, 5004, 'Single Advertiser', 8, '2003-11-30')

/* There we go, this is a simplified version where CRM's product attributes are all in one table. I agree that changing to a set based solution will make this go much much faster! So even though I can't create an indexed view in SQL Svr 7, I think I'm still going to try to create a view in the CRM system to mirror what's in the WEB system, so that I can compare all the products at once in a set based solution.

However, as you can see in company 001, row 5001 and 5002. CRM system allows for customer to buy "additional" attributes, like 5002, these attributes are add on to the original product - row 5001. Only the total combined product is shown on the WEB system. This is the challenge that I'm facing when trying to create the view today.

I think once the view is created, I can do:

*/

if object_ID('tempdb..#Error_table') is not null drop table #Error_table

create table #Error_table (company_id INT)

declare @minID int,

@maxID int

select @minID = 0,

@maxID = max(company_id)

FROM #CRM_table

While @minID < @maxID

begin

Select top 1 @minID = company_id

From #CRM_table

Where company_id > @minID

--This is where you'd do your error checking

DECLARE @return_error INT

EXEC @return_error = usp_Check_CRM_WEB_errors @minID

IF @return_error > 0

Insert #Error_Table VALUES @minID

ELSE

PRINT 'Company ' + @minID + ' has no error.'

end

Select * from #Error_Table

Nicholas Wang Default port Points: 1463 More actions · Answer 13

The SPROC usp_Check_CRM_WEB_errors will be a set based solution, like this?

CREATE PROC usp_Check_CRM_WEB_errors

@comp_id

AS

SELECT crm.row_id

FROM #CRM_table crm

LEFT OUTER JOIN #WEB_table web

ON crm.row_id = web.row_id

WHERE crm.company_id <> web.company_id

OR crm.prod_type <> web.prod_typ

OR crm.number_of_ad <> web.number_of_ad

OR crm.end_date <> web.end_date

OR web.row_id IS NULL

IF @@ROWCOUNT = 0

RETURN(0)

ELSE

RETURN(1)

GO

Calvin Lawson SSChampion Points: 11030 More actions · Answer 14

Yeah, that looks great! Even with a un-indexed view it should run faster than row by row processing.

Is there any reason you need to check each company_id separately? If you could check them all at once then you could remove your last loop.

Signature is NULL