August 12, 2003 at 11:43 am
This is going to be long... so please bear with me.
Our sales team uses a CRM system that uses SQL Server 7 as backend. Customer purchase our "online products", and these products are created onto our website, which uses Oracle as backend.
Our DQ department now has to look at every single sale, check the data in the CRM System and compare it to the product created on our website, make sure that all the attributes match.
Now, I've been tasked to create a "process" to check everything automatically. Only those with problems (discrepancies between the 2 systems) should be returned for manual inspection.
Our website's product data are heavily guarded, I can only get a snapshot of a part of it 4 times a day. The data I got are the attributes of the products, I store them into a table in SQL Server 7.
Coming up - the script I've created (so far):
August 12, 2003 at 12:20 pm
/* This script search for all companies that have a new product pending Data Quality check/approval.
** Inside those companies, we want to compare each and every product between the CRM system (SQL Server 7),
** and the web data (loaded into a table in SQL Server 7 - *web products attributes table*).
** If discrepancies exist, insert that company's ID into a table called ErrorCompTable.
*/
DECLARE @comp_idint
DECLARE @web_row_idint
--DECLARE all web_product_attributes variables
DECLARE @tempvarchar(100)
DECLARE company_cursor CURSOR
FOR (SELECT company_id
FROM st_cases
WHERE status = 'New'
) -- this returns all the company_ids that have new products.
OPEN company_cursor
FETCH NEXT FROM company_cursor INTO @comp_id
WHILE @@FETCH_STATUS = 0
BEGIN
/******************************************************************************************************
**Inside the COMPANY_CURSOR, we declare and open another cursor called row_cursor,
**and loop through all the products inside company @comp_id.
******************************************************************************************************/
DECLARE row_cursor CURSOR
FOR (SELECT /*attributes*/
FROM /*web products attributes table*/
WHERE company_id = @comp_id
)
OPEN row_cursor
FETCH NEXT FROM row_cursor
INTO /*@web_row_id, @ web_products_attributes... etc*/
WHILE @@FETCH_STATUS = 0
BEGIN
/******************************************************************************************************
**Inside the ROW_CURSOR, we compare one product's attributes between the web's (@web_products_attributes)
**and CRM one by one. (Linked by CRM_row_id - @web_row_id.)
** IF <> THEN we insert the company_id into table *ErrorCompTable* and BREAK from
**row_cursor & company_cursor (skip to next company).
******************************************************************************************************/
SET @temp = (SELECT CRM_product_attributes FROM CRM_product_table
WHERE crm_row_id = @web_row_id)
IF @temp <> CRM_product_attributes
BEGIN
INSERT INTO ErrorCompTable @comp_id
BREAK
END
/******************************************************************************************************
**REPEAT FOR PRODUCT'S ALL ATTRIBUTES ... ... ... etc etc...
******************************************************************************************************/
/******************************************************************************************************
**DONE! After we've checked all the attributes for the previous row (@web_row_id),
**we FETCH NEXT web_row_id & attributes from the cursor into the variables, then go back
**in the WHILE loop.
******************************************************************************************************/
FETCH NEXT FROM row_cursor INTO /*@web_row_id, @ web_products_attributes... etc*/
END-- End row_cursor WHILE loop
/******************************************************************************************************
**If we BREAK out, we will be here, which closes the row_cursor and fetch the next company
**and start again.
******************************************************************************************************/
CLOSE row_cursor
DEALLOCATE row_cursor
FETCH NEXT FROM company_cursor INTO @comp_id
END-- End company_cursor WHILE loop
CLOSE company_cursor
DEALLOCATE company_cursor
August 12, 2003 at 12:34 pm
OK, the problem is performance... the script takes more than 7 minutes to run in my testing environment! I obviously can't have that run as a job on the production server!
I think the biggest problem is that each them I loop through the cursor to check every single product's attributes, I have to find the attributes in multiple tables in the CRM system's SQL Server 7 DB.
SELECT crm_attribute1 FROM this_table
IF @web_attribute1 <> crm_attribute1
...
SELECT crm_attribute2 FROM this_table JOIN that_table
IF @web_attribute2 <> crm_attribute2
...
Now, if we were on SQL Svr 2000, then I think I should have created an Indexed View for all the CRM system's product attributes, right? That way, inside the cursor, it doesn't have to look for attributes by joining tables & stuff. The Index should help a lot too!
However, we're on SQL Server 7 and it doesn't support Indexed View. Is there ANY WAY I can make this process faster?
(Also, as a side question, since I'm inserting the company_ids into a table, should I enclose this scrpit in a transaction? If so, how? )
Thanks for any help!
August 13, 2003 at 1:14 am
I may have mis-read this but...
Can you not join the CRM and your snapshot tables selecting only Comp_id and use the where clause to exclude matched records.
Something like...
SELECT Comp_ID
FROM CRM c
JOIN Snapshot s
ON JOIN COLUMNS
WHERE c.f1 <> s.f1
Or c.f2 <> s.f2
Etc...
Obviously when you create your snapshot table you can apply indexes to it.
August 13, 2003 at 1:31 am
If I understand you correctly, in the *web products attribute table* you have one row per product containing all the attributes.
In your CRM table, you have one row per product and per product attribute.
And it is hell to start comparing these :).
You could remove all of the cursors by comparing each attribute one at a time in a join.
SELECT web.company_id
FROM web
LEFT OUTER JOIN
(SELECT crm_row_id, attribute_value
FROM crm
WHERE attribute_id = <matches>) attr
ON web.row_id = attr.crm_row_id
WHERE web.attribute_value <> attr.attribute_value
OR attr.attribute_value = NULL
The attr subquery in the join, just selects the correct row from the crm table with the correct attribute to compare.
Repeat this for each attribute.
August 13, 2003 at 9:12 am
NPeeters is mostly right. The web table has 1 row per product, the crm table also has 1 row per product HOWEVER... there are many different categories of products. So, the table is like this:
CREATE TABLE crm_product_attributes
(
prod_category int NOT NULL,
crm_row_id int NOT NULL,
att_col_1 NULL,
att_col_2 NULL,
...
...
att_col_20 NULL
)
So, depending on the different categories, products can have different attributes. Also, customers can purchase "additional" attributes products to add to their original purchase! These "additional attributes" are add onto the "Related_crm_row_id" to combine into ONE row in the web table!!!!! (Ugly huh?!)
Some attributes on the web table like "sales_person_id", "company_name" are stored in different tables in the CRM db.
The only link between the CRM tables and the web table is the row_id (product level only!) I cannot link them with company_id.
Indexes were built on the web table and the CRM tables.
------------------------------------------------------------------------------
I think we haven't seen a working solution yet, right?
Now, a co-worker suggested that I get rid of the cursors because SQL Server sucks at it. (I'm relatively new to the DB Dev scene and have been reading SQL Server Central for about 1 year. I tend to think that SQL is fast and powerful, it's just us mortals writing bad codes on it that makes it slow. 🙂 Anyways, he suggested that in SQL Server, "Less is More"... I should try separating the 2 cursors into 2 stored procedures and write codes to loop through them.
I haven't started writing it that way yet, what do you all think?
Again, thanks for any input on this!!!
August 13, 2003 at 12:38 pm
Breaking this into two procs may definitely help, but you really need to avoid nesting cursors. That's your performance problem, I'm sure.
Try to figure out how to do one of those cursors as a batch, then use a "While" loop for the other one. Or use "While" loops for both if you have to. Below's an example of a loop from an earlier discussion:
If object_id('tempdb..#table1') is not null Drop Table #table1
Create Table #table1 (Number int, Name_ varchar(50))
Insert #table1 Values (1, 'ee,ed,ff,er,fsd')
Insert #table1 Values (2, 'rr,kk')
If object_id('tempdb..#table2') is not null Drop Table #table2
Create Table #table2 (Number int, Name_ varchar(10))
While 1 = 1 --Always - Loop is controlled by "Break"
Begin
Insert Into #table2
Select t1.Number
,
Case
When Charindex(',', t1.Name_) > 0
Then Replace(Left(t1.Name_, (Charindex(',', t1.Name_))), ',', '')
Else t1.Name_
End
From #table1 t1
Where Len(Name_) <> 0
IF @@Rowcount = 0 Break --Once there are no more records to process break loop
Update #table1
Set Name_ =
Case When Charindex(',', Name_) > 0
Then Right(Name_, Len(Name_) - Charindex(',', Name_))
Else ''
End
From #table1
Where Len(Name_) <> 0
End
Select *
From #table2
Signature is NULL
August 16, 2003 at 4:38 pm
First, I should say that I can't endorse the use of cursors, but that is really a 'do as I say and not as I do' sort of thing. Second, what is the total number of comparisons you have to make? Unless it is in the tens of thousands (at least), you shouldn't be seeing the performance problem.
Some things to check:
Mark your cursor as 'forward_only'. By default SQL 2000 will use a dynamic cursor, which is slower, unless you specify otherwise...Since you are using 7.0, I'm not sure what your cursor options are (I know 'Fast_Forward_Only' is not available) but you may want to look into that.
Many problems that appear to require a cursor, in fact, do not. I know that the challenge I set for my team is 'no cursors' in production'...Your friend gave you a good suggestion when he recommended SP's (using output parameters, I presume). However, case statements, cross-joins, self-joins, and temp tables can get you a long way. I had a bit of trouble following the description, but if you had a subset of some dummy data, I'd give it a go as a challenge.
Write a short script that creates the tabels and inserts some values to recreate the problem...I'm sure someone will pick it up and run with it.
ciao ciao
ciao ciao
August 16, 2003 at 4:48 pm
Are you saying that you are supposed check the CRM system to find out if the attributes of the product you sold match the attributes of the product you put into production? Shouldn't they do that BEFORE they put it into production?
Also, unless you are either comparing a huge number of variables or running against an awful big # of records, you shouldn't really have this problem. SQL 7 was not great with cursors, but it wasn't that bad either. I would check both the box this process is running on, the priority being given to SQL Server within the Windows OS, as well as how much memory is available to the process.
ciao ciao
ciao ciao
August 17, 2003 at 5:12 pm
Thursday, Aug. 14th, I was in the middle of rewriting the query as Calvin suggested when suddenly, VOOooooommmmm... ... the power went out! Now I moved to NYC in beginning of 2002 from Seattle, where I had experience a wide power outage in the winter of 92. It was actually quite fun, I was in high school, my friends came over and we tell ghost stories... sigh, those were the days. This time, I started walking at ~ 4:30pm, from 31st St & 9th Ave (West side) all the way across the Queensboro bridge (59th St and 1st Ave)... I was lucky enough to caught a bus after that, got home at ~8:30pm... completely exhausted in the heat! Man! What an experience.
So anyway, now I'm back, trying to figure this problem out.
P1Daniels, it's funny you mentioned. The company I work at is one of the biggest internet company in located in NYC, however, the sales process is done completely manually! Sales people call clients and get sales, enter sales info into the CRM system, then "Order Fulfillment" people fulfill these sales onto our WEB system. So as you can probably guess, there are tons of user errors in between! The scipt I'm developing here is meant to check all the new sales data that have recently been fulfilled, and notify my group to correct them if errors exist.
I'll look into the "forward_only" cursor option, however, I really want to try rewriting this whole thing without cursors, but I don't know how... can someone please give me some hints?
The general process should go like this, right?...
CREATE TABLE #Company_table (company_id INT)
INSERT INTO #Company_table
SELECT company_id
FROM CRM_system WHERE "new sales cases exist"
SELECT 1st company_id FROM #Company_table
WHILE --(don't know how to condition this loop)
BEGIN
(Call a stored procedure with company_id as input, the sproc should check all products' attributes and return value 1 or 0 depending if errors exist)
IF (error exist), INSERT company_id INTO error_table
SELECT next company_id?
END
Hmmm... I guess my biggest question is "How to condition the loop to loop through the company_ids". Should I add an identity column to the #company_table?
Thanks for letting me pick your brains 🙂 Much appreciated!! Thanks all!
August 18, 2003 at 2:53 am
A different approach maybe?
Is it possible to build a table that links a 'web-product-attribute' to a 'crm-product-attribute' in some way? Is this some kind of general structure that is always applied, regardless of the product / customer who purchases the stuff?
If that's the case, it should be fairly straightforward to build this as a set-based solution.
Is it in some way possible to post some (simple) sample data? This makes the discussion a lot easier...
August 18, 2003 at 12:29 pm
Man, that power outage was some crazy stuff. Only inevitable a complex self-referencing system, though (Chaos!). I heard people were being pretty cool about things, though; probably have a baby boom in another 9 months. 🙂
Npeeters has the right idea about how this should be done...in a linking table and using sets instead of loops..much much better peformance. However, as long as company_id is unique:
if object_ID('tempdb..#Company_table') is not null drop table #Company_table
CREATE TABLE #Company_table (company_id INT, error varchar(25))
if object_ID('tempdb..#Error_Table') is not null drop table #Error_Table
create table #Error_Table (company_id INT, error varchar(25))
INSERT INTO #Company_table Values (1, 'Error')
INSERT INTO #Company_table Values (2, 'Good')
INSERT INTO #Company_table Values (3, 'Error')
INSERT INTO #Company_table Values (4, 'Good')
INSERT INTO #Company_table Values (5, 'Good')
INSERT INTO #Company_table Values (6, 'Error')
declare @minID int,
@maxID int
select @minID = 0,
@maxID = max(company_id)
FROM #Company_table
While @minID < @maxID
begin
Select top 1 @minID = company_id
From#Company_table
Wherecompany_id > @minID
--This is where you'd do your error checking
Insert#Error_Table
--sp_CheckforError @minID
Select*
From#Company_table
Wherecompany_id = @minID and error = 'error'
end
Select * from #Error_Table
Signature is NULL
August 18, 2003 at 3:04 pm
Yes! Yes! We're getting closer... Let me first try to make up some sample data:
if object_ID('tempdb..#CRM_table') is not null drop table #CRM_table
CREATE TABLE #CRM_table (company_id INT, row_id INT, prod_type varchar(50), number_of_ad INT, end_date datetime, related_row_id INT NULL)
if object_ID('tempdb..#WEB_table') is not null drop table #WEB_table
create table #WEB_table (company_id INT, row_id INT, prod_type varchar(50), number_of_ad INT, end_date datetime)
INSERT INTO #CRM_table Values (001, 5001, 'Single Advertiser', 5, '2003-12-31', NULL)
INSERT INTO #CRM_table Values (001, 5002, 'Additional Advertiser', 10, '2003-12-31', 5001)
INSERT INTO #CRM_table Values (002, 5003, 'Single Advertiser', 10, '2004-03-31', NULL)
INSERT INTO #CRM_table Values (003, 5004, 'Single Advertiser', 8, '2003-11-30', NULL)
INSERT INTO #WEB_table Values (001, 5001, 'Member Advertiser', 15, '2003-12-31')
INSERT INTO #WEB_table Values (002, 5003, 'Single Advertiser', 10, '2004-03-31')
INSERT INTO #WEB_table Values (003, 5004, 'Single Advertiser', 8, '2003-11-30')
/* There we go, this is a simplified version where CRM's product attributes are all in one table. I agree that changing to a set based solution will make this go much much faster! So even though I can't create an indexed view in SQL Svr 7, I think I'm still going to try to create a view in the CRM system to mirror what's in the WEB system, so that I can compare all the products at once in a set based solution.
However, as you can see in company 001, row 5001 and 5002. CRM system allows for customer to buy "additional" attributes, like 5002, these attributes are add on to the original product - row 5001. Only the total combined product is shown on the WEB system. This is the challenge that I'm facing when trying to create the view today.
I think once the view is created, I can do:
*/
if object_ID('tempdb..#Error_table') is not null drop table #Error_table
create table #Error_table (company_id INT)
declare @minID int,
@maxID int
select @minID = 0,
@maxID = max(company_id)
FROM #CRM_table
While @minID < @maxID
begin
Select top 1 @minID = company_id
From #CRM_table
Where company_id > @minID
--This is where you'd do your error checking
DECLARE @return_error INT
EXEC @return_error = usp_Check_CRM_WEB_errors @minID
IF @return_error > 0
Insert #Error_Table VALUES @minID
ELSE
PRINT 'Company ' + @minID + ' has no error.'
end
Select * from #Error_Table
August 18, 2003 at 3:12 pm
The SPROC usp_Check_CRM_WEB_errors will be a set based solution, like this?
CREATE PROC usp_Check_CRM_WEB_errors
@comp_id
AS
SELECT crm.row_id
FROM #CRM_table crm
LEFT OUTER JOIN #WEB_table web
ON crm.row_id = web.row_id
WHERE crm.company_id <> web.company_id
OR crm.prod_type <> web.prod_typ
OR crm.number_of_ad <> web.number_of_ad
OR crm.end_date <> web.end_date
OR web.row_id IS NULL
IF @@ROWCOUNT = 0
RETURN(0)
ELSE
RETURN(1)
GO
August 18, 2003 at 3:34 pm
Yeah, that looks great! Even with a un-indexed view it should run faster than row by row processing.
Is there any reason you need to check each company_id separately? If you could check them all at once then you could remove your last loop.
Signature is NULL
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply