July 5, 2010 at 1:48 am
Guys hi, i need your help plz.
The sales department enters orders in the ERP system. Sometimes, two people may enter in the system the same order. My goal is to trace and identify the duplicate orders and alert the user entering the later order "that a similar order allready exists in the system".
The order contains a variety of data to be completed, but i beleive that the most important data that is to be used for the "checking algorithmin", are the following
Client's order number - Our Item Code - Item Quantity - Delivery Week (or date) AND/OR Ship Week (or Date).
Clients order number: Not always entered (depends if the customer provides us his order number)
Item Code: Always entered
Item Quantity: Always entered
Delivery Week (and or Date): May be entered in the system or may not, depends wether the Ship week is entered in the system
Ship Week (and or date): May be entered in the system or may not, depends wether the Delivery week is entered in the system
Sometimes both Delivery Week and Ship Week may be entered in the system.
The data above should be used (in my opinion) in order to produce some kind of algorithm. This algorithm should be compared against the rest of the orders, in order to find (if any) duplicate orders.
What algorithm should i make? i want a simple and fast algorithm, that will be created and work even if the Client's number and one of the Delivery or Ship week is not entered? Any Ideae? For instance should i just concatenate the string and convert them to ascii and use this as a checking algorithm? Will this be efficient or will produce faulty results?
Any suggestions, will be greatly appreciated plz.
July 5, 2010 at 5:57 am
July 5, 2010 at 6:14 am
Hi,
Here is two articles that will help you to find and delete duplicate records in a sql table.
How to delete duplicate records or rows among identical rows in a table where no primary key exists and T-SQL ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ...) to Delete Duplicate Rows in SQL Table
I had once started a merge system for defining a dynamic criteria to select duplicate rows and merge them. Actually it worked too. But since the requirements were after work hours, there was not enough time to process all records to compare with each other.
I had some thoughts for SQL2008 to find and merge duplicates but a solution for SQL2005 does not seem to be successfull if you work with big amount of data.
July 5, 2010 at 8:22 am
Dfalir
The order contains a variety of data to be completed, but i beleive that the most important data that is to be used for the "checking algorithmin", are the following
Client's order number - Our Item Code - Item Quantity - Delivery Week (or date) AND/OR Ship Week (or Date).
I would disagree with your assumption for example:
You could have customer 'A' order item 'B'. quantity of 1 and a delivery date of 'C' entered. Yet another customer 'Z' could also order item 'B', quantity of 1 and a delivery date of 'C' entered. This condition would raise your alert, and a false alert at that.
How do you:
1. Determine the address to which the item is to be delivered?
2. To whom is the invoice to be addressed?
3. Are customers assigned account numbers?
4. Other unique customer data in other tables?
In other words, what makes the 2 valid orders different?
Provide that data and someone will most likely give you a workable solution. For a tested solution, please provide the table definition(s), and some sample data as per the first link in my signature block and someone may then be able to assist you.
July 5, 2010 at 8:46 am
Ron hi, and thank you very much for your concern, it seems you did a quick and nice thinking!
I couldn't agree more, and in general you are right. However you should know that we are a flexible materials manufacturing company - meaning we built products specifically for each customer and these product can not be sold to another customer because they carry their brand name in the packaging. The packaging for Bic raisors (things you shave everymorning) for example, can not be sold to any other company other than Bic itself. I didn't want to bother you with details though.
Now Imagine that i want to trace if two people of our sales team have entered the same order twice, for the same customer, regarding the same quantity and regarding the same delivery date. The data i mentioned at my first post is the data that should be used to identify the duplication of order.
There are customers that dont provide us their order number, also they may order the exact same quantity of goods and at the exact same price! In this case the only thing that may distinguish the orders is the different delivery date. In the unlikely scenario that the customer happens to order the exact same product and quantity for the exact same delivery day (lets say they called a day after the initial order, and they requested to double the quantity and the sales person instead of updating the order makes a new order) i still want to warn our sales people to double check with the customer.
A customer order may be 2-3 tons of packaging so in this case, i wouldn't mind if they lost a few minutes double checking compared with the cost of producing double quantity of goods. You should be getting bored by now, i bet! 🙂
This is the kind of orders i want to "catch" - Not to delete just to warn the user.
But even though that my suggestion may be wrong and could be other important elements within the data of each order, this - if i am not mistaken is not the real problem-. My problem is, would for instance work if i concatenated all data and produce a hashbyte and compare this to each other order? is there something faster / and or more efficient?
The data we can always add, or delete from the algorithm..
I hope to get an answer, my email was very very long...sorry . 🙂
July 5, 2010 at 12:49 pm
Assuming I read and properly understood your explanation.
CREATE TABLE #T(ClientOrderNumber VARCHAR(10), ItemCode VARCHAR(10),QTY INT, DWeek DATETIME,SWeek DATETIME)
INSERT INTO #T
SELECT '7808','7765',100,NULL, GETDATE() + 10 UNION ALL
SELECT NULL,'123',1000,NULL, NULL UNION ALL
SELECT NULL,'7765',100,NULL, GETDATE() + 10 UNION ALL
SELECT NULL,'8910',5100,NULL, GETDATE() + 5
This may be the answer you are seeking.
;WITh
cteDupeItemCode AS
(SELECT ItemCode FROM #T GROUP BY ItemCode
HAVING COUNT(*) > 1)
SELECT source.ItemCode, source.Qty, DWeek,SWeek
FROM #T source
INNER JOIN cteDupeItemCode dupe --understand this is the key making it all work
ON source.ItemCode = dupe.ItemCode
ORDER BY source.Itemcode;
Result:
ItemCodeQtyDWeekSWeek
7765 100 NULL2010-07-15 14:36:57.030
7765 100 NULL2010-07-15 14:36:57.030
Or this may be what you require:
;with numbered as(SELECT rowno=row_number() over
(partition by ItemCode, qty order by ItemCode),ClientOrderNumber,ItemCode, QTY,Dweek,SWeek from #T)
select * from numbered WHERE rowno > 1
Result:
rownoClientOrderNumberItemCodeQTYDweekSWeek
27808 7765 100 NULL 2010-07-15 14:36:57.030
July 6, 2010 at 2:01 am
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply