August 20, 2010 at 6:12 pm
Hello,
Any help or advice is really appreciated
I have the following table
Table A
vendorname
walmart
walmartinc
walmartinc
cubfoods
cubfoods
cubfood
asper
apser
I need the view the count of the duplicated vendors. if I perform the following
select vendorname,count(*) as duplicates from vendors
group by vendorname having count(*) >1
the following o/p is displayed
Vendorname duplicates
walmartinc 2
cubfoods 2
As we can see that (walmart, walmart inc),(asper,apser) are the same vendors; but because of the extra character or number its been considered unique; and because of which I am unable to specify the correct number of unique and duplicate vendors in the vendor table of nearly 15000 records.
please advice me how can i avoid this problem; is there a way to compare two columns based on the first 7 characters.
Thank you
August 21, 2010 at 2:01 am
Hi T2512
I have done a small script that will help you get started. This script deletes the duplicate records in one go. Just a word of caution, please make sure that there are no transactions referring these venders who are deleted
-If you have not placed the appropriate FK’s, data will be deleted and you will have orphaned detail transactions
-If your application is spanned over many databases and the vender tables PK is referred in different DB’s please make sure that you do the required verifications before deleting the vender data.
-
-
-CREATE TABLE Vendar ( IID int identity(1,1) , Name varchar(100) )
-CREATE TABLE #Vendartemp ( IID int ,Name Varchar(20))
-
-
-INSERT INTO Vendar(NAME) VALUES ( 'WALMART')
-INSERT INTO Vendar(NAME) VALUES ( 'WALMART INC')
-INSERT INTO Vendar(NAME) VALUES ( 'WALMART INC')
-INSERT INTO Vendar(NAME) VALUES ( 'apser INC')
-INSERT INTO Vendar(NAME) VALUES ( 'apser')
-INSERT INTO Vendar(NAME) VALUES ( 'cubfoods')
-INSERT INTO Vendar(NAME) VALUES ( 'apser')
-INSERT INTO Vendar(NAME) VALUES ( 'WALMART')
-
-
-Select * from Vendar
-
-
---identifies the duplicate records
-INSERT INTO #Vendartemp ( Name ,IID)
-Select A.Name , A.IID
-from Vendar A
-join Vendar B ON A.name = B.name
-where A.Name in
-(
-Select Name Name
-From Vendar
-Group by Name
-having COUNT(1) > 1
-)
-and A.IID > B.IID
-
-SELECT * FROM #Vendartemp
-
-DELETE FROM VENDAR
-where VENDAR.IID IN
-(
-Select VENDAR.IID
-FROM #Vendartemp B
-Where VENDAR.Name = B.Name AND VENDAR.IID < B.IID
-)
-Select * from Vendar
You can use the substring function to identify the first 7 characters in a column
Hope this helps
August 21, 2010 at 5:23 am
Using the first 7 characters will not identify asper as the same vendor as apser. Building a CASE statement in a CTE allows you to pre-map the variant spellings of vendor names to a single name eg vendor names appearing as asper or apser are mapped to asper. You can then run your query from the CTE to give you the desired output.
IF NOT OBJECT_ID('tempdb.dbo.#Vendors', 'U') IS NULL DROP TABLE #Vendors
SELECT 'walmart' AS vendorname INTO #Vendors UNION ALL
SELECT 'walmartinc' UNION ALL
SELECT 'walmartinc' UNION ALL
SELECT 'cubfoods' UNION ALL
SELECT 'cubfoods' UNION ALL
SELECT 'cubfood' UNION ALL
SELECT 'asper' UNION ALL
SELECT 'apser'
;WITH cte AS
(
SELECT CASE
WHEN vendorname IN ('walmart', 'walmartinc') THEN 'walmart'
WHEN vendorname IN ('cubfoods', 'cubfood') THEN 'cubfoods'
WHEN vendorname IN ('asper', 'apser') THEN 'asper'
ELSE vendorname END AS vendorname
FROM #Vendors
)
SELECT vendorname, COUNT(*) AS duplicates
FROM cte
GROUP BY vendorname
HAVING COUNT(*) > 1
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply