July 31, 2012 at 8:37 am
I have a table ('Documents') that stores the names and locations of documents, for example the location of property deeds and wills. There is some duplication of data on the table where the customers name, the description of the document and the storage reference are exactly the same on more than one record.
The table structure is this:
IDNAMEDescriptionStatusLocationStorageRef
37448JONESWill In StorageCellar1234
38724JONESWill In StorageCellar1234
I want to write a query that is going to list all the duplicate records. I think I need a loop to compare the data in the columns but I've never written one in SQL before so I could with a few pointers please.
I think I need to loop through each row on the Name column first, then the Description and then StorageRef. Then if the data in all three columns is the same, it should be listed on the results. I think I will need to declare variables for each of these three columns so @Name, @Description and @StorageRef. I think I know how to pass a value to a variable ('set @Name = (select Name from Documents)', but I don't know how to how to loop through the ID column so that I can compare the different values of the variables - I think I need a 'BEGIN' somewhere?
Just to be clear, I'm not expecting or asking anyone to write it for me, but to explain the basic principles of how to write a loop; even if you just point me in the direction of a tutorial that will explain it.
Yours hopefully
July 31, 2012 at 8:58 am
You absolutely do NOT need a loop for this. You need to group by the columns that can contain your duplicates.
select Name, StorageRef, COUNT(*)
from YourTable
group by Name, StorageRef
having COUNT(*) > 1
There are VERY few times you need to loop or use cursors. SQL is horribly inefficient at looping.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
July 31, 2012 at 9:09 am
Ooooh, that's brilliant! I'm great at writing loops at the best of times so this is fantastic.
Thanks v much
July 31, 2012 at 9:12 am
sjerromeharris (7/31/2012)
Ooooh, that's brilliant! I'm great at writing loops at the best of times so this is fantastic.Thanks v much
You are welcome.
Next time you think about writing a loop, instead create a new post on SSC and we can show you several ways to avoid it.
Check out this article. The same logic applies as it does to cursors. http://www.sqlservercentral.com/articles/T-SQL/66097/[/url]
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
July 31, 2012 at 9:31 am
Here's another way:
-- ID of the first row of the dupeset is rn = 1, subsequent dupes have rn > 1
SELECT
ID,
NAME,
[Description],
Location,
StorageRef,
rn = ROW_NUMBER() OVER (PARTITION BY NAME, [Description], Location, StorageRef ORDER BY ID)
FROM ( -- a quick and convenient way to create a small sample data set
SELECT ID = 37448, NAME = 'JONES', [Description] = 'Will', Location = 'In Storage Cellar', StorageRef = 1234 UNION ALL
SELECT 38724, 'JONES', 'Will', 'In Storage Cellar', 1234
) s
-- which can be written as a CTE like so:
;WITH CTEDeleter AS (
SELECT
ID,
NAME,
[Description],
Location,
StorageRef,
rn = ROW_NUMBER() OVER (PARTITION BY NAME, [Description], Location, StorageRef ORDER BY ID)
FROM MyTable s
)
DELETE FROM CTEDeleterWHERE rn > 1
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
August 16, 2012 at 8:36 am
I have been asked to provide another report very similar to the one above. Its on the same table so the structure is the same.
Whereas the first report was to show records where the clients name and the storage ref were exactly the same and therefore duplicates, this scenario is where two different clients have been given the same storage ref:
NAMESTATUSLOCATIONSTORAGEREF
Jones in storagecellar1234
Bloggsin storagesafe1234
I'm sure the select statement would be very similar to the one below as I still need to count the number of occurrences of the storageref, but I don't know how to list each Name that any duplicate storagerefs relate to.
select Name, StorageRef, COUNT(*)
from YourTable
group by Name, StorageRef
having COUNT(*) > 1
Thanks
August 16, 2012 at 8:53 am
sjerromeharris (8/16/2012)
I have been asked to provide another report very similar to the one above. Its on the same table so the structure is the same.Whereas the first report was to show records where the clients name and the storage ref were exactly the same and therefore duplicates, this scenario is where two different clients have been given the same storage ref:
NAMESTATUSLOCATIONSTORAGEREF
Jones in storagecellar1234
Bloggsin storagesafe1234
I'm sure the select statement would be very similar to the one below as I still need to count the number of occurrences of the storageref, but I don't know how to list each Name that any duplicate storagerefs relate to.
select Name, StorageRef, COUNT(*)
from YourTable
group by Name, StorageRef
having COUNT(*) > 1
Thanks
Try this:
WITH myCTE
AS
(
select DISTINCT Name, StorageRef
from YourTable
)
SELECT
ct.storageref,
[Names] = STUFF((SELECT ',' + ct1.[Name] FROM myCTE ct1 WHERE ct1.[StorageRef] = ct.[StorageRef] FOR XML PATH('')), 1, 1, ''),
COUNT(*)
FROM
myCTE ct
GROUP BY ct.StorageRef
HAVING COUNT(*) > 1
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply