Help with syntax for a loop

  • I have a table ('Documents') that stores the names and locations of documents, for example the location of property deeds and wills. There is some duplication of data on the table where the customers name, the description of the document and the storage reference are exactly the same on more than one record.

    The table structure is this:

    IDNAMEDescriptionStatusLocationStorageRef

    37448JONESWill In StorageCellar1234

    38724JONESWill In StorageCellar1234

    I want to write a query that is going to list all the duplicate records. I think I need a loop to compare the data in the columns but I've never written one in SQL before so I could with a few pointers please.

    I think I need to loop through each row on the Name column first, then the Description and then StorageRef. Then if the data in all three columns is the same, it should be listed on the results. I think I will need to declare variables for each of these three columns so @Name, @Description and @StorageRef. I think I know how to pass a value to a variable ('set @Name = (select Name from Documents)', but I don't know how to how to loop through the ID column so that I can compare the different values of the variables - I think I need a 'BEGIN' somewhere?

    Just to be clear, I'm not expecting or asking anyone to write it for me, but to explain the basic principles of how to write a loop; even if you just point me in the direction of a tutorial that will explain it.

    Yours hopefully

  • You absolutely do NOT need a loop for this. You need to group by the columns that can contain your duplicates.

    select Name, StorageRef, COUNT(*)

    from YourTable

    group by Name, StorageRef

    having COUNT(*) > 1

    There are VERY few times you need to loop or use cursors. SQL is horribly inefficient at looping.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Ooooh, that's brilliant! I'm great at writing loops at the best of times so this is fantastic.

    Thanks v much

  • sjerromeharris (7/31/2012)


    Ooooh, that's brilliant! I'm great at writing loops at the best of times so this is fantastic.

    Thanks v much

    You are welcome.

    Next time you think about writing a loop, instead create a new post on SSC and we can show you several ways to avoid it.

    Check out this article. The same logic applies as it does to cursors. http://www.sqlservercentral.com/articles/T-SQL/66097/[/url]

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Here's another way:

    -- ID of the first row of the dupeset is rn = 1, subsequent dupes have rn > 1

    SELECT

    ID,

    NAME,

    [Description],

    Location,

    StorageRef,

    rn = ROW_NUMBER() OVER (PARTITION BY NAME, [Description], Location, StorageRef ORDER BY ID)

    FROM ( -- a quick and convenient way to create a small sample data set

    SELECT ID = 37448, NAME = 'JONES', [Description] = 'Will', Location = 'In Storage Cellar', StorageRef = 1234 UNION ALL

    SELECT 38724, 'JONES', 'Will', 'In Storage Cellar', 1234

    ) s

    -- which can be written as a CTE like so:

    ;WITH CTEDeleter AS (

    SELECT

    ID,

    NAME,

    [Description],

    Location,

    StorageRef,

    rn = ROW_NUMBER() OVER (PARTITION BY NAME, [Description], Location, StorageRef ORDER BY ID)

    FROM MyTable s

    )

    DELETE FROM CTEDeleterWHERE rn > 1

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • I have been asked to provide another report very similar to the one above. Its on the same table so the structure is the same.

    Whereas the first report was to show records where the clients name and the storage ref were exactly the same and therefore duplicates, this scenario is where two different clients have been given the same storage ref:

    NAMESTATUSLOCATIONSTORAGEREF

    Jones in storagecellar1234

    Bloggsin storagesafe1234

    I'm sure the select statement would be very similar to the one below as I still need to count the number of occurrences of the storageref, but I don't know how to list each Name that any duplicate storagerefs relate to.

    select Name, StorageRef, COUNT(*)

    from YourTable

    group by Name, StorageRef

    having COUNT(*) > 1

    Thanks

  • sjerromeharris (8/16/2012)


    I have been asked to provide another report very similar to the one above. Its on the same table so the structure is the same.

    Whereas the first report was to show records where the clients name and the storage ref were exactly the same and therefore duplicates, this scenario is where two different clients have been given the same storage ref:

    NAMESTATUSLOCATIONSTORAGEREF

    Jones in storagecellar1234

    Bloggsin storagesafe1234

    I'm sure the select statement would be very similar to the one below as I still need to count the number of occurrences of the storageref, but I don't know how to list each Name that any duplicate storagerefs relate to.

    select Name, StorageRef, COUNT(*)

    from YourTable

    group by Name, StorageRef

    having COUNT(*) > 1

    Thanks

    Try this:

    WITH myCTE

    AS

    (

    select DISTINCT Name, StorageRef

    from YourTable

    )

    SELECT

    ct.storageref,

    [Names] = STUFF((SELECT ',' + ct1.[Name] FROM myCTE ct1 WHERE ct1.[StorageRef] = ct.[StorageRef] FOR XML PATH('')), 1, 1, ''),

    COUNT(*)

    FROM

    myCTE ct

    GROUP BY ct.StorageRef

    HAVING COUNT(*) > 1

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply