ROW_NUMBER to distinguish between identical records

  • Hi all,

    One of my clients has a setup in which they have projects at multiple schools. Each project is classified by a domain. In some cases, a school could have multiple projects for the same domain.

    In the database, these projects are all represented by a unique key (IDENTITY column), but I need some way of giving the user a more logical way of differentiating between the projects. Initially, I had done this by using ROW_NUMBER in the base query, but I realised recently that this would give erroneous results if the number of rows returned differed based off the query parameters. For example, if a school had three projects for the same domain, and one employee was assigned to projects P1 and P2, while the other to P1 and P3, using a simple row numbering would result in both employees having the projects assigned the row numbers 1 and 2.

    I've come up with the following query to solve the problem. Just wanted feedback on A) whether there are any flaws in my logic, and B) whether there is a better way of doing what I'm trying to do.

    CREATE TABLE #Project

    (

    ID INT IDENTITY PRIMARY KEY,

    DomainID INT,

    SchoolID INT,

    Name VARCHAR(500)

    )

    CREATE TABLE #Domain

    (

    ID INT IDENTITY PRIMARY KEY,

    Name VARCHAR(500)

    )

    CREATE TABLE #School

    (

    ID INT IDENTITY PRIMARY KEY,

    Name VARCHAR(500)

    )

    INSERT INTO #Project (DomainID, SchoolID, Name) VALUES (1, 1, 'P1D1S1'), (1, 2, 'P2D1S2'), (2, 2, 'P3D2S2'), (1, 1, 'P4D1S1'), (1, 1, 'P5D1S1')

    INSERT INTO #Domain (Name) VALUES ('D1'), ('D2')

    INSERT INTO #School (Name) VALUES ('S1'), ('S2')

    SELECT

    p1.ID,

    #Domain.Name + ' - ' + #School.Name + ' - ' + CAST(rowNum AS VARCHAR) AS ProjectWithSchoolAndDomain

    FROM #Project p1

    JOIN #Domain ON DomainID = #Domain.ID

    JOIN #School ON SchoolID = #School.ID

    CROSS APPLY

    (

    SELECT

    p2.ID,

    ROW_NUMBER() OVER (PARTITION BY p2.DomainID, p2.SchoolID ORDER BY p2.ID) AS rowNum

    FROM #Project p2

    WHEREp2.DomainID = p1.DomainID

    AND p2.SchoolID = p1.SchoolID

    ) o

    WHERE p1.ID = o.ID

    DROP TABLE #Project

    DROP TABLE #Domain

    DROP TABLE #School

  • First, the Project table should be keyed on ( DomainID, SchoolID, ID ), if, as your example query implies, Domain is the more dominant "parent" relation. Reverse the first two if School is the more dominant "parent" relation. But, in any case, not on just ID.

    ( ID INT IDENTITY, DomainID INT, SchoolID INT, Name VARCHAR(500), CONSTRAINT Project__PK PRIMARY KEY ( DomainID, SchoolID, ID ) )

    Second, yes, the query can be touched up as well. That's much more trivial, and I'll get back to that later if no one else beats me to it.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • You're filtering your inner query by the outer query values. That will affect the results of the inner query and thus the row numbers. To ensure consistent row numbers, do not use any dependencies external to the query where the row numbers are generated.

    WITH cte AS

    ( SELECT p2.ID,

    rowNum = ROW_NUMBER() OVER (PARTITION BY p2.DomainID,

    p2.SchoolID

    ORDER BY p2.ID

    )

    FROM #Project AS p2

    )

    SELECT p1.ID,

    ProjectWithSchoolAndDomain = #Domain.Name + ' - ' + #School.Name + ' - ' + CAST(p2.rowNum AS VARCHAR)

    FROM #Project AS p1

    JOIN #Domain

    ON p1.DomainID = #Domain.ID

    JOIN #School

    ON p1.SchoolID = #School.ID

    JOIN cte AS p2

    ON p1.ID = p2.ID;

    Update: On further inspection, it looks like your query will work as written because your row_number partitioning and ordering are self-contained in the inner query and your filtering is on the same columns as your partitioning.

    I would still be hesitant to use the reference to an external resource in general because if the rows are eliminated before row_numbers are assigned, you'll get different results. Although it works, you'll need to be careful if you modify it in the future or re-use the strategy elsewhere.

    Wes
    (A solid design is always preferable to a creative workaround)

  • kramaswamy (11/20/2016)


    Hi all,

    One of my clients has a setup in which they have projects at multiple schools. Each project is classified by a domain. In some cases, a school could have multiple projects for the same domain.

    In the database, these projects are all represented by a unique key (IDENTITY column), but I need some way of giving the user a more logical way of differentiating between the projects. Initially, I had done this by using ROW_NUMBER in the base query, but I realised recently that this would give erroneous results if the number of rows returned differed based off the query parameters. For example, if a school had three projects for the same domain, and one employee was assigned to projects P1 and P2, while the other to P1 and P3, using a simple row numbering would result in both employees having the projects assigned the row numbers 1 and 2.

    I've come up with the following query to solve the problem. Just wanted feedback on A) whether there are any flaws in my logic, and B) whether there is a better way of doing what I'm trying to do.

    CREATE TABLE #Project

    (

    ID INT IDENTITY PRIMARY KEY,

    DomainID INT,

    SchoolID INT,

    Name VARCHAR(500)

    )

    CREATE TABLE #Domain

    (

    ID INT IDENTITY PRIMARY KEY,

    Name VARCHAR(500)

    )

    CREATE TABLE #School

    (

    ID INT IDENTITY PRIMARY KEY,

    Name VARCHAR(500)

    )

    INSERT INTO #Project (DomainID, SchoolID, Name) VALUES (1, 1, 'P1D1S1'), (1, 2, 'P2D1S2'), (2, 2, 'P3D2S2'), (1, 1, 'P4D1S1'), (1, 1, 'P5D1S1')

    INSERT INTO #Domain (Name) VALUES ('D1'), ('D2')

    INSERT INTO #School (Name) VALUES ('S1'), ('S2')

    SELECT

    p1.ID,

    #Domain.Name + ' - ' + #School.Name + ' - ' + CAST(rowNum AS VARCHAR) AS ProjectWithSchoolAndDomain

    FROM #Project p1

    JOIN #Domain ON DomainID = #Domain.ID

    JOIN #School ON SchoolID = #School.ID

    CROSS APPLY

    (

    SELECT

    p2.ID,

    ROW_NUMBER() OVER (PARTITION BY p2.DomainID, p2.SchoolID ORDER BY p2.ID) AS rowNum

    FROM #Project p2

    WHEREp2.DomainID = p1.DomainID

    AND p2.SchoolID = p1.SchoolID

    ) o

    WHERE p1.ID = o.ID

    DROP TABLE #Project

    DROP TABLE #Domain

    DROP TABLE #School

    I have to wonder what could possibly be more valuable than the users having the project names on their list? You can easily use row numbers to provide a numbered list for each user, but provide them with the Project ID as well as the name in addition to that row number, and you in theory don't have any further identification issues. Is there something I'm missing here?

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply