Newbie Question

  • Based on the sample data:

    insert into users values ('john','doe','test@test.com','N') -- ID 1

    insert into users values ('john','doe','test@test.com','N') -- ID 2

    insert into users values ('jack','doe','test@jack.com','N') -- ID 3

    insert into users values ('jack','doe','test@doe.com','N') -- ID 4

    Whoa....

    I see the diff as I was setting up the above. ID 1 and 2 are John Doe at test.com where as ID 3 and 4 aren't the same; ID 3 is at jack.com and ID 4 is at doe.com. However, this does not fit your description as both ID 3 and ID 4 are the same name.

    John, does that answer your question?

  • Well, not really. I see what you are saying, but I'm still not getting what makes a user ID valid. By the OP's definition,

    A valid ID would be a user that exists in the user table but is not able to login and get to the financial reviewer queue...

    .

    I think we're missing a piece of the puzzle here. Just from looking at the data, how do we tell which user IDs are valid and which are not? With both UserIDs 1 & 2 being the same person, what makes #1 the valid ID? Is is just because it comes first?

    On another line of thought, I think we can get you the query to identify and correct these issues, but it sounds to me like you should investigate the root cause of why this is happening, correct it, and put a constraint in the database to prevent this from happening again.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • I agree. I have a feeling we are only getting a piece of the puzzle, not the whole puzzle.

  • The problem had to do with a bad implementation. That root cause was identified and resolved. There is a lot of data pointing to old users that no longer use that ID and now a bunch of financial reports are pointing to users that cannot access them.

    So in theory it is very easy to sit down at a console and update every row with an update statement for the 136 affected users for each column that they may be in, however I want to write a cursor or a script to handle this kind of issue in the future.

    I don't have enough practice with writing sql scripts and comparing Data outside of the very basics.

    I have two rows for John Doe, they are the same person, the only thing that is different are the ID's. For John Doe he is listed 2x with ID 1 and 2. ID 1 is invalid so for every column that 1 shows up in the financial_report table I want to replace it with the valid user ID, number 2.

    I just figured that if I wrote a SQL script to grab the ID's search in the financial_report table where the ID is in one of the 5 columns I could therefore save my time and in the future when this happens have a script happy or a foundation to replicate this across similar issues.

  • Okay, two questions. Regarding John Doe (ID 1 and ID 2), how do we know ID 2 is correct, because the ID number is higher? Second, in the sample data you have two Jack Doe's, how do we know that those two are not duplicates, and if so which one is correct?

  • itsmeman (1/7/2010)


    I have two rows for John Doe, they are the same person, the only thing that is different are the ID's. For John Doe he is listed 2x with ID 1 and 2. ID 1 is invalid so for every column that 1 shows up in the financial_report table I want to replace it with the valid user ID, number 2.

    Great, I think we understand the intent of the script. What makes ID 1 the invalid ID? By the way, your example data shows that ID 2 is the invalid one and you replace it with ID 1.

    Just for the record, I don't think this is a difficult SQL script create at all. The challenge is in getting the data to tell the story. If we can't look at the data and figure out which IDs are good and which are bad, we can't write a script that corrects the problem.

    So back to the question, what makes an ID valid and what makes an ID invalid?

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • After some thought... I been thinking about that myself but haven't been able to figure out how to identify myself...

    It's best to say we want the lowest ID... So if there is a 1 and a 2 with the same first and last name we want the number 1 to update all the values where 2 exists in the database in one of those 5 columns.

    The first user ID has been active for sometime, some how a user was generated later down the road and is now being assigned to reports when they cannot even login because there is no userID assigned to them.

    From what I have seen it is always the first ID that is kept, the 2nd ID with the higher value of the duplicates is inactivated.

  • Great. We can go with using the first (lowest ID) as the valid ID. Now, one more thing that needs cleared up and I think Lynn touched on this already. Your sample data shows that you are correcting the entries for John Doe (IDs 1 & 2) but why aren't you fixing the entries for Jack Doe?

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • The idea is to originally grab all duplicates in the users table and then scan through the financial_Reports and replace the value of the identified duplicate users highest ID with the lowest ID. You would fix both records. Not just John Doe but all identified problem rows in users.

  • I think you are missing the point of our questions. In your sample data, there are two duplicate users based solely on first and last name; John Doe and Jack Doe. In your expected results the only record corrected was the one for John Doe. What about Jack Doe in that sample data? Why weren't those id's corrected?

  • Lynn, I think he is saying that both should have been corrected. I think this was an oversight in the example data.

    Anyhow, here's a start. This will identify users w/ multile IDs, find the valid one using MIN, identify the invalid ones, and update the financialreviewer2 column. This works based on my understanding of the problem and at this point, is limited to updating only one column. If this work like the OP suggests, we can either create multiple updates (one for each column) or create a way to update them all at once.

    DECLARE @users TABLE (

    id int IDENTITY (1, 1) NOT NULL ,

    firstname varchar (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    lastname varchar (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    email varchar (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    isinactive varchar (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL

    )

    insert into @users values ('john','doe','test@test.com','N')

    insert into @users values ('john','doe','test@test.com','N')

    insert into @users values ('jack','doe','test@jack.com','N')

    insert into @users values ('jack','doe','test@doe.com','N')

    DECLARE @financial_reports TABLE (

    ID int IDENTITY (1, 1) NOT NULL ,

    financialreviewer1 int NULL ,

    financialreviewer2 int NULL ,

    financialreviewer3 int NULL ,

    signingfinancialprovider1 int NULL ,

    signingfinancialprovider2 int NULL ,

    signingfinancialprovider3 int NULL

    )

    insert into @financial_reports values ('1','2','','','','')

    insert into @financial_reports values ('3','4','','','','')

    SELECT * FROM @financial_reports

    UPDATE f

    SETf.FinancialReviewer2 = u.ValidID

    FROM @financial_reports f

    INNER JOIN (

    --==== build out a table w/ invalid-to-valid mappings

    SELECT ID as InvalidID,

    u2.ValidID

    FROM @Users u

    INNER JOIN (

    --==== identify users w/ multiple iDs

    --==== and find the valid one (MIN)

    SELECT FirstName,

    LastName,

    MIN(ID) as ValidID

    FROM @Users

    GROUP BY FirstName, LastName

    ) u2

    ON u.FirstName = u2.FirstName

    AND u.LastName = u2.LastName

    AND u.ID <> u2.ValidID

    ) u ON f.financialReviewer2 = u.InValidID

    SELECT * FROM @financial_reports

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • That is a great example for how to accomplish something I described without a cursor.

    Here we write cursors because it is pre-determined and believed that it keeps better data integrity with row by row checking.

    I come from an application support background and I have built a couple web apps before, but nothing sophisticated so my knowledge is basic at best about SQL, I can write basic queries but not think of them like how you did. I would have never thought to write an update statement the way you did John. That is a very good example and will help me accomplish what I need to accomplish.

    I very much appreciate your patients and example. Thank you.

    John do you know of any any SQL exercises to get some practice in, I often find technical books are very straight forward with examples but no questions and answers kind of a thing.

    I really liked this article: http://www.sqlservercentral.com/articles/SQL+Puzzles/2973/

    It took me longer then 5 minutes to write out my own code and it runs pretty fast but it was nothing like the examples on that URL.

    DECLARE @count int

    DECLARE @dividebythree int

    DECLARE @dividebyfive int

    DECLARE @bizz varchar(4)

    DECLARE @buzz varchar(4)

    DECLARE @bizzbuzz varchar(9)

    set @bizz = 'bizz'

    set @buzz = 'buzz'

    set @bizzbuzz = 'bizzbuzz'

    set @count = 0

    while (@count < 100)

    BEGIN

    set @count = @count + 1

    select @dividebythree = @count % 3

    select @dividebyfive = @count % 5

    if @count % 3 = 0 and @count % 5 = 0

    print @bizzbuzz

    else

    if @dividebythree = 0

    print @bizz

    else

    if @dividebyfive = 0

    print @buzz

    else

    print @count

    END

  • itsmeman (1/8/2010)


    That is a great example for how to accomplish something I described without a cursor.

    Here we write cursors because it is pre-determined and believed that it keeps better data integrity with row by row checking.

    This last statement is just simply not true. If this was the case, why do all of the leaders in the SQL Server world write non-cursor based solutions? That is a belief that needs to change if you want get a decent level of performance out of your SQL Server applications. A guy/gal that was proficient in writing SET based SQL solutions could come into your shop and WOW everyone at the performance differences between the cursors already programmed into your systems and their SET based alternatives. It can be a night/day different in many cases.

    I come from an application support background and I have built a couple web apps before, but nothing sophisticated so my knowledge is basic at best about SQL, I can write basic queries but not think of them like how you did. I would have never thought to write an update statement the way you did John. That is a very good example and will help me accomplish what I need to accomplish.

    I very much appreciate your patients and example. Thank you.

    John do you know of any any SQL exercises to get some practice in, I often find technical books are very straight forward with examples but no questions and answers kind of a thing.

    A great start would be to keep yourself into the forums here at SSC. Lookup and read articles on SET based coding, look at articles posted by the forum regulars like Jeff Moden, Lynn Pettis, Grant Fritchey, Gail Shaw, RBarryYoung, and many others.

    Getting into SET based SQL coding is a mindset thing. You have to get away from thinking about what you need to do to a Row of data and instead think about what you need to do to a Column of data. You perform actions against SETs of data, now against a single row. So in everything you do, you are building and comparing dataSETs.

    The solution that I gave you was quite simple to write. You'll notice that I included some comments. From those comments, you can see what I'm doing. All I've really done here is to build out a dataset that represents the Invalid-to-Valid ID mappings. That dataset is joined back into the target table for the update. Because I used an INNER JOIN against the target table, all of the invalid IDs from the financialreviewer2 column are updated at once.....not row-by-row.

    So how can you get enough practice and exposure to SET based programming to change how you think about approaching SQL coding? Use the forums here at SSC to learn! Take time each day to go through some of the questions posted in the TSQL forum, look at the questions posted there and think about how you would solve them, mock up a quick example (or better yet, hope that the poster included example data and DDL in the post) and give it a shot on your local test system. Keep in mind this doesn't mean you need to start answering threads! Just sit back, watch, and learn from those who do answer them. Try to write out your own SET based solution to the problem and then compare that to the posts from others who are answering the questions. At first, you may be spending more time analyzing others' SET based solutions to see how they work, but after a short while, you'll understand what they are doing and begin to be able to produce your own solutions similar to the UPDATE statement I gave you.

    By the way, what I've just described is exactly how I learned to code SET based SQL solutions. I cut my teeth on database programming in a launguage/database platform that was very row centric so I had to learn to think in SETs and I found it the complete opposite from what I was used to. I'm now, and have been for a long time, to the point to where it's automatic and I don't even consider loops, cursors, or row based procedural coding. Keep in mind, I'm not patting my own back here; I'm point out that practicing SET based coding can produce good coding habits that lead to a change of mindset. Once you're mindset is right, you'll be a cut above the run-of-the-mill SQL Server developer.

    I really liked this article: http://www.sqlservercentral.com/articles/SQL+Puzzles/2973/

    It took me longer then 5 minutes to write out my own code and it runs pretty fast but it was nothing like the examples on that URL.

    ....you'll notice that one of the requirements of that SQL puzzle was "no cursors". :Whistling:

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • I will put your advice to good work. It is very good.

    Thank you for the help. I appreciate it.

  • Another thing I didn't mention is just ask questions. Don't be afraid to ask questions 'cuz that's how you'll learn. If you have cursors that you just don't think can be written as SET based, create a quality post (using the tips in the article) and post it as a question. You'll get great advice and help for many of the regulars here.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

Viewing 15 posts - 16 through 29 (of 29 total)

You must be logged in to reply to this topic. Login to reply