January 7, 2009 at 11:08 pm
Me too. Fair enough.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
January 8, 2009 at 2:46 am
If one were to pose a question along the lines of the teacher's lesson from John's post people would still try to 'do the math, draw the picture and write the poem'.
The point would be lost on certain people as it is already, they still would learn nothing.
It is clear that a number of posters did not understand the question/article and jumped to a conclusion. How do you educate these people?
Should they be ignored or told 'no you are wrong, please read the problem as posted fully and understand it before posting next time'?
--Shaun
Hiding under a desk from SSIS Implemenation Work :crazy:
January 8, 2009 at 5:52 am
I agree they need to be told and I very much appreciate you going to bat on this subject... but folks, especially folks like myself, can and should do it without any caustic edge whatsoever. It's sometimes a hard thing to do especially when you see it day in and day out, both here and at work, but John is correct. If folks like me are gonna do things, we need to do them right. I've seen what happens on other forums when even tiny flames become the rule rather than the exception. I should have left it at "go back and read" instead of rubbing it in later.
Barry was actually NOT out of line... I was the only one that used any inflammatory words or innuendo. He was absolutely correct that it would make for an interesting study.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 8, 2009 at 6:16 am
In the spur of the moment, we all occasionally forget that people who post articles, or contribute to forums, are our guests, and therefore deserve courtesy. At the same time, most of us enjoy the spontaneity of the replies, though it can mean the occasional flashes of exasperation, even when directed at us! It is just a matter of getting the balance right. I suspect it is better to err on the side of politeness. It is very easy to unintentionally hurt someone's feelings, and I'd hate to discourage contributions.
Best wishes,
Phil Factor
January 8, 2009 at 6:48 am
Well said. All o' youse guys. 🙂
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
October 30, 2009 at 3:27 am
Phil Factor (8/5/2008)
Surely this is the simplest solution? (be warned of a nasty catch when creating these 'quirky updates'. the order of execution of the update is -variable assignments first then column assignments-both left to right!)
Very nice solution.
I Just prefer little change, it's more readable
[font="Courier New"]--create the sample table
DECLARE @Sample TABLE (TheName VARCHAR(4),Identifier INT, sex VARCHAR(1))
INSERT INTO @Sample (TheName,Identifier,Sex)
SELECT 'ABC', 24, 'M' UNION ALL
SELECT 'ABC', 24,'M' UNION ALL
SELECT 'LMN', 27, 'M' UNION ALL
SELECT 'LMN', 27, 'M' UNION ALL
SELECT 'LMN', 27, 'M' UNION ALL
SELECT 'PQRS', 25, 'F' UNION ALL
SELECT 'XYZ', 24, 'M' UNION ALL
SELECT 'XYZ', 25, 'M'
DECLARE @hash VARCHAR(80), @sex VARCHAR(1)
UPDATE @sample
SET
@sex=CASE WHEN COALESCE(@hash,'')
<>TheName+CONVERT(VARCHAR(5),Identifier)+sex
THEN 'd' ELSE SEX END,
@hash= TheName+CONVERT(VARCHAR(5),Identifier)+sex,
Sex = @sex
DELETE FROM @sample WHERE sex='d'
SELECT * FROM @sample[/font]
October 30, 2009 at 4:27 am
I haven't read through all 12 pages of post admittedly but don't think this has been suggested...
The rules said no new columns but didn't say anything about altering existing columns, should you not have one with a datatype long enough to accept the following. Or of course you could concatenate it onto the int columns assuming it would not exceed int - the existing datatypes would drive whether you needed to alter the table. So worst case scenario:
DROP TABLE Play
CREATE TABLE Play(TheName VARCHAR(4),Identifier INT, sex VARCHAR(1))
INSERT INTO Play (TheName,Identifier,Sex)
SELECT 'ABC', 24, 'M' UNION ALL
SELECT 'ABC', 24,'M' UNION ALL
SELECT 'LMN', 27, 'M' UNION ALL
SELECT 'LMN', 27, 'M' UNION ALL
SELECT 'LMN', 27, 'M' UNION ALL
SELECT 'PQRS', 25, 'F' UNION ALL
SELECT 'XYZ', 24, 'M' UNION ALL
SELECT 'XYZ', 25, 'M'
ALTER TABLE Play
ALTER COLUMN Sex varchar(20)
UPDATE Play
SET Sex = Sex + convert(char,abs(checksum(newid())))
DELETE a
FROM Play a
WHERE substring(Sex,2,19) =
(SELECT min(substring(Sex,2,19))
FROM Play b
WHERE a.TheName = b.TheName
AND a.Identifier = b.Identifier
AND left(a.sex,1) = left(b.sex,1))
UPDATE Play
SET Sex = left(Sex,1)
ALTER TABLE Play
ALTER COLUMN Sex varchar(1)
October 30, 2009 at 5:25 am
Good brain training but are the restrictions as mentioned realistic? Which company would definitely not want to use a temp table or table variable?
October 30, 2009 at 6:51 am
-- Create table
CREATE TABLE dbo.xSource (Name VARCHAR(50), Age TINYINT, Sex CHAR(1))
INSERT INTO xSource VALUES ('Stephen', 43, 'M')
INSERT INTO xSource VALUES ('Stephen', 43, 'M')
INSERT INTO xSource VALUES ('Stephen', 43, 'M')
INSERT INTO xSource VALUES ('Aron', 18, 'M')
INSERT INTO xSource VALUES ('Sharon', 38, 'F')
INSERT INTO xSource VALUES ('Sharon', 38, 'F')
INSERT INTO xSource VALUES ('Sharon', 38, 'F')
INSERT INTO xSource VALUES ('Sharon', 38, 'F')
INSERT INTO xSource VALUES ('Ira', 24, 'F')
INSERT INTO xSource VALUES ('Joe', 49, 'M')
INSERT INTO xSource VALUES ('Joe', 49, 'M')
INSERT INTO xSource VALUES ('John', 30, 'F')
INSERT INTO xSource VALUES ('John', 30, 'F')
INSERT INTO xSource VALUES ('Eva', 30, 'F')
INSERT INTO xSource VALUES ('GB', 44, 'M')
INSERT INTO xSource VALUES ('GB', 44, 'M')
SELECT Name, Age, Sex FROM xSource ORDER BY Name, Sex, Age
-- Step 1 Get list of only DUPLICATE rows
INSERT INTO xSource (Name, Age, Sex)
SELECT '~'+Name, Age, Sex FROM xSource GROUP BY Name, Age, Sex Having COUNT(*) > 1
-- Part 2 Delete DUPLICATE rows
DELETE xSource
FROM xSource DLT
INNER JOIN (SELECT SUBSTRING(Name, 2, 49) AS Name, Age, Sex FROM xSource WHERE SUBSTRING(Name,1,1)='~') LST
ON DLT.Name = LST.Name
AND DLT.Age = LST.Age
AND DLT.Sex = LST.SEX
-- Step 3 Repopulate Distinct Purged rows
INSERT INTO xSource (Name, Age, Sex)
SELECT Name, Age, Sex
FROM (SELECT SUBSTRING(Name, 2, 49) AS Name, Age, Sex FROM xSource WHERE SUBSTRING(Name,1,1)='~') LST
-- Step 4 Cleanup
DELETE xSource
WHERE SUBSTRING(Name,1,1)='~'
SELECT Name, Age, Sex FROM xSource ORDER BY Name, Sex, Age
October 30, 2009 at 6:54 am
Thanks for the memories. A time when storage and memory was rare and expensive and you had to really know how the guts of the operating system, disk system, and memory allocation worked. Some of the old techniques are still useful today.
Great article and a great solution. 😎
October 30, 2009 at 7:09 am
Stephen.Richardson (10/30/2009)
-- Create tableCREATE TABLE dbo.xSource (Name VARCHAR(50), Age TINYINT, Sex CHAR(1))
INSERT INTO xSource VALUES ('Stephen', 43, 'M')
INSERT INTO xSource VALUES ('Stephen', 43, 'M')
INSERT INTO xSource VALUES ('Stephen', 43, 'M')
INSERT INTO xSource VALUES ('Aron', 18, 'M')
INSERT INTO xSource VALUES ('Sharon', 38, 'F')
INSERT INTO xSource VALUES ('Sharon', 38, 'F')
INSERT INTO xSource VALUES ('Sharon', 38, 'F')
INSERT INTO xSource VALUES ('Sharon', 38, 'F')
INSERT INTO xSource VALUES ('Ira', 24, 'F')
INSERT INTO xSource VALUES ('Joe', 49, 'M')
INSERT INTO xSource VALUES ('Joe', 49, 'M')
INSERT INTO xSource VALUES ('John', 30, 'F')
INSERT INTO xSource VALUES ('John', 30, 'F')
INSERT INTO xSource VALUES ('Eva', 30, 'F')
INSERT INTO xSource VALUES ('GB', 44, 'M')
INSERT INTO xSource VALUES ('GB', 44, 'M')
SELECT Name, Age, Sex FROM xSource ORDER BY Name, Sex, Age
-- Step 1 Get list of only DUPLICATE rows
INSERT INTO xSource (Name, Age, Sex)
SELECT '~'+Name, Age, Sex FROM xSource GROUP BY Name, Age, Sex Having COUNT(*) > 1
-- Part 2 Delete DUPLICATE rows
DELETE xSource
FROM xSource DLT
INNER JOIN (SELECT SUBSTRING(Name, 2, 49) AS Name, Age, Sex FROM xSource WHERE SUBSTRING(Name,1,1)='~') LST
ON DLT.Name = LST.Name
AND DLT.Age = LST.Age
AND DLT.Sex = LST.SEX
-- Step 3 Repopulate Distinct Purged rows
INSERT INTO xSource (Name, Age, Sex)
SELECT Name, Age, Sex
FROM (SELECT SUBSTRING(Name, 2, 49) AS Name, Age, Sex FROM xSource WHERE SUBSTRING(Name,1,1)='~') LST
-- Step 4 Cleanup
DELETE xSource
WHERE SUBSTRING(Name,1,1)='~'
SELECT Name, Age, Sex FROM xSource ORDER BY Name, Sex, Age
Question: Why, in step 3 and 4, do you insert and delete rather than perform an UPDATE xSource Name = SUBSTRING(Name,2,49) WHERE SUBSTRING(Name,1,1)='~'?
Is doing steps 3 and 4 more efficient than the 'replace in place'?
<Caveat: I haven't had my coffee yet, so my coding may be slightly off, but I hope my intent is clear>
October 30, 2009 at 7:16 am
Funny how this article just came out today... I was actually on a similar thread last week and someone came up with a great solution...
My personal opinion the article is interesting but ... hmmm...
How about these solutions from this thread...
http://www.sqlservercentral.com/Forums/Topic793765-145-1.aspx
There's an undocumented identity key for every row of any table that can be used... in one simple delete statement you can delete the duplicate rows...
I was amazed...
Check it out.
Thanks
October 30, 2009 at 7:23 am
The orginal request was to purge the system of "duplicate" rows. Some rows are duplicated and some are not. Additionaly some names lie Terry may be either a male or female therefore you could have a 30 year old Terry male and female therefore not duplicate rows. If you do an update you change all rows matching criteria (I did not try update top 1). Personaly I like security, NO MISTAKES. In production the absolute worst word in the the human lanquage is "oooopppssss", or "I think".
October 30, 2009 at 7:23 am
jghali (10/30/2009)
Funny how this article just came out today... I was actually on a similar thread last week and someone came up with a great solution...My personal opinion the article is interesting but ... hmmm...
How about these solutions from this thread...
http://www.sqlservercentral.com/Forums/Topic793765-145-1.aspx
There's an undocumented identity key for every row of any table that can be used... in one simple delete statement you can delete the duplicate rows...
I was amazed...
Check it out.
Thanks
Please read the thread its not about deleting duplicates its about keeping them 😀
Hiding under a desk from SSIS Implemenation Work :crazy:
October 30, 2009 at 7:37 am
Your off-hand comment about naming the field "Gender" rather than "Sex" piqued my curiosity. Why? As I understand the words, sex means biological differences: chromosomes and sexual organs. Gender refers to the characteristics a society or culture delineates as masculine or feminine. There's some blurring of the definitions, but I think they generally hold. So why do you prefer Gender for the column label?
Viewing 15 posts - 106 through 120 (of 156 total)
You must be logged in to reply to this topic. Login to reply