August 5, 2008 at 6:41 am
bob.holbrey (8/5/2008)
A really simple option which is able to take advantage that "age" is a number...Create a table "test" with the following data as shown:
select * from test
NameAgeSex
ABC 24M
ABC 24M
DEF 24M
DEF 24M
GHI 26F
GHI 26F
GHI 26F
PQRS 25F
insert test
select distinct name, age*1000, sex
from test
delete from test where age<1000
update test
set age=age/1000
select * from test
NameAgeSex
ABC 24M
DEF 24M
GHI 26F
PQRS 25F
i believe the desired solution was to delete only the FIRST original row and keep the other duplicates - so your result should include multiple GHI rows... but not the PQRS row since it was a single original row:
ABC 24 M
ABC 24 M
LMN 27 M
LMN 27 M
LMN 27 M
PQRS 25 F
XYZ 24 M
XYZ 25 M
Now i would wish to 'DELETE' the 1st original row ( Row no 1,3,6,7)and keep the other duplicates from above data. Now, If there is a row without any duplicates present.. it will be deleted ( for eg Row no 6)
August 5, 2008 at 6:42 am
bob.holbrey (8/5/2008)
A really simple option which is able to take advantage that "age" is a number......
select distinct name, age*1000, sex
Right Bob, that was another option. However, I did not take it because I felt that I could not assume that AGE was any larger than TINYINT (one byte), "1000" could have been out of its range.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
August 5, 2008 at 6:54 am
Good article, anything that gets this many responses is, IMHO, a good article. Being a relative newcomer to the field (late 90's) I never had to deal with the lack of storage issues as you had to, so would never have come up with that solution. Of course this is what I love about SSC, never saw the original thread, but now I have some new ways to solve the problem.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
August 5, 2008 at 7:18 am
Before I get into my first impression I will say it is a good article. And you should be aware people can have specific habits they don't realize which can lead to specific solutions.
But my first impression was the first example for the contact numbers. I noticed you made 4 passes over the data instead of just the one. And even thou you option works the issue the customer faced was they had two seperate updates which caused the issue. If both had been handled in the same UPDATE they issue does not occurr and I would have done something like this
BEGIN Transaction
UPDATE CONTACT_NUMBERS
SET CallOrder = (CASE WHEN CallOrder = 1 THEN 2 ELSE 1 END)
COMMIT Transaction
as this makes one pass across the data and adjust all records at the same time the conflict does not occurr.
With all that said you still offer people who may other ways they could do things should there be a possible limitation.
Additionally, I would add you also have to weigh the system impact of the choice you make. As I stated, in your first example you make 4 passes over the dataset and you also write all the ouched rows twice to the log file. The method I present makes 1 pass and logs each row only once to the log.
Lastly, I would add that if you are going to provide examples please be consistent. In the contact number example you provide the script to build the sample set but in the case of the example which is the focus of the article you do not. Just keep this in mind when you write further articles is be consistent in your presentation stlye throughout the article.
August 5, 2008 at 7:56 am
Warning - Newcomer here - This is the first challenge I've tried to actually get a result out of. Let me know what you think.
I chose to find a duplicate, and remove it. This is repeated until there are no more duplicates left. I ran this through sql2005...no luck with 2000.
Thanks for you input
- Jon
-----------------------------------------------------------
/*This First part is just to setup the example
--CREATE THE TEST TABLE
CREATE TABLE tblTest
(
NAMES varchar(50) ,
AGE varchar(50),
SEX varchar(50)
)
--INSERT VALUE INTO TEST TABLE
INSERT Into tblTest
Select 'ABC','24', 'M'
UNION ALL Select 'ABC','24', 'M'
UNION ALL Select 'LMN','27', 'M'
UNION ALL Select 'LMN','27', 'M'
UNION ALL Select 'LMN','27', 'M'
UNION ALL Select 'PQRS','25', 'F'
UNION ALL Select 'XYZ','24', 'M'
UNION ALL Select 'XYZ','25', 'M'
--DROP TABLE IF NEEDED
DROP TABLE TBLTEST
*/
DECLARE @NAMES VARCHAR(50)
DECLARE @AGE VARCHAR(50)
DECLARE @SEX VARCHAR(50)
WHILE 1 = 1
BEGIN
SELECT @NAMES = (SELECT TOP(1) NAMES FROM (SELECT NAMES,AGE,SEX, count(*) 'COUNT' from TBLTEST group by NAMES ,AGE,SEX)T1
WHERE COUNT > 1)
SELECT @AGE = (SELECT TOP(1) AGE FROM (SELECT NAMES,AGE,SEX, count(*) 'COUNT' from TBLTEST group by NAMES ,AGE,SEX)T1
WHERE COUNT > 1)
SELECT @SEX = (SELECT TOP(1) SEX FROM (SELECT NAMES,AGE,SEX, count(*) 'COUNT' from TBLTEST group by NAMES ,AGE,SEX)T1
WHERE COUNT > 1)
;
IF @NAMES IS NULL
BEGIN
BREAK
END
ELSE
WITH TMP AS
(SELECT * FROM TBLTEST
WHERE NAMES = @NAMES
AND [AGE] = @AGE
AND [SEX] = @SEX)
DELETE TOP(1) FROM TMP
END
August 5, 2008 at 8:09 am
Antares686 (8/5/2008)
But my first impression was the first example for the contact numbers. I noticed you made 4 passes over the data instead of just the one. And even thou you option works the issue the customer faced was they had two seperate updates which caused the issue. If both had been handled in the same UPDATE they issue does not occurr and I would have done something like this
BEGIN Transaction
UPDATE CONTACT_NUMBERS
SET CallOrder = (CASE WHEN CallOrder = 1 THEN 2 ELSE 1 END)
COMMIT Transaction
as this makes one pass across the data and adjust all records at the same time the conflict does not occurr.
True, however, recall that I qualified this example in the article:
And yes, I do know that there are other ways to do this correctly, especially with a CASE function. For reasons that I cannot get into, that was not an option here.
The reason I did not use it is because the customer was not familiar with the CASE function and I was talking them through this over the phone. Since I could not see what they were typing, I was concerned that they could make a simple typo, (not visible to me) that would take their situation from bad to worse.
Consequently, I opted for the slower but safer option, of preserving the command sturcture that they already had and just executing it twice.
Lastly, I would add that if you are going to provide examples please be consistent. In the contact number example you provide the script to build the sample set but in the case of the example which is the focus of the article you do not. Just keep this in mind when you write further articles is be consistent in your presentation stlye throughout the article.
Good point, and I did have the INSERT's for a bunch of test data too, I just forgot to include them. Oh well. Anyone who still wants some sample data, here is the data from Chris Morris's post:
DROP TABLE #Testing
CREATE TABLE #Testing (
[Name] Varchar(4),
Age TINYINT,
Sex Char(1)
)
INSERT INTO #Testing ([name], Age, Sex)
SELECT 'ABC',24,'M' UNION ALL --
SELECT 'ABC',24,'M' UNION ALL
SELECT 'DEF',24,'M' UNION ALL --
SELECT 'DEF',24,'F' UNION ALL
SELECT 'GHI',26,'F' UNION ALL --
SELECT 'GHI',26,'F' UNION ALL
SELECT 'GHI',26,'F' UNION ALL
SELECT 'GHI',26,'F' UNION ALL
SELECT 'GHI',26,'F' UNION ALL
SELECT 'LMN',27,'M' UNION ALL --
SELECT 'LMN',27,'M' UNION ALL
SELECT 'LMN',27,'M' UNION ALL
SELECT 'PQRS',25,'F' UNION ALL --
SELECT 'XYZ',24,'M' UNION ALL --
SELECT 'XYZ',25,'M' --
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
August 5, 2008 at 8:25 am
I ran into this problem before and came up with a solution using Distinct, which may not be the most efficent but it is very quick and easy, and seems to work well on smaller tables.
The gist is to make a copy of the table you want to remove the dups from then truncate that table and then Select Distinct back into the table from the copy. This example starts with 7 rows and ends up with 3.
--Setup
Create table TAB (
Fnamevarchar(50),
Ageint,
Sexchar(1))
Insert into TAB
Select'ABC',24,'M' Union ALL
Select'ABC',24,'M' Union ALL
Select'ABC',25,'M' Union ALL
Select'DEF',35,'F' Union ALL
Select'DEF',35,'F' Union ALL
Select'DEF',35,'F' Union ALL
Select'DEF',35,'F'
--Add all records to temp table
Select *
into #temp
From TAB
--Clear out table and add records back in
Truncate table TAB
Insert into TAB (Fname, Age, Sex)
Select Distinct Fname, Age, Sex
From #temp
--Output
Select *
From TAB
--Cleanup
drop table TAB, #temp
August 5, 2008 at 8:27 am
Excellent article !:)
Actually you gave a variety of good & tasty food in a single box. If you write more articles like this , it would help very much for sql developer who would like to improve their SQL knowledge. Again, Great Article ! 🙂
karthik
August 5, 2008 at 8:33 am
try as hard as you can;-
and still the dupe removers and requirments non-readers supply solutions to the problem already
answered/solved and burried by rbarryyoung, chris.morriss and Phil Factor.
All legends.
*Shaun drinks a swig of calm juice*
People pay attention;-
We neither need nor want any additional code or solutions unless they address the issue involving all of the requirements of the article and provide something more than has already been written!
*mutters they won't read this anyway, at least I tried!*
Hiding under a desk from SSIS Implemenation Work :crazy:
August 5, 2008 at 8:41 am
Shaun McGuile (8/5/2008)
try as hard as you can;-and still the dupe removers and requirments non-readers supply solutions to the problem already
answered/solved and burried by rbarryyoung, chris.morriss and Phil Factor.
All legends.
*Shaun drinks a swig of calm juice*
People pay attention;-
We neither need nor want any additional code or solutions unless they address the issue involving all of the requirements of the article and provide something more than has already been written!
*mutters they won't read this anyway, at least I tried!*
Sorry I wasn't one of the first 5 responders, I believe my solution is one of the easiest so far and definately needs to be posted.
August 5, 2008 at 8:44 am
I'm having trouble understanding Step 4. Where do tables 'TAB' and 'Tally' come from?
Thanks,
Tim
Insert into SOURCE(Name, Age, Sex)
Select Name
, Age
, CHAR(ASCII(Sex) & 1) --extract and preserve the [Sex] field
From TAB
Join Tally ON Tally.Number <= Ascii(Sex)/2
Where Tally.Number Between 2 and 127
August 5, 2008 at 8:45 am
You used a temp table - forbidden by the article requirements.
Sorry,
Just havin a bad mood moment, please don't take it personally, I get worked up on long running threads....;)
Hiding under a desk from SSIS Implemenation Work :crazy:
August 5, 2008 at 8:47 am
Dang, I thought I was onto something there. I went back and re-read the requirements and broke all kinds of stuff. Thanks for the 'heads up'.
August 5, 2008 at 8:50 am
timmy_patterson (8/5/2008)
I'm having trouble understanding Step 4. Where do tables 'TAB' and 'Tally' come from?Thanks,
Tim
Insert into SOURCE(Name, Age, Sex)
Select Name
, Age
, CHAR(ASCII(Sex) & 1) --extract and preserve the [Sex] field
From TAB
Join Tally ON Tally.Number <= Ascii(Sex)/2
Where Tally.Number Between 2 and 127
TAB looks like a typo probably should be SOURCE
and TALLY is the tally table you should have in any and all databases -if you do not have one you need one so get one - search this site for articles. 🙂
Hiding under a desk from SSIS Implemenation Work :crazy:
August 5, 2008 at 9:14 am
If you know that you will not have more than 128 possible "duplicate" records, I agree that this is a clever solution. In the initial design of the table, unless you are trying to allow for some interesting possibilities, why wouldn't this already be a bit field? Secondly, why would you choose the low order bit to retain the original value. When you tally the results, you will either need to save the bit when manipulating the result and or add 2 for each tally. Using the high order bit would allow you to add 1 in either case and then just use a mask to check that high bit.
Viewing 15 posts - 31 through 45 (of 156 total)
You must be logged in to reply to this topic. Login to reply