August 5, 2008 at 9:17 am
karthikeyan (8/5/2008)
Excellent article !:)Actually you gave a variety of good & tasty food in a single box. If you write more articles like this , it would help very much for sql developer who would like to improve their SQL knowledge. Again, Great Article ! ๐
Thanks for the feedback, Karthik.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
August 5, 2008 at 9:23 am
timmy_patterson (8/5/2008)
I'm having trouble understanding Step 4. Where do tables 'TAB' and 'Tally' come from?Thanks,
Tim
Insert into SOURCE(Name, Age, Sex)
Select Name
, Age
, CHAR(ASCII(Sex) & 1) --extract and preserve the [Sex] field
From TAB
Join Tally ON Tally.Number <= Ascii(Sex)/2
Where Tally.Number Between 2 and 127
Oops, another typo! TAB should be SOURCE. A Tally table is just another name for a "Numbers" table. In other words, a table that has every integer from 1 to "X". In our case X would not have to be greater than 127. Many DBA's keep them around pre-made because of how useful they are for performance and for avoing cursors (See Jeff Moden's many articles on this).
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
August 5, 2008 at 9:37 am
I took Shaun's recommendation and found a good article here: http://www.sqlservercentral.com/articles/TSQL/62867/.
I never heard of a Tally table before. This is good stuff. I already know places where I can use it to boost performance.
I've been masking bits for years. I even do in in SQL but I've never used it like you did. Very clever. The most valuable nugget I got from this whole thing is the Tally table. I can't wait to use it. :satisfied:
Many thanks for the great article and thanks to all the great responses!
August 5, 2008 at 9:37 am
Ryan Beesley (8/5/2008)
If you know that you will not have more than 128 possible "duplicate" records, I agree that this is a clever solution.
Yes, and that seems like a reasonable assumption on my part. After all what would be the purpose of so many identical records?
Also, as I stated, I could have gotten at least 4 more bits from the NAME column, If I needed them. Makes it more complex, but that would take me up to 2047 duplicates.
In the initial design of the table, unless you are trying to allow for some interesting possibilities, why wouldn't this already be a bit field?
Well, it could have been, but it wasn't, as evidenced by the fact that it held either an "M" which is not legal for a bit field. I didn't design the table or the problem, someone posted it here. Back in the day, of course, we would never use 8 bits where 1 would do, but today we've got the space so we use it to make things easier on ourselves.
Secondly, why would you choose the low order bit to retain the original value. When you tally the results, you will either need to save the bit when manipulating the result and or add 2 for each tally. Using the high order bit would allow you to add 1 in either case and then just use a mask to check that high bit.
Sorry, I cannot see a difference either way and I just find it easier to keep track of a low order bit, in my head. Maybe you could post an example of the procedure using the high-order bit instead?
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
August 5, 2008 at 10:15 am
Barry - how cool to see something so Old School. I took my first Fortran course in about 1970, and became a staff programmer at UH soon after. You're right - people would cram anything into a byte ๐ to save some space, and if you had a whole "word", jeez - you would put an entire record in that quicker than you could say "MOUNT TAPE". Thanks for the memories...
Ken
August 5, 2008 at 11:03 am
Likewise, the F's are stored as "01000110" in upper-case and "01000110" in lower
I've looked very closely, and I can't see any difference between those two. Or maybe you meant "01100110 in lower"?
August 5, 2008 at 11:21 am
Awesome article. It is good to know and makes me very glad that memory was much less constrained than that by the time I started programming.
---
Timothy A Wiseman
SQL Blog: http://timothyawiseman.wordpress.com/
August 5, 2008 at 11:55 am
Jonathan Rynd (8/5/2008)
Likewise, the F's are stored as "01000110" in upper-case and "01000110" in lower
I've looked very closely, and I can't see any difference between those two. Or maybe you meant "01100110 in lower"?
Right you are. A previous commentor picked up on this also, my apologies.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
August 5, 2008 at 11:58 am
timothyawiseman (8/5/2008)
Awesome article. It is good to know and makes me very glad that memory was much less constrained than that by the time I started programming.
Thanks for the feedback Timothy.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
August 5, 2008 at 12:03 pm
Great article and great solutions to an "interesting" requirement set!
I admit I didn't read the original post, but the assumption Chris's and Phil's solutions rely on (that duplicate rows are grouped together) seems to be implied by the original poster's phrase: "Now i would wish to 'DELETE' the 1st original row ( Row no 1,3,6,7)and keep the other duplicates from above data. "
I agree with what I think Tao is hinting at, though. Back in the day when tiny disk drives cost more than my house, a solution that nearly doubled the table size wouldn't likely fly. Of course it's not likely a system would be designed to store duplicates, either. And, if any were found, they would be deleted, not preserved.
Also, the concept of storing a tally table would be a pretty hard sell. "So you're suggesting we have a permanent table of nothing but consectutive numbers? REALLY? Well, how high should we count? I say we use long ints for this table of yours and we'll take the storage cost out of your salary."
Anyway, as reluctant as I am to suggest this, (please don't hate me, Jeff) if one assumes duplicate rows are grouped together, a RBAR solution to this interesting problem is likely what would have been adopted back then. Of course you can forget about concurrency if you're RBAR grinding through GBs of data.
Please don't get me wrong. I totally understand and promote set based solutions over RBAR and have used tally tables to great advantage. It's just interesting to see hybrid solutions combining 70s constraints, goals, and techiques with modern ones. We certainly have it better than we used to, but we're doing a lot more now, too.
August 5, 2008 at 12:15 pm
60s and 70s ๐
August 5, 2008 at 1:01 pm
WOW! It's not often I get to learn as much from the comments as from the article which originated the comments! Of course, I'm at that stage where data storage is not a problem - it's the algorithm used to retrieve the data out of my grey-matter storage device.
Thanks to all for the lessons learned!
August 5, 2008 at 1:52 pm
Back in the day, of course, we would never use 8 bits where 1 would do, but today we've got the space so we use it to make things easier on ourselves.
Except that, in this case, SQL would still allocate a full byte for that single bit column. The remaining seven bits would be wasted. Now, if you had multiple bit columns (up to eight) you would begin to see a savings.
Andrew
--Andrew
August 5, 2008 at 2:14 pm
Awesome article.
And awesome technique.
And although "prohibited" in the article (a "Loop" is used), I cannot withstand to post another possible solution to this type of task:
[font="Courier New"]DECLARE @Current nvarchar(4000); -- Iterator for the distinct rows
-- Select first group of dupes (or single record)
-- And delete the first occurrence using the WHERE clause
-- and ROWCOUNT
SET @Current = (SELECT MIN(Name + CAST(Age AS CHAR(2)) + Sex) FROM TAB);
SET ROWCOUNT 1; -- Limit DELETE to one row
DELETE TAB WHERE Name + CAST(Age AS CHAR(2)) + Sex = @Current
SET ROWCOUNT 0; -- Reset
-- Loop through each of the remaining group and delete one
-- occurrence of the group.
WHILE 1=1
BEGIN
--Chose next group
SET @Current = (SELECT MIN(Name + CAST(Age AS CHAR(2)) + Sex)
FROM (SELECTName, Age, Sex
FROMTAB
WHEREName + CAST(Age AS CHAR(2)) + Sex > @Current) A
);
IF @Current IS NULLBREAK; -- Exit WHILE if no more groups.
SET ROWCOUNT 1; -- Limit to one row
DELETE TAB WHERE Name + CAST(Age AS CHAR(2)) + Sex = @Current;
SET ROWCOUNT 0; -- Reset
END;[/font]
And while we are at it, does anyone know how to add color IF codes? When I add them, they display as text like here:
Best Regards,
Chris Bรผttner
August 5, 2008 at 2:39 pm
Christian Buettner (8/5/2008)
Awesome article.And awesome technique.
...
And while we are at it, does anyone know how to add color IF codes? When I add them, they display as text like here:
Thanks, Christian.
And you have to quote the argument in the color IFCode: [ color="red" ]
(the buttons get it wrong).
You can also use the Simple-Talk Prettifier for color-coded SQL:http://extras.sqlservercentral.com/prettifier/prettifier.aspx
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
Viewing 15 posts - 46 through 60 (of 156 total)
You must be logged in to reply to this topic. Login to reply