April 7, 2013 at 2:06 pm
Jeff Moden (4/7/2013)
With the understanding that I've not seen nor have I been able to produce such a failure except when an indeterminate formula was used as the target of the ORDER BY, I guess the next question would be, is there a code example somewhere that demonstrates failure of the TOP (MaxIntValue)/ORDER BY? The one that MS published never seems to fail.
As long as you are not querying tables with more than maxint rows, this method currently always works. Just as the version with TOP 100 PERCENT always worked until it stopped working in SQL Server 2005. But as far as I know, it's not documented, nor guaranteed.
Also, is there a work around that will produce the desired behavior other than the obvious method of shifting to FOR XML PATH? Or, is that also subject to ORDER BY failing (which would really screw up some real XML)?
There are two issues here, and we should take care not to confuse them.
1: ORDER BY in a subquery (or view) - except for some specific exceptions, this is not allowed. In the situations where it is allowed (off the top of my head only TOP and FOR XML), the ORDER BY is guaranteed to be used to specify what rows to qualify/disqualify for the TOP and is also guaranteed to process the rows in the specified order when forming the XML. But not guaranteed to do anything else. If you have TOP 5 ... ORDER BY FirstName in a subquery or view, you will only get names like Aaron and Abby, but unless there is an ORDER BY in the outer query as well, you can still get them in any order. They usually stay in alphabetical order, but this is not guaranteed. For the FOR XML construct, the result is just a single row, so ordering does not apply at all anymore.
2: Concatenation by variable assignment (SELECT @x = @x + Column FROM ...), either with or without ORDER BY. I have once seen a website with a number of examples that were reproducible and that showed this technique to be unreliable. Unfortunately, I lost the URL and have never been able to find it again. And frankly, I have no idea if any of those examples would have reproduced on newer versions (like SQL Server 2005, or up - yes, it is THAT long ago since I found that site).
There is a very weird article in the Knowledge Base: http://support.microsoft.com/kb/287515. It first says that "the correct behaviour for an aggregate concatenation query is undefined" (which as far as I know is still true). However, it then further down does provide a workaround "to achieve the expected results from an aggregate concatenation query", without bothering to specify what the "expected results" for this undefined behaviour should be.
Also, is there a work around that will produce the desired behavior other than the obvious method of shifting to FOR XML PATH? Or, is that also subject to ORDER BY failing (which would really screw up some real XML)?
The only guaranteed methods I know to correctly concatenate strings are:
1: FOR XML
2: Procedural code (i.e. looping over the result set)
3: Creating your own CLR user-defined aggregate function. When this feature was new, there was no option to preserve order in CLR user-defined aggregates (there was syntax for it, but it had no effect). I have no idea if this has changed since; I think it has not in SQL Server 2008, but I stopped paying attention after that.
If you use ORDER BY and FOR XML at the same level of the query, you can rely on the results being added to the XML result in the desired order. If you use ORDER BY at a different subquery-level, it's anyone's guess.
June 13, 2013 at 10:23 am
I have a quick question, in the original post the final output is row by row. How do you get it into columns using id? Thanks.
Disregard. I figured it out I needed to key off the tabby value! What an amazingly quick solution! Thanks again.
June 13, 2013 at 10:21 pm
rayoftennessee (6/13/2013)
I have a quick question, in the original post the final output is row by row. How do you get it into columns using id? Thanks.Disregard. I figured it out I needed to key off the tabby value! What an amazingly quick solution! Thanks again.
Hi Ray,
Thanks for stopping by and the thoughtful comment.
Do you have a coded solution that you'd like to share for the problem you mentioned above? It might help someone else out with a similar problem.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 17, 2013 at 3:39 am
*uuh*
Hi, Jeff,
my thanks to you for your great articles here on SSC as well as your comments outside.
And it was good of you to answer the idea regarding the use of recursion in detail: I learned a lot today!
Best regards
JP
________________________________________________________
If you set out to do something, something else must be done first.
January 2, 2014 at 5:37 pm
Hi Jeff,
I apologize if you've addressed this elsewhere, but I was sort of curious if you'd done any testing of tally/inline tally tables vs. sequences in 2012?
http://technet.microsoft.com/en-us/library/ff878091.aspx
I'd be interested to see them thrown into the mix.
Thanks,
Erik
January 2, 2014 at 5:43 pm
I'm not sure that sequences work the same way as a Tally table. It seems that their purposes are different.
Could you explain how would you use a sequence instead of a tally table?
January 2, 2014 at 6:18 pm
Hi Luis,
I recently saw a PostgreSQL example where a series of dates was generated that reminded me a bit of tally table code. That's what got me curious about sequences in MS SQL. I'm not entirely sure how I'd use it here, but wanted to ask in case one of you vastly smarter people had already thought of something 🙂
http://pgexercises.com/questions/aggregates/rollingavg.html
select dategen.date,
(
-- correlated subquery that, for each day fed into it,
-- finds the average revenue for the last 15 days
select sum(case
when memid = 0 then slots * facs.guestcost
else slots * membercost
end) as rev
from cd.bookings bks
inner join cd.facilities facs
on bks.facid = facs.facid
where bks.starttime > dategen.date - interval '14 days'
and bks.starttime < dategen.date + interval '1 day'
)/15 as revenue
from
(
-- generates a list of days in august
select cast(generate_series(timestamp '2012-08-01',
'2012-08-31','1 day') as date) as date
) as dategen
order by dategen.date;
January 3, 2014 at 8:18 am
Thanks for stopping by, Erik.
From what I understand, the SEQUENCEs of 2012 aren't the same as the "Generate_Series" functionality of PostGre. SEQUENCEs still rely on rows being present where "Generate_Series" generates a row set. Not quite the same thing.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 3, 2014 at 8:36 am
Generate_Series sounds like a great function, but it's not the same as Sequences in SQL Server (or Oracle FWIW).
Sequences are used to fetch values but they will keep track of the last value used to return the following value the next time the sequence is used. Sequences are similar to identity column values but are used different and are independent objects.
For something similar to Generate_Series, some people have constructed iTVFunctions for tally or calendar tables built on the fly.
January 3, 2014 at 9:00 am
Yeah, I saw the limitations last night after I had a chance to mess with it. I was hoping there would be a quick way to use the sequence values as positions in data or something. Ah well.
Thanks for the replies
January 3, 2014 at 9:40 am
Disney uses a AS/400 based PMS which uses sequences on every record. I never started a conversation as to it purpose, but I'm assuming that it provides a means of (almost) guaranteeing that each transaction record (regardless of table) is (almost) uniquely identified. The sequence reset at 10,000,000,000. Depending on the retention level, the same sequence number could appear more than once at the same time in the db regardless of table. The probability of the same sequence number occurring more than once in the same table being miniscule. If memory serves the primary key was composed of the transaction date plus the sequence number. I can't remember if the sequence was stored separately or combined with the trans date. It's only been 13 years.
July 4, 2014 at 5:59 am
Thank you for this article.
I'm very new in SQL-Server field and it was very useful for me, but I have one question which not clear enough for me:
Suppose I want to use "Tally table" in my procedure, and my co-worker planning to use "Tally table" in his procedure.
Do we have to use different Tally table for each procedure.
If it so, I think it will be so many different "Tally tables" in our database (one Tally table on each procedure).
If we could use the only Tally table for each procedure then how to fill this table sequential data? I need 1000 rows, somebody else can decide that it's too much and decrease the amount...
Please clear this moment for me...
July 4, 2014 at 8:03 am
You don't need a different tally table for each procedure. You only need one and you'll just determine the range when you query it.
You could as well create a view instead of a physical tally table
CREATE VIEW vTally
WITH SCHEMABINDING
AS
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b) --10E+4 or 10,000 rows max
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) n
FROM E4
or a table valued function
CREATE FUNCTION fTally(@From int, @To int)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b) --10E+4 or 10,000 rows max
SELECT TOP( @To - @From + 1) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) + @From - 1 n
FROM E4
Which run with 0 reads as they're calculated on the fly.
July 4, 2014 at 9:35 am
Personally, I'd much prefer the iTVF (inline Table Valued Function) over the view because it runs as fast as the view and is much easier to control the range of numbers returned.
Shifting gears a bit, I don't try to calculate ranges that start with other than 0 or 1 in my Tally function because it adds a bit (small but still present) of unnecessary overhead. Most of the work I do starts at 0 or 1 and when dealing with large ranges and large amount of data, every bit helps even at the speed of today's machines. If I need a range to be offset, I do that calculation externally. I also do non-whole number and non-sequential sequences to having just a base 0 or 1 to whatever function works the best for me.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 4, 2014 at 1:07 pm
my2ndbox (7/4/2014)
Thank you for this article.I'm very new in SQL-Server field and it was very useful for me, but I have one question which not clear enough for me:
Suppose I want to use "Tally table" in my procedure, and my co-worker planning to use "Tally table" in his procedure.
Do we have to use different Tally table for each procedure.
If it so, I think it will be so many different "Tally tables" in our database (one Tally table on each procedure).
If we could use the only Tally table for each procedure then how to fill this table sequential data? I need 1000 rows, somebody else can decide that it's too much and decrease the amount...
Please clear this moment for me...
You only need one Tally Table or one Tally function such as that which Luis posted. I use both.
My Tally Table is setup for 11,000 rows because I do a fair bit of work with mortgages. 11,000 rows is a number just a little larger than 30 years worth of days. The article explains how to build one. Make sure that you don't forget the unique clustered index in the form of a PK.
The Tally function that I use looks like this. It will allow a starting value of either 0 or 1 with virtually no overhead for making that decision.
/****** Object: UserDefinedFunction [dbo].[fnTally] Script Date: 07/04/2014 15:02:49 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE FUNCTION [dbo].[fnTally]
/**********************************************************************************************************************
Purpose:
Return a column of BIGINTs from @ZeroOrOne up to and including @MaxN with a max value of 1 Billion.
As a performance note, it takes about 00:02:10 (hh:mm:ss) to generate 1 Billion numbers to a throw-away variable.
Usage:
--===== Syntax example (Returns BIGINT)
SELECT t.N
FROM dbo.fnTally(@ZeroOrOne, @MaxN) t
;
Notes:
1. Based on Itzik Ben-Gan's cascading CTE (cCTE) method for creating a "readless" Tally Table source of BIGINTs.
Refer to the following URL for how it works. http://sqlmag.com/sql-server/virtual-auxiliary-table-numbers
2. To start a sequence at 0, @ZeroOrOne must be 0 or NULL. Any other value that's convertable to the BIT data-type
will cause the sequence to start at 1.
3. If @ZeroOrOne = 1 and @MaxN = 0, no rows will be returned.
5. If @MaxN is negative or NULL, a "TOP" error will be returned.
6. @MaxN must be a positive number from >= the value of @ZeroOrOne up to and including 1 Billion. If a larger
number is used, the function will silently truncate after 1 Billion. If you actually need a sequence with
that many values, you should consider using a different tool. ;-)
7. There will be a substantial reduction in performance if "N" is sorted in descending order. If a descending
sort is required, use code similar to the following. Performance will decrease by about 27% but it's still
very fast especially compared with just doing a simple descending sort on "N", which is about 20 times slower.
If @ZeroOrOne is a 0, in this case, remove the "+1" from the code.
DECLARE @MaxN BIGINT;
SELECT @MaxN = 1000;
SELECT DescendingN = @MaxN-N+1
FROM dbo.fnTally(1,@MaxN);
8. There is no performance penalty for sorting "N" in ascending order because the output is explicity sorted by
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
Revision History:
Rev 00 - Unknown - Jeff Moden
- Initial creation with error handling for @MaxN.
Rev 01 - 09 Feb 2013 - Jeff Moden
- Modified to start at 0 or 1.
Rev 02 - 16 May 2013 - Jeff Moden
- Removed error handling for @MaxN because of exceptional cases.
**********************************************************************************************************************/
(@ZeroOrOne BIT, @MaxN BIGINT)
RETURNS TABLE WITH SCHEMABINDING AS
RETURN WITH
E1(N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1) --10E1 or 10 rows
, E3(N) AS (SELECT 1 FROM E1 a, E1 b, E1 c) --10E3 or 1 Thousand rows
, E9(N) AS (SELECT 1 FROM E3 a, E3 b, E3 c) --10E9 or 1 Billion rows
SELECT N = 0 WHERE ISNULL(@ZeroOrOne,0)= 0 --Conditionally start at 0.
UNION ALL
SELECT TOP(@MaxN) N = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E9 -- Values from 1 to @MaxN
;
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 391 through 405 (of 511 total)
You must be logged in to reply to this topic. Login to reply