January 3, 2021 at 7:12 pm
Itzik Ben-Gan (definitely one very smart guy in the world of SQL Server and T-SQL) published an interesting challenge back on the 9th of December, 2020. You can find that challenge at the following link...
https://sqlperformance.com/2020/12/t-sql-queries/number-series-challenge
To summarize...
Itzik posted the basis of his world-famous "GetNums" function, which generates a sequence of numbers. The version he posted is his most recent (date unknown) and can be found at the following link...
https://tsql.solidq.com/SourceCodes/GetNums.txt
To save you a click and a little bit of time, here's the function from the link above...
----------------------------------------------------------------------
-- Š Itzik Ben-Gan, SolidQ
-- For more, see 5-day Advanced T-SQL Course:
-- http://tsql.solidq.com/t-sql-courses/
----------------------------------------------------------------------
IF OBJECT_ID(N'dbo.GetNums', N'IF') IS NOT NULL DROP FUNCTION dbo.GetNums;
GO
CREATE FUNCTION dbo.GetNums(@low AS BIGINT, @high AS BIGINT) RETURNS TABLE
AS
RETURN
WITH
L0 AS (SELECT c FROM (SELECT 1 UNION ALL SELECT 1) AS D(c)),
L1 AS (SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),
L2 AS (SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),
L3 AS (SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),
L4 AS (SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),
L5 AS (SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B),
Nums AS (SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS rownum
FROM L5)
SELECT TOP(@high - @low + 1) @low + rownum - 1 AS n
FROM Nums
ORDER BY rownum;
GO
In both the "challenge" article at the first link and the code above, he's programmed the function to return a numeric sequence that starts at some low value (virtually any value possible in a BIGINT) and ends at some high value based on two user-provided parameters.
The "challenge" Itizik has issued is to improve on the performance of that function (it's already nasty fast) because Microsoft has not yet seen fit to provide an equivalent since it was requested by Erland Sommarskog way back in Feb 2007. See the following link to see the still-open request to MS (please also take the time to up-vote it... much appreciated) ...
Getting back to Itzik's challenge, I had a question form in my mind. As a pre-amble to the question, I've found that I use "1" as a starting number about 95% of the time and "0" as a starting number about 4.9% of the time. I can't remember the last time I've needed a number other than "0" or "1" as a starting number but I know I've done it a couple of times over the last couple of decades and so I'll say that I've only needed to do so about 0.1% of the time.
So, my penultimate question about all of this is...
As a percentage of all the times you've used a numerical sequence generator, how often have you needed to generate a sequence that started with something other than "1" or "0"?
It would also be interesting if you described why you needed to start a sequence at other than "1" or "0".
Thanks for the help, folks.
Oh... what is my ultimate question? I'll save that for later in the discussion that follows this post because I don't want to taint anyone's thoughts with such a question.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 3, 2021 at 7:26 pm
how often have you needed to generate a sequence that started with something other than "1" or "0"?
I guess about every time when I wanted a sequence of something time-based: days, months, weeks, years.
[font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]
January 3, 2021 at 8:04 pm
how often have you needed to generate a sequence that started with something other than "1" or "0"?
I guess about every time when I wanted a sequence of something time-based: days, months, weeks, years.
Interesting and thanks for the feedback, Erland...
Are you saying that you calculate the actual (for example) underlying date serial number to create a date sequence? Why wouldn't you just add a sequence to the start date (or whatever)?
--Jeff Moden
Change is inevitable... Change for the better is not.
January 3, 2021 at 8:10 pm
Typically, I have table with the sequences I need, at least if I need them in more than one place in the database.
I discuss time-based tables in my article here: http://www.sommarskog.se/Short%20Stories/table-of-numbers.html
In section 2.1 you also find a query with a sequence that does not start on 0 or 1. (But as noted later in the article, it is not a very practical query.)
[font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]
January 3, 2021 at 8:47 pm
To be sure, I don't see any temporal queries in section 2.1... just the construct for your million row table.
Section 2.2 has a couple of temporal queries where you use the numbers table to create dates based on n-1 (which is "0" for the first value of your numbers table). You didn't actually use a value from your numbers table that started at something other than 0 or 1. Sure... as the the sequence progressed, other values were used but the basis of the queries was to add the sequence of values to the low/starting date and the first value of the sequence was n-1, which is "0". You didn't, for example, convert '19971201' to a date serial number (35763) and start your search in the numbers table for that number.
That's what I'm talking about when I say that I usually use a sequence that starts at "0" or "1". You seem to do the same.
Section 2.3 (first query in that section for finding missing IDs) does, indeed, start with whatever the lowest value in the range of IDs that you're checking but you immediately poo-poo that idea (and I agree) and then demonstrate the use of LEAD followed by the use of your numbers table... which again starts at 1 and you use a formula to create the offset and range (which I also agree with).
My point is that you mostly use your numbers table starting at "1" or "n-1" (effectively "0") and not a number larger than 1 (again, I agree).
Anyway, thank you for your feedback, Erland.
Getting back to the posted question, is there anyone out there that uses a sequence that doesn't start at "0" or "1" other than to correctly demonstrate the wrong way to do something?
--Jeff Moden
Change is inevitable... Change for the better is not.
January 3, 2021 at 8:54 pm
Sorry about the incorrect section reference. The time-based tables are in chapter 4.
I guess that in many cases where we are using sequence, we want a sequence, and what number it starts and stops on is largely irrelevant, but it is often easier to start on 1 or 0.
[font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]
January 3, 2021 at 9:15 pm
If you actually need a sequence, then we have a CREATE SEQUENCE statement in the language I need to just generate a set of integers then I would use
SELECT (units.i + tens + hundreds + .. ) AS num
FROM
VALUES (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) AS units
CROSS JOIN
VALUES (00, 10, 20, 30, 40, 50, 60, 70, 80, 90) AS tens
CROSS JOIN
VALUES (000, 100, 200, 300, 400, 500, 600, 700, 800, 900) AS hundreds
CROSS JOIN
...
If you need a minimal element in this set to be something other than one or zero, then simply add a where clause. This reduces all the computations down to simple addition and only costs a little extra typing.
Please post DDL and follow ANSI/ISO standards when asking for help.
January 3, 2021 at 9:56 pm
If you actually need a sequence, then we have a CREATE SEQUENCE statement in the language I need to just generate a set of integers then I would use
SELECT (units.i + tens + hundreds + .. ) AS num FROM VALUES (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) AS units CROSS JOIN VALUES (00, 10, 20, 30, 40, 50, 60, 70, 80, 90) AS tens CROSS JOIN VALUES (000, 100, 200, 300, 400, 500, 600, 700, 800, 900) AS hundreds CROSS JOIN ...
If you need a minimal element in this set to be something other than one or zero, then simply add a where clause. This reduces all the computations down to simple addition and only costs a little extra typing.
Thanks, Joe. There's a way similar to what you're posted that produces and can produce a very large sequence (about 4.3 billion rows) with only 3 nested loops in the execution plan. Please see the following article for one way...
https://www.sqlservercentral.com/scripts/create-a-tally-function-fntally
Shifting gears back to the subject at hand.... You've been around for quite a while... Have you ever needed to start such an "auxiliary" sequence to replace a While Loop and haven't started it at "0" or "1"? If so, what did the code do?
--Jeff Moden
Change is inevitable... Change for the better is not.
January 3, 2021 at 10:47 pm
Getting back to Itzik's challenge, I had a question form in my mind. As a pre-amble to the question, I've found that I use "1" as a starting number about 95% of the time and "0" as a starting number about 4.9% of the time. I can't remember the last time I've needed a number other than "0" or "1" as a starting number but I know I've done it a couple of times over the last couple of decades and so I'll say that I've only needed to do so about 0.1% of the time.
So, my penultimate question about all of this is...
As a percentage of all the times you've used a numerical sequence generator, how often have you needed to generate a sequence that started with something other than "1" or "0"?
It would also be interesting if you described why you needed to start a sequence at other than "1" or "0".
Thanks for the help, folks.
Oh... what is my ultimate question? I'll save that for later in the discussion that follows this post because I don't want to taint anyone's thoughts with such a question.
In my case I would say the numbers are closer to 75% to 80% of the time using 1. Then 20% to 25% using 0. A lot of time the questions are "what's missing?" from a range of dates or some sequential list. Using min_val+fn.n to begin the sequence is very typical also.
An example of a tally function with beginning and end parameters would be Jonathan's daterange function. At first I found it very useful and used it frequently. Lately though I've just been using the tally functions and calculating dates as needed. Also, I think Sergey's Tally Generator produces sequences with different starting points.
Sometimes I use the fnNumbers function I came up with because I think it's visually approachable. It copies the (very convenient) 1 or 0 parameter from fnTally.
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
January 3, 2021 at 10:56 pm
Jeff Moden wrote:Getting back to Itzik's challenge, I had a question form in my mind. As a pre-amble to the question, I've found that I use "1" as a starting number about 95% of the time and "0" as a starting number about 4.9% of the time. I can't remember the last time I've needed a number other than "0" or "1" as a starting number but I know I've done it a couple of times over the last couple of decades and so I'll say that I've only needed to do so about 0.1% of the time.
So, my penultimate question about all of this is...
As a percentage of all the times you've used a numerical sequence generator, how often have you needed to generate a sequence that started with something other than "1" or "0"?
It would also be interesting if you described why you needed to start a sequence at other than "1" or "0".
Thanks for the help, folks.
Oh... what is my ultimate question? I'll save that for later in the discussion that follows this post because I don't want to taint anyone's thoughts with such a question.
In my case I would say the numbers are closer to 75% to 80% of the time using 1. Then 20% to 25% using 0. A lot of time the questions are "what's missing?" from a range of dates or some sequential list. Using min_val+fn.n to begin the sequence is very typical also.
An example of a tally function with beginning and end parameters would be Jonathan's daterange function. At first I found it very useful and used it frequently. Lately though I've just been using the tally functions and calculating dates as needed. Also, I think Sergey's Tally Generator produces sequences with different starting points.
Sometimes I use the fnNumbers function I came up with because I think it's visually approachable. It copies the (very convenient) 1 or 0 parameter from fnTally.
Thanks for the feedback, Steve. So, from the sounds of it, you don't use a starting value of other than "0" or "1", correct?
--Jeff Moden
Change is inevitable... Change for the better is not.
January 3, 2021 at 11:31 pm
Thanks for the feedback, Steve. So, from the sounds of it, you don't use a starting value of other than "0" or "1", correct?
Those are my two choices so yes. If I were to guess as to the frequency where a constant was added to the generated sequence it's maybe 40% of the time.
With help from 'Thom A' we could probably get some "real" answers from the Stack Overflow database. Above a certain points total, like 25k (which is a lot more than I have), they permit full access to query their db. I've posted many answers (maybe 50+) using fnTally, fnNumbers, or a CTE using (ORDER BY (SELECT NULL)). Thom A (Larnu) has 50k points and has probably also posted a large number of tally based answers.
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
January 4, 2021 at 1:03 am
That's not a bad idea. I think I may actually have a copy of that database, now that you mention it. Heh... if nothing else, it would make for good practice search for specific data in text written by a shedload of different people all with 20 different styles of writing each. 😀
--Jeff Moden
Change is inevitable... Change for the better is not.
January 4, 2021 at 1:17 am
I use 2 monster .sql scripts to answer SSC/SO questions. One currently has 13,787 lines and contains 32 occurrences of 'fnTally' of which 24 specify the sequence begin with 1. Two has 6,859 lines and contains 14 occurrences of fnTally of which 9 specify beginning with a 1. In total 33/46 or about 72%.
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
January 4, 2021 at 8:59 am
out of the normal but in a particular application from one of my clients sequences always started at 1000.
And on a similar situation, multi-tenant, each tenant would start at 1.000.000 intervals so we would know who it belonged to and so we could "merge" data onto a single DW without having conflicting data (not my design!!)
January 4, 2021 at 1:02 pm
out of the normal but in a particular application from one of my clients sequences always started at 1000.
And on a similar situation, multi-tenant, each tenant would start at 1.000.000 intervals so we would know who it belonged to and so we could "merge" data onto a single DW without having conflicting data (not my design!!)
Thanks, Frederico. I appreciate the feedback.
That's not the kind of "sequence" I'm looking for, though. What I was looking for was "number sequences" from a sequence generator bit of code such at Itzik Ben-Gan's "GetNums" or "fnTally" or even a Tally/Numbers table.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 23 total)
You must be logged in to reply to this topic. Login to reply