April 20, 2020 at 4:30 pm
April 20, 2020 at 5:28 pm
DEFINITELY not an ideal solution, but works with the sample data (plus an extra a ways in the future). It assumes the documentID is the order in which documents arrived, not the arrival date. What I mean is, document ID 11 will have an arrival date greater to or equal than document ID 10.
I say not ideal because it is not pretty code to read, but meets the requirement of the title (CTE):
DECLARE @tmpTable TABLE
(
[documentID]INT
, [arrivalDate] DATE
);
INSERT INTO @tmpTable
(
[documentID]
, [arrivalDate]
)
VALUES
(
1-- documentID - int
, '2019-01-02'-- arrivalDate - date
)
, (
2-- documentID - int
, '2019-01-02'-- arrivalDate - date
)
, (
3-- documentID - int
, '2019-01-02'-- arrivalDate - date
)
, (
4-- documentID - int
, '2019-01-03'-- arrivalDate - date
)
, (
5-- documentID - int
, '2019-01-03'-- arrivalDate - date
)
, (
6-- documentID - int
, '2019-01-04'-- arrivalDate - date
)
, (
7-- documentID - int
, '2019-01-04'-- arrivalDate - date
)
, (
8-- documentID - int
, '2019-01-04'-- arrivalDate - date
)
, (
9-- documentID - int
, '2019-01-04'-- arrivalDate - date
)
, (
10-- documentID - int
, '2019-01-07'-- arrivalDate - date
)
, (
11-- documentID - int
, '2019-02-03'-- arrivalDate - date
);
WITH [cte]
AS
(
SELECT
[documentID]
, [arrivalDate]
, ROW_NUMBER() OVER (PARTITION BY
[arrivalDate]
ORDER BY [documentID]
) AS [rn]
FROM@tmpTable
)
, [cte2]
AS
(
SELECT
[cte].[documentID]
, [cte].[arrivalDate]
, [cte].[rn]
, CASE
WHEN [cte].[rn] > 2
THEN DATEADD( DAY
, 1
, [cte].[arrivalDate]
)
ELSE [cte].[arrivalDate]
END AS [finaldate]
FROM[cte]
)
, [cte3]
AS
(
SELECT
[cte2].[documentID]
, [cte2].[arrivalDate]
, ROW_NUMBER() OVER (PARTITION BY
[cte2].[finaldate]
ORDER BY [cte2].[documentID]
) AS [rn]
, [cte2].[finaldate]
FROM[cte2]
)
, [cte4]
AS
(
SELECT
[cte3].[documentID]
, [cte3].[arrivalDate]
, [cte3].[rn]
, CASE
WHEN [cte3].[rn] > 2
THEN DATEADD( DAY
, 1
, [cte3].[finaldate]
)
ELSE [cte3].[finaldate]
END AS [finaldate]
FROM[cte3]
)
, [cte5]
AS
(
SELECT
[cte4].[documentID]
, [cte4].[arrivalDate]
, ROW_NUMBER() OVER (PARTITION BY
[cte4].[finaldate]
ORDER BY [cte4].[documentID]
) AS [rn]
, [cte4].[finaldate]
FROM[cte4]
)
, [cte6]
AS
(
SELECT
[cte5].[documentID]
, [cte5].[arrivalDate]
, [cte5].[rn]
, CASE
WHEN [cte5].[rn] > 2
THEN DATEADD( DAY
, 1
, [cte5].[finaldate]
)
ELSE [cte5].[finaldate]
END AS [finaldate]
FROM[cte5]
)
, [cte7]
AS
(
SELECT
[cte6].[documentID]
, [cte6].[arrivalDate]
, ROW_NUMBER() OVER (PARTITION BY
[cte6].[finaldate]
ORDER BY [cte6].[documentID]
) AS [rn]
, [cte6].[finaldate]
FROM[cte6]
)
, [cte8]
AS
(
SELECT
[cte7].[documentID]
, [cte7].[arrivalDate]
, [cte7].[rn]
, CASE
WHEN [cte7].[rn] > 2
THEN DATEADD( DAY
, 1
, [cte7].[finaldate]
)
ELSE [cte7].[finaldate]
END AS [finaldate]
FROM[cte7]
)
SELECT
[cte8].[documentID]
, [cte8].[arrivalDate]
, [cte8].[finaldate]
FROM[cte8];
8 CTE's to make it work though. And likely need more CTE's for any real world data. For example, if rows 10 and 11 were both the same arrival date as row 9, you would need an additional 2 CTE's to handle that case.
an alternate method you could use would be a while loop which would handle more data without having to make more CTEs:
DECLARE @tmpTable2 TABLE
(
[documentID]INT
, [arrivalDate] DATE
, [FinalDate]DATE
, [rn]INT
);
DECLARE @maxRN INT;
SELECT
@maxRN = MAX([rownum].[rn])
FROM
(
SELECT
ROW_NUMBER() OVER (PARTITION BY
[arrivalDate]
ORDER BY [documentID]
) AS [rn]
FROM@tmpTable
) AS [rownum];
INSERT INTO @tmpTable2
(
[documentID]
, [arrivalDate]
, [FinalDate]
, [rn]
)
SELECT
[documentID]
, [arrivalDate]
, [arrivalDate]
, ROW_NUMBER() OVER (PARTITION BY
[arrivalDate]
ORDER BY [documentID]
)
FROM@tmpTable;
WHILE (@maxRN > 2)
BEGIN
UPDATE
@tmpTable2
SET
[FinalDate] = DATEADD( DAY
, 1
, [FinalDate]
)
WHERE[rn] > 2;
UPDATE
@tmpTable2
SET
[rn] = [updated].[rn]
FROM
(
SELECT
[documentID]
, ROW_NUMBER() OVER (PARTITION BY
[FinalDate]
ORDER BY [documentID]
) AS [rn]
FROM@tmpTable2
) AS [updated]
WHERE[@tmpTable2].[documentID] = [updated].[documentID];
SELECT
@maxRN = MAX([rownum].[rn])
FROM
(
SELECT
ROW_NUMBER() OVER (PARTITION BY
[FinalDate]
ORDER BY [documentID]
) AS [rn]
FROM@tmpTable2
) AS [rownum];
END;
SELECT
[documentID]
, [arrivalDate]
, [FinalDate]
FROM@tmpTable2;
Not the prettiest code, and as it is using a while loop, not the most efficient code I'm sure, but I think it is better than using CTE's...
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
April 20, 2020 at 6:26 pm
Consumable Test Data:
CREATE TABLE #t
(
documentID int NOT NULL PRIMARY KEY
,ArrivedDate date NOT NULL
);
INSERT INTO #t
VALUES (1, '20190102')
,(2, '20190102')
,(3, '20190102')
,(4, '20190103')
,(5, '20190103')
,(6, '20190104')
,(7, '20190104')
,(8, '20190104')
,(9, '20190104')
,(10, '20190107');
This works with the test data but needs more work:
WITH DocOrder
AS
(
SELECT documentID, ArrivedDate
,ROW_NUMBER() OVER (ORDER BY documentID) AS Ord
FROM #t
)
,BaseDate
AS
(
SELECT MIN(ArrivedDate) AS ArrivedDate
FROM #t
)
SELECT D.documentID, D.ArrivedDate
,CASE
WHEN X.FinalDate >= D.ArrivedDate
THEN X.FinalDate
ELSE D.ArrivedDate
END AS FinalDate
FROM DocOrder D
CROSS JOIN BaseDate B
CROSS APPLY
(
VALUES(DATEADD(day, (Ord - 1) / 2, B.ArrivedDate))
) X (FinalDate);
April 20, 2020 at 8:09 pm
Nice work Ken! Your solution is cleaner than mine. I did find a flaw with it though - if you have a gap in the dates, you can get into trouble. For example, if 2 more rows were inserted with the same date as row 10, you end up with 3 rows containing 20190107 insead of the 2 we are expecting.
With any real-world data, you will likely run into problems with gaps.
Also, as you are making a new column called Ord, will that not be identical to the documentID column?
Note - I am not trying to pick on your code by any means, I was just testing it out and trying some other use cases. It works nicer than mine with the provided sample code as you have no loops and you don't end up with a boatload of CTE's like I did (8 CTEs in a single query is a bit high).
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
April 20, 2020 at 8:37 pm
Thanks for the comments Brian. I am aware of the flaws with real data but do not have time to look at in more detail at the moment. There might be scope for creating groups or failing that write to a temp table and use a while loop to sort out the problems. The idea of Ord was to cope with any gaps in documentID.
An interesting problem.
April 21, 2020 at 2:12 am
This appears to work but it's not simple. It defines the anchor records as having the minimum ArrivedDate where the doc_row_num<=2 and doc_row_num<= doc_count. Then it inserts additional rows calculating the end_dt (by 2's) unless the calculated date is less than the ArrivedDate. If the calculated end date is less than the ArrivedDate, then the end_dt is set to equal the ArrivedDate.
;with
mm_dt_cte(min_dt) as (select min(ArrivedDate) from #t),
doc_days_cte(arrive_dt, doc_count) as (
select
ArrivedDate,
count(*)
from #t
group by ArrivedDate),
r_cte(doc_id, arrive_dt, doc_row_num, row_num, min_dt, doc_count) as (
select
t.documentID,
t.ArrivedDate,
row_number() over (partition by ArrivedDate order by ArrivedDate) as doc_row_num,
row_number() over (order by ArrivedDate) as row_num,
mdc.min_dt,
ddc.doc_count
from
#t t
join
mm_dt_cte mdc on t.ArrivedDate=mdc.min_dt
join
doc_days_cte ddc on t.ArrivedDate=ddc.arrive_dt),
anchor_cte(doc_id) as (
select doc_id
from r_cte
where arrive_dt=min_dt
and doc_row_num<=2
and doc_row_num<=doc_count)
select
rc.doc_id, rc.arrive_dt, rc.arrive_dt end_dt
from
r_cte rc
join
anchor_cte ac on rc.doc_id=ac.doc_id
union all
select
t.documentID, t.ArrivedDate,
iif(dateadd(dd, (row_number() over (order by ArrivedDate)+1)/2, mdc.min_dt)<ArrivedDate,
ArrivedDate,
dateadd(dd, (row_number() over (order by ArrivedDate)+1)/2, mdc.min_dt))
from
#t t
cross join
mm_dt_cte mdc
where not exists
(select 1 from anchor_cte ac where t.documentID=ac.doc_id);
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
April 21, 2020 at 10:47 am
This query seems to work with the OP’s test data but, like Brain’s and my initial set-based attempts, will fail with more realistic data.
On looking at this again, the options seem to be a loop (cursors are slow in SQL Server), the CLR (a pain to enable and maintain securely) or the Quirky Update. eg:
While I am in awe of Jeff Moden’s meticulous testing and documentation, for me it really runs against the grain to use such a non-relational and officially undocumented technique. If the problem was for an application, I would probably suggest solving it in the middle tier. In SQL Server, ‘being a bear of exceedingly little brain’, I can see no reasonable set-based alternative.
New test data:
CREATE TABLE #t
(
documentID int NOT NULL PRIMARY KEY
,ArrivedDate date NOT NULL
);
INSERT INTO #t
VALUES (1, '20190102')
,(2, '20190102')
,(3, '20190102')
,(4, '20190103')
,(5, '20190103')
,(6, '20190104')
,(7, '20190104')
,(8, '20190104')
,(9, '20190104')
,(10, '20190107')
,(11, '20190107')
,(12, '20190107')
,(13, '20190107')
,(14, '20190107')
,(15, '20190107');
Solution using the Quirky Update:
CREATE TABLE #Quirky
(
documentID int NOT NULL PRIMARY KEY
,ArrivedDate date NOT NULL
,FinalDate date NULL
);
INSERT INTO #Quirky(documentID, ArrivedDate)
SELECT documentID, ArrivedDate
FROM #t;
DECLARE @FinalDate date = (SELECT DATEADD(day, -1, MIN(ArrivedDate)) FROM #t)
,@FNum int = 2
,@Anchor int;
UPDATE Q
SET @Anchor = documentID
,@FNum =
CASE
WHEN @FNum = 1 AND ArrivedDate > @FinalDate
THEN 1
WHEN @FNum = 2
THEN 1
ELSE 2
END
,@FinalDate = FinalDate =
CASE
WHEN @FNum = 1 AND ArrivedDate > @FinalDate
THEN ArrivedDate
WHEN @FNum = 1
THEN DATEADD(day, 1, @FinalDate)
ELSE @FinalDate
END
FROM #Quirky Q WITH (TABLOCKX)
OPTION (MAXDOP 1);
SELECT documentID, ArrivedDate, FinalDate
FROM #Quirky;
April 21, 2020 at 2:18 pm
I agree with Ken here that the application side is better suited for this.
Personally, I would refrain from using undocumented features as they can change behavior OR be removed completely without any notice.
If SQL is your only option for generating that data set, I would recommend a loop of some sort (cursor or while loop, which both come with the performance problems) or the undocumented feature of a quirky update (which comes with some risk as well).
scdecade - your example has the same problem as Ken's original one where gaps in the dates throw everything off. Use the sample data in Ken's latest post, your solution produces 3 dates on the 7th.
I was thinking about grouping the dates and then doing something with it based on that, but that feels like it is going to get very messy very quickly as you can't just "reset" the value when the date has a gap as you may have had enough cases the previous day that you can't solve it that way.
My opinion, a better long term solution would be to create another column on the table and update the table with the Final Date from an application as documents are completed. What if 3 documents come in on day 1 and you are sick on day 2, so the 3rd case doesn't get resolved until day 3? OR day 2 is a Saturday so you are not doing anything with the document until day 4?
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
April 21, 2020 at 5:41 pm
Yeah I noticed that too. My next attempt would be to use CROSS APPLY containing IIF function(s) (as many as necessary) and pass in a ton of variables.
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
April 22, 2020 at 3:24 pm
Another approach would be to use recursion to find the sequence breaks:
WITH DocOrder
-- needed as gaps in documentID would mess up the /2 calculation
AS
(
SELECT documentID, ArrivedDate
,CAST(ROW_NUMBER() OVER (ORDER BY documentID) AS int) AS Ord
FROM #t
)
,Breaks
AS
(
SELECT 1 AS documentID
,1 AS Ord
,(SELECT MIN(ArrivedDate) FROM #t) AS MinDate
,1 AS rn
UNION ALL
SELECT D.documentID
,D.Ord - B.Ord + 1 AS Ord
,D.ArrivedDate AS MinDate
,CAST(ROW_NUMBER() OVER (ORDER BY D.documentID) AS int) AS rn
FROM Breaks B
CROSS JOIN DocOrder D
WHERE B.rn = 1
AND D.documentID > B.documentID
AND D.ArrivedDate > DATEADD(day, (D.Ord - B.Ord + 1) / 2 - 1, B.MinDate)
)
,Grps
AS
(
SELECT documentID AS MinId
,COALESCE(LEAD(documentID) OVER (ORDER BY documentID) - 1, 2147483647) AS MaxId
,MinDate
FROM Breaks
WHERE rn = 1
)
,GrpOrd
AS
(
SELECT T.documentID, T.ArrivedDate, G.MinDate
,ROW_NUMBER() OVER (PARTITION BY G.MinId ORDER BY T.documentID) AS Ord
FROM #t T
JOIN Grps G
ON T.documentID BETWEEN G.MinId AND G.MaxId
)
SELECT documentID, ArrivedDate
,DATEADD(day, (Ord - 1) / 2, MinDate) AS FinalDate
FROM GrpOrd
ORDER BY documentID
OPTION (MAXRECURSION 100); -- default, limit = 32767
If there was a large amount of data, or a lot of sequence breaks, then recursion would be slow. It might then be better to materialize some of the CTEs:
CREATE TABLE #DocOrder
(
documentID int NOT NULL PRIMARY KEY
,ArrivedDate date NOT NULL
,Ord int NOT NULL
);
INSERT INTO #DocOrder
SELECT documentID, ArrivedDate
,ROW_NUMBER() OVER (ORDER BY documentID) AS Ord
FROM #t;
CREATE TABLE #Breaks
(
documentID int NOT NULL PRIMARY KEY
,Ord int NOT NULL
,MinDate date NOT NULL
);
INSERT INTO #Breaks
SELECT 1, 1, MIN(ArrivedDate)
FROM #t;
WHILE (1=1)
BEGIN
WITH Breaks
AS
(
SELECT TOP (1) documentID, Ord, MinDate
FROM #Breaks
ORDER BY documentID DESC
)
INSERT INTO #Breaks
SELECT TOP (1)
D.documentID
,D.Ord - B.Ord + 1 AS Ord
,D.ArrivedDate AS MinDate
FROM Breaks B
CROSS JOIN #DocOrder D
WHERE D.documentID > B.documentID
AND D.ArrivedDate > DATEADD(day, (D.Ord - B.Ord + 1) / 2 - 1, B.MinDate)
ORDER BY D.documentID;
IF @@ROWCOUNT = 0
BREAK;
END;
WITH Grps
AS
(
SELECT documentID AS MinId
,COALESCE(LEAD(documentID) OVER (ORDER BY documentID) - 1, 2147483647) AS MaxId
,MinDate
FROM #Breaks
)
,GrpOrd
AS
(
SELECT T.documentID, T.ArrivedDate, G.MinDate
,ROW_NUMBER() OVER (PARTITION BY G.MinId ORDER BY T.documentID) AS Ord
FROM #t T
JOIN Grps G
ON T.documentID BETWEEN G.MinId AND G.MaxId
)
SELECT documentID, ArrivedDate
,DATEADD(day, (Ord - 1) / 2, MinDate) AS FinalDate
FROM GrpOrd
ORDER BY documentID;
April 22, 2020 at 7:54 pm
It's conditional counting. Sql is terrible at this.
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
April 23, 2020 at 5:39 pm
It's conditional counting. Sql is terrible at this.
To add to that,
I agree with Ken here that the application side is better suited for this.
Personally, I would refrain from using undocumented features as they can change behavior OR be removed completely without any notice.
If SQL is your only option for generating that data set, I would recommend a loop of some sort (cursor or while loop, which both come with the performance problems) or the undocumented feature of a quirky update (which comes with some risk as well).
scdecade - your example has the same problem as Ken's original one where gaps in the dates throw everything off. Use the sample data in Ken's latest post, your solution produces 3 dates on the 7th.
I was thinking about grouping the dates and then doing something with it based on that, but that feels like it is going to get very messy very quickly as you can't just "reset" the value when the date has a gap as you may have had enough cases the previous day that you can't solve it that way.
My opinion, a better long term solution would be to create another column on the table and update the table with the Final Date from an application as documents are completed. What if 3 documents come in on day 1 and you are sick on day 2, so the 3rd case doesn't get resolved until day 3? OR day 2 is a Saturday so you are not doing anything with the document until day 4?
Heh... personally, you would use a "loop of some sort"? A "Quirky Update" IS a "loop of some sort" known as a "Pseudo-Cursor". Don't look for that term in MS documentation. They don't have a clue about it. Also, the "Quirky Update" (not under that name, of course) is actually documented in the UPDATE documentation in the form of @variable = column = expression. They do have a warning about not knowing about what values it might use but that's because the people that wrote the documentation don't actually have a clue as to how to properly use it more than one or value at a time. There's also a pretty easy method to check whether it's working correctly or not and for it to report and error if it ever stops working (and it has been working since at least SQL Server 6.5 and also works in SyBase).
I'll also state that even if you DO want to avoid the use of tbe "Quirky Update", there are (since 20012) supported methods like Lead, Lag, Rows Preceding, and Rows Following that can do a lot of what the "Quirky Update" does although it does take a bit more thinking outside the box for things like this. I've not tried it yet but there's probably a decent way to do this using some non-quirky, supported method without resorting to some form of RBAR (and a Recursive CTE for this IS RBAR).
If I get some time, I'll take a crack at this.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 23, 2020 at 7:02 pm
>> I have a table with 2 fields [sic] DocumentID and ArrivedDate. <<
You might want to actually read the standards for SQL. A field is a subset of a column, such as the year within the date. Ignoring that, let's go on with the rest of your posting
>> I am able to elaborate only 2 documents in a day. How can I add a field (Final Date) with the correct days (every day with a max of 2 doc)?<<
No, you're missing a whole concept here. This is a constraint. Constraint should be enforced in the DDL so that you never have to bother with them in the actual data. Also one of the basic things about posting on an SQL forum is we need the DDL, which will include those constraints.
CREATE TABLE Foobar
(foobar_id CHAR(5) NOT NULL PRIMARY KEY,
document_id CHAR(10) NOT NULL
REFERENCES Documents(document_id),
posting_date DATE NOT NULL,
posting_nbr INTEGER NOT NULL
CHECK (posting_nbr IN (1,2)),
PRIMARY KEY (posting_date, document_id, posting_nbr),
..);
..
Please post DDL and follow ANSI/ISO standards when asking for help.
April 23, 2020 at 8:55 pm
>> I have a table with 2 fields [sic] DocumentID and ArrivedDate. <<
You might want to actually read the standards for SQL. A field is a subset of a column, such as the year within the date. Ignoring that, let's go on with the rest of your posting
Totally agreed with all of that. Unfortunately, MS documentation is riddled with the same mistake.
>> I am able to elaborate only 2 documents in a day. How can I add a field (Final Date) with the correct days (every day with a max of 2 doc)?<<
No, you're missing a whole concept here. This is a constraint. Constraint should be enforced in the DDL so that you never have to bother with them in the actual data. Also one of the basic things about posting on an SQL forum is we need the DDL, which will include those constraints.
Not quite correct, Joe. In this case, it doesn't have a thing to do with constraints. The OP receives the data on certain days, regardless of number of rows, and has been required to spread out anything more than two rows per day in a FIFO manner across all the days (even some missing days) going forward.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply