April 5, 2009 at 7:09 am
john.arnott (4/3/2009)
Chris Morris (4/3/2009)
GSquared (4/3/2009)
Anyone else have any ideas on this?There were two other ideas which I've not had time to follow up yet. The first was to derive an average slope for each row, as value over column number, then calculate the missing col1 value from any values on the same row to permit ordering by col1 alone.......
Clever, Chris. You couldn't use col1 alone, actually. For collisions on col1, you'd want to use the slope as a secondary order key. Only thing left, perhaps, would be to examine the standard deviation of the domain values as a quantifiable indication of confidence in the resulting row order.
You're right John, thanks for pointing this out. I went for the easier option 😉
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
April 6, 2009 at 3:23 am
Assuming that the zero values really are nulls in the database then I suggest the following:-
select
coalesce( col1, 0 ) col1,
coalesce( col2, 0 ) col2,
coalesce( col3, 0 ) col3
from data
order by
coalesce( col1, col2, col3, 0 ),
coalesce( col2, col3, 0),
coalesce( col3, 0 )
April 6, 2009 at 7:22 am
Chris, on the case-statement version, running on the same hardware as the prior tests, took 8:59 (nearly 9 minutes) the first run, 8:54 the second.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
April 6, 2009 at 7:51 am
GSquared (4/6/2009)
Chris, on the case-statement version, running on the same hardware as the prior tests, took 8:59 (nearly 9 minutes) the first run, 8:54 the second.
If that was exactly as posted, then it was 39,968 rows - it takes five or six minutes on this lappy.
(39968 row(s) affected) / 00:05:07 (00:06:30 without clustered index)
Changing the upper bound of the tally number filter back to your original value will do the trick:
where number between 10 and 1000
The whole batch including creating the sample data runs in 00:00:04 on this lappy. I've got another method which runs slightly slower than this but it's even more ugly, man 😉
If I get time later I'll run Chris B's, yours and my solutions on here for comparison. I'm sure there's a fourth solution out there which would combine the elegance of yours and Chris B's solutions with the benefit of a little more performance.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
April 6, 2009 at 8:05 am
Sweet. This has certainly been an interesting problem to follow.
Anyone heard from the OP lately?
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
April 6, 2009 at 8:39 am
Bob Hovious (4/6/2009)
Sweet. This has certainly been an interesting problem to follow.Anyone heard from the OP lately?
Nope. Bet he's busy with his nose in a book :hehe:
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
April 6, 2009 at 9:04 am
Bob Hovious (4/6/2009)
This has certainly been an interesting problem to follow.
Hey Bob, I'm still messing about with derivatives of this - have you got a "final version" that you'd like to have included? I'd be interested and grateful.
Cheers
ChrisM
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
April 6, 2009 at 9:12 am
No, I haven't looked at this since I realized that my shift-left version was not the ultimate answer. You guys had already made considerable progress, so I've just been sitting on the bench rooting from the sidelines. However it still keeps tickling the back of my brain. If an elegant solution should suddenly occur to me, I'll post it up here.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
April 6, 2009 at 9:44 am
Hi
This is very intresting question, but kind of list in somany replies...could any please tell me the solution/algorithm.
April 6, 2009 at 10:25 am
Christian Buettner (4/4/2009)
jsanborn (4/2/2009)
Since this was posted for a community of database professionals, I didn't think a great deal of detailed explanation was needed just like I didn't think it necessary to explain what a database or a table is.That's one thing thats been preying on my mind for a while now:
Yes, this is a forum for database professionals. But I hope you have realized, that your "problem" actually is not a database problem, nor a relational problem. Ordering rows is far from relational. And the fact that SQL allows you to order rows doesn't make it a database problem. I would rather put it into application development, since it more reflects array sorting.
The problem you posted is not easy to solve in SQL. You want to have columns sorted individually, but the sorting in each column is non-deterministic because of the nulls. Therefore you can only "compare" the full rows against each other. Unfortunately(really?), ORDER BY in SQL doesn't allow you to compare each two rows indivdually (meaning here: apply different column comparison for each pair of rows, depending on the horizontal position of nulls), therefore you have to somehow "rank" the full rows first (like I did with the COUNT) and then use the rank to do the sorting.
If I have not overlooked something, we still do not know what your intention was regarding this question or whether it was actually solved. It would be nice if you could shed some light here.
I agree absolutely that if this is a "real" problem that the original poster has, he/she should solve it with a tool more appropriate to the situation. SQL is very powerful and, as demonstrated by the many alternative ideas presented here, could probably be used to provide results in the order specified. But then, of course, something else would be done for the ordered data to provide any value. Graphs? Some sort of statistical analysis? Who knows? The ordering would probably best be left to an application outside of SQL.
If this were a forum of bicycling enthusiasts, we may have threads on subjects like derailleur design, tire inflation for different conditions, or whatever. If someone posted a question, "How do I get a load of lumber home?", there may be several solutions proffered -- attach a trailer to your bike, arrange slings on either side of the bike to carry balanced loads, etc. The best answer would actually be "have the lumber yard deliver, or obtain (borrow or rent) a truck" .
April 6, 2009 at 10:37 am
I agree. I think an application solution would be best. It looks like any possible SQL solution is going to be very cumbersome. Thanks all.
April 6, 2009 at 11:43 am
jsanborn (4/6/2009)
I agree. I think an application solution would be best. It looks like any possible SQL solution is going to be very cumbersome. Thanks all.
As they say here "it depends". The solutions provided by ChrisB and GSquared are elegant and compact, certainly not cumbersome, but are not particularly performant. Tailoring the solution more tightly to actual data would provide considerable scope for performance enhancement but at the expense of simplicity.
More than one solution has been proposed and proven and one or two additional possibilites have been discussed - if this was a challenge, as some have suggested, then the challenge has been met.
It's not necessarily the end of the thread when the OP bows out. Members occasionally retain an interest. For those few who do, here are some timings from three or four variants on successful solutions, including those provided by ChrisB and GSquared:
drop table #test
create table #test (
ID int identity, Col1 float, Col2 float, Col3 float, Col4 float, Col5 float, Col6 float, Col7 float, Col8 float);
--
insert into #test (Col1, Col2, Col3, Col4, Col5, Col6, Col7, Col8)
select 2.35,3.01,3.49,4.25,4.79,5.36,5.82,6.31 union all
select NULL,3.59,4.32,NULL,5.21,NULL,NULL,7.12 union all
select NULL,NULL,NULL,5.23,5.41,6.33,6.89,7.99 union all
select 4.12,4.66,5.31,5.92,NULL,6.87,7.34,NULL;
--
insert into #test (Col1, Col2, Col3, Col4, Col5, Col6, Col7, Col8)
select Col1+number, Col2+number, Col3+number, Col4+number, Col5+number, Col6+number, Col7+number, Col8+number
from dbo.Numbers
cross join #test
where number between 10 and 10000;
--
CREATE CLUSTERED INDEX [seqm] ON #test
([col1] ASC
)WITH (PAD_INDEX = OFF) ON [PRIMARY]
GO
-- GSquared final version =====================
declare @max-2 float;
select @max-2 = 1.79E+308;
;with CTE (CTEID, Seq1) as
(select t1.id, count(*)
from #test t1
inner join #test t2
on isnull(t1.col1,@Max) > isnull(t2.col1,0)
and isnull(t1.col2,@Max) > isnull(t2.col2,0)
and isnull(t1.col3,@Max) > isnull(t2.col3,0)
and isnull(t1.col4,@Max) > isnull(t2.col4,0)
and isnull(t1.col5,@Max) > isnull(t2.col5,0)
and isnull(t1.col6,@Max) > isnull(t2.col6,0)
and isnull(t1.col7,@Max) > isnull(t2.col7,0)
and isnull(t1.col8,@Max) > isnull(t2.col8,0)
group by t1.id)
select #test.*
from #test
left outer join CTE
on id = cteid
order by seq1;
-- (39968 row(s) affected) / 00:13:19
-- (3968 row(s) affected) / 00:00:13
--==============================================
-- ChrisB
declare @max-2 float;
select @max-2 = 1.79E+308;
SELECT *
, (SELECT COUNT(*) FROM #test
WHERE ISNULL(col1,0) <= ISNULL(al.col1,@Max)
AND ISNULL(col2,0) <= ISNULL(al.col2,@Max)
AND ISNULL(col3,0) <= ISNULL(al.col3,@Max)
AND ISNULL(col4,0) <= ISNULL(al.col4,@Max)
AND ISNULL(col5,0) <= ISNULL(al.col5,@Max)
AND ISNULL(col6,0) <= ISNULL(al.col6,@Max)
AND ISNULL(col7,0) <= ISNULL(al.col7,@Max)
AND ISNULL(col8,0) <= ISNULL(al.col8,@Max)) AS Ordr
FROM #test al
ORDER BY Ordr;
-- (39968 row(s) affected) / 00:11:36
-- (3968 row(s) affected) / 00:00:13
--======================================
-- ChrisM version 1
SELECT c.*
FROM #test c
ORDER BY
CASE WHEN NOT (c.col1 IS NULL) THEN c.col1 ELSE
(SELECT MAX(col1) FROM #test WHERE (col1 is NOT null) AND (col2 < c.col2 OR col3 < c.col3 OR col4 < c.col4 OR col5 < c.col5 OR col6 < c.col6 OR col7 < c.col7 OR col8 < c.col8)) END,
CASE WHEN NOT (c.col1 IS NULL AND c.col2 IS NULL) THEN ISNULL(c.col2, 0) ELSE
(SELECT MAX(col2) FROM #test WHERE (col2 is NOT null) AND (col1 < c.col1 OR col3 < c.col3 OR col4 < c.col4 OR col5 < c.col5 OR col6 < c.col6 OR col7 < c.col7 OR col8 < c.col8)) END,
CASE WHEN NOT (c.col1 IS NULL AND c.col2 IS NULL AND c.col3 IS NULL) THEN ISNULL(c.col3, 0) ELSE
(SELECT MAX(col3) FROM #test WHERE (col3 is NOT null) AND (col1 < c.col1 OR col2 < c.col2 OR col4 < c.col4 OR col5 < c.col5 OR col6 < c.col6 OR col7 < c.col7 OR col8 < c.col8)) END,
CASE WHEN NOT (c.col1 IS NULL AND c.col2 IS NULL AND c.col3 IS NULL AND c.col4 IS NULL) THEN ISNULL(c.col4, 0) ELSE
(SELECT MAX(col4) FROM #test WHERE (col4 is NOT null) AND (col1 < c.col1 OR col2 < c.col2 OR col3 < c.col3 OR col5 < c.col5 OR col6 < c.col6 OR col7 < c.col7 OR col8 < c.col8)) END,
CASE WHEN NOT (c.col1 IS NULL AND c.col2 IS NULL AND c.col3 IS NULL AND c.col4 IS NULL AND c.col5 IS NULL) THEN ISNULL(c.col5, 0) ELSE
(SELECT MAX(col5) FROM #test WHERE (col5 is NOT null) AND (col1 < c.col1 OR col2 < c.col2 OR col3 < c.col3 OR col4 < c.col4 OR col6 < c.col6 OR col7 < c.col7 OR col8 < c.col8)) END,
CASE WHEN NOT (c.col1 IS NULL AND c.col2 IS NULL AND c.col3 IS NULL AND c.col4 IS NULL AND c.col5 IS NULL AND c.col6 IS NULL) THEN ISNULL(c.col6, 0) ELSE
(SELECT MAX(col6) FROM #test WHERE (col6 is NOT null) AND (col1 < c.col1 OR col2 < c.col2 OR col3 < c.col3 OR col4 < c.col4 OR col5 < c.col5 OR col7 < c.col7 OR col8 < c.col8)) END,
CASE WHEN NOT (c.col1 IS NULL AND c.col2 IS NULL AND c.col3 IS NULL AND c.col4 IS NULL AND c.col5 IS NULL AND c.col6 IS NULL AND c.col7 IS NULL) THEN ISNULL(c.col7, 0) ELSE
(SELECT MAX(col7) FROM #test WHERE (col7 is NOT null) AND (col1 < c.col1 OR col2 < c.col2 OR col3 < c.col3 OR col4 < c.col4 OR col5 < c.col5 OR col6 < c.col6 OR col8 < c.col8)) END,
CASE WHEN NOT (c.col1 IS NULL AND c.col2 IS NULL AND c.col3 IS NULL AND c.col4 IS NULL AND c.col5 IS NULL AND c.col6 IS NULL AND c.col7 IS NULL AND c.col8 IS NULL) THEN ISNULL(c.col8, 0) ELSE
(SELECT MAX(col8) FROM #test WHERE (col8 is NOT null) AND (col1 < c.col1 OR col2 < c.col2 OR col3 < c.col3 OR col4 < c.col4 OR col5 < c.col5 OR col6 < c.col6 OR col7 < c.col7)) END
-- (39968 row(s) affected) / 00:05:02
-- (3968 row(s) affected) / 00:00:03
--======================================
-- ChrisM version 2
SELECT Seq, ID, Col1, Col2, Col3, Col4, Col5, Col6, Col7, Col8 FROM (
SELECT STR(col1,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2) AS Seq, *
FROM #test t
WHERE col1 IS NOT NULL
UNION ALL
SELECT (SELECT STR(MAX(col1),8,2) FROM #test WHERE (col2 <= t.col2 OR col3 <= t.col3 OR col4 <= t.col4 OR col5 <= t.col5 OR col6 <= t.col6 OR col7 <= t.col7 OR col8 <= t.col8)) +
STR(col2,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2), *
FROM #test t
WHERE col1 IS NULL AND col2 IS NOT null
UNION ALL
SELECT (SELECT STR(MAX(col1),8,2) + STR(MAX(col2),8,2) FROM #test WHERE (col3 <= t.col3 OR col4 <= t.col4 OR col5 <= t.col5 OR col6 <= t.col6 OR col7 <= t.col7 OR col8 <= t.col8)) +
STR(col3,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2), *
FROM #test t
WHERE COALESCE(col1, col2) IS NULL AND col3 IS NOT null
UNION ALL
SELECT (SELECT STR(MAX(col1),8,2) + STR(MAX(col2),8,2) + STR(MAX(col3),8,2) FROM #test WHERE (col4 <= t.col4 OR col5 <= t.col5 OR col6 <= t.col6 OR col7 <= t.col7 OR col8 <= t.col8)) +
STR(col4,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2), *
FROM #test t
WHERE COALESCE(col1, col2, col3) IS NULL AND col4 is NOT NULL
UNION ALL
SELECT (SELECT STR(MAX(col1),8,2) + STR(MAX(col2),8,2) + STR(MAX(col3),8,2) + STR(MAX(col4),8,2) FROM #test WHERE (col5 <= t.col5 OR col6 <= t.col6 OR col7 <= t.col7 OR col8 <= t.col8)) +
STR(col5,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2), *
FROM #test t
WHERE COALESCE(col1, col2, col3, col4) IS NULL AND col5 IS NOT NULL
UNION ALL
SELECT (SELECT STR(MAX(col1),8,2) + STR(MAX(col2),8,2) + STR(MAX(col3),8,2) + STR(MAX(col4),8,2) + STR(MAX(col5),8,2) FROM #test WHERE (col6 <= t.col6 OR col7 <= t.col7 OR col8 <= t.col8)) +
STR(col6,8,2)+STR(0,8,2)+STR(0,8,2), *
FROM #test t
WHERE COALESCE(col1, col2, col3, col4, col5) IS NULL AND col6 is NOT null
UNION ALL
SELECT (SELECT STR(MAX(col1),8,2) + STR(MAX(col2),8,2) + STR(MAX(col3),8,2) + STR(MAX(col4),8,2) + STR(MAX(col5),8,2) + STR(MAX(col6),8,2) FROM #test WHERE (col7 <= t.col7 OR col8 <= t.col8)) +
STR(col7,8,2)+STR(0,8,2), *
FROM #test t
WHERE COALESCE(col1, col2, col3, col4, col5, col6) IS NULL AND col7 is NOT null -- 00:00:00 (no rows returned)
UNION ALL
SELECT (SELECT STR(MAX(col1),8,2) + STR(MAX(col2),8,2) + STR(MAX(col3),8,2) + STR(MAX(col4),8,2) + STR(MAX(col5),8,2) + STR(MAX(col6),8,2) + STR(MAX(col7),8,2) FROM #test WHERE (col8 <= t.col8)) +
STR(col8,8,2), *
FROM #test t
WHERE COALESCE(col1, col2, col3, col4, col5, col6, col7) IS NULL AND col8 is NOT null
) d
ORDER BY Seq
-- (39968 row(s) affected) / 00:02:26
-- (3968 row(s) affected) / 00:00:03
--======================================
-- ChrisM version 3 ("condensed" version 2)
SELECT ID, Col1, Col2, Col3, Col4, Col5, Col6, Col7, Col8
FROM (SELECT *, STR(col1,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2) AS Seq
FROM #test t
WHERE t.col1 IS NOT NULL
UNION ALL
SELECT *, (SELECT STR(MAX(col1),8,2) + STR(MAX(col2),8,2) + STR(MAX(col3),8,2) + STR(MAX(col4),8,2) + STR(MAX(col5),8,2) + STR(MAX(col6),8,2) + STR(MAX(col7),8,2) + STR(MAX(col8),8,2)
FROM #test
WHERE (col2 <= t.col2 OR col3 <= t.col3 OR col4 <= t.col4 OR col5 <= t.col5 OR col6 <= t.col6 OR col7 <= t.col7 OR col8 <= t.col8))
FROM #test t
WHERE t.col1 IS NULL
) d
ORDER BY Seq
-- (39968 row(s) affected) / 00:06:11
-- (3968 row(s) affected) / 00:00:08
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
April 6, 2009 at 3:44 pm
If you do this:
update #test
set col1 = isnull(col1,0),
col2 = isnull(col2,0),
col3 = isnull(col3,0),
col4 = isnull(col4,0),
col5 = isnull(col5,0),
col6 = isnull(col6,0),
col7 = isnull(col7,0),
col8 = isnull(col8,0)
;with CTE (CTEID, Seq1) as
(select t1.id, count(*)
from #test t1
inner join #test t2
on t1.col1 > t2.col1
and t1.col2 > t2.col2
and t1.col3 > t2.col3
and t1.col4 > t2.col4
and t1.col5 > t2.col5
and t1.col6 > t2.col6
and t1.col7 > t2.col7
and t1.col8 > t2.col8
group by t1.id)
select #test.*
from #test
left outer join CTE
on id = cteid
order by seq1;
It takes under 1 second, for the same dataset. That's the real solution.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
April 7, 2009 at 12:59 am
GSquared (4/6/2009)
If you do this:
update #test
set col1 = isnull(col1,0),
col2 = isnull(col2,0),
col3 = isnull(col3,0),
col4 = isnull(col4,0),
col5 = isnull(col5,0),
col6 = isnull(col6,0),
col7 = isnull(col7,0),
col8 = isnull(col8,0)
;with CTE (CTEID, Seq1) as
(select t1.id, count(*)
from #test t1
inner join #test t2
on t1.col1 > t2.col1
and t1.col2 > t2.col2
and t1.col3 > t2.col3
and t1.col4 > t2.col4
and t1.col5 > t2.col5
and t1.col6 > t2.col6
and t1.col7 > t2.col7
and t1.col8 > t2.col8
group by t1.id)
select #test.*
from #test
left outer join CTE
on id = cteid
order by seq1;
It takes under 1 second, for the same dataset. That's the real solution.
Hi GSquared,
I cannot get your solution to work.
If I allow 10000 rows (from the pivot table), the query runs forever.
If I allow only 1000 rows, I get incorrect results.
I also doubt that the code reflects what needs to be done.
By setting the NULL columns to 0, you remove their "sorting ambivalence", meaning they could sort before or after specific numbers in the same column, depending on other columns that are not null.
Maybe I am missing something, if so, I apologize.
Best Regards,
Chris Büttner
April 7, 2009 at 2:37 am
Hi Chris,
Very fancy stuff that you posted. Wasn't even aware you could do such fancy stuff in an ORDER BY:-).
Actually I did want to tell you that your queries did not work, but then I realized that mine did not either. Reason: The data in table #test is flawed.
SELECT A.id, b.id, A.col2, B.col2, a.col3, B.col3 FROM #test A, #test b
WHERE A.col2 > b.col2 AND a.col3 < B.col3
No matter how you sort, either col2 or col3 will remain unsorted (could apply to other combinations as well).
Best Regards,
Chris Büttner
Viewing 15 posts - 76 through 90 (of 180 total)
You must be logged in to reply to this topic. Login to reply