Multi Column Sort

Question

Multi Column Sort

Viewing 15 posts - 76 through 90 (of 181 total)

You must be logged in to reply to this topic. Login to reply

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 1

john.arnott (4/3/2009)
Chris Morris (4/3/2009)
GSquared (4/3/2009)
Anyone else have any ideas on this?
There were two other ideas which I've not had time to follow up yet. The first was to derive an average slope for each row, as value over column number, then calculate the missing col1 value from any values on the same row to permit ordering by col1 alone.......
Clever, Chris. You couldn't use col1 alone, actually. For collisions on col1, you'd want to use the slope as a secondary order key. Only thing left, perhaps, would be to examine the standard deviation of the domain values as a quantifiable indication of confidence in the resulting row order.

You're right John, thanks for pointing this out. I went for the easier option 😉

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Jim Turner Newbie Points: 9 More actions · Answer 2

Assuming that the zero values really are nulls in the database then I suggest the following:-

select

coalesce( col1, 0 ) col1,

coalesce( col2, 0 ) col2,

coalesce( col3, 0 ) col3

from data

order by

coalesce( col1, col2, col3, 0 ),

coalesce( col2, col3, 0),

coalesce( col3, 0 )

GSquared SSC Guru Points: 260824 More actions · Answer 3

Chris, on the case-statement version, running on the same hardware as the prior tests, took 8:59 (nearly 9 minutes) the first run, 8:54 the second.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 4

GSquared (4/6/2009)
Chris, on the case-statement version, running on the same hardware as the prior tests, took 8:59 (nearly 9 minutes) the first run, 8:54 the second.

If that was exactly as posted, then it was 39,968 rows - it takes five or six minutes on this lappy.

(39968 row(s) affected) / 00:05:07 (00:06:30 without clustered index)

Changing the upper bound of the tally number filter back to your original value will do the trick:

where number between 10 and 1000

The whole batch including creating the sample data runs in 00:00:04 on this lappy. I've got another method which runs slightly slower than this but it's even more ugly, man 😉

If I get time later I'll run Chris B's, yours and my solutions on here for comparison. I'm sure there's a fourth solution out there which would combine the elegance of yours and Chris B's solutions with the benefit of a little more performance.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

The Dixie Flatline SSC Guru Points: 53354 More actions · Answer 5

Sweet. This has certainly been an interesting problem to follow.

Anyone heard from the OP lately?

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 6

Bob Hovious (4/6/2009)
Sweet. This has certainly been an interesting problem to follow.
Anyone heard from the OP lately?

Nope. Bet he's busy with his nose in a book :hehe:

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 7

Bob Hovious (4/6/2009)
This has certainly been an interesting problem to follow.

Hey Bob, I'm still messing about with derivatives of this - have you got a "final version" that you'd like to have included? I'd be interested and grateful.

Cheers

ChrisM

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

The Dixie Flatline SSC Guru Points: 53354 More actions · Answer 8

No, I haven't looked at this since I realized that my shift-left version was not the ultimate answer. You guys had already made considerable progress, so I've just been sitting on the bench rooting from the sidelines. However it still keeps tickling the back of my brain. If an elegant solution should suddenly occur to me, I'll post it up here.

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

Vijaya Kadiyala SSCrazy Eights Points: 9149 More actions · Answer 9

Hi

This is very intresting question, but kind of list in somany replies...could any please tell me the solution/algorithm.

Thanks -- Vijaya Kadiyala
www.dotnetvj.com
SQL Server Articles For Beginers

john.arnott SSChampion Points: 11882 More actions · Answer 10

Christian Buettner (4/4/2009)
jsanborn (4/2/2009)
Since this was posted for a community of database professionals, I didn't think a great deal of detailed explanation was needed just like I didn't think it necessary to explain what a database or a table is.
That's one thing thats been preying on my mind for a while now:
Yes, this is a forum for database professionals. But I hope you have realized, that your "problem" actually is not a database problem, nor a relational problem. Ordering rows is far from relational. And the fact that SQL allows you to order rows doesn't make it a database problem. I would rather put it into application development, since it more reflects array sorting.
The problem you posted is not easy to solve in SQL. You want to have columns sorted individually, but the sorting in each column is non-deterministic because of the nulls. Therefore you can only "compare" the full rows against each other. Unfortunately(really?), ORDER BY in SQL doesn't allow you to compare each two rows indivdually (meaning here: apply different column comparison for each pair of rows, depending on the horizontal position of nulls), therefore you have to somehow "rank" the full rows first (like I did with the COUNT) and then use the rank to do the sorting.
If I have not overlooked something, we still do not know what your intention was regarding this question or whether it was actually solved. It would be nice if you could shed some light here.

I agree absolutely that if this is a "real" problem that the original poster has, he/she should solve it with a tool more appropriate to the situation. SQL is very powerful and, as demonstrated by the many alternative ideas presented here, could probably be used to provide results in the order specified. But then, of course, something else would be done for the ordered data to provide any value. Graphs? Some sort of statistical analysis? Who knows? The ordering would probably best be left to an application outside of SQL.

If this were a forum of bicycling enthusiasts, we may have threads on subjects like derailleur design, tire inflation for different conditions, or whatever. If someone posted a question, "How do I get a load of lumber home?", there may be several solutions proffered -- attach a trailer to your bike, arrange slings on either side of the bike to carry balanced loads, etc. The best answer would actually be "have the lumber yard deliver, or obtain (borrow or rent) a truck" .

jsanborn Old Hand Points: 348 More actions · Answer 11

I agree. I think an application solution would be best. It looks like any possible SQL solution is going to be very cumbersome. Thanks all.

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 12

jsanborn (4/6/2009)
I agree. I think an application solution would be best. It looks like any possible SQL solution is going to be very cumbersome. Thanks all.

As they say here "it depends". The solutions provided by ChrisB and GSquared are elegant and compact, certainly not cumbersome, but are not particularly performant. Tailoring the solution more tightly to actual data would provide considerable scope for performance enhancement but at the expense of simplicity.

More than one solution has been proposed and proven and one or two additional possibilites have been discussed - if this was a challenge, as some have suggested, then the challenge has been met.

It's not necessarily the end of the thread when the OP bows out. Members occasionally retain an interest. For those few who do, here are some timings from three or four variants on successful solutions, including those provided by ChrisB and GSquared:

drop table #test

create table #test (

ID int identity, Col1 float, Col2 float, Col3 float, Col4 float, Col5 float, Col6 float, Col7 float, Col8 float);

--

insert into #test (Col1, Col2, Col3, Col4, Col5, Col6, Col7, Col8)

select 2.35,3.01,3.49,4.25,4.79,5.36,5.82,6.31 union all

select NULL,3.59,4.32,NULL,5.21,NULL,NULL,7.12 union all

select NULL,NULL,NULL,5.23,5.41,6.33,6.89,7.99 union all

select 4.12,4.66,5.31,5.92,NULL,6.87,7.34,NULL;

--

insert into #test (Col1, Col2, Col3, Col4, Col5, Col6, Col7, Col8)

select Col1+number, Col2+number, Col3+number, Col4+number, Col5+number, Col6+number, Col7+number, Col8+number

from dbo.Numbers

cross join #test

where number between 10 and 10000;

--

CREATE CLUSTERED INDEX [seqm] ON #test

([col1] ASC

)WITH (PAD_INDEX = OFF) ON [PRIMARY]

GO

-- GSquared final version =====================

declare @Max float;

select @Max = 1.79E+308;

;with CTE (CTEID, Seq1) as

(select t1.id, count(*)

from #test t1

inner join #test t2

on isnull(t1.col1,@Max) > isnull(t2.col1,0)

and isnull(t1.col2,@Max) > isnull(t2.col2,0)

and isnull(t1.col3,@Max) > isnull(t2.col3,0)

and isnull(t1.col4,@Max) > isnull(t2.col4,0)

and isnull(t1.col5,@Max) > isnull(t2.col5,0)

and isnull(t1.col6,@Max) > isnull(t2.col6,0)

and isnull(t1.col7,@Max) > isnull(t2.col7,0)

and isnull(t1.col8,@Max) > isnull(t2.col8,0)

group by t1.id)

select #test.*

from #test

left outer join CTE

on id = cteid

order by seq1;

-- (39968 row(s) affected) / 00:13:19

-- (3968 row(s) affected) / 00:00:13

--==============================================

-- ChrisB

declare @Max float;

select @Max = 1.79E+308;

SELECT *

, (SELECT COUNT(*) FROM #test

WHERE ISNULL(col1,0) <= ISNULL(al.col1,@Max)

AND ISNULL(col2,0) <= ISNULL(al.col2,@Max)

AND ISNULL(col3,0) <= ISNULL(al.col3,@Max)

AND ISNULL(col4,0) <= ISNULL(al.col4,@Max)

AND ISNULL(col5,0) <= ISNULL(al.col5,@Max)

AND ISNULL(col6,0) <= ISNULL(al.col6,@Max)

AND ISNULL(col7,0) <= ISNULL(al.col7,@Max)

AND ISNULL(col8,0) <= ISNULL(al.col8,@Max)) AS Ordr

FROM #test al

ORDER BY Ordr;

-- (39968 row(s) affected) / 00:11:36

-- (3968 row(s) affected) / 00:00:13

--======================================

-- ChrisM version 1

SELECT c.*

FROM #test c

ORDER BY

CASE WHEN NOT (c.col1 IS NULL) THEN c.col1 ELSE

(SELECT MAX(col1) FROM #test WHERE (col1 is NOT null) AND (col2 < c.col2 OR col3 < c.col3 OR col4 < c.col4 OR col5 < c.col5 OR col6 < c.col6 OR col7 < c.col7 OR col8 < c.col8)) END,

CASE WHEN NOT (c.col1 IS NULL AND c.col2 IS NULL) THEN ISNULL(c.col2, 0) ELSE

(SELECT MAX(col2) FROM #test WHERE (col2 is NOT null) AND (col1 < c.col1 OR col3 < c.col3 OR col4 < c.col4 OR col5 < c.col5 OR col6 < c.col6 OR col7 < c.col7 OR col8 < c.col8)) END,

CASE WHEN NOT (c.col1 IS NULL AND c.col2 IS NULL AND c.col3 IS NULL) THEN ISNULL(c.col3, 0) ELSE

(SELECT MAX(col3) FROM #test WHERE (col3 is NOT null) AND (col1 < c.col1 OR col2 < c.col2 OR col4 < c.col4 OR col5 < c.col5 OR col6 < c.col6 OR col7 < c.col7 OR col8 < c.col8)) END,

CASE WHEN NOT (c.col1 IS NULL AND c.col2 IS NULL AND c.col3 IS NULL AND c.col4 IS NULL) THEN ISNULL(c.col4, 0) ELSE

(SELECT MAX(col4) FROM #test WHERE (col4 is NOT null) AND (col1 < c.col1 OR col2 < c.col2 OR col3 < c.col3 OR col5 < c.col5 OR col6 < c.col6 OR col7 < c.col7 OR col8 < c.col8)) END,

CASE WHEN NOT (c.col1 IS NULL AND c.col2 IS NULL AND c.col3 IS NULL AND c.col4 IS NULL AND c.col5 IS NULL) THEN ISNULL(c.col5, 0) ELSE

(SELECT MAX(col5) FROM #test WHERE (col5 is NOT null) AND (col1 < c.col1 OR col2 < c.col2 OR col3 < c.col3 OR col4 < c.col4 OR col6 < c.col6 OR col7 < c.col7 OR col8 < c.col8)) END,

CASE WHEN NOT (c.col1 IS NULL AND c.col2 IS NULL AND c.col3 IS NULL AND c.col4 IS NULL AND c.col5 IS NULL AND c.col6 IS NULL) THEN ISNULL(c.col6, 0) ELSE

(SELECT MAX(col6) FROM #test WHERE (col6 is NOT null) AND (col1 < c.col1 OR col2 < c.col2 OR col3 < c.col3 OR col4 < c.col4 OR col5 < c.col5 OR col7 < c.col7 OR col8 < c.col8)) END,

CASE WHEN NOT (c.col1 IS NULL AND c.col2 IS NULL AND c.col3 IS NULL AND c.col4 IS NULL AND c.col5 IS NULL AND c.col6 IS NULL AND c.col7 IS NULL) THEN ISNULL(c.col7, 0) ELSE

(SELECT MAX(col7) FROM #test WHERE (col7 is NOT null) AND (col1 < c.col1 OR col2 < c.col2 OR col3 < c.col3 OR col4 < c.col4 OR col5 < c.col5 OR col6 < c.col6 OR col8 < c.col8)) END,

CASE WHEN NOT (c.col1 IS NULL AND c.col2 IS NULL AND c.col3 IS NULL AND c.col4 IS NULL AND c.col5 IS NULL AND c.col6 IS NULL AND c.col7 IS NULL AND c.col8 IS NULL) THEN ISNULL(c.col8, 0) ELSE

(SELECT MAX(col8) FROM #test WHERE (col8 is NOT null) AND (col1 < c.col1 OR col2 < c.col2 OR col3 < c.col3 OR col4 < c.col4 OR col5 < c.col5 OR col6 < c.col6 OR col7 < c.col7)) END

-- (39968 row(s) affected) / 00:05:02

-- (3968 row(s) affected) / 00:00:03

--======================================

-- ChrisM version 2

SELECT Seq, ID, Col1, Col2, Col3, Col4, Col5, Col6, Col7, Col8 FROM (

SELECT STR(col1,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2) AS Seq, *

FROM #test t

WHERE col1 IS NOT NULL

UNION ALL

SELECT (SELECT STR(MAX(col1),8,2) FROM #test WHERE (col2 <= t.col2 OR col3 <= t.col3 OR col4 <= t.col4 OR col5 <= t.col5 OR col6 <= t.col6 OR col7 <= t.col7 OR col8 <= t.col8)) +

STR(col2,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2), *

FROM #test t

WHERE col1 IS NULL AND col2 IS NOT null

UNION ALL

SELECT (SELECT STR(MAX(col1),8,2) + STR(MAX(col2),8,2) FROM #test WHERE (col3 <= t.col3 OR col4 <= t.col4 OR col5 <= t.col5 OR col6 <= t.col6 OR col7 <= t.col7 OR col8 <= t.col8)) +

STR(col3,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2), *

FROM #test t

WHERE COALESCE(col1, col2) IS NULL AND col3 IS NOT null

UNION ALL

SELECT (SELECT STR(MAX(col1),8,2) + STR(MAX(col2),8,2) + STR(MAX(col3),8,2) FROM #test WHERE (col4 <= t.col4 OR col5 <= t.col5 OR col6 <= t.col6 OR col7 <= t.col7 OR col8 <= t.col8)) +

STR(col4,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2), *

FROM #test t

WHERE COALESCE(col1, col2, col3) IS NULL AND col4 is NOT NULL

UNION ALL

SELECT (SELECT STR(MAX(col1),8,2) + STR(MAX(col2),8,2) + STR(MAX(col3),8,2) + STR(MAX(col4),8,2) FROM #test WHERE (col5 <= t.col5 OR col6 <= t.col6 OR col7 <= t.col7 OR col8 <= t.col8)) +

STR(col5,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2), *

FROM #test t

WHERE COALESCE(col1, col2, col3, col4) IS NULL AND col5 IS NOT NULL

UNION ALL

SELECT (SELECT STR(MAX(col1),8,2) + STR(MAX(col2),8,2) + STR(MAX(col3),8,2) + STR(MAX(col4),8,2) + STR(MAX(col5),8,2) FROM #test WHERE (col6 <= t.col6 OR col7 <= t.col7 OR col8 <= t.col8)) +

STR(col6,8,2)+STR(0,8,2)+STR(0,8,2), *

FROM #test t

WHERE COALESCE(col1, col2, col3, col4, col5) IS NULL AND col6 is NOT null

UNION ALL

SELECT (SELECT STR(MAX(col1),8,2) + STR(MAX(col2),8,2) + STR(MAX(col3),8,2) + STR(MAX(col4),8,2) + STR(MAX(col5),8,2) + STR(MAX(col6),8,2) FROM #test WHERE (col7 <= t.col7 OR col8 <= t.col8)) +

STR(col7,8,2)+STR(0,8,2), *

FROM #test t

WHERE COALESCE(col1, col2, col3, col4, col5, col6) IS NULL AND col7 is NOT null -- 00:00:00 (no rows returned)

UNION ALL

SELECT (SELECT STR(MAX(col1),8,2) + STR(MAX(col2),8,2) + STR(MAX(col3),8,2) + STR(MAX(col4),8,2) + STR(MAX(col5),8,2) + STR(MAX(col6),8,2) + STR(MAX(col7),8,2) FROM #test WHERE (col8 <= t.col8)) +

STR(col8,8,2), *

FROM #test t

WHERE COALESCE(col1, col2, col3, col4, col5, col6, col7) IS NULL AND col8 is NOT null

) d

ORDER BY Seq

-- (39968 row(s) affected) / 00:02:26

-- (3968 row(s) affected) / 00:00:03

--======================================

-- ChrisM version 3 ("condensed" version 2)

SELECT ID, Col1, Col2, Col3, Col4, Col5, Col6, Col7, Col8

FROM (SELECT *, STR(col1,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2)+STR(0,8,2) AS Seq

FROM #test t

WHERE t.col1 IS NOT NULL

UNION ALL

SELECT *, (SELECT STR(MAX(col1),8,2) + STR(MAX(col2),8,2) + STR(MAX(col3),8,2) + STR(MAX(col4),8,2) + STR(MAX(col5),8,2) + STR(MAX(col6),8,2) + STR(MAX(col7),8,2) + STR(MAX(col8),8,2)

FROM #test

WHERE (col2 <= t.col2 OR col3 <= t.col3 OR col4 <= t.col4 OR col5 <= t.col5 OR col6 <= t.col6 OR col7 <= t.col7 OR col8 <= t.col8))

FROM #test t

WHERE t.col1 IS NULL

) d

ORDER BY Seq

-- (39968 row(s) affected) / 00:06:11

-- (3968 row(s) affected) / 00:00:08

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

GSquared SSC Guru Points: 260824 More actions · Answer 13

If you do this:

update #test

set col1 = isnull(col1,0),

col2 = isnull(col2,0),

col3 = isnull(col3,0),

col4 = isnull(col4,0),

col5 = isnull(col5,0),

col6 = isnull(col6,0),

col7 = isnull(col7,0),

col8 = isnull(col8,0)

;with CTE (CTEID, Seq1) as

(select t1.id, count(*)

from #test t1

inner join #test t2

on t1.col1 > t2.col1

and t1.col2 > t2.col2

and t1.col3 > t2.col3

and t1.col4 > t2.col4

and t1.col5 > t2.col5

and t1.col6 > t2.col6

and t1.col7 > t2.col7

and t1.col8 > t2.col8

group by t1.id)

select #test.*

from #test

left outer join CTE

on id = cteid

order by seq1;

It takes under 1 second, for the same dataset. That's the real solution.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Christian Buettner-167247 SSChampion Points: 13729 More actions · Answer 14

GSquared (4/6/2009)
If you do this:
update #test
set col1 = isnull(col1,0),
col2 = isnull(col2,0),
col3 = isnull(col3,0),
col4 = isnull(col4,0),
col5 = isnull(col5,0),
col6 = isnull(col6,0),
col7 = isnull(col7,0),
col8 = isnull(col8,0)
;with CTE (CTEID, Seq1) as
(select t1.id, count(*)
from #test t1
inner join #test t2
on t1.col1 > t2.col1
and t1.col2 > t2.col2
and t1.col3 > t2.col3
and t1.col4 > t2.col4
and t1.col5 > t2.col5
and t1.col6 > t2.col6
and t1.col7 > t2.col7
and t1.col8 > t2.col8
group by t1.id)
select #test.*
from #test
left outer join CTE
on id = cteid
order by seq1;
It takes under 1 second, for the same dataset. That's the real solution.

Hi GSquared,

I cannot get your solution to work.

If I allow 10000 rows (from the pivot table), the query runs forever.

If I allow only 1000 rows, I get incorrect results.

I also doubt that the code reflects what needs to be done.

By setting the NULL columns to 0, you remove their "sorting ambivalence", meaning they could sort before or after specific numbers in the same column, depending on other columns that are not null.

Maybe I am missing something, if so, I apologize.

Best Regards,

Chris Büttner

Christian Buettner-167247 SSChampion Points: 13729 More actions · Answer 15

Hi Chris,

Very fancy stuff that you posted. Wasn't even aware you could do such fancy stuff in an ORDER BY:-).

Actually I did want to tell you that your queries did not work, but then I realized that mine did not either. Reason: The data in table #test is flawed.

SELECT A.id, b.id, A.col2, B.col2, a.col3, B.col3 FROM #test A, #test b

WHERE A.col2 > b.col2 AND a.col3 < B.col3

No matter how you sort, either col2 or col3 will remain unsorted (could apply to other combinations as well).

Best Regards,

Chris Büttner