Weird Sorting based on csv requirement

Question

Post reply

Weird Sorting based on csv requirement

Arjun Sivadasan

SSCertifiable

Points: 5902
More actions
August 26, 2010 at 6:35 am

#237237

Hi sscians, I have got a weird requirement from my client. Although I have figured out a way to do it, I would like to know if there is a way in which this can be done faster.
I have simplified the problem and removed other factors at play. This exercise has to be done for around half a million rows and the output would have around 500 rows.
-Record table has recordid and recordvalue
-Recordid could have multiple recordvalues
-Need to get output as recordid (col1) and corresponding recordvalues as csv (col2)
-The csv (col2) has to be sorted within and the output should be sorted based on col2
-Need a row number based on this sorted output. Multiple instances of this sort should retain the same order.
--================================================
--Problem:
--0. Record table has recordid and recordvalue
--1. Recordid could have multiple recordvalues
--2. Need to get output as recordid (col1) and corresponding recordvalues as csv (col2)
--3. The csv (col2) has to be sorted within and the output should be sorted based on col2
--4. Need a row number based on this sorted output. Multiple instances of this sort should retain the same order.
--================================================
if object_id('tempdb..#test') is not null drop table #test
create table #test(recordid int, recordvalue char(2))
--================================================
insert into #test
select 1, 'b6'
union
select 1, 'a4'
union
select 1, 'c1'
union
select 2, 'a2'
union
select 2, 'b7'
union
select 3, 'a1'
union
select 3, 'b3'
select * from #test
--================================================
if object_id('tempdb..#testresult') is not null drop table #testresult
create table #testresult(recordid int, recordvalue varchar(50), rownum int identity(1,1))
--================================================
--This is where I do the churning. Is there a better way to to this??
insert into #testresult
select recordid
,stuff((select ', ' + recordvalue from #test where recordid = t.recordid order by recordvalue for xml path('')),1,2,'') [value]
from #test t
group by recordid
order by [value], recordid
select * from #testresult
--================================================
drop table #test
drop table #testresult
--================================================
Thanks in advance.
-- arjun
https://sqlroadie.com/

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 1

Arjun

I think I'd write a simple SSIS package. There are tasks in SSIS that are specially designed to manipulate text files.

John

Arjun Sivadasan SSCertifiable Points: 5902 More actions · Answer 2

John, I am not reading from text files. Also, SSIS is out of the scope. This is for sort of a search engine and all data is in DB.

- arjun

https://sqlroadie.com/

drew.allen SSC Guru Points: 76997 More actions · Answer 3

I tend to use FOR XML to concatenate values together. In some cases, you can also use the quirky update.

Here is the FOR XML version.

if object_id('tempdb..#test') is not null drop table #test

create table #test(recordid int, recordvalue char(2))

--================================================

insert into #test

select 1, 'b6'

union

select 1, 'a4'

union

select 1, 'c1'

union

select 2, 'a2'

union

select 2, 'b7'

union

select 3, 'a1'

union

select 3, 'b3'

WITH Records AS (

SELECT DISTINCT

RecordID

, ( SELECT

CASE WHEN RecordValue = Min(RecordValue) OVER( PARTITION BY RecordID ) THEN '' ELSE ', ' END

+ RecordValue

FROM #Test as sub

WHERE sub.RecordID = main.RecordID

ORDER BY RecordValue

FOR XML PATH('')

) AS RecordValues

FROM #Test AS main

)

SELECT RecordID, RecordValues, Row_Number() OVER(ORDER BY RecordValues)

FROM Records

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

Arjun Sivadasan SSCertifiable Points: 5902 More actions · Answer 4

Thanks Drew. I will compare the execution times and post the results.

- arjun

https://sqlroadie.com/

drew.allen SSC Guru Points: 76997 More actions · Answer 5

Actually, I was thinking about this last night, and because of the number of records you may want to do batch the processing. Here's one example. You would adjust the TOP PERCENT to process more/fewer records. You may want to SET ROWCOUNT instead of using a TOP PERCENT.

if object_id('tempdb..#test') is not null drop table #test

create table #test(recordid int, recordvalue char(2))

--================================================

insert into #test

select 1, 'b6'

union

select 1, 'a4'

union

select 1, 'c1'

union

select 2, 'a2'

union

select 2, 'b7'

union

select 3, 'a1'

union

select 3, 'b3'

DECLARE @test-2 TABLE (

RecordID int PRIMARY KEY CLUSTERED

, RecordValues varchar(50) NULL

);

INSERT @test-2(RecordID)

SELECT DISTINCT RecordID

FROM #test

WHILE @@ROWCOUNT > 0

UPDATE main

SET RecordValues = (

SELECT

CASE WHEN RecordValue = Min(RecordValue) OVER( PARTITION BY RecordID ) THEN '' ELSE ', ' END

+ RecordValue

FROM #Test as sub

WHERE sub.RecordID = main.RecordID

ORDER BY RecordValue

FOR XML PATH('')

)

FROM @test-2 AS main

WHERE RecordID IN (

SELECT TOP 20 PERCENT RecordID

FROM @test-2

WHERE RecordValues Is Null

)

SELECT RecordID, RecordValues, Row_Number() OVER(ORDER BY RecordValues)

FROM @test-2

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

WayneS SSC Guru Points: 95460 More actions · Answer 6

drew.allen (8/27/2010)
You may want to SET ROWCOUNT instead of using a TOP PERCENT.

SET ROWCOUNT is on the deprecated list. Use TOP (n) instead.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

drew.allen SSC Guru Points: 76997 More actions · Answer 7

WayneS (8/27/2010)
drew.allen (8/27/2010)
You may want to SET ROWCOUNT instead of using a TOP PERCENT.
SET ROWCOUNT is on the deprecated list. Use TOP (n) instead.

I only suggested SET ROWCOUNT because I didn't want to use a subquery to specify the TOP clause and I got an error message when I put the TOP clause in the main UPDATE query. It turns out that the parens are required in the UPDATE clause and I had left them off.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

Arjun Sivadasan SSCertifiable Points: 5902 More actions · Answer 8

Drew, thanks for the inputs. If you check my initial post, I had also used for xml to do the concatenation. So, there isn't much difference between your query and mine, except for the row_number part.

I can't use batch processing as I need the entire set of recordids at one go. This is only a small part of the query and I have do some other operations with this result. Anyway, nice to hear from you.

Thanks Wayne. Point noted.

- arjun

https://sqlroadie.com/

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 9

Limit the number of RecordID to do the correlated subquery for.

As of now, you are grouping later and subquery first, for all RecordID

INSERT#TestResult

SELECTr.RecordID,

STUFF(f.Value, 1, 2, '') AS Value

FROM(

SELECTRecordID

FROM#Test

GROUP BYRecordID

) AS t

CROSS APPLY(

SELECT', ' + x.RecordValue

FROM#Test AS x

WHEREx.RecordID = t.RecordID

ORDER BY', ' + x.RecordValue

FOR XMLPATH('')

) AS f(Value)

N 56°04'39.16"
E 12°55'05.25"

drew.allen SSC Guru Points: 76997 More actions · Answer 10

Arjun Sivadasan (8/30/2010)
I can't use batch processing as I need the entire set of recordids at one go. This is only a small part of the query and I have do some other operations with this result. Anyway, nice to hear from you.

This doesn't prevent you from using batch processing, it just limits the scope of the batch. I was suggesting using batches in this one particular step of the overall process, not running the whole process on small batches.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

Arjun Sivadasan SSCertifiable Points: 5902 More actions · Answer 11

Ok drew, somehow, I don't think I can do that. I will explain the simplified logic of the query.

There are records, and records consist of fields. Each record will have different values for the fields.

User can pick a field from a field-list and perform a search.

The logic of the query is:

1. Find the correct version of records based on date and other parameters.

2. Sort the recordids based on the field selected by the user. --This is the part i had posted.

3. Generate a row number according to the sort.

4. Based on the page accessed by the user, filter for the row number range.

5. Get all fields and other info related to the filtered records.

So, as you can see, I need the entire list of records to generate the row number. If I take only a percent of the effective records, my row numbering will not be the same.

If my understanding is wrong, please correct me; but I think I cannot limit the scope of the batch in step 2.

- arjun

https://sqlroadie.com/

drew.allen SSC Guru Points: 76997 More actions · Answer 12

Arjun Sivadasan (8/31/2010)
So, as you can see, I need the entire list of records to generate the row number. If I take only a percent of the effective records, my row numbering will not be the same.

Look back at the post where I suggested using batches and you will see that I have already addressed this concern. The WHILE loop batches the records to process your second step, and the last SELECT statement includes all of the records to get the row numbers for your third step. So it is indeed possible to use batches in one step and then use the entire set in a subsequent step.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

Arjun Sivadasan SSCertifiable Points: 5902 More actions · Answer 13

Yes, I get the point now. Sorry for acting like a dumbo. I will try it out and post the results. Thanks for being patient Drew.

- arjun

https://sqlroadie.com/

Arjun Sivadasan SSCertifiable Points: 5902 More actions · Answer 14

Hi SwePeso, I will try the query out and let you know if I am able to get better performance out of it. Thanks for the reply.

- arjun

https://sqlroadie.com/