August 26, 2010 at 6:35 am
Hi sscians, I have got a weird requirement from my client. Although I have figured out a way to do it, I would like to know if there is a way in which this can be done faster.
I have simplified the problem and removed other factors at play. This exercise has to be done for around half a million rows and the output would have around 500 rows.
-Record table has recordid and recordvalue
-Recordid could have multiple recordvalues
-Need to get output as recordid (col1) and corresponding recordvalues as csv (col2)
-The csv (col2) has to be sorted within and the output should be sorted based on col2
-Need a row number based on this sorted output. Multiple instances of this sort should retain the same order.
--================================================
--Problem:
--0. Record table has recordid and recordvalue
--1. Recordid could have multiple recordvalues
--2. Need to get output as recordid (col1) and corresponding recordvalues as csv (col2)
--3. The csv (col2) has to be sorted within and the output should be sorted based on col2
--4. Need a row number based on this sorted output. Multiple instances of this sort should retain the same order.
--================================================
if object_id('tempdb..#test') is not null drop table #test
create table #test(recordid int, recordvalue char(2))
--================================================
insert into #test
select 1, 'b6'
union
select 1, 'a4'
union
select 1, 'c1'
union
select 2, 'a2'
union
select 2, 'b7'
union
select 3, 'a1'
union
select 3, 'b3'
select * from #test
--================================================
if object_id('tempdb..#testresult') is not null drop table #testresult
create table #testresult(recordid int, recordvalue varchar(50), rownum int identity(1,1))
--================================================
--This is where I do the churning. Is there a better way to to this??
insert into #testresult
select recordid
,stuff((select ', ' + recordvalue from #test where recordid = t.recordid order by recordvalue for xml path('')),1,2,'') [value]
from #test t
group by recordid
order by [value], recordid
select * from #testresult
--================================================
drop table #test
drop table #testresult
--================================================
Thanks in advance.
-- arjun
https://sqlroadie.com/
August 26, 2010 at 6:42 am
Arjun
I think I'd write a simple SSIS package. There are tasks in SSIS that are specially designed to manipulate text files.
John
August 26, 2010 at 6:53 am
John, I am not reading from text files. Also, SSIS is out of the scope. This is for sort of a search engine and all data is in DB.
- arjun
https://sqlroadie.com/
August 26, 2010 at 8:58 pm
I tend to use FOR XML to concatenate values together. In some cases, you can also use the quirky update.
Here is the FOR XML version.
if object_id('tempdb..#test') is not null drop table #test
create table #test(recordid int, recordvalue char(2))
--================================================
insert into #test
select 1, 'b6'
union
select 1, 'a4'
union
select 1, 'c1'
union
select 2, 'a2'
union
select 2, 'b7'
union
select 3, 'a1'
union
select 3, 'b3'
WITH Records AS (
SELECT DISTINCT
RecordID
, ( SELECT
CASE WHEN RecordValue = Min(RecordValue) OVER( PARTITION BY RecordID ) THEN '' ELSE ', ' END
+ RecordValue
FROM #Test as sub
WHERE sub.RecordID = main.RecordID
ORDER BY RecordValue
FOR XML PATH('')
) AS RecordValues
FROM #Test AS main
)
SELECT RecordID, RecordValues, Row_Number() OVER(ORDER BY RecordValues)
FROM Records
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
August 27, 2010 at 1:53 am
Thanks Drew. I will compare the execution times and post the results.
- arjun
https://sqlroadie.com/
August 27, 2010 at 8:29 am
Actually, I was thinking about this last night, and because of the number of records you may want to do batch the processing. Here's one example. You would adjust the TOP PERCENT to process more/fewer records. You may want to SET ROWCOUNT instead of using a TOP PERCENT.
if object_id('tempdb..#test') is not null drop table #test
create table #test(recordid int, recordvalue char(2))
--================================================
insert into #test
select 1, 'b6'
union
select 1, 'a4'
union
select 1, 'c1'
union
select 2, 'a2'
union
select 2, 'b7'
union
select 3, 'a1'
union
select 3, 'b3'
DECLARE @test-2 TABLE (
RecordID int PRIMARY KEY CLUSTERED
, RecordValues varchar(50) NULL
);
INSERT @test-2(RecordID)
SELECT DISTINCT RecordID
FROM #test
WHILE @@ROWCOUNT > 0
UPDATE main
SET RecordValues = (
SELECT
CASE WHEN RecordValue = Min(RecordValue) OVER( PARTITION BY RecordID ) THEN '' ELSE ', ' END
+ RecordValue
FROM #Test as sub
WHERE sub.RecordID = main.RecordID
ORDER BY RecordValue
FOR XML PATH('')
)
FROM @test-2 AS main
WHERE RecordID IN (
SELECT TOP 20 PERCENT RecordID
FROM @test-2
WHERE RecordValues Is Null
)
SELECT RecordID, RecordValues, Row_Number() OVER(ORDER BY RecordValues)
FROM @test-2
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
August 27, 2010 at 9:12 am
drew.allen (8/27/2010)
You may want to SET ROWCOUNT instead of using a TOP PERCENT.
SET ROWCOUNT is on the deprecated list. Use TOP (n) instead.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
August 27, 2010 at 12:34 pm
WayneS (8/27/2010)
drew.allen (8/27/2010)
You may want to SET ROWCOUNT instead of using a TOP PERCENT.SET ROWCOUNT is on the deprecated list. Use TOP (n) instead.
I only suggested SET ROWCOUNT because I didn't want to use a subquery to specify the TOP clause and I got an error message when I put the TOP clause in the main UPDATE query. It turns out that the parens are required in the UPDATE clause and I had left them off.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
August 30, 2010 at 1:38 am
Drew, thanks for the inputs. If you check my initial post, I had also used for xml to do the concatenation. So, there isn't much difference between your query and mine, except for the row_number part.
I can't use batch processing as I need the entire set of recordids at one go. This is only a small part of the query and I have do some other operations with this result. Anyway, nice to hear from you.
Thanks Wayne. Point noted.
- arjun
https://sqlroadie.com/
August 30, 2010 at 3:22 am
Limit the number of RecordID to do the correlated subquery for.
As of now, you are grouping later and subquery first, for all RecordID
INSERT#TestResult
SELECTr.RecordID,
STUFF(f.Value, 1, 2, '') AS Value
FROM(
SELECTRecordID
FROM#Test
GROUP BYRecordID
) AS t
CROSS APPLY(
SELECT', ' + x.RecordValue
FROM#Test AS x
WHEREx.RecordID = t.RecordID
ORDER BY', ' + x.RecordValue
FOR XMLPATH('')
) AS f(Value)
N 56°04'39.16"
E 12°55'05.25"
August 30, 2010 at 11:17 am
Arjun Sivadasan (8/30/2010)
I can't use batch processing as I need the entire set of recordids at one go. This is only a small part of the query and I have do some other operations with this result. Anyway, nice to hear from you.
This doesn't prevent you from using batch processing, it just limits the scope of the batch. I was suggesting using batches in this one particular step of the overall process, not running the whole process on small batches.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
August 31, 2010 at 3:43 am
Ok drew, somehow, I don't think I can do that. I will explain the simplified logic of the query.
There are records, and records consist of fields. Each record will have different values for the fields.
User can pick a field from a field-list and perform a search.
The logic of the query is:
1. Find the correct version of records based on date and other parameters.
2. Sort the recordids based on the field selected by the user. --This is the part i had posted.
3. Generate a row number according to the sort.
4. Based on the page accessed by the user, filter for the row number range.
5. Get all fields and other info related to the filtered records.
So, as you can see, I need the entire list of records to generate the row number. If I take only a percent of the effective records, my row numbering will not be the same.
If my understanding is wrong, please correct me; but I think I cannot limit the scope of the batch in step 2.
- arjun
https://sqlroadie.com/
August 31, 2010 at 6:18 am
Arjun Sivadasan (8/31/2010)
So, as you can see, I need the entire list of records to generate the row number. If I take only a percent of the effective records, my row numbering will not be the same.
Look back at the post where I suggested using batches and you will see that I have already addressed this concern. The WHILE loop batches the records to process your second step, and the last SELECT statement includes all of the records to get the row numbers for your third step. So it is indeed possible to use batches in one step and then use the entire set in a subsequent step.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
August 31, 2010 at 6:34 am
Yes, I get the point now. Sorry for acting like a dumbo. I will try it out and post the results. Thanks for being patient Drew.
- arjun
https://sqlroadie.com/
August 31, 2010 at 6:36 am
Hi SwePeso, I will try the query out and let you know if I am able to get better performance out of it. Thanks for the reply.
- arjun
https://sqlroadie.com/
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply