I suspect that there's a better way to minimize IO, but ...

Question

I suspect that there's a better way to minimize IO, but ...

WayneS

SSC Guru

Points: 95461
More actions
May 24, 2009 at 5:36 am

#134582

Hi gang,
I'm currently running the following commands, and it's working. But, I suspect that there's a better way, that avoids the distinct and running the subquery for each record where it's not necessary. As it is now, the subquery would be running for each record, and then the distinct gets the unique occurrances. I think this can be accomplished by avoiding the DISTINCT, but I just haven't been able to get a grip on how to accomplish it.
The desired output is as this code produces - a distinct list of Values, with a comma-separated list of all Codes per Value.
On a related note... is there a way to get the CSV list without the leading comma as part of the subquery? Right now, I'm having to come back after it's been run to remove it.
declare @temp TABLE (
Value int,
Code numeric(5,3))
insert into @temp
select 1, 1.725 UNION
select 1, 1.818 UNION
select 2, 2.110 UNION
select 2, 2.575 UNION
select 3, 8.257 UNION
select 3, 9.258
declare @t2 table (Value int, Code varchar(50))
insert into @t2
select distinct
Value,
Code = (select ', ' + convert(varchar(6), Code) from @temp where Value = t.Value FOR XML PATH(''))
from @temp t
update @t2 set Code = stuff(Code, 1, 2, '')
select * from @t2
Thanks,
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Allister Reid SSCrazy Points: 2666 More actions · Answer 1

Hi Wayne,

Does your table have to be a variable, I had a go with a bigger data set and got really bad results on it, it got a lot quicker once I added an index and group by.

--Jeff Moden’s test table

SELECT TOP 1000000

RowNum = IDENTITY(INT,1,1),

SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,

SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)

+ CHAR(ABS(CHECKSUM(NEWID()))%26+65),

SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),

SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),

SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),

SomeHex12 = RIGHT(NEWID(),12)

INTO dbo.JBMTest

FROM Master.dbo.SysColumns t1

CROSS JOIN Master.dbo.SysColumns t2

--===== Add the Primary Key

ALTER TABLE dbo.JBMTest

ADD PRIMARY KEY CLUSTERED (RowNum)

-- equivalent of your query on the large test data – takes _forever_ to run! 16 minutes and counting!

select distinct

SomeLetters2,

Code = stuff((select ', ' + convert(varchar(6), SomeMoney) from JBMTest where SomeLetters2 = t.SomeLetters2 FOR XML PATH('')), 1, 2, '')

from JBMTest t

-- add an index to SomeLetters2 column

CREATE NONCLUSTERED INDEX [IX_SomeLetters] ON [dbo].[JBMTest]

(

[SomeLetters2] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

--Group by on indexed column runs pretty quick c. 10 seconds

SELECT

SomeLetters2,

Code = stuff((select ', ' + convert(varchar(6), SomeMoney) from JBMTest WHERE SomeLetters2 = t.SomeLetters2 FOR XML PATH('')), 1, 2, '')

FROM JBMTest t

GROUP BY someletters2

Florian Reischl SSC-Dedicated Points: 37301 More actions · Answer 2

Heh... After about half an hour testing I figured it out. The answer is as simple that I didn't think about.

@Allister: Thanks for the test data!

I took Allister's test data and executed the query (which is just like your, Wayne). I stopped the execution after about 5 minutes.

I tried a quirky update solution which doesn't work, a CLR version which took about 5 seconds (great!).

After that I took a second look to the initial query. The major problem was just the DISTINCT which performs on the final result and causes the sub-query to be executed 1,000,000 times. Just remove the DISTINCT and use a GROUP BY and it takes about 7-8 seconds on my system.

[font="Courier New"]DBCC FREEPROCCACHE WITH NO_INFOMSGS

DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS

GO

---============= FOR XML =======================

IF (OBJECT_ID('tempdb..#ResultsXML') IS NOT NULL)

DROP TABLE #ResultsXML

CREATE TABLE #ResultsXML (Id INT NOT NULL IDENTITY PRIMARY KEY CLUSTERED, SomeLetters VARCHAR(100), Code VARCHAR(MAX))

-- equivalent of your query on the large test data – takes _forever_ to run! 16 minutes and counting!

INSERT INTO #ResultsXML

SELECT

SomeLetters2,

Code = STUFF(

(

SELECT ', ' + CONVERT(VARCHAR(6), SomeMoney)

FROM JBMTest

WHERE SomeLetters2 = t.SomeLetters2

FOR XML PATH('')

),

1, 2, '')

FROM JBMTest t

GROUP BY SomeLetters2

[/font]

Greets

Flo

Allister Reid SSCrazy Points: 2666 More actions · Answer 3

Sorry Flo,

shouldn't have left the distinct query in the middle of my solution!

For reference it finally finished running after 23:53... indexed GROUP BY took 00:10.

The CLR route sounds promising, I really must make some time to research this area, a lot of my time is primarily c# development.

Allister

Florian Reischl SSC-Dedicated Points: 37301 More actions · Answer 4

Allister Reid (5/24/2009)
Sorry Flo,
shouldn't have left the distinct query in the middle of my solution!
For reference it finally finished running after 23:53... indexed GROUP BY took 00:10.
The CLR route sounds promising, I really must make some time to research this area, a lot of my time is primarily c# development.
Allister

Hi Allister

Nothing to apologize! The distinct query reflected Wayne's approach and I didn't have to write it since you did 😛

Nice to hear that you are interested in SQL CLR. CLR is cool for some things in SQL Server but most times T-SQL is the better way for usual data tasks.

Greets

Flo

Allister Reid SSCrazy Points: 2666 More actions · Answer 5

heh-he... and i wrote the group by solution in me first post as well as the distinct 😉

Have been thinking about dynamic custom data types via CLR for user defined forms, I take it you have to implement access functions for each type... maybe you can point me in the direction of any decent articles/walkthroughs?

Florian Reischl SSC-Dedicated Points: 37301 More actions · Answer 6

Allister Reid (5/24/2009)
heh-he... and i wrote the group by solution in me first post as well as the distinct 😉

:w00t:

I didn't see the group by. Really. So I have to apologize!

Have been thinking about dynamic custom data types via CLR for user defined forms, I take it you have to implement access functions for each type... maybe you can point me in the direction of any decent articles/walkthroughs?

I'm not sure if I understood you here. (Er.. I'm sure I didn't :-D)

I don't have very good articles/walkthroughs for SQL CLR. I'm primary a C# developer so I just had a look to BOL and their (not very good) samples. I never read a book about SQL CLR.

Flo

Allister Reid SSCrazy Points: 2666 More actions · Answer 7

Florian Reischl (5/24/2009)
Allister Reid (5/24/2009)
heh-he... and i wrote the group by solution in me first post as well as the distinct 😉
:w00t:
I didn't see the group by. Really. So I have to apologize!

No worries... easy to miss if it takes forever for script to acually reach this point! I can't take any credit for it anyway coz I got from another post (actually about 4 or 5 other posts) here-abouts.

Have been thinking about dynamic custom data types via CLR for user defined forms, I take it you have to implement access functions for each type... maybe you can point me in the direction of any decent articles/walkthroughs?
I'm not sure if I understood you here. (Er.. I'm sure I didn't :-D)

Seriously, don't worry about it, I don't entirely understand myself yet, and it's pretty off topic! But I've kinda been going in the direction of having system that application administrators can define forms, which are serialised and saved to data base, then application users fill out specific instances of these forms with specific details.

You could think of the system as a configurable questionaire, where application admin define the structure of the forms and application users create specific instances of the forms by filling them out; obviously these instances would be saved to database for future collation/processing/querying. It is these custom objects that I was considering as candidates for creating as custom types. The reason being that I thought this may get round problem of having un-queryable serialised data in database.

Allister

Florian Reischl SSC-Dedicated Points: 37301 More actions · Answer 8

Allister Reid (5/24/2009)
Have been thinking about dynamic custom data types via CLR for user defined forms, I take it you have to implement access functions for each type... maybe you can point me in the direction of any decent articles/walkthroughs?
I'm not sure if I understood you here. (Er.. I'm sure I didn't :-D)
Seriously, don't worry about it, I don't entirely understand myself yet, and it's pretty off topic! But I've kinda been going in the direction of having system that application administrators can define forms, which are serialised and saved to data base, then application users fill out specific instances of these forms with specific details.
You could think of the system as a configurable questionaire, where application admin define the structure of the forms and application users create specific instances of the forms by filling them out; obviously these instances would be saved to database for future collation/processing/querying. It is these custom objects that I was considering as candidates for creating as custom types. The reason being that I thought this may get round problem of having un-queryable serialised data in database.

Sounds like MS InfoPath ;-).

It depends on the type of serialization you use. Avoid binary but you can use XML data type. You can use one table which contains the plain XML to ensure the availability of the original data. In addition use a EAV design (if you have to be able to support any kind of user data) and get the important information into usual database columns. So you can use SQL to query main information. For really special things you can query the XML data.

For front-end you can either use a 3rd party tool like Altova Authentic or use XSLT which dynamically transforms your templates into a HTML layout. A submit button can call a java-script to get all entered user data into a XML format and call a web-service to store the data.

Maybe consider to open a new thread for this discussion and post a link here 😉

Flo

WayneS SSC Guru Points: 95461 More actions · Answer 9

Allister Reid (5/24/2009)
Hi Wayne,
Does your table have to be a variable, I had a go with a bigger data set and got really bad results on it, it got a lot quicker once I added an index and group by.

No, and in fact it is only a table variable for this post to make it easy for you'll to test against. I'm actually using a temp table with indexes myself.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

WayneS SSC Guru Points: 95461 More actions · Answer 10

Florian Reischl (5/24/2009)
Heh... After about half an hour testing I figured it out. The answer is as simple that I didn't think about.
@Allister: Thanks for the test data!
I took Allister's test data and executed the query (which is just like your, Wayne). I stopped the execution after about 5 minutes.
I tried a quirky update solution which doesn't work, a CLR version which took about 5 seconds (great!).
After that I took a second look to the initial query. The major problem was just the DISTINCT which performs on the final result and causes the sub-query to be executed 1,000,000 times. Just remove the DISTINCT and use a GROUP BY and it takes about 7-8 seconds on my system.

Flo & Allister - I want to thank you for helping out with this. I could swear that I had tried the group by... I know that I had at least thought about it. And I had tried out the stuff function in the first select without the additional set of () around it. Geez, it really is this "simple" stuff that trips one up sometimes. And your testing pretty much confirmed what I thought about using the distinct in the first place.. there would be some performance problems when scaling up (though I was pretty happy with what I was getting). Plus, I was kinda concerned about being the victim of Gail's double-edged ninja sword.:-)

Again, thanks!!!!

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

Florian Reischl SSC-Dedicated Points: 37301 More actions · Answer 11

Florian Reischl

SSC-Dedicated

Points: 37301

May 25, 2009 at 4:53 am

#999614

Hi Wayne

Glad that we could help!

Flo