May 24, 2009 at 5:36 am
Hi gang,
I'm currently running the following commands, and it's working. But, I suspect that there's a better way, that avoids the distinct and running the subquery for each record where it's not necessary. As it is now, the subquery would be running for each record, and then the distinct gets the unique occurrances. I think this can be accomplished by avoiding the DISTINCT, but I just haven't been able to get a grip on how to accomplish it.
The desired output is as this code produces - a distinct list of Values, with a comma-separated list of all Codes per Value.
On a related note... is there a way to get the CSV list without the leading comma as part of the subquery? Right now, I'm having to come back after it's been run to remove it.
declare @temp TABLE (
Value int,
Code numeric(5,3))
insert into @temp
select 1, 1.725 UNION
select 1, 1.818 UNION
select 2, 2.110 UNION
select 2, 2.575 UNION
select 3, 8.257 UNION
select 3, 9.258
declare @t2 table (Value int, Code varchar(50))
insert into @t2
select distinct
Value,
Code = (select ', ' + convert(varchar(6), Code) from @temp where Value = t.Value FOR XML PATH(''))
from @temp t
update @t2 set Code = stuff(Code, 1, 2, '')
select * from @t2
Thanks,
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
May 24, 2009 at 8:57 am
Hi Wayne,
Does your table have to be a variable, I had a go with a bigger data set and got really bad results on it, it got a lot quicker once I added an index and group by.
--Jeff Modenβs test table
SELECT TOP 1000000
RowNum = IDENTITY(INT,1,1),
SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,
SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)
+ CHAR(ABS(CHECKSUM(NEWID()))%26+65),
SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),
SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),
SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),
SomeHex12 = RIGHT(NEWID(),12)
INTO dbo.JBMTest
FROM Master.dbo.SysColumns t1
CROSS JOIN Master.dbo.SysColumns t2
--===== Add the Primary Key
ALTER TABLE dbo.JBMTest
ADD PRIMARY KEY CLUSTERED (RowNum)
-- equivalent of your query on the large test data β takes _forever_ to run! 16 minutes and counting!
select distinct
SomeLetters2,
Code = stuff((select ', ' + convert(varchar(6), SomeMoney) from JBMTest where SomeLetters2 = t.SomeLetters2 FOR XML PATH('')), 1, 2, '')
from JBMTest t
-- add an index to SomeLetters2 column
CREATE NONCLUSTERED INDEX [IX_SomeLetters] ON [dbo].[JBMTest]
(
[SomeLetters2] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
--Group by on indexed column runs pretty quick c. 10 seconds
SELECT
SomeLetters2,
Code = stuff((select ', ' + convert(varchar(6), SomeMoney) from JBMTest WHERE SomeLetters2 = t.SomeLetters2 FOR XML PATH('')), 1, 2, '')
FROM JBMTest t
GROUP BY someletters2
May 24, 2009 at 10:26 am
Heh... After about half an hour testing I figured it out. The answer is as simple that I didn't think about.
@Allister: Thanks for the test data!
I took Allister's test data and executed the query (which is just like your, Wayne). I stopped the execution after about 5 minutes.
I tried a quirky update solution which doesn't work, a CLR version which took about 5 seconds (great!).
After that I took a second look to the initial query. The major problem was just the DISTINCT which performs on the final result and causes the sub-query to be executed 1,000,000 times. Just remove the DISTINCT and use a GROUP BY and it takes about 7-8 seconds on my system.
[font="Courier New"]DBCC FREEPROCCACHE WITH NO_INFOMSGS
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS
GO
---============= FOR XML =======================
IF (OBJECT_ID('tempdb..#ResultsXML') IS NOT NULL)
DROP TABLE #ResultsXML
CREATE TABLE #ResultsXML (Id INT NOT NULL IDENTITY PRIMARY KEY CLUSTERED, SomeLetters VARCHAR(100), Code VARCHAR(MAX))
-- equivalent of your query on the large test data β takes _forever_ to run! 16 minutes and counting!
INSERT INTO #ResultsXML
SELECT
SomeLetters2,
Code = STUFF(
(
SELECT ', ' + CONVERT(VARCHAR(6), SomeMoney)
FROM JBMTest
WHERE SomeLetters2 = t.SomeLetters2
FOR XML PATH('')
),
1, 2, '')
FROM JBMTest t
GROUP BY SomeLetters2
[/font]
Greets
Flo
May 24, 2009 at 10:53 am
Sorry Flo,
shouldn't have left the distinct query in the middle of my solution!
For reference it finally finished running after 23:53... indexed GROUP BY took 00:10.
The CLR route sounds promising, I really must make some time to research this area, a lot of my time is primarily c# development.
Allister
May 24, 2009 at 11:16 am
Allister Reid (5/24/2009)
Sorry Flo,shouldn't have left the distinct query in the middle of my solution!
For reference it finally finished running after 23:53... indexed GROUP BY took 00:10.
The CLR route sounds promising, I really must make some time to research this area, a lot of my time is primarily c# development.
Allister
Hi Allister
Nothing to apologize! The distinct query reflected Wayne's approach and I didn't have to write it since you did π
Nice to hear that you are interested in SQL CLR. CLR is cool for some things in SQL Server but most times T-SQL is the better way for usual data tasks.
Greets
Flo
May 24, 2009 at 11:26 am
heh-he... and i wrote the group by solution in me first post as well as the distinct π
Have been thinking about dynamic custom data types via CLR for user defined forms, I take it you have to implement access functions for each type... maybe you can point me in the direction of any decent articles/walkthroughs?
May 24, 2009 at 11:36 am
Allister Reid (5/24/2009)
heh-he... and i wrote the group by solution in me first post as well as the distinct π
:w00t:
I didn't see the group by. Really. So I have to apologize!
Have been thinking about dynamic custom data types via CLR for user defined forms, I take it you have to implement access functions for each type... maybe you can point me in the direction of any decent articles/walkthroughs?
I'm not sure if I understood you here. (Er.. I'm sure I didn't :-D)
I don't have very good articles/walkthroughs for SQL CLR. I'm primary a C# developer so I just had a look to BOL and their (not very good) samples. I never read a book about SQL CLR.
Flo
May 24, 2009 at 11:56 am
Florian Reischl (5/24/2009)
Allister Reid (5/24/2009)
heh-he... and i wrote the group by solution in me first post as well as the distinct π:w00t:
I didn't see the group by. Really. So I have to apologize!
No worries... easy to miss if it takes forever for script to acually reach this point! I can't take any credit for it anyway coz I got from another post (actually about 4 or 5 other posts) here-abouts.
Have been thinking about dynamic custom data types via CLR for user defined forms, I take it you have to implement access functions for each type... maybe you can point me in the direction of any decent articles/walkthroughs?
I'm not sure if I understood you here. (Er.. I'm sure I didn't :-D)
Seriously, don't worry about it, I don't entirely understand myself yet, and it's pretty off topic! But I've kinda been going in the direction of having system that application administrators can define forms, which are serialised and saved to data base, then application users fill out specific instances of these forms with specific details.
You could think of the system as a configurable questionaire, where application admin define the structure of the forms and application users create specific instances of the forms by filling them out; obviously these instances would be saved to database for future collation/processing/querying. It is these custom objects that I was considering as candidates for creating as custom types. The reason being that I thought this may get round problem of having un-queryable serialised data in database.
Allister
May 24, 2009 at 12:25 pm
Allister Reid (5/24/2009)
Have been thinking about dynamic custom data types via CLR for user defined forms, I take it you have to implement access functions for each type... maybe you can point me in the direction of any decent articles/walkthroughs?
I'm not sure if I understood you here. (Er.. I'm sure I didn't :-D)
Seriously, don't worry about it, I don't entirely understand myself yet, and it's pretty off topic! But I've kinda been going in the direction of having system that application administrators can define forms, which are serialised and saved to data base, then application users fill out specific instances of these forms with specific details.
You could think of the system as a configurable questionaire, where application admin define the structure of the forms and application users create specific instances of the forms by filling them out; obviously these instances would be saved to database for future collation/processing/querying. It is these custom objects that I was considering as candidates for creating as custom types. The reason being that I thought this may get round problem of having un-queryable serialised data in database.
Sounds like MS InfoPath ;-).
It depends on the type of serialization you use. Avoid binary but you can use XML data type. You can use one table which contains the plain XML to ensure the availability of the original data. In addition use a EAV design (if you have to be able to support any kind of user data) and get the important information into usual database columns. So you can use SQL to query main information. For really special things you can query the XML data.
For front-end you can either use a 3rd party tool like Altova Authentic or use XSLT which dynamically transforms your templates into a HTML layout. A submit button can call a java-script to get all entered user data into a XML format and call a web-service to store the data.
Maybe consider to open a new thread for this discussion and post a link here π
Flo
May 24, 2009 at 8:46 pm
Allister Reid (5/24/2009)
Hi Wayne,Does your table have to be a variable, I had a go with a bigger data set and got really bad results on it, it got a lot quicker once I added an index and group by.
No, and in fact it is only a table variable for this post to make it easy for you'll to test against. I'm actually using a temp table with indexes myself.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
May 24, 2009 at 9:00 pm
Florian Reischl (5/24/2009)
Heh... After about half an hour testing I figured it out. The answer is as simple that I didn't think about.@Allister: Thanks for the test data!
I took Allister's test data and executed the query (which is just like your, Wayne). I stopped the execution after about 5 minutes.
I tried a quirky update solution which doesn't work, a CLR version which took about 5 seconds (great!).
After that I took a second look to the initial query. The major problem was just the DISTINCT which performs on the final result and causes the sub-query to be executed 1,000,000 times. Just remove the DISTINCT and use a GROUP BY and it takes about 7-8 seconds on my system.
Flo & Allister - I want to thank you for helping out with this. I could swear that I had tried the group by... I know that I had at least thought about it. And I had tried out the stuff function in the first select without the additional set of () around it. Geez, it really is this "simple" stuff that trips one up sometimes. And your testing pretty much confirmed what I thought about using the distinct in the first place.. there would be some performance problems when scaling up (though I was pretty happy with what I was getting). Plus, I was kinda concerned about being the victim of Gail's double-edged ninja sword.:-)
Again, thanks!!!!
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
May 25, 2009 at 4:53 am
Hi Wayne
Glad that we could help!
Flo
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply