May 30, 2014 at 2:13 pm
Hi,
Just came across some weird legacy code in my system where an INSERT is followed by three UNIONs of the exact same query. What's the purpose of that? Deduplication or something? I've never seen it before and I'm curious how it works.
Count(*) with all three UNIONs
11011
Count(*) with two UNIONs
11012
Count(*) with one UNION
36485
Thanks in advance for any insight
May 30, 2014 at 2:19 pm
sqldriver (5/30/2014)
Hi,Just came across some weird legacy code in my system where an INSERT is followed by three UNIONs of the exact same query. What's the purpose of that? Deduplication or something? I've never seen it before and I'm curious how it works.
Count(*) with all three UNIONs
11011
Count(*) with two UNIONs
11012
Count(*) with one UNION
36485
Thanks in advance for any insight
Can you post the actual code? Can't what you are talking about from this.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
May 30, 2014 at 2:24 pm
Sean Lange (5/30/2014)
sqldriver (5/30/2014)
Hi,Just came across some weird legacy code in my system where an INSERT is followed by three UNIONs of the exact same query. What's the purpose of that? Deduplication or something? I've never seen it before and I'm curious how it works.
Count(*) with all three UNIONs
11011
Count(*) with two UNIONs
11012
Count(*) with one UNION
36485
Thanks in advance for any insight
Can you post the actual code? Can't what you are talking about from this.
Here it is.
Again, this is legacy code that I'm trying to re-write. I take no responsibility for repeatedly calling a function on an insert or using ISNUMERIC on a function like it's my birthday.
insert tbl_ (JobOrderNumber, Folder, DatabaseName,DedupAcrossMultipleDB, r, LatestMetricDate)
select dbo.udf_ParseString(c.source_dir,'\',4) as JobOrderNumber
, j.JobFolder
, j.DatabaseName
, j.DedupAcrossMultipleDB
, j.jobid
, m.endtime
from tab_master_db..tbl_collections c
join tab_master_db..tbl_jobs j
on c.jobid = j.jobid
left outer join #Metrics m
on c.jobid = m.jobid
where isnumeric(dbo.udf_ParseString(c.source_dir,'\',4)) = 1
union
select dbo.udf_ParseString(c.extr_output_dir,'\',4) as JobOrderNumber
, j.JobFolder
, j.DatabaseName
, j.DedupAcrossMultipleDB
, j.jobid
, m.endtime
from tab_master_db.dbo.tbl_collections c
join tab_master_db.dbo.tbl_jobs j
on c.jobid = j.jobid
left outer join #Metrics m
on c.jobid = m.jobid
where isnumeric(dbo.udf_ParseString(c.extr_output_dir,'\',4)) = 1
union
select dbo.udf_ParseString(c.img_output_dir,'\',4) as JobOrderNumber
, j.JobFolder
, j.DatabaseName
, j.DedupAcrossMultipleDB
, j.jobid
, m.endtime
from tab_master_db.dbo.tbl_collections c
join tab_master_db.dbo.tbl_jobs j
on c.jobid = j.jobid
left outer join #Metrics m
on c.jobid = m.jobid
where isnumeric(dbo.udf_ParseString(c.img_output_dir,'\',4)) = 1
May 30, 2014 at 2:32 pm
sqldriver (5/30/2014)
I take no responsibility for repeatedly calling a function on an insert or using ISNUMERIC on a function like it's my birthday.
LOL. Yes that is particularly horrible. I have a feeling the ParseString udf has a loop or xml in there too. :w00t:
Obviously you are trying to rewrite to get rid of some of the garbage you inherited. However, those three queries are NOT the same thing. Look closely at the parameter to scalar udf, they are different in each of the 3 queries. Looks like a proper splitter and a tally table would make this parsing exercise work in less than a fraction of the time it does currently.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
May 30, 2014 at 2:55 pm
Sean Lange (5/30/2014)
sqldriver (5/30/2014)
I take no responsibility for repeatedly calling a function on an insert or using ISNUMERIC on a function like it's my birthday.
LOL. Yes that is particularly horrible. I have a feeling the ParseString udf has a loop or xml in there too. :w00t:
Obviously you are trying to rewrite to get rid of some of the garbage you inherited. However, those three queries are NOT the same thing. Look closely at the parameter to scalar udf, they are different in each of the 3 queries. Looks like a proper splitter and a tally table would make this parsing exercise work in less than a fraction of the time it does currently.
Oh, how'd you know? 🙂
WHILE @CNT <= @Occurrence and @Pos <> 0
BEGIN
SET @Pos = patindex(@Delimiter, @InputString)
SET @CurString = left(@InputString,@Pos-1)
SET @InputString = right(@InputString,len(@InputString)-len(@CurString)-1)
SET @cnt = @cnt + 1
END
I'm going to Moden-ize the splitter as part of this re-write.
But I'm still curious why the counts go down with each UNION added to the insert.
Thanks
May 30, 2014 at 3:04 pm
sqldriver (5/30/2014)
Sean Lange (5/30/2014)
sqldriver (5/30/2014)
I take no responsibility for repeatedly calling a function on an insert or using ISNUMERIC on a function like it's my birthday.
LOL. Yes that is particularly horrible. I have a feeling the ParseString udf has a loop or xml in there too. :w00t:
Obviously you are trying to rewrite to get rid of some of the garbage you inherited. However, those three queries are NOT the same thing. Look closely at the parameter to scalar udf, they are different in each of the 3 queries. Looks like a proper splitter and a tally table would make this parsing exercise work in less than a fraction of the time it does currently.
Oh, how'd you know? 🙂
WHILE @CNT <= @Occurrence and @Pos <> 0
BEGIN
SET @Pos = patindex(@Delimiter, @InputString)
SET @CurString = left(@InputString,@Pos-1)
SET @InputString = right(@InputString,len(@InputString)-len(@CurString)-1)
SET @cnt = @cnt + 1
END
I'm going to Moden-ize the splitter as part of this re-write.
But I'm still curious why the counts go down with each UNION added to the insert.
Thanks
I assume you mean that if you run each of the three select statements they return a different number of rows? That is because there is something in the data that is causing that. It doesn't have anything to do with the UNION, it is all about the data in those tables.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
May 30, 2014 at 3:28 pm
Sean Lange (5/30/2014)
sqldriver (5/30/2014)
Sean Lange (5/30/2014)
sqldriver (5/30/2014)
I take no responsibility for repeatedly calling a function on an insert or using ISNUMERIC on a function like it's my birthday.
LOL. Yes that is particularly horrible. I have a feeling the ParseString udf has a loop or xml in there too. :w00t:
Obviously you are trying to rewrite to get rid of some of the garbage you inherited. However, those three queries are NOT the same thing. Look closely at the parameter to scalar udf, they are different in each of the 3 queries. Looks like a proper splitter and a tally table would make this parsing exercise work in less than a fraction of the time it does currently.
Oh, how'd you know? 🙂
WHILE @CNT <= @Occurrence and @Pos <> 0
BEGIN
SET @Pos = patindex(@Delimiter, @InputString)
SET @CurString = left(@InputString,@Pos-1)
SET @InputString = right(@InputString,len(@InputString)-len(@CurString)-1)
SET @cnt = @cnt + 1
END
I'm going to Moden-ize the splitter as part of this re-write.
But I'm still curious why the counts go down with each UNION added to the insert.
Thanks
I assume you mean that if you run each of the three select statements they return a different number of rows? That is because there is something in the data that is causing that. It doesn't have anything to do with the UNION, it is all about the data in those tables.
Sort of.
When I run this:
insert tbl_ (JobOrderNumber, Folder, DatabaseName,DedupAcrossMultipleDB, r, LatestMetricDate)
select dbo.udf_ParseString(c.source_dir,'\',4) as JobOrderNumber
, j.JobFolder
, j.DatabaseName
, j.DedupAcrossMultipleDB
, j.jobid
, m.endtime
from tab_master_db..tbl_collections c
join tab_master_db..tbl_jobs j
on c.jobid = j.jobid
left outer join #Metrics m
on c.jobid = m.jobid
where isnumeric(dbo.udf_ParseString(c.source_dir,'\',4)) = 1
36485 rows get inserted to tbl_
When I run this:
insert tbl_ (JobOrderNumber, Folder, DatabaseName,DedupAcrossMultipleDB, r, LatestMetricDate)
select dbo.udf_ParseString(c.source_dir,'\',4) as JobOrderNumber
, j.JobFolder
, j.DatabaseName
, j.DedupAcrossMultipleDB
, j.jobid
, m.endtime
from tab_master_db..tbl_collections c
join tab_master_db..tbl_jobs j
on c.jobid = j.jobid
left outer join #Metrics m
on c.jobid = m.jobid
where isnumeric(dbo.udf_ParseString(c.source_dir,'\',4)) = 1
union
select dbo.udf_ParseString(c.extr_output_dir,'\',4) as JobOrderNumber
, j.JobFolder
, j.DatabaseName
, j.DedupAcrossMultipleDB
, j.jobid
, m.endtime
from tab_master_db.dbo.tbl_collections c
join tab_master_db.dbo.tbl_jobs j
on c.jobid = j.jobid
left outer join #Metrics m
on c.jobid = m.jobid
where isnumeric(dbo.udf_ParseString(c.extr_output_dir,'\',4)) = 1
11012 get inserted to tbl_
And finally, when I run this:
insert tbl_ (JobOrderNumber, Folder, DatabaseName,DedupAcrossMultipleDB, r, LatestMetricDate)
select dbo.udf_ParseString(c.source_dir,'\',4) as JobOrderNumber
, j.JobFolder
, j.DatabaseName
, j.DedupAcrossMultipleDB
, j.jobid
, m.endtime
from tab_master_db..tbl_collections c
join tab_master_db..tbl_jobs j
on c.jobid = j.jobid
left outer join #Metrics m
on c.jobid = m.jobid
where isnumeric(dbo.udf_ParseString(c.source_dir,'\',4)) = 1
union
select dbo.udf_ParseString(c.extr_output_dir,'\',4) as JobOrderNumber
, j.JobFolder
, j.DatabaseName
, j.DedupAcrossMultipleDB
, j.jobid
, m.endtime
from tab_master_db.dbo.tbl_collections c
join tab_master_db.dbo.tbl_jobs j
on c.jobid = j.jobid
left outer join #Metrics m
on c.jobid = m.jobid
where isnumeric(dbo.udf_ParseString(c.extr_output_dir,'\',4)) = 1
union
select dbo.udf_ParseString(c.img_output_dir,'\',4) as JobOrderNumber
, j.JobFolder
, j.DatabaseName
, j.DedupAcrossMultipleDB
, j.jobid
, m.endtime
from tab_master_db.dbo.tbl_collections c
join tab_master_db.dbo.tbl_jobs j
on c.jobid = j.jobid
left outer join #Metrics m
on c.jobid = m.jobid
where isnumeric(dbo.udf_ParseString(c.img_output_dir,'\',4)) = 1
11011 get inserted to tbl_
It's confusing the heck out of me. :blush:
May 30, 2014 at 4:54 pm
It is probably because the presence of the UNION is removing duplicates, not just between the three queries but within the results of each query as well.
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
May 30, 2014 at 5:59 pm
mister.magoo (5/30/2014)
It is probably because the presence of the UNION is removing duplicates, not just between the three queries but within the results of each query as well.
:angry:
May 30, 2014 at 6:24 pm
sqldriver (5/30/2014)
mister.magoo (5/30/2014)
It is probably because the presence of the UNION is removing duplicates, not just between the three queries but within the results of each query as well.:angry:
:unsure: :angry:
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply