UNION and INSERT

  • Hi,

    Just came across some weird legacy code in my system where an INSERT is followed by three UNIONs of the exact same query. What's the purpose of that? Deduplication or something? I've never seen it before and I'm curious how it works.

    Count(*) with all three UNIONs

    11011

    Count(*) with two UNIONs

    11012

    Count(*) with one UNION

    36485

    Thanks in advance for any insight

  • sqldriver (5/30/2014)


    Hi,

    Just came across some weird legacy code in my system where an INSERT is followed by three UNIONs of the exact same query. What's the purpose of that? Deduplication or something? I've never seen it before and I'm curious how it works.

    Count(*) with all three UNIONs

    11011

    Count(*) with two UNIONs

    11012

    Count(*) with one UNION

    36485

    Thanks in advance for any insight

    Can you post the actual code? Can't what you are talking about from this.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (5/30/2014)


    sqldriver (5/30/2014)


    Hi,

    Just came across some weird legacy code in my system where an INSERT is followed by three UNIONs of the exact same query. What's the purpose of that? Deduplication or something? I've never seen it before and I'm curious how it works.

    Count(*) with all three UNIONs

    11011

    Count(*) with two UNIONs

    11012

    Count(*) with one UNION

    36485

    Thanks in advance for any insight

    Can you post the actual code? Can't what you are talking about from this.

    Here it is.

    Again, this is legacy code that I'm trying to re-write. I take no responsibility for repeatedly calling a function on an insert or using ISNUMERIC on a function like it's my birthday.

    insert tbl_ (JobOrderNumber, Folder, DatabaseName,DedupAcrossMultipleDB, r, LatestMetricDate)

    select dbo.udf_ParseString(c.source_dir,'\',4) as JobOrderNumber

    , j.JobFolder

    , j.DatabaseName

    , j.DedupAcrossMultipleDB

    , j.jobid

    , m.endtime

    from tab_master_db..tbl_collections c

    join tab_master_db..tbl_jobs j

    on c.jobid = j.jobid

    left outer join #Metrics m

    on c.jobid = m.jobid

    where isnumeric(dbo.udf_ParseString(c.source_dir,'\',4)) = 1

    union

    select dbo.udf_ParseString(c.extr_output_dir,'\',4) as JobOrderNumber

    , j.JobFolder

    , j.DatabaseName

    , j.DedupAcrossMultipleDB

    , j.jobid

    , m.endtime

    from tab_master_db.dbo.tbl_collections c

    join tab_master_db.dbo.tbl_jobs j

    on c.jobid = j.jobid

    left outer join #Metrics m

    on c.jobid = m.jobid

    where isnumeric(dbo.udf_ParseString(c.extr_output_dir,'\',4)) = 1

    union

    select dbo.udf_ParseString(c.img_output_dir,'\',4) as JobOrderNumber

    , j.JobFolder

    , j.DatabaseName

    , j.DedupAcrossMultipleDB

    , j.jobid

    , m.endtime

    from tab_master_db.dbo.tbl_collections c

    join tab_master_db.dbo.tbl_jobs j

    on c.jobid = j.jobid

    left outer join #Metrics m

    on c.jobid = m.jobid

    where isnumeric(dbo.udf_ParseString(c.img_output_dir,'\',4)) = 1

  • sqldriver (5/30/2014)

    I take no responsibility for repeatedly calling a function on an insert or using ISNUMERIC on a function like it's my birthday.

    LOL. Yes that is particularly horrible. I have a feeling the ParseString udf has a loop or xml in there too. :w00t:

    Obviously you are trying to rewrite to get rid of some of the garbage you inherited. However, those three queries are NOT the same thing. Look closely at the parameter to scalar udf, they are different in each of the 3 queries. Looks like a proper splitter and a tally table would make this parsing exercise work in less than a fraction of the time it does currently.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (5/30/2014)


    sqldriver (5/30/2014)

    I take no responsibility for repeatedly calling a function on an insert or using ISNUMERIC on a function like it's my birthday.

    LOL. Yes that is particularly horrible. I have a feeling the ParseString udf has a loop or xml in there too. :w00t:

    Obviously you are trying to rewrite to get rid of some of the garbage you inherited. However, those three queries are NOT the same thing. Look closely at the parameter to scalar udf, they are different in each of the 3 queries. Looks like a proper splitter and a tally table would make this parsing exercise work in less than a fraction of the time it does currently.

    Oh, how'd you know? 🙂

    WHILE @CNT <= @Occurrence and @Pos <> 0

    BEGIN

    SET @Pos = patindex(@Delimiter, @InputString)

    SET @CurString = left(@InputString,@Pos-1)

    SET @InputString = right(@InputString,len(@InputString)-len(@CurString)-1)

    SET @cnt = @cnt + 1

    END

    I'm going to Moden-ize the splitter as part of this re-write.

    But I'm still curious why the counts go down with each UNION added to the insert.

    Thanks

  • sqldriver (5/30/2014)


    Sean Lange (5/30/2014)


    sqldriver (5/30/2014)

    I take no responsibility for repeatedly calling a function on an insert or using ISNUMERIC on a function like it's my birthday.

    LOL. Yes that is particularly horrible. I have a feeling the ParseString udf has a loop or xml in there too. :w00t:

    Obviously you are trying to rewrite to get rid of some of the garbage you inherited. However, those three queries are NOT the same thing. Look closely at the parameter to scalar udf, they are different in each of the 3 queries. Looks like a proper splitter and a tally table would make this parsing exercise work in less than a fraction of the time it does currently.

    Oh, how'd you know? 🙂

    WHILE @CNT <= @Occurrence and @Pos <> 0

    BEGIN

    SET @Pos = patindex(@Delimiter, @InputString)

    SET @CurString = left(@InputString,@Pos-1)

    SET @InputString = right(@InputString,len(@InputString)-len(@CurString)-1)

    SET @cnt = @cnt + 1

    END

    I'm going to Moden-ize the splitter as part of this re-write.

    But I'm still curious why the counts go down with each UNION added to the insert.

    Thanks

    I assume you mean that if you run each of the three select statements they return a different number of rows? That is because there is something in the data that is causing that. It doesn't have anything to do with the UNION, it is all about the data in those tables.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (5/30/2014)


    sqldriver (5/30/2014)


    Sean Lange (5/30/2014)


    sqldriver (5/30/2014)

    I take no responsibility for repeatedly calling a function on an insert or using ISNUMERIC on a function like it's my birthday.

    LOL. Yes that is particularly horrible. I have a feeling the ParseString udf has a loop or xml in there too. :w00t:

    Obviously you are trying to rewrite to get rid of some of the garbage you inherited. However, those three queries are NOT the same thing. Look closely at the parameter to scalar udf, they are different in each of the 3 queries. Looks like a proper splitter and a tally table would make this parsing exercise work in less than a fraction of the time it does currently.

    Oh, how'd you know? 🙂

    WHILE @CNT <= @Occurrence and @Pos <> 0

    BEGIN

    SET @Pos = patindex(@Delimiter, @InputString)

    SET @CurString = left(@InputString,@Pos-1)

    SET @InputString = right(@InputString,len(@InputString)-len(@CurString)-1)

    SET @cnt = @cnt + 1

    END

    I'm going to Moden-ize the splitter as part of this re-write.

    But I'm still curious why the counts go down with each UNION added to the insert.

    Thanks

    I assume you mean that if you run each of the three select statements they return a different number of rows? That is because there is something in the data that is causing that. It doesn't have anything to do with the UNION, it is all about the data in those tables.

    Sort of.

    When I run this:

    insert tbl_ (JobOrderNumber, Folder, DatabaseName,DedupAcrossMultipleDB, r, LatestMetricDate)

    select dbo.udf_ParseString(c.source_dir,'\',4) as JobOrderNumber

    , j.JobFolder

    , j.DatabaseName

    , j.DedupAcrossMultipleDB

    , j.jobid

    , m.endtime

    from tab_master_db..tbl_collections c

    join tab_master_db..tbl_jobs j

    on c.jobid = j.jobid

    left outer join #Metrics m

    on c.jobid = m.jobid

    where isnumeric(dbo.udf_ParseString(c.source_dir,'\',4)) = 1

    36485 rows get inserted to tbl_

    When I run this:

    insert tbl_ (JobOrderNumber, Folder, DatabaseName,DedupAcrossMultipleDB, r, LatestMetricDate)

    select dbo.udf_ParseString(c.source_dir,'\',4) as JobOrderNumber

    , j.JobFolder

    , j.DatabaseName

    , j.DedupAcrossMultipleDB

    , j.jobid

    , m.endtime

    from tab_master_db..tbl_collections c

    join tab_master_db..tbl_jobs j

    on c.jobid = j.jobid

    left outer join #Metrics m

    on c.jobid = m.jobid

    where isnumeric(dbo.udf_ParseString(c.source_dir,'\',4)) = 1

    union

    select dbo.udf_ParseString(c.extr_output_dir,'\',4) as JobOrderNumber

    , j.JobFolder

    , j.DatabaseName

    , j.DedupAcrossMultipleDB

    , j.jobid

    , m.endtime

    from tab_master_db.dbo.tbl_collections c

    join tab_master_db.dbo.tbl_jobs j

    on c.jobid = j.jobid

    left outer join #Metrics m

    on c.jobid = m.jobid

    where isnumeric(dbo.udf_ParseString(c.extr_output_dir,'\',4)) = 1

    11012 get inserted to tbl_

    And finally, when I run this:

    insert tbl_ (JobOrderNumber, Folder, DatabaseName,DedupAcrossMultipleDB, r, LatestMetricDate)

    select dbo.udf_ParseString(c.source_dir,'\',4) as JobOrderNumber

    , j.JobFolder

    , j.DatabaseName

    , j.DedupAcrossMultipleDB

    , j.jobid

    , m.endtime

    from tab_master_db..tbl_collections c

    join tab_master_db..tbl_jobs j

    on c.jobid = j.jobid

    left outer join #Metrics m

    on c.jobid = m.jobid

    where isnumeric(dbo.udf_ParseString(c.source_dir,'\',4)) = 1

    union

    select dbo.udf_ParseString(c.extr_output_dir,'\',4) as JobOrderNumber

    , j.JobFolder

    , j.DatabaseName

    , j.DedupAcrossMultipleDB

    , j.jobid

    , m.endtime

    from tab_master_db.dbo.tbl_collections c

    join tab_master_db.dbo.tbl_jobs j

    on c.jobid = j.jobid

    left outer join #Metrics m

    on c.jobid = m.jobid

    where isnumeric(dbo.udf_ParseString(c.extr_output_dir,'\',4)) = 1

    union

    select dbo.udf_ParseString(c.img_output_dir,'\',4) as JobOrderNumber

    , j.JobFolder

    , j.DatabaseName

    , j.DedupAcrossMultipleDB

    , j.jobid

    , m.endtime

    from tab_master_db.dbo.tbl_collections c

    join tab_master_db.dbo.tbl_jobs j

    on c.jobid = j.jobid

    left outer join #Metrics m

    on c.jobid = m.jobid

    where isnumeric(dbo.udf_ParseString(c.img_output_dir,'\',4)) = 1

    11011 get inserted to tbl_

    It's confusing the heck out of me. :blush:

  • It is probably because the presence of the UNION is removing duplicates, not just between the three queries but within the results of each query as well.

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • mister.magoo (5/30/2014)


    It is probably because the presence of the UNION is removing duplicates, not just between the three queries but within the results of each query as well.

    :angry:

  • sqldriver (5/30/2014)


    mister.magoo (5/30/2014)


    It is probably because the presence of the UNION is removing duplicates, not just between the three queries but within the results of each query as well.

    :angry:

    :unsure: :angry:

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

    Viewing 10 posts - 1 through 9 (of 9 total)

    You must be logged in to reply to this topic. Login to reply