Is there an joiner much like the 8kSplitter?

  • anthony.green (8/1/2012)


    Chris,

    Just a simple execute the two views, the original came back in 12 seconds, yours was over 2 minutes.

    I get around 15-60ms for either using the backup file you posted. What's happening here? Are you testing with a different data set? Both views return 1 row from this data.

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • There is a mistake in the view, I posted the correct line in an earlier post.

    The line

    join StandardsGroupsClassification ccmap

    ON ccmap.PID = s.SID

    should be

    join StandardsGroupsClassification ccmap

    ON ccmap.PID = s.SubID

    This should then bring back the whole result set of around 290k rows.

    The view has now changed completly and the new base view is in the most recent post.

    This new view runs in between 10 and 11 seconds.

  • I made a slight change to the function Chris wrote. I changed it from a mtvf to an itvf. Please check it out.

    CREATE FUNCTION [dbo].[GetCIDs2]

    (@SID INT)

    RETURNS TABLE WITH SCHEMABINDING

    AS RETURN(

    with CIDList (CID) as (

    SELECT

    csg.CID

    FROM

    ContentStandardsGroup csg

    INNER JOIN StandardsGroupMapping map

    ON map.PID = csg.PID

    WHERE

    map.SID = @sid

    UNION

    SELECT

    cs.CID

    FROM

    ContentStandard cs

    WHERE

    cs.SID = @sid

    )

    select

    CIDs = STUFF((

    SELECT ',' + CID

    FROM CIDList

    FOR XML PATH(''),TYPE).value('.','varchar(max)'),1,1,'')

    );

  • Lynn Pettis (8/1/2012)


    I made a slight change to the function Chris wrote...

    πŸ˜€ πŸ˜€

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (8/1/2012)


    Lynn Pettis (8/1/2012)


    I made a slight change to the function Chris wrote...

    πŸ˜€ πŸ˜€

    Take this as a "seal" (imagine a picture of a seal here) of approval? I can't download the database here to test. Have to wait until I get home.

  • Lynn Pettis (8/1/2012)


    ChrisM@Work (8/1/2012)


    Lynn Pettis (8/1/2012)


    I made a slight change to the function Chris wrote...

    πŸ˜€ πŸ˜€

    Take this as a "seal" (imagine a picture of a seal here) of approval? I can't download the database here to test. Have to wait until I get home.

    The seal had a significant meaning in a previous life. Of course, Lynn. You're the safe pair of hands on this thread, I'm just taking the opportunity to muck around with stuff to see what happens.

    Careful when you get around to comparing those functions with the new view - the result set is quite big (read wide).

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • OK the examples so far are kind of specific to the data at hand, and not as generic as I was thinking a Joiner function would be.

    i was playing with trying to build something more generic using a TABLE TYPE for a parameter to a function;

    while it works, even my example is slow as heck. on my own dedicated dev machine, it's 6 seconds for just 105 rows of data; running the FOR XML directly is near-instant.

    abstracting the logic out for "reuse" is making this pretty darn slow.

    seems like trying to make something generic would just not be worth it, I think?

    here's my pair of objects, a TYPE TABLE and a crappy ITVF concat function:

    /* Create a user-defined table type */

    CREATE TYPE intVarcharForConCat AS TABLE

    (

    ID int,

    Descriptor VARCHAR(50)

    );

    --now a function that would do the FOR XML for concatenation.

    GO

    CREATE FUNCTION ConcatStuff(@MyTable dbo.intVarcharForConCat READONLY)

    RETURNS TABLE WITH SCHEMABINDING

    AS

    RETURN(

    SELECT DISTINCT

    t.ID,

    sq.Columns As Descriptors

    FROM @MyTable t

    INNER JOIN (SELECT

    ID,

    Columns = STUFF(

    (SELECT

    ',' + Descriptor

    FROM @MyTable sc

    WHERE sc.ID = s.ID

    FOR XML PATH('')

    )

    ,1,1,'')

    FROM @MyTable s

    ) sq ON t.ID = sq.ID

    )

    and what i thought was a basic simple test:

    Declare @MyTable dbo.intVarcharForConCat

    --populate with sample data

    INSERT INTO @MyTable

    SELECT OBJECT_ID,name from sys.columns

    --test the concat function

    SELECT * FROM dbo.ConcatStuff(@MyTable)

    some code cleanup for those who need it:

    --now cleanup

    DROP FUNCTION ConcatStuff

    DROP TYPE intVarcharForConCat;

    the equivilent FOR XML directly:

    SELECT DISTINCT

    t.name,

    sq.Columns

    FROM sys.tables t

    JOIN (

    SELECT OBJECT_ID,

    Columns = STUFF((SELECT ',' + name

    FROM sys.columns sc

    WHERE sc.object_id = s.object_id

    FOR XML PATH('')),1,1,'')

    FROM sys.columns s

    ) sq ON t.object_id = sq.object_id

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • well , answering my own question as to why performance sux:

    the plan has nested loop for the inner join, i think based on the Estimated # of rows being = 1 (and the actual # rows being a lot more) because it cannot calculate the # rows from a @Table;

    if i explicitly change the join to INNER HASH JOIN for example, i get some more acceptable performance.

    i think that might be one of those rare situations where i need to give the optimizer some help.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Rather than using an optimizer hint why not just change the table variable to a temp table and let the optimizer do its thing.

  • andersg98 (8/1/2012)


    Rather than using an optimizer hint why not just change the table variable to a temp table and let the optimizer do its thing.

    mostly because a function can only use table variables or table types and not temp tables.

    I was locked into an ITVF function to mentally match the reverse of the DelimitedSplit8K function, rather than another solution.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Thanks Lynn for Chris's modified function.

    Both that function and the one in the DB backup I provided both execute in the same amount of time.

  • anthony.green (8/2/2012)


    Thanks Lynn for Chris's modified function.

    Both that function and the one in the DB backup I provided both execute in the same amount of time.

    Anthony - "Chris's modified function" is Lynn's version πŸ˜‰

    I've tested three different "Joiner" functions now using your DB backup and (local) instance, and running the results into a @blackhole variable (SELECT @variable = column FROM...) to diminish display/data transit times.

    The FOR XML PATH() function runs in about 20s.

    The QU/table variable function takes around 10 minutes and cripples this PC.

    A function using SELECT @VARIABLE = @VARIABLE + column (same generic code that we use to construct dynamic cross-tabs) also takes about 10 minutes.

    FOR XML PATH() is the only method I know of which can be constructed as an inline function.

    Here are the two alternative functions I tested with:

    CREATE FUNCTION [dbo].[GetCIDs2]

    (@SID INT)

    RETURNS @Result TABLE (LongCID VARCHAR(8000)) WITH SCHEMABINDING

    AS

    BEGIN

    DECLARE @Accumulator VARCHAR(8000)

    SET @Accumulator = ''

    SELECT @Accumulator = @Accumulator + ',' + CAST(CID AS VARCHAR(10))

    FROM (

    SELECT seq = ROW_NUMBER() OVER(ORDER BY CID), CID -- force order

    FROM (

    SELECT csg.CID

    FROM dbo.ContentStandardsGroup csg

    JOIN dbo.StandardsGroupMapping map

    ON map.PID = csg.PID

    WHERE map.SID = @sid

    UNION

    SELECT cs.CID

    FROM dbo.ContentStandard cs

    WHERE cs.SID = @sid

    ) d -- 383 rows for SID = 4946

    ) x

    INSERT INTO @Result SELECT @Accumulator;

    RETURN

    END

    CREATE FUNCTION [dbo].[GetCIDs]

    (@SID INT)

    RETURNS @Result TABLE (LongCID VARCHAR(8000)) WITH SCHEMABINDING

    AS

    BEGIN

    DECLARE @Accumulator TABLE (seq INT, CID INT);

    INSERT INTO @Accumulator (seq, CID)

    SELECT

    seq = ROW_NUMBER() OVER(ORDER BY CID), -- force order

    CID

    FROM (

    SELECT csg.CID

    FROM dbo.ContentStandardsGroup csg

    JOIN dbo.StandardsGroupMapping map

    ON map.PID = csg.PID

    WHERE map.SID = @sid

    UNION

    SELECT cs.CID

    FROM dbo.ContentStandard cs

    WHERE cs.SID = @sid

    ) d; -- 383 rows for SID = 4946

    DECLARE @LongCID VARCHAR(8000); SELECT @LongCID = '';

    UPDATE @Accumulator SET

    @LongCID = @LongCID + ',' + CAST(CID AS VARCHAR(10));

    INSERT INTO @Result SELECT NULLIF(@LongCID,'');

    RETURN

    END

    Test harness for any one function:

    DECLARE @Blackhole VARCHAR(8000), @Startdate DATETIME;

    SET @Startdate = GETDATE();

    WITH SIDs AS

    (

    SELECT SID

    FROM ContentStandard

    GROUP BY SID

    UNION

    SELECT SID

    FROM ContentStandardsGroup CSG

    INNER JOIN StandardsGroupMapping SGM

    ON SGM.PID = CSG.PID

    )

    SELECT

    --s.SID,

    --s.Label,

    --s.State,

    --clAge.ClassID AS AgeGradeID,

    --s.SubID,

    --s.Description,

    @Blackhole = FUNC.LongCID

    FROM dbo.Standard AS s

    INNER JOIN dbo.StandardSetAgeGrade AS ssAge ON ssAge.AGID = s.AGID

    INNER JOIN dbo.Classification AS clAge ON clAge.ParentID = '002' AND clAge.ClassID = ssAge.ParentClassID

    INNER JOIN SIDs T1 ON S.SID = T1.SID

    CROSS APPLY dbo.getCIDs(s.SID) AS FUNC

    WHERE s.Active = 1;

    SELECT DATEDIFF(ms,@Startdate,GETDATE()); -- 0 / 0

    GO

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 12 posts - 46 through 56 (of 56 total)

You must be logged in to reply to this topic. Login to reply