Loop through two tables to populate third

  • I have two tables, one contains a list of Users and one contains a list of Skills. I want to create a third table and populate it such that every user has every skill. This is a simple representation of what I need to do - in my real application I need to call a UDF before each insert into #tblUsersSkills.

    CREATE TABLE #tblUsers

    (

    UserID int,

    UserName varchar(50)

    )

    GO

    INSERT INTO #tblUsers

    (UserID, UserName)

    SELECT 1, 'Bill' UNION ALL

    SELECT 2, 'Jim' UNION ALL

    SELECT 3, 'Mary'

    GO

    CREATE TABLE #tblSkills

    (

    SkillID int,

    Skill varchar(50)

    )

    GO

    INSERT INTO #tblSkills

    (SkillID, Skill)

    SELECT 1, 'Running' UNION ALL

    SELECT 2, 'Jumping' UNION ALL

    SELECT 3, 'Skipping'

    GO

    CREATE TABLE #tblUsersSkills

    (

    UserID int,

    SkillID int

    )

    DECLARE @user-id int

    SELECT @user-id = MIN(UserID) FROM #tblUsers

    WHILE @user-id IS NOT NULL --begin to loop through the users

    BEGIN

    DECLARE @SkillID int

    SELECT @SkillID = MIN(SkillID) FROM #tblSkills

    WHILE @SkillID IS NOT NULL --begin to loop through the skills

    BEGIN

    --I need to call a User Defined Function here in my real application to increment a date

    INSERT INTO #tblUsersSkills (UserID, SkillID)

    VALUES(@UserID, @SkillID)

    SELECT @SkillID = MIN(SkillID) FROM #tblSkills WHERE SkillID > @SkillID

    END

    SELECT @user-id = MIN(UserID) FROM #tblUsers WHERE UserID > @user-id

    END

    SELECT * FROM #tblUsersSkills

    DROP table #tblUsers

    DROP table #tblSkills

    DROP table #tblUsersSkills

    The code above works - and I seem to have avoided using a cursor. But it seems very RBAR to me (to coin an acronmym I have seen used here.)

    I can't use this ...

    INSERT INTO #tblUsersSkills (UserID, SkillID)

    SELECT #tblUsers.UserID, #tblSkills.SkillID

    FROM #tblUsers

    CROSS JOIN #tblSkills

    ... because I need to call a function before doing the insert into #tblUsersSkills

    Is there a better - faster - simpler - way of doing this?

  • if you have to call a UDF for each insert, by definition you have to RBAR it. I'm curious why you have to call a function before each insert versus populating the table then doing a set based approach to whatever the function is doing. Either way, with that requirement...welcome to RBAR.

    --------------------------------------------------------------------------
    When you realize you've dug yourself into a hole....Step 1...stop digging.

  • What exactly is the function doing? Is it updating a date field in the existing table?

    For better, quicker answers, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • The code I posted is made up - the real tables all to do with medicene and would not easily make sense.

    But, to continue using my (pseudo) code - if we allow for an extra field called DifficultyID in #tblUsersSkills, this code works okay ...

    INSERT INTO #tblUsersSkills(UserID, SkillID, DifficultyID)

    SELECT UserID, SkillID, dbo.fnGetDifficulty(UserID, SkillID)

    FROM #tblUsers

    CROSS JOIN #tblSkills

    WHERE dbo.fnGetDifficulty(UserID, SkillID) > 0

    ... but I don't want to insert a row into #tblUsersSkills if the UDF dbo.fnGetDifficulty returns 0

    In the code above is dbo.fnGetDifficulty evaluated twice?

    Is this a bad thing?

    Is the RBAR code in my first post more efficient?

    Thank you for your replies.

  • Yes the function will be called twice and as for the difference between the RBAR version and the select, you have to test.

    We can't say the duplicate evaluation is good or bad without some knowledge of the underlying function code. If it is hitting other tables heavily it could be awful compared to the RBAR version.

    Sorry I can't be more help.

    --------------------------------------------------------------------------
    When you realize you've dug yourself into a hole....Step 1...stop digging.

  • sku370870 (6/13/2011)


    The code I posted is made up - the real tables all to do with medicene and would not easily make sense.

    But, to continue using my (pseudo) code - if we allow for an extra field called DifficultyID in #tblUsersSkills, this code works okay ...

    INSERT INTO #tblUsersSkills(UserID, SkillID, DifficultyID)

    SELECT UserID, SkillID, dbo.fnGetDifficulty(UserID, SkillID)

    FROM #tblUsers

    CROSS JOIN #tblSkills

    WHERE dbo.fnGetDifficulty(UserID, SkillID) > 0

    ... but I don't want to insert a row into #tblUsersSkills if the UDF dbo.fnGetDifficulty returns 0

    In the code above is dbo.fnGetDifficulty evaluated twice?

    Is this a bad thing?

    Is the RBAR code in my first post more efficient?

    Thank you for your replies.

    If you are working with SQL Server 2005 or later you can avoid the double-call to the UDF by using CROSS APPLY. With no change to your UDF you could use:

    INSERT INTO #tblUsersSkills(UserID, SkillID, DifficultyID)

    SELECT UserID, SkillID, DifficultyID

    FROM #tblUsers

    CROSS JOIN #tblSkills

    CROSS APPLY

    (

    SELECT DifficultyID=dbo.fnGetDifficulty(UserID, SKillID)

    ) diffid

    WHERE DifficultyID > 0

    You could also change your UDF to a table-valued function which returns a row containing the DifficultyID if it is greater than 0 and returns nothing if the DifficultyID is 0. Then you could write it like this:

    INSERT INTO #tblUsersSkills(UserID, SkillID, DifficultyID)

    SELECT UserID, SkillID, DifficultyID

    FROM #tblUsers

    CROSS JOIN #tblSkills

    CROSS APPLY dbo.fnGetDifficulty(UserID, SKillID)

    As others have indicated, how efficient any of this is can't be predicted at all without knowing how the function works. In general, the more RBAR in the function, the more you are going to wind up with in your final solution. It's always possible, however, that the optimizer would see a way of distributing the work of the function over the set of data in a non-RBAR way (but in my experience, that's usually not the way to bet). In this example, intuitively, I would suspect that whatever is going on in fnGetDifficulty could be represented as a table which could be joined to rather than using the CROSS APPLY, but I know you said that the example was "made up" so it may not be at all similar to the real issue you are dealing with. If you keep us in the dark, you can't expect a lot of clarity in return.:unsure:

    Good luck.

    - Les

  • I'm not sure how so good or bad is to perform a double evaluation, but I'm afraid that you can avoid it.

    About your SQL code, if I'm rigth you don't need to use CROSS JOIN because #tblUsers and #tblSkills aren't related.

    INSERT INTO #tblUsersSkills (UserID, SkillID, DifficultyID)

    SELECT u.UserID, s.SkillID, dbo.fnGetDifficulty(u.UserID, s.SKillID)

    FROM #tblUsers u, #tblSkills s

    WHERE dbo.fnGetDifficulty(u.UserID, s.SKillID)

    Tip: I used to use aliases for tables to add clarity to the code 🙂

    Have a nice day.

  • lnoland (6/15/2011)


    sku370870 (6/13/2011)


    The code I posted is made up - the real tables all to do with medicene and would not easily make sense.

    But, to continue using my (pseudo) code - if we allow for an extra field called DifficultyID in #tblUsersSkills, this code works okay ...

    INSERT INTO #tblUsersSkills(UserID, SkillID, DifficultyID)

    SELECT UserID, SkillID, dbo.fnGetDifficulty(UserID, SkillID)

    FROM #tblUsers

    CROSS JOIN #tblSkills

    WHERE dbo.fnGetDifficulty(UserID, SkillID) > 0

    ... but I don't want to insert a row into #tblUsersSkills if the UDF dbo.fnGetDifficulty returns 0

    In the code above is dbo.fnGetDifficulty evaluated twice?

    Is this a bad thing?

    Is the RBAR code in my first post more efficient?

    Thank you for your replies.

    If you are working with SQL Server 2005 or later you can avoid the double-call to the UDF by using CROSS APPLY. With no change to your UDF you could use:

    INSERT INTO #tblUsersSkills(UserID, SkillID, DifficultyID)

    SELECT UserID, SkillID, DifficultyID

    FROM #tblUsers

    CROSS JOIN #tblSkills

    CROSS APPLY

    (

    SELECT DifficultyID=dbo.fnGetDifficulty(UserID, SKillID)

    ) diffid

    WHERE DifficultyID > 0

    You could also change your UDF to a table-valued function which returns a row containing the DifficultyID if it is greater than 0 and returns nothing if the DifficultyID is 0. Then you could write it like this:

    INSERT INTO #tblUsersSkills(UserID, SkillID, DifficultyID)

    SELECT UserID, SkillID, DifficultyID

    FROM #tblUsers

    CROSS JOIN #tblSkills

    CROSS APPLY dbo.fnGetDifficulty(UserID, SKillID)

    As others have indicated, how efficient any of this is can't be predicted at all without knowing how the function works. In general, the more RBAR in the function, the more you are going to wind up with in your final solution. It's always possible, however, that the optimizer would see a way of distributing the work of the function over the set of data in a non-RBAR way (but in my experience, that's usually not the way to bet). In this example, intuitively, I would suspect that whatever is going on in fnGetDifficulty could be represented as a table which could be joined to rather than using the CROSS APPLY, but I know you said that the example was "made up" so it may not be at all similar to the real issue you are dealing with. If you keep us in the dark, you can't expect a lot of clarity in return.:unsure:

    Good luck.

    - Les

    I've seen Cross Apply used in solutions posted on here and other forums and never knew what it did. Now I do! Thanks very much for your reply.

  • Narud (6/16/2011)


    I'm not sure how so good or bad is to perform a double evaluation, but I'm afraid that you can avoid it.

    About your SQL code, if I'm rigth you don't need to use CROSS JOIN because #tblUsers and #tblSkills aren't related.

    INSERT INTO #tblUsersSkills (UserID, SkillID, DifficultyID)

    SELECT u.UserID, s.SkillID, dbo.fnGetDifficulty(u.UserID, s.SKillID)

    FROM #tblUsers u, #tblSkills s

    WHERE dbo.fnGetDifficulty(u.UserID, s.SKillID)

    Tip: I used to use aliases for tables to add clarity to the code 🙂

    Have a nice day.

    I was using CROSS JOIN because the tables are not related - I didn't realise you could Select from unrelated tables just by sticking a comma between them.

    A little knowledge is a dangerous thing, and I only have a little knowledge.

    Thanks very much for your reply - slowly I'm learning more about SQL Server.

  • It's good to know that my answer was at last a little bit useful for you. But don't thing that you have a little knowledge. Even the experts may learn new things every day. Only who thinks that knows everything can't learn anything new.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply