I found an RBAR statement, Now what?

  • Hello All,

    This post is in regards to a query used to populate customer mailing list tables based on which client's list they are on.

    I have a query that is used to populate these tables by pulling data from various other tables and combining it into one table per client. The data that goes into these tables has to meet various criterias, one of them being that each customer can be in only one client table. i.e. If customer_A is a customer of Client_B and Client_C, the customer should appear in only ONE of those two tables. This way a customer will recieve only one mailing instead of two exact copies. I have the following snippet of code to determine whether or not a customer will be placed into the current table:

    and not exists(select 1 from list..list_names_active ll where l.name_id = ll.name_id and ll.list_id in (select 139 union all select 140 union all select r.list_id from list..listRank r where r.id < (select id from list..listRank where list_id = 122) and r.country = 'us'))
    [/code]

    When this line was added in, performance completely fell off, which based on Jeff Moden's explanations of RBAR and triangle joins, makes perfect sense. My issue is that I'm having trouble rewriting this line so that it is set-based. I tried switching from using "not exists" to "not in," which was even worse according to the execution plan!

    I apologize in advance if this isn't enough information. Any insight anyone could provide would be greatly appreciated.

    Thanks,
    Nate

  • I would look at the emailTable populating process.

    1 - create a distinct list of users who should receive email

    2 - populate table information by level of validity

    create table #tEmailList

    insert into #tEmailList

    select ... UNION

    select ...

    update #tEmailList

    set informationByTable =

    from table1

    .. repeat

  • Please do the following:

    1) Read the first article I reference below in my signature block regarding asking for assistance.

    2) Post the DDL for the tables involved in your query.

    3) Post sample data for the tables as shown in the article in #1.

    4) Post the expected results based on the sample data provided in #3.

    5) Post your current code with the RBAR issue.

    What this will do is allow us to to help you better by providing you with better answers and tested code.

  • Lynn,

    Thank you but I will work on Darryl's suggestions. The code in question was posted already. Adding that line in takes the overall process from 4 hours to well over 24 hours so that is the problem. Also I get the expected results, over 20 hours later. The DDL will be for 5 or 6 tables and to get the full effect of my problem you will need sample data on the order of 200 million records per table. Unfortunately I don't know of an efficient way to generate that kind of data without giving out production data. I was simply wondering if folks knew of a better technique than I was trying to use.

    Thank you in advance,

    Nate

  • SwedishOrr (6/10/2009)


    Lynn,

    Thank you but I will work on Darryl's suggestions. The code in question was posted already. Adding that line in takes the overall process from 4 hours to well over 24 hours so that is the problem. Also I get the expected results, over 20 hours later. The DDL will be for 5 or 6 tables and to get the full effect of my problem you will need sample data on the order of 200 million records per table. Unfortunately I don't know of an efficient way to generate that kind of data without giving out production data. I was simply wondering if folks knew of a better technique than I was trying to use.

    Thank you in advance,

    Nate

    First, if you are talking about the snippet in your original post, that isn't enough for me to do anything really.

    Second, sample data doesn't need to be production data, just something that is at least representative of the production data, and you don't need a lot, just enough to run the code. Usually 10 to 20 records per table is good.

    Third, the expected results is based off the sample data and doesn't even have to be generated by code, it can be hand generated.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply