View With Joins vs Table

  • I find myself in a situation where I have a table in which I need all the fields but it is also supplemented by several lookup tables.  For example, the main table (Table A) might have a name field and that name is matched up against a lookup table (Table B) to display a clean version of that name as well as the dirty version of the name.  So if table A had the name 'Jimm' and table B had an entry to match up 'Jimm' which is cleaned to be 'Jim' then both the original 'Jimm' and the cleaned up 'Jim' will be displayed.  This can also be the case for such fields as city, state, zip, etc.  I'm currently using left joins to pull all of the records in table A and then the cleaned up versions in any of the lookup tables all in a view so the user does not know the data resides in several tables and, therefore, doesn't have to worry about joining them because they want the data in a de-nomarlized format.  My problem is performance with this view because of all the joins.  I've indexed the fields in all of the tables which are used to join on but it still doesn't seem to run as quickly as needed.  Is there a better practice for this or should I create a physical table that they are needing based off of the joins in the view?  I'm reluctant to go with the second option right now because I feel that is redundant and waste of space.

  • Well how big are the lookup tables and how many of them are there?

    One option is to store the cleaned values on the base table and have the lookups handled at either record insert/update time or as part of a batch process which would remove the need for the view.  Another option depending on how many lookups there are and how big they are would be to split the view up into multiple views and tell the users to select from the correct view(generally when users/developers say they have to have every column in their single select they're lying) which could possibly improve the performance of the individual views.

  • The base table has ~4 million records.  There are four different lookup tables having record counts of 270, 7000, 24000, and 42000.  The main reason of trying to avoid putting the clean names in the base table is because I did not want to update six fields on those 4 million (and growing)  records every month when new data is inserted.  I can't necessarily handle it at insert time since all records will have to reflect the new changes.

  • Would need to see the query plan to know for sure what is going on, but, based solely on your description, there's a limited amount you can do.

    1) Cluster the lookup tables on the lookup value.
    2) Limit the number of rows you return to the user -- and thus need to do lookups on -- at one time.  No human can work on anything close to 4M rows at a time.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • I would imagine returning 4 million + rows would be slow with or without the lookups.

  • I'll continue to play around with indexing and what not on the joins to see if I can improve that.  For now I've created a physical table using the view with the layout they are requesting which significantly improves performance.  I'll continue to import all of the data to the base table and then drop/recreate the final table monthly when new data is inserted to the base table.  There is a lookup field that will have to be updated daily in the final table in case anything has changed.  Is the best way to handle this to just update the field in all 4 million records based on the values in the lookup at the current time?  The vast majority should remain the same on a daily basis so any way to filter those out would be ideal.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply