a ridiculous data problem

  • I face a ridiculous data problem. Technically it is not complex but it is so ridiculous and new for me that I was hoping somebody had some insights. The real problem is that the data spec somebody built is terrible. Let me try a simplistic example.

    As a fictional example, consider the shoe business. A master shoe table is available with every shoe and all of its corresponding attributes such as size, width, color, style, material, occasion, gender.

    However, a data spec has been created in which shoe data is passed with the least amount of attributes. For instance, if I receive one row that specifies only size, color, style and gender, then I am to interpret that this implicit data really indicates all shoes with that size, color, style and gender. Then I have to determine all rows in the master table that match this size, color, style and gender combination.

    To make this more complicated, the attributes passed to me may vary. It could be any combination of attributes so I cannot hard-code a matching of specific columns only.

    I want to throttle the boneheads that came up with this data delivery spec but I cannot change it; I have to deal with it. Incidentally I may be dealing with over 100 attributes in any combination! ARGH!

    How would you handle this?

    TIA.

  • I recommend you read through Gail Shaw's blog on catch-all queries, you can find it here:

    http://sqlinthewild.co.za/index.php/2009/03/19/catch-all-queries/


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • That was an interesting article indeed, Evil Kraig F.

    I would be comparing two tables (vs. a set of variables or one row). Using my shoe example, I would compare the master shoe table with all attributes populated against an incoming shoe table that did not have all attributes populated. The result should be all rows in the master table that fit whatever attributes are in the incoming table.

    I was thinking I could use ISNULL for the join on the incoming table such as...

    SELECT a.*

    FROM dbo.shoe_master a

    JOIN dbo.shoe_incoming b

    ON a.size = ISNULL(b.size,a.size) AND a.color = ISNULL(b.color,a.color) [and repeat for all attributes]

    It's feels wrong to me. I cannot recall ever doing that but I probably have done so a long time ago. It produces the result I need.

    Any thoughts on this method?

  • dglane (4/30/2012)


    That was an interesting article indeed, Evil Kraig F.

    I would be comparing two tables (vs. a set of variables or one row). Using my shoe example, I would compare the master shoe table with all attributes populated against an incoming shoe table that did not have all attributes populated. The result should be all rows in the master table that fit whatever attributes are in the incoming table.

    I was thinking I could use ISNULL for the join on the incoming table such as...

    SELECT a.*

    FROM dbo.shoe_master a

    JOIN dbo.shoe_incoming b

    ON a.size = ISNULL(b.size,a.size) AND a.color = ISNULL(b.color,a.color) [and repeat for all attributes]

    It's feels wrong to me. I cannot recall ever doing that but I probably have done so a long time ago. It produces the result I need.

    Any thoughts on this method?

    Your method is HORRIBLY BAD from both a performance AND concurrency perspective. You either break it down into all the permutations, with sub-sprocs, like Gail shows in her blog or you use dynamic sql (being VERY sure you guard against SQL Injection if so). Or you have an application that simply cannot perform/scale. Maybe you don't need the latter so your method will be acceptable for you.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Understood, KGB. This is not associated with an app. It pertains to a batch import. We would be receiving a table with some combination of attributes populated for each row and then need to determine which master rows relate to the partially-populated incoming rows.

  • dglane (4/30/2012)


    Understood, KGB. This is not associated with an app. It pertains to a batch import. We would be receiving a table with some combination of attributes populated for each row and then need to determine which master rows relate to the partially-populated incoming rows.

    I am a performance tuning nut, but I still ALWAYS tell my clients that if a process is running acceptably (not locking, deadlocking, running too long, etc are all OK for you) then you should move on to one of the likely MANY other things on your hit list! Just because you CAN make a process run 100X faster doesn't mean you NEED to do that. And if you don't - move on. 😎

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Your mileage may vary, but I've seen impressive results from adding the RECOMPILE option mentioned in Gail's article.

    Wasn't exactly the same scenario, but similar enough that you might get some benefit from it!

    Otherwise, I'd go the dynamic SQL route.

  • dglane (4/30/2012)


    That was an interesting article indeed, Evil Kraig F.

    I would be comparing two tables (vs. a set of variables or one row).

    See, a single row, to me anyway, is merely a parameter set. I see no reason to use them otherwise. Unless this is many rows of differentiating filters, which not only changes the expectation but the entire paradigm.

    It's feels wrong to me. I cannot recall ever doing that but I probably have done so a long time ago. It produces the result I need.

    Any thoughts on this method?

    Yes, I personally would not do it this way and not use a joined table, I would feed a catch-all-query the parameter set based on Dynamic SQL using sp_executesql with parameters to assist in blocking injection attacks.

    You've explained this about as well as you can, and you've simplified the question for easier comprehension, but in things like this the devil's in the details. Are you simplifying and changing the components for ease or for confidentiality? If it's for ease, I'd recommend you post up a chunk of DDL/Procs/Expected inbound data/indexes/etc, as well as the full real process which we'll be happy to try to help you with.

    If it's for confidentiality I'd start with the parameters to a catch-all-query and go from there. If your inbound filter list has multiple rows, depending on a number of factors but primarily indexing and selectivity, you might cursor the filter list (if it's within a reasonable number) into parameters to a catchall that dumps to a local temp table for final distribution after the cursor loopage. This allows you to multi-seek for different result sets without ending up with a huge mess and allows for different optimization plans to get involved with that data seeking so you don't end up hosed.

    Again, this all depends on a hundred little factors.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply