Too much code for this T-SQL

  • Just to add to the clamor, normalization will most likely speed your queries and such up, not slow them down. Done wrong, it'll break everything, of course. Done right, it will make the whole database faster, easier to work with, and more reliable.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Thanks GSquared. Much appreciated.

  • unfortunately, the people who wrote the NHS datasets are morons....

    The data gets extracted like this from the source systems in flat file format and loaded to the DW like this.

    My immediate reaction, the first time I saw this was "What if there is a 15th operaton?"

    You could create a view which normalised the data

    CREATE VIEW vwOperations AS

    SELECT recordID,1 as operationNumber,Operation1 as Operation from sourceRecord where Operation1 is not NULL

    Union

    SELECT recordID,2 as operationNumber,Operation2 as operation from sourceRecord where Operation2 is not NULL

    Union ....

    And then simply query the operation field in the view for the required text.

    It won't make the query any quicker but it does help with code re-use.

  • aaron.reese (7/6/2011)


    unfortunately, the people who wrote the NHS datasets are morons....

    The data gets extracted like this from the source systems in flat file format and loaded to the DW like this.

    My immediate reaction, the first time I saw this was "What if there is a 15th operaton?"

    You could create a view which normalised the data

    CREATE VIEW vwOperations AS

    SELECT recordID,1 as operationNumber,Operation1 as Operation from sourceRecord where Operation1 is not NULL

    Union

    SELECT recordID,2 as operationNumber,Operation2 as operation from sourceRecord where Operation2 is not NULL

    Union ....

    And then simply query the operation field in the view for the required text.

    It won't make the query any quicker but it does help with code re-use.

    I'd recommend to do it just the other way around: use a normalized table and create a CrossTab (or pivoted) view, if needed.

    The fact the source data are as shown doesn't imply the table structure has to reflect the same structure. Design flaws of source tbles need to be corrected when loading the data.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Since its not always possible to actually modify the raw input data from the source, which I've seen many times, the best solution in my opinion is often to create a set of procedures, possibly using Integration Services, that takes the raw data as an input and feeds the data into a normalized set of tables as an output.

    This is of course not always possible, especially if disk space is an issue, but I find it often works out much better than trying to formulate all your queries and procedures around a non-normalized set of data.

    LutzM (7/6/2011)


    aaron.reese (7/6/2011)


    unfortunately, the people who wrote the NHS datasets are morons....

    The data gets extracted like this from the source systems in flat file format and loaded to the DW like this.

    My immediate reaction, the first time I saw this was "What if there is a 15th operaton?"

    You could create a view which normalised the data

    CREATE VIEW vwOperations AS

    SELECT recordID,1 as operationNumber,Operation1 as Operation from sourceRecord where Operation1 is not NULL

    Union

    SELECT recordID,2 as operationNumber,Operation2 as operation from sourceRecord where Operation2 is not NULL

    Union ....

    And then simply query the operation field in the view for the required text.

    It won't make the query any quicker but it does help with code re-use.

    I'd recommend to do it just the other way around: use a normalized table and create a CrossTab (or pivoted) view, if needed.

    The fact the source data are as shown doesn't imply the table structure has to reflect the same structure. Design flaws of source tbles need to be corrected when loading the data.

  • Ninja's_RGR'us (6/28/2011)


    No offense but I run reports on 10s of millions of record in subseconds with 10-40 tables in joins on a 2 core 4 GB ram server... not slow when you know your stuff.

    No offense, but I bet this has MUCH less to do with "your stuff" and way MORE to do with the amount of data and especially the distribution of the data values - at least if this is a routine and normal occurrence. 😎

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (7/7/2011)


    Ninja's_RGR'us (6/28/2011)


    No offense but I run reports on 10s of millions of record in subseconds with 10-40 tables in joins on a 2 core 4 GB ram server... not slow when you know your stuff.

    No offense, but I bet this has MUCH less to do with "your stuff" and way MORE to do with the amount of data and especially the distribution of the data values - at least if this is a routine and normal occurrence. 😎

    Divide and conquer.

    I didn't say 10M rows per table I was talking combined on all tables.

  • Ninja's_RGR'us (7/7/2011)


    TheSQLGuru (7/7/2011)


    Ninja's_RGR'us (6/28/2011)


    No offense but I run reports on 10s of millions of record in subseconds with 10-40 tables in joins on a 2 core 4 GB ram server... not slow when you know your stuff.

    No offense, but I bet this has MUCH less to do with "your stuff" and way MORE to do with the amount of data and especially the distribution of the data values - at least if this is a routine and normal occurrence. 😎

    Divide and conquer.

    I didn't say 10M rows per table I was talking combined on all tables.

    Divide and conquer - so you are doing interim steps joining some of the tables at each step and not this:

    select ...

    from table1

    join table 2 on ..

    join table 3 on ..

    join table 4 on ..

    join table 5 on ..

    join table 6 on ..

    .

    .

    .

    join table 32 on ..

    where . . .

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • If you know what I'm doing then why do you ask ? :w00t:

    Few pointers from my current client. Even if this is an ERP, the system gets about 500 reads per write. To find out for you client(s). Start a trace and select the audit logout even. Then do the sum of reads / writes to get the ratio. I let mine run for 2 weeks to make sure I had accurate numbers (including end of months). The figures where pretty much good from day 1 tho.

    I'm currently writing reports straight on the PROD ERP db. Knowing the R / W ratio is insanely skewed I pretty much allow myself to add any index I need to (assuming big improvement on 1 query). Now I don't go overboard and build 15 X 10 columns on the same table... or even one for that matter, but you get the idea.

    Here I have the luxury to add anything I want, then come back a month later and see if its being used. If not I verify that indeed the report is being used and then I whack the index. So far I haven't had to kill many of them... and only when another report built a better index.

    I usually write the whole report as a single query. Once I'm done I check if the plan optimzation has been terminated b4 completion (I preffer 0 warning, not even good enough plan found).

    Since I know index and stats are maintained daily and more I assume that it won't get better by itself magically (tho I'm starting to look into setting hourly update stats jobs).

    I then divide the query and apply the best filters to speed it up myself (knowing how the users tend to use the reports and then checking the usage history). I start with the parameters that are not optional and should filter out the most rows. Then I leave the optional parameter filters for other steps.

    For multi-value parameters I've recently found that temp table + PK seemed to always beat the temp table without PK and that either always beat the crap of using the splitter function straight in the where or even a table variable.

    No I've not tested with stats only. But I know that PK constraint are being used and considered by the optimizer and the PK seems to be more natural to me over stats.

    Add / edit indexes, stats as required. Rince, repeat.

    PS Yes I can take pretty much take as much time as I need for any reports. I have a manager who knows the cost of bad performance and knows I know my stuff so he lets my do my voodoo.

    Any pointers from your consulting gigs Kev?

  • Sounds like you got a sweet gig there!

    My only thing was that I have not seen many-table joins be efficient outside of tight PK/FK scenarios. The slightest variance in estimations on inputs or joins and the plan can go to hell in a heartbeat when you have too many things tied together.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (7/7/2011)


    Sounds like you got a sweet gig there!

    My only thing was that I have not seen many-table joins be efficient outside of tight PK/FK scenarios. The slightest variance in estimations on inputs or joins and the plan can go to hell in a heartbeat when you have too many things tied together.

    Agreed. I've seen similar results here.

    The db is really well designed here. All tables have Clustered PK + often a couple more unique constraints and then all the index I can throw at them. IIRC the data / index ratio was around 70 / 30 before I started adding more indexes in. So all that definitely helps me squeeze more joins in :w00t:.

    Also most of the reports I'm currently doing are the "drill me down 8 levels from here" kind to get this specialized data. So depending on where the main filter is on the report I either start top to bottom or the reverse (if I have to split the queries).

    This place is as close as OLTP / OLAP heaven as you can imagine!

  • I was wrong about the good enough plan "warning".

    Gail has some sweet info and bottom line is that good enough plan is better than no warning at all (she posted a link in the discussion).

    http://www.sqlservercentral.com/Forums/FindPost1140232.aspx

    Good news is that I didn't use that technique too often nor did it increase query planning time too much (if at all).

Viewing 12 posts - 16 through 26 (of 26 total)

You must be logged in to reply to this topic. Login to reply