Flat files - any advantages?

  • Jeff,

    The queries are long, not because they are complex, but because they operate on huge amounts of data. 

    We also use tables that are pre-prepared from all the data for some of the more common analyses, as well as views to keep the time down. 

    I do actually think that I have my answer here.  There was valid logic in the decision to leave certain tables denormalized.  We can certainly move things to a more normalized state, but we have to use care. 

    David

     

  • David...

    What do you call "huge?"   How many denormalized rows are we talking about?  I work with a denormalized input from a million customer IVR and there are currently almost 20 million rows in the detail table... the report I create is by hour of the day for each day for a whole month  9 parameter rows times 31 days times 24 time periods for each row including daily totals, etc... AND, it prints a summary... the whole shootin' match takes about 12 seconds even though there are 20 million rows in the table.  To me, 20 million rows is not huge.

    I'm thinking that 4 hours is a bit much for what you're trying to do... credits to Navy beans says RBAR, poorly constructed views, and, perhaps, a misunderstanding of how to use indexes (if there are any)... maybe even the total absence of a primary key which, even when used on a mere Identity column, can speed up a query provided the devlopers haven't killed the ability to use indexes with poorly formed WHERE clauses or poorly formed joins. 

    Anyway, glad your happy with the "denormalized" answer.  Let us know how things turn out...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    I am not sure how much information about the application I can reveal in a post, so I cannot be specific. 

    I think that the major problem that we have is with application function.  Since it is used for data analysis, we have to give the user quite a bit of power in the generation of queries.  The where clause can get very large when certain selections are made.  We may have to figure out a way to eliminate choices that to the user is the simple of clicking a "Select All" button, but could create a very large number of conditions in the where clause. 

    Right now most of the queries that I am generating are against views in several databases (one database for each year since 2000 - there will be one generated for each coming year).  I only hit the other (non-year) databases for the UI display items.  Each view is 32,000,000 rows and 75 columns.  There are no indexes on the view - only a simple identity would make sense, and I am not sure if this would help speed things up. 

    Internally, each year is run on a thread as a separate query on a web service.  When the threads all complete they are combined into a single DataSet.  Simple queries return in seconds.  I have not tested very long running queries simply because I had to get on with development, but testing limits is coming up soon.  We may have to make changes in both the database and application. 

    David

Viewing 3 posts - 16 through 17 (of 17 total)

You must be logged in to reply to this topic. Login to reply