Weird pattern of behaviour

  • Hi, Group.

    I have a data load process that I run to load test data. The application under test runs on Weblogic 9.2 and the data load script is in Python. The backend database can either be Oracle or SQL Server. When I use Oracle I have no problems. When I try SQL Server however, I see the pattern as illustrated in the chart linked below. The chart plots time in minutes on the x-axis and events loaded per minute on the y-axis. You can see that as time goes on the number of events loaded decreases. The only difference between the Oracle test and the SQL Server test is the database. I use the exact same hardware for both sets of test. Oracle and SQL Server share the same hardware but only one database is up and running at a time. I'm half way through a book I bought a couple of days ago (High Performance SQL Server DBA) and all the queries that I have run show nothing wrong (so far - maybe the answer lies in the other half of the book).

    My question is this. Does anyone have any suggestions of things to check? Some quick hits. I'm more familiar with Oracle than SQL Server so I don't really know where to start. Another thing to note is that while the data load was underway the SQL Server process gradually consumed more CPU and memory. By the time I killed the data load SQL Server was taking about 100% CPU and just under 1Gb RAM.

    The SQL Server hardware is:

    HP ProLiant DL360 G4 (X86-based PC)

    4x 3.4GHz CPU

    Total Physical Memory 3,583.47 MB

    Thanks, Max

    http://www.bodoin.com/myimages/performance.jpg

  • Does the table being loaded to have many indexes ? Does it have a clustered index ? If it has a clustered index, does the data being inserted have a natural ascending order (example date/timestamp) that matches the clustered index ?

    Is the database in 'Simple' logging mode ? If not, can you change it to simple mode for the duration of the load ?

  • There are actually as many as 5 tables tables being populated and they do have a bunch of indexes on them. The data load consists of only inserts. Once data is in the system, it's never updated. However I understand that the load placed on the system while updating the indexes could be a contributing factor to the poor performance although I don't understand why the performance degrades slowly over many hours then completely falls apart.

    Each table has a clustered index on the PRIMARY KEY which is a sequence number (generated by the application).

    I don't know about simple logging mode so I will need to check into that some more.

    The first time I noticed this problem I decided that using SQL Server's default database setting was probably a bad idea so I added filegroups and adjusted the sizing accordingly i.e. make sure the files were big enough so they didn't need constant growth, etc.

    The ironic thing about this exercise is that this is only the dataload. The *real* test was going to be running queries against that loaded data. That was until I found out about this problem first.

    Cheers, Max

  • Have you confirmed that the data and/or log files are not auto-growing ?

    If there is a percentage based growth increment, it could explain the gradually worsening performance, as each increment grabbed from the filesystem is larger.

    If you aren't in simple logging mode, and the inserts aren't in the order of the clustered index therefore generating large numbers of page splits, you could be generating a large amount of log writes, forcing the log to repeatedly auto-grow.

  • The files are set to allow autogrowth but I sized them so that they don't need to grow. I also just checked and each file still has plenty of free space. I don't use % growth, I specify an actual amount.

    I switched the database to Simple logging mode now.

    I'm glad you mentioned plag splits though. The book I am reading devotes just under one page to page splits so I don't really know what to make of the number. I just started a new load about 30mins ago and I have over 9,000 page splits (EDIT: just checked, that's page splits/sec, not total page splits). I know from looking at this number before that it climbs to over 100,000. However, I don't know if that is bad or not. Some metrics are relative. I think I can reduce the number of page splits by specifying a fillsize of something < 100%.

    I've also been looking at Windows counters. Here's what I can report:

    1. available memory is over 3Gb

    2. average pages / second is 0.008

    3. average disk queue length is 0.269

    4. processor queue length is 0

    So from looking at that handful of metircs it doesn't look like the OS is at fault. The trouble is I've been through all of the diagnostic SQLs that I have and nothing has raised a flag. If I am to believe the results of these scripts everything is A-OK but that's clearly not the case.

    Cheers, Max

  • My data load is loading a lot of data in quick order. I think I have done an OK job of sizing the files, is it possible for me to size a table as well? i.e. the same as Oracle initial extents, max extents, etc.? Perhaps if I created the table with enough extents to begin with MSSQL wouldn't need to keep allocating more extents. I checked some more details:

    Checkpoint pages/sec 63524

    Extent Deallocations/sec 5

    Extents Allocated/sec 1438

    FreeSpace Page Fetches/sec 24

    FreeSpace Scans/sec 24

    Lazy writes/sec 0

    Mixed page allocations/sec 262

    Page Deallocations/sec 21

    Page lookups/sec 30823686

    Page reads/sec 3005

    Page Splits/sec 11333

    Page writes/sec 64573

    Readahead pages/sec 384

    Without really understanding these values, they look OK other than the Page XXXX/sec. Those numbers seem very high which probably makes sense since I am blasting the server with data.

    All thing being equal I would like to see a steady load. I'm not trying to compare the performance with Oracle, that's not the goal, but I would like to see a static level of performance in terms of events loaded per minute.

    Thanks, Max

    EDIT: apologies for the formatting. I spent a while lining it all up forgetting that the forum software would likely wipe out all the whitespace without special formatting e.g. tags (if they can be used here).

  • I think with the high number of page splits I am seeing couple with the high percentage of fragmentation, I am going to mess with the fillfactor. There doesn't appear to be a rule of thumb so I just need to jump in. I'm starting with 50 for now. I can finr tune it later as necessary. Fingers crossed that this solved my problem...for now.

  • quite bizarre. Even after changing the fillfactor to 50%, I am getting a lot of page splits and my indexes are still fragmented after only a couple of minutes of the data load running.

  • Hi, have you tried dropping all indexes before the load and adding them after you are done? 

  • No, I can't do that. The database is going to be in production and the indexes will be required. I understand your point though, if this was a one-off exercise then that would perhaps be a good idea. I may even test that theory just to see if the pattern changes. Even though it's not a valid solution it might shed some more light on the problem.

    Last night I created a Tuning trace file - 420Mb. I started the Database Engine Tuning Wizard before I left for home and when I got in this morning it was at exactly the same spot as I left it. I would have hoped that it would be able to tell me something in 14 hours.

    Thanks, Max

  • Have you checked locks and waits?  That might shed some light on where the bottleneck is. 

    We generally try to load our data off hours when we can drop all the indexes and then add them after the load is done.  You should re-index affected tables after the load anyway.

     

  • Yip, there are no blocking locks or deadlocks.

    The primary reason for this data load was so that I would have data to test queries with but the process of loading it identified this problem. I should probably not have used the term data load because that's confusing. Data load implies a process where (a) a file of data arrives, (b) the data is loaded, (c) users starts using the data. That's not the case here. This data load is representative of heavy volume of transactions being loaded into the system via the application's APIs. There is nothing that can be done to schedule when data comes into the system. A company with a worldwide presence could be loading data 24 hours a day. There is potentially no down time to allow recreation of indexes. Apologies for any confusion that may have caused.

    Thanks, Max

  • Some more questions then:

    - Are there any triggers on the tables being inserted into ?

    - Are there referential integrity constraints ? Are the tables containing the primary key side of these RI constraints the tables being inserted to ?

    Regarding fillfactor. 50% FF is the default. To optimize for data writes (at the expense of reads), you'd set FF lower than 50%. To optimize for reads, you set FF to a high %.

    Regarding this:

    >>Each table has a clustered index on the PRIMARY KEY which is a sequence number (generated by the application).

    What is the data type/nature of this generated key ? Is it in a naturally ascending order, or is it something like a GUID ?

    A generated key that has no natural order may not be an optimal candidate for a clustered index and may contribute to the large number of page splits you're seeing.

     

     

  • 1. no triggers

    2. no constraints

    In fact, there is nothing in the schema except for tables and indexes.

    The fillfactor default is not 50%, it's 0 or 100 which basically amount to the same thing.

    The PRIMARY KEY data types are all integers. The application uses Kodo which is a JDO implementation. Kodo uses a table called JDO_SEQUENCE which is used to generate all the PRIMARY KEY values for all the tables. I think I will try configuring the tables with NONCLUSTERED PRIMARY KEYs to see if that makes a difference.

    Thanks, Max

    I am running DTA again with a smaller trace file and it seems to be processing fine, but still slow. Hopefully have some results from that shortly as well.

    EDIT: DTA completed with no recommendations.

  • Changing the PRIMARY KEYS to NONCLUSTERED has made no difference at all. I am seeing the exact same performance, page splits and indexx fragmentation.

Viewing 15 posts - 1 through 15 (of 46 total)

You must be logged in to reply to this topic. Login to reply