server slows after a couple hours

  • we found an answer to our perfmon problem with the missing sql counters - since we are running x64 windows and 32 bit sql, we need to run perfmon with the /32 switch. Good to know.

  • There is a profile measure for page splits, you should look at that given your insert description

    Can you list the structure of the table? How are you managing the fillfactor of your indexes?

  • CREATE TABLE [dbo].[TableName](

    [col1] [int] IDENTITY(1,1) NOT NULL,

    [col2] [int] NOT NULL,

    [col3] [int] NOT NULL,

    [col4] [int] NOT NULL,

    [col5] [int] NOT NULL,

    [col6] [decimal](10, 6) NOT NULL,

    [col7] [decimal](10, 6) NOT NULL,

    [col8] [decimal](10, 6) NOT NULL,

    [col9] [bit] NOT NULL DEFAULT (0),

    [col10] [bit] NOT NULL DEFAULT (1),

    [col11] [varchar](20) NOT NULL,

    [col13] [datetime] NOT NULL DEFAULT (getdate()),

    [col14] [timestamp] NOT NULL,

    CONSTRAINT [PK_PrimaryKey] PRIMARY KEY CLUSTERED

    (

    [FHLBRateID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]

    ) ON [PRIMARY]

    CREATE UNIQUE NONCLUSTERED INDEX [UIX_Index] ON [dbo].[TableName]

    (

    [Col3] DESC,

    [Col2] DESC,

    [Col4] DESC,

    [Col5] DESC

    )WITH (PAD_INDEX = ON, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]

  • I can't seem to find the page split, can you tell me where to find that?

  • Sql Server:Access Methods

  • IF FHLBRateID is your primary key and not your identity column that would make the possibilty of page splits being more likely. If FHLBRateID isn't the identity and PK, could you move your clustered index to the indentity?

  • FHLBRateID is both the identity and primary key

  • OK, it is not likely page splits then.

    Let's review since we've gone back and forth on this

    1. Has the problem happened since Tempdb was grown?

    2. How is the database set to grow and how much free space does it have?

    3. How long since the last incident?

    4. What is your average time between incidents?

    5. Have you done any tracing with Profiler?

    David

  • the problem just occured again, so growing tempDB didn't seem to have an impact.

    the data file is about 13GB, 6GB free space, set to grow by 25MB, but I just changed it to grow by 10%. I don't think it has been growing because it has a lot of free space, way more than it is set to grow by.

    avg time between incidents is from 1 hour to 12 hours. happens more during the day while users are in system. avg is probably close to 2-3 hours though.

    we have traced and don't see an increase in activity

    we tried a reindex and update of stats on the database that has the impact.

  • So there isn't any indication of an IO spike during the incident, hmmm.

    Can you describe again what is happening during an incident and how long its last. Does it affect everyone, is it system wide?

  • it only seems to affect 1 database. users affected are about 7 out of 150. apps affected are 3 out of 150.

    basically it's just a huge slowdown. a query that should take 1 second takes 2-3 minutes, and that is causing timeouts on the client side. It lasts until we restart the sql service.

    we even tried detaching that database and reattaching it to see if that would clear the issue, but that didn't work. We are trying to figure out why the problem goes away after a restart of the service. what steps does sql go through on a restart of the sevices - We know it clears proc cache, so we tried that manually. no luck there. it also disconnects all the connections - so we tried killing all the connections and that did not help. what else does SQL do when it shuts down and comes back up?

    so we are trying to find the root of the problem, but we are also trying to figure out why stopping and stating sql fixes the issue for a while.

  • I didn't see it mentioned above, you might want to check sys.dm_os_waiting_tasks for the spid running the query and see what it is waiting on.

  • good suggestion. I'll try that next time we get the bahavior and post the results

  • It seems you have reasonably eliminated IO and general performance possiblities. The service starts does a varity of things from flushing users, memory, and clearing temp db etc.

    Since a restart solves the problem that would point to memory issues but that should affect every databases and process on the server. Are you getting any memory or stack dumps in your errorlog?

    If not then, I think you can eliminate memory for now. I have to ask and don't mean to offend but how experienced are you in breaking down the trace data?

    Since it is only affecting certain users and not the entire server that would tend to point away from Sql Server and\or hardware back to your applications and your code. If that is where the problem lies, the trace should show it if your tracing when the incident occurs. Any chance your missing something in the trace analysis?

    David

  • I don't claim to be a trace expert, but I think I have been pretty thorough with them. something I can note here that does point towards SQL and not the app code, is that in the traces, I can see a few sprocs that normally run in 1 second taking 2-3 minutes with the exact same parameters. We are tracing for statements and how long they are taking. is there anything else you think I should be tracing or looking at?

Viewing 15 posts - 16 through 30 (of 42 total)

You must be logged in to reply to this topic. Login to reply