January 7, 2009 at 11:33 am
A while after upgrading our PeopleSoft databases to SQL 2005, odd performance problems are popping up here and there. Things are sometimes running very slowly, sometimes timing out completely.
If you're not familiar with how PeopleSoft uses SQL, it uses API Server Cursors. Here is how a query is run through PeopleSoft on a 5 million row table:
-- statement 1
declare @p6 int
set @p6=8193
exec sp_cursorprepare
@p1 output
,N'@P1 datetime
,@P2 datetime'
,N'SELECT
A.BUSINESS_UNIT
, A.TRANSACTION_GROUP
, (CONVERT(CHAR(10),A.ACCOUNTING_DT,121))
, A.DISTRIB_TYPE
, A.COST_ELEMENT
, A.ACCOUNT
, A.DEPTID
, A.OPERATING_UNIT
, A.PRODUCT
, A.MONETARY_AMOUNT
FROM
PS_CM_ACCTG_LINE A
WHERE A.ACCOUNTING_DT BETWEEN @P1 AND @P2'
,1
,@p5 output
,@p6 output
select @p1, @p5, @p6
-- statement 2
declare @p2 int
set @p2=180159821
declare @p3 int
set @p3=16
declare @p4 int
set @p4=1
declare @p5 int
set @p5=265
exec sp_cursorexecute
1073746738
,@p2 output
,@p3 output
,@p4 output
,@p5 output
,'2008-01-12 00:00:00:000'
,'2008-01-12 00:00:00:000'
select @p2, @p3, @p4, @p5
-- statement 3
exec sp_cursorfetch
180159821
,2
,0
,265
When I run it through PeopleSoft query and capture the workload through profiler, the execution plan shows a clustered index scan (500,000) rows. I can't tell that it ever completes.
When I run the raw query through MSSMS, it takes a few seconds to return 300 rows. The execution plan shows a key lookup (1 row) on the clustered index followed by a scan (3000 rows) of a nonclustered index.
I saw a KB article entitled FIX: SQL Server 2005 performance may be slower than SQL Server 2000 performance when you use an API server cursor. Also I see a few other things here and here on what might be the same issue. I can't tell from any of these though if it applies to my situation.
Anybody have any ideas? Ever seen this?
Thanks!
Rob
January 7, 2009 at 11:40 am
Several things. One, after upgrading, did you update all your statistics on the database? Also, where, from a functional view, are you experiencing the problem? We had issues with PO's in our Finance system after moving to SQL Server 2005 from SQL Server 2000.
What finally helped was when our PeopleSoft admin ran a trace on the PeopleSoft side that we found a fairly large table that wasn't indexed, and once that table was properly indexed, our performance issues went a way. Why things worked fine under SQL Server 2000, we don't know.
I'm going to ask our PeopleSoft Admin to join SSC, and see if she might be able to help you out some here.
January 7, 2009 at 11:56 am
Thanks Lynn. I did update stats after the upgrade. We run jobs nightly that defrag indexes that get over 10% fragmentation.
This particular table has 9 indexes. The clustered index of 5 million rows is .01% fragmented (according to dm_db_index_physical_stats). There are only a few indexes with more than 1% fragmentation and they are all under 100 pages.
Overall we're getting a lot of complaints in seemingly random areas (CRM 360, hourly FS SQR jobs, changing billing status on orders in FS). Updates/inserts seem fine. The problem appears to be with selects. The most problems are when we try to run anything through PeopleSoft query.
In our app server logs we are seeing cmdtux timeouts that correspond with jolt exceptions in the web server logs. These in turn correspond to PeopleSoft query timeouts.
I'm concerned that the execution plan appears to be different when running the query through managment studio vs through PeopleSoft. There is a Oracle MetaLink bug for the issue (ostensibly) but the article is blank and doesn't provide a link to the fix mentioned in the header. Go figure :hehe:
January 7, 2009 at 12:03 pm
Our PeopleSoft developers aren't too happy with the Oracle site for PeopleSoft either. Our Admin appears to be out to lunch at the moment. As soon as I see her (I did email this thread to her) I'll see about getting her to try and help out.
We spent months trying to figure out our issue, I had even shot gunned numerous indexes recommended by DTA, but that didn't help completely. Like I said, having her figure out how to do a trace from the PeopleSoft side really helped to isolate the problem.
January 7, 2009 at 2:27 pm
Is it only your batch processes that are timing out or are you getting timeouts when the users are loading a page on-line (i.e. bringing up a purchase order)?
Also, are the same processes experiencing performance issues or is the same process sometimes slow and sometimes ok? The performance problems that we experienced were irregular and seemed to occur when several things were happening simultaneously. This made the problem very difficult to isolate and reproduce. Since we couldn't reproduce the problem at will, we couldn't test any attempted solutions reliably either.
Which PeopleSoft application and version is experiencing the problem (i.e. Finance, HCM, SCM, CRM)?
Did you upgrade to SQL 2005 on any applications that are performing ok? We upgraded our HCM version 9 to 2005 with no issues. Our 8.9 version Finance system, though, had a lot of performance problems. The primary symptom that we saw on the DB server were long disk IO latch wait times. Is this similar to what you are experiencing?
January 7, 2009 at 3:09 pm
Is it only your batch processes that are timing out or are you getting timeouts when the users are loading a page on-line (i.e. bringing up a purchase order)?
It is happening during both. It appears to happen anytime PeopleSoft must look something up on a big table.
Also, are the same processes experiencing performance issues or is the same process sometimes slow and sometimes ok?
The same processes consistently. We can reproduce the behavior in DEV and STG as well.
Which PeopleSoft application and version is experiencing the problem (i.e. Finance, HCM, SCM, CRM)?
FS and CRM. We are only using FS, CRM and Portal. We use Portal for single-sign-on only.
Did you upgrade to SQL 2005 on any applications that are performing ok?
Both FS and CRM are experiencing the issue.
The primary symptom that we saw on the DB server were long disk IO latch wait times. Is this similar to what you are experiencing?
I do see high IO latch wait. This is caused by the differences in the execution plans that SQL selects when queries are run using sp_cursorprepare/sp_cursorexecute. I'm trying to figure out how to post images to my site so I can show the difference between the execution plans and the indexes being used.
I have one query on a large table I've been using to test. Keep in mind, if I run it using PeopleSoft Query Manager it will bomb as it tries to do a clustered index scan on 500k rows. If I copy/paste the SQL out of PeopleSoft and run it through Management Studio, it does a key lookup first (1 row) and then a non-clustered index seek on 3000 rows. It runs in under 1 sec. I added an index on ACCOUNTING_DT to test performance. It now uses that index and runs instantly through Management Studio. Run it through PS query though and it still does the clustered index scan and bombs. If I run that query using PS Query 2-tier, it won't bomb because my timeout is higher than the PIA. However, I can see that it still uses a bad execution plan and does the clustered index scan.
Why would it use 2 different execution plans depending on if sp_cursorprepare/sp_cursorexecute is used?
BTW - I tried rolling back from SQL Native Client to the SQL Server ODBC links on my app server but it didn't change the behavior.
Edit: I was able to get the following text execution plans.
Here is the plan when I copy/paste the raw SQL and run it through management studio:
|--Compute Scalar(DEFINE: ([Expr1002]=CONVERT(char(10),[FSSTG].[dbo].[PS_CM_ACCTG_LINE].[ACCOUNTING_DT] as ...
|--Nested Loops(Inner Join, OUTER REFERENCES: ([A].[BUSINESS_UNIT], [A].[TRANSACTION_GROUP], [A].[ACCOUNTING_DT], ...
|--Index Seek(OBJECT: ([FSSTG].[dbo].[PS_CM_ACCTG_LINE].[PSHCI_ACCTG_LINE] AS [A]), SEEK: ([A].[ACCOUNTING_DT] ...
|--Clustered Index Seek(OBJECT: ([FSSTG].[dbo].[PS_CM_ACCTG_LINE].[PS_CM_ACCTG_LINE] AS [A]), SEEK: ...
And here is what happens when PeopleSoft runs it:
|--Compute Scalar(DEFINE: ([Expr1002]=CONVERT(char(10),[FSSTG].[dbo].[PS_CM_ACCTG_LINE].[ACCOUNTING_DT] as ...
|--Clustered Index Scan(OBJECT: ([FSSTG].[dbo].[PS_CM_ACCTG_LINE].[PS_CM_ACCTG_LINE] AS [A]), WHERE: ...
January 7, 2009 at 3:52 pm
What application and tools version are you on?
January 7, 2009 at 4:54 pm
CRM ?
FS 8.9
PeopleTools 8.47.14.
I think... I'll double-check when I'm back in the office tom.
Thx,
Rob
January 7, 2009 at 9:40 pm
Run the following query on you system(s) and let us know the results for your PeopleSoft databases.
select
name,
snapshot_isolation_state,
snapshot_isolation_state_desc,
is_read_committed_snapshot_on
from
sys.databases
January 8, 2009 at 7:19 am
master1ON0
tempdb0OFF0
model0OFF0
msdb1ON0
CRMSTG0OFF1
FSSTG0OFF1
PPSTG0OFF1
CRMDMO0OFF1
FSDMO0OFF1
PPDMO0OFF1
CRMDEV0OFF1
FSDEV0OFF1
PPDEV0OFF1
January 8, 2009 at 10:40 am
After talking with our PeopleSoft Admin regarding the issues we had experienced and did to try to correct our issue, and seeing the results of the query I asked you to run; we thought of one thing for you try with one of your development databases.
Turn OFF READ_COMMITTED_SNAPSHOT on a development database and see if that has an affect on your current issue.
Currently, we only have READ_COMMITTED_SNAPSHOT turned on for our HR system which is running version 9.0. Our finance system is on version 8.9 and we have READ_COMMITTED_SNAPSHOT turned off on that database.
READ_COMMITTED_SNAPSHOT when ON makes more use of tempdb and the upgrade instruction we followed indicated that this should only be turned on for a version upgrade of PeopleSoft which we only did with HR (8.8 to 9.0).
Not saying this will solve your problem, but it may be part of it.
January 8, 2009 at 11:35 am
I turned RCSI OFF in FSSTG, tried the query again and it timed out (runs slow through 2-tier, past the 3-tier's timeout).
Interesting to note that this doesn't affect ALL queries... just some. The ones it does affect though, it affects consistently.
PeopleSoft told us to rewrite our queries or increase the PIA timeouts 😛
January 8, 2009 at 11:50 am
In a way, it reminds me of our problem which actually turned out to be a fairly large UNINDEXED table. The problem I had was that the table that turned out to be the problem never showed up when I would run a tuning trace through DTA. It took our PeopleSoft Admin figuring out how to run a trace from the application side to finally id the problem table.
January 8, 2009 at 1:02 pm
Looking through the various CU's for SQL Server 2005 SP2, I see 3 hot bugs regarding server-side cursor performance changes when upgrading from 2000 to 2005. I'm going to apply CU #10 to STG, enable the hotfixes specific to that issue and see if that helps the issue.
January 8, 2009 at 10:59 pm
Sorry, had a few meetings, and then had to leave work for another meeting (non-work related). Let us know if that helps your performance issue.
Also, if you copy the SQL code to a file, you can use that file as input to DTA and see if there are any indexes that you may want to create. Take the suggestions with a grain of salt, don't just implement tham because it says you should, and keep in mind that if you want them as part of the system, you may have to for go covered indexes (included columns). I haven't had any time to figure out how to get those types of indexes into the PeopleSoft system so that the PeopleSoft tools can support them. You really don't want to have to have two different sets of indexes to support.
Viewing 15 posts - 1 through 15 (of 18 total)
You must be logged in to reply to this topic. Login to reply