Query execution speed half on anything other than x5570 core 0

Question

Post reply

Query execution speed half on anything other than x5570 core 0

aidan-867828

Valued Member

Points: 74
More actions
August 20, 2009 at 2:51 am

#213576

Hi, I'm really stumped on this and can't find anything to try and resolve it. Any hints of clues as to what to look for would be really appreciated.
I installed 64bit SQL Server 2005 on a single physical CPU Nehalem Xeon x5570 with HT (so 8 virtual cores are shown to the OS) on a Windows Server 2008 Standard edition Dell box.
I noticed the queries were not running as fast as I expected (around the same speed as a previous generation Xeon box) and, after trying everything else I could think of, I messed with the affinity mask settings. My results found that for CPU bound queries (that do not seem to run in parallel anyway) if the CPU bound part schedules on virtual core 0 they complete about twice as fast as if they schedule on any other core. Actually this is true no matter what the affinity mask is set to. But the only way to ensure they schedule on core 0 is to set the processor affinity mask for that core only.
Obviously this doesn't scale well, but I don't really have much experience of this CPU / system setup, so if anyone knows what I'm doing wrong, or how to address this I'd really appreciate it, as most of the queries for the main system running on this are CPU bound.
Thanks.
UPDATE: this is still the case with Hyper Threading turned off and even with some of the physical cores turned off. If the query schedules on core zero it runs twice as fast as any other core.
Additional info, there's nothing more fancy than some large tables, in-line table valued functions, scalar functions and some other views in the database (ie no user external stored procedures, c# or any other non-SQL stuff).

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply

aidan-867828 Valued Member Points: 74 More actions · Answer 1

Added SQL Server 2005 SP3 but still the same issue. Anyone got any ideas?

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 2

No idea offhand. This might be something worth contacting customer support about. I could understand if the Hyperthreading was enabled, but not when it's disabled.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

aidan-867828 Valued Member Points: 74 More actions · Answer 3

I'm starting to wonder if it's something to do with the non uniform memory architecture of the Xeon x5570. There's 6GB of ram on this box. I don't know how it's physically installed yet (I'll open the box this weekend). Do you know if there are any known issues with this and SQL Server?

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 4

SQL from 2005 onwards handles the NUMA architecture just fine. However, with 4 cores (how many physical processors?) and 6 GB memory, I don't see how the memory can be equal for each NUMA node, unless it's 2 physical processors with 3GB memory each.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

aidan-867828 Valued Member Points: 74 More actions · Answer 5

One physical CPU (quad core). Is there going to be any solution with 6GB, (ie if I find one physical memory module) Or does it sould like I am going to need to change the memory setup?

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 6

I'm not familiar with the x5570 and I'm not as familiar with NUMA as I would like to be. As far as I know, a NUMA node would consist of one physical cpu and a certain amount of memory. Could be mistaken there. I suggest you investigate Intel's technical specs of the x5570 for details on how it behaves and how the NUMA nodes are set.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

aidan-867828 Valued Member Points: 74 More actions · Answer 7

aidan-867828

Valued Member

Points: 74

September 3, 2009 at 3:38 am

#1048604

Update.

Memory is 3x2GB Dual Rank UDIMMs 1333Mhz

bcronce SSCrazy Points: 2597 More actions · Answer 8

NUMA:

The OS defaults to allocating memory from the current NUMA node that the thread is executing. If the thread changes nodes or runs out of available memory on the current node, it will then allocate memory from a different node.

I know SQL2008 is NUMA aware and tries to take advantage of reduced latency/extra bandwidth of NUMA, but there shouldn't be a huge diff between which node you're on.

The QPI connect between the nodes is 25.6 GB/s and very low latency. It is still additional latency, so you will get reduced performance and possible contention for the other node trying to access it's own data, but it should still be peppy

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 9

1) check the sql log just after startup for NUMA messages. If you don't see them then it isn't doing it I think, at least soft numa. I doubt that a computer with a single CPU would be configured for hardware numa.

2) NUMA can be a collection of CPUs per node, not just one.

3) IIRC it is (or used to be) best to have match pairs of memory modules. You have an odd number. not sure if this is relevant in your architecture or not.

4) I would be on the phone to both hardware manufacturer and Microsoft on this one.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Paul White SSC Guru Points: 150468 More actions · Answer 10

Hi Aidan,

If you could run the following and upload the results it would help enormously.

Thanks

Paul

-- scheduler info (inc. NUMA)

select parent_node_id,

scheduler_id,

cpu_id,

[status],

is_online,

is_idle,

preemptive_switches_count,

context_switches_count,

idle_switches_count,

current_tasks_count,

runnable_tasks_count,

current_workers_count,

active_workers_count,

work_queue_count,

pending_disk_io_count,

load_factor,

yield_count,

last_timer_activity,

failed_to_create_worker

-- quantum_length_us

from sys.dm_os_schedulers

where scheduler_id < 256

order by

scheduler_id;

-- useful system info - as seen by SQL Server

select cpu_count,

hyperthread_ratio,

physical_memory_in_bytes,

virtual_memory_in_bytes,

bpool_committed,

bpool_commit_target,

bpool_visible,

stack_size_in_bytes,

os_quantum,

os_priority_class,

max_workers_count,

scheduler_count

from sys.dm_os_sys_info;

-- selected configuration options

select name,

value,

value_in_use,

is_dynamic

from sys.configurations

where name in

(

'max worker threads',

'priority boost',

'affinity mask',

'min server memory (MB)',

'max server memory (MB)',

'lightweight pooling',

'affinity I/O mask'

)

order by

name asc;

-- wait stats (cpu)

select top (10)

wait_type,

waiting_tasks_count,

wait_time_ms,

max_wait_time_ms,

signal_wait_time_ms

from sys.dm_os_wait_stats

order by

signal_wait_time_ms desc;

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

aidan-867828 Valued Member Points: 74 More actions · Answer 11

Hi Paul

Because of the issues, at the moment the server has cores 2 and 3 disabled in the bios with cores 0 and 1 enabled and the affinity masks set with core 0 for processor and core 1 for I/O. So I'm not sure how useful these results will be. When I get a chance I can re-boot the machine and reset the bios and affinity masks to default (or another setting if you would prefer). But as it currently is here are the results:

parent_node_id scheduler_id cpu_id status is_online is_idle preemptive_switches_count context_switches_count idle_switches_count current_tasks_count runnable_tasks_count current_workers_count active_workers_count work_queue_count pending_disk_io_count load_factor yield_count last_timer_activity failed_to_create_worker

-------------- ------------ ------ ------------------------------------------------------------ --------- ------- ------------------------- ---------------------- ------------------- ------------------- -------------------- --------------------- -------------------- -------------------- --------------------- ----------- ----------- -------------------- -----------------------

0 0 0 VISIBLE ONLINE 1 0 271357 2537913 3041200 12 0 22 11 0 0 15 8259666 708727658 0

0 1 1 VISIBLE OFFLINE 0 1 2 5 792 3 0 4 1 0 0 1 797 708408901 0

64 255 255 VISIBLE ONLINE (DAC) 1 1 4 9 794 2 0 3 1 0 0 1 801 708410336 0

(3 row(s) affected)

cpu_count hyperthread_ratio physical_memory_in_bytes virtual_memory_in_bytes bpool_committed bpool_commit_target bpool_visible stack_size_in_bytes os_quantum os_priority_class max_workers_count scheduler_count

----------- ----------------- ------------------------ ----------------------- --------------- ------------------- ------------- ------------------- -------------------- ----------------- ----------------- ---------------

2 2 6431227904 8796092891136 165568 585924 585924 2093056 40000 128 512 1

(1 row(s) affected)

name value value_in_use is_dynamic

----------------------------------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------

affinity I/O mask 2 2 0

affinity mask 1 1 1

lightweight pooling 0 0 0

max server memory (MB) 2147483647 2147483647 1

max worker threads 0 0 0

min server memory (MB) 0 0 1

priority boost 1 1 0

(7 row(s) affected)

wait_type waiting_tasks_count wait_time_ms max_wait_time_ms signal_wait_time_ms

------------------------------------------------------------ -------------------- -------------------- -------------------- --------------------

SLEEP_BPOOL_FLUSH 51251 38657 31 11762

WRITELOG 87202 662395 936 5943

ASYNC_NETWORK_IO 929545 1021338 1996 4773

PAGEIOLATCH_EX 96972 74942 546 4290

SLEEP_TASK 432690 1030449 93 4196

LAZYWRITER_SLEEP 726376 1414146146 85544474 1138

PAGEIOLATCH_SH 17446 47127 374 795

PAGELATCH_UP 591 9781 530 234

LATCH_EX 61 1466 109 140

BACKUPBUFFER 8341 21403 327 109

(10 row(s) affected)

Paul White SSC Guru Points: 150468 More actions · Answer 12

I'll have a look at the results you posted, but the site formatting makes it kinda hard 🙂

It may pay to save the results to a CSV file (right-click and save as) and attach that instead?

It would certainly be better to have the results when everything is reset to default.

One last question: which exact OS are you running?

Thanks anyway!

Paul

edit: It is rarely a good idea to enable the 'priority boost' option - when you reset to default, please don't forget this one.

It would also be cool to get the results of my sys.dm_os_schedulers query when your query is executing on core zero, and then with it running on a problematic core. If you are able to get a third reading when allowing it to run in parallel, that would be extra-awesome!

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 13

I second the priority boost comment. I also note max memory is defaulted - not best practice, but likely not a culprit here.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

aidan-867828 Valued Member Points: 74 More actions · Answer 14

Sorry about the previous text format. I couldn't get IE 8 to upload anything (I'm on another machine using IE 6 and it seems to work from here).

The machine is Windows Server 2008 Standard Edition SP 2

Dell Server R610 X5570 2.93Ghz.

Attached are the files. For the file with the results from Core 0 executing the query, I had to use the affinity mask to make core 0 execute (so I hope that doesn't invalidate the results).

Thanks for your help.