Two new TPC-E OLTP Benchmark Submissions have shown up in the last month or so since I last recapped three recent submissions back on June 15. The TPC-E benchmark is a very useful comparison and sizing tool for OLTP workloads. Assuming you have enough I/O performance to drive the workload, TPC-E performance is primarily limited by CPU performance. The TPC-E benchmark is not as dependent on a completely unrealistic, oversized I/O subsystem like the older TPC-C benchmark was. You are not allowed to use RAID 0 for your storage subsystem with TPC-E (like you could with TPC-C)
The TPC-E benchmark has results going back to 2007, with a number of different processor families and generations represented from both Intel and AMD (although there are only a few results for systems with AMD processors, which tells you something pretty significant). This means that you can probably find a submitted system that has very similar set of processors, if not an exact match for an existing or a planned system that you want to compare.
The first benchmark result is for an IBM System x3850 X5 submitted on June 27, 2011, that has a score of 2,862.61 tpsE. This is for a four socket server, with the 10-core, 2.4GHz Intel Xeon E7-4870 processor. With hyper-threading enabled, this gives you 80 logical processors, along with 1TB of RAM. This system has a total of 105 spindles, primarily using Direct Attached Storage (DAS), which is not an insane amount by any means. Especially since they are only 10K SAS drives in six RAID 5 arrays… It is also using an 11.6TB initial database size.
The second benchmark result is for a Fujitsu PRIMEQUEST 1800E2 submitted on July 27, 2011, that has a score of 4,414.79 tpsE. This is for an eight socket server, with the 10-core, 2.4GHz Intel Xeon E7-8870 processor. With hyper-threading enabled, this gives you 160 logical processors, along with 2TB of RAM. This system has a total of 360 spindles, using Direct Attached Storage (DAS). There are 360 64GB SLC SSDs, with RAID 5 for the data files and RAID 10 for the log file. It is also using an 18.4TB initial database size.
These are actually two pretty interesting benchmark submissions. The Fujitsu system is essentially twice the size of the IBM system (eight sockets vs. four sockets), with twice the RAM (2TB vs. 1TB), and what would seem to be a much more powerful I/O subsystem (SLC SSDs vs. 10K SAS drives), with over triple the spindle count for the bigger system, along with RAID 10 for the log file.
Yet despite all of this, the eight socket system does not have a tpsE score that is twice as high as the four socket system. A score of 4,414.79 tpsE is only 1.54 times higher than a score of 2,862.61 tpsE. This means we do not see linear scaling as we go from four sockets to eight sockets using the current top of the line Intel Xeon E7 processor.
This is probably a limitation of Non Uniform Memory Access (NUMA), since we did see much closer to linear scaling going from two sockets to four sockets with this same Intel Xeon E7 family. There was a recent benchmark for an IBM System x3690 X5 system submitted on May 27, 2011, that has a score 0f 1,560.70 tpsE. This is for a two-socket server, with the 10-core, 2.4GHz Intel Xeon E7-2870 processor. With hyper-threading enabled, this gives you 40 logical processors, along with 512GB of RAM. A score of 2,862.61 tpsE is actually 1.83 times higher than a score of 1560.70 tpsE. This is much closer to linear scaling as you go from a two socket system to a four socket system.
What this means for you is that you would probably be better off (from an overall CPU capacity perspective) with two, four socket servers instead of one, eight socket server, assuming you can split or partition your workload between the servers. You would also be better off with two, two socket servers instead of one, four socket server.