October 6, 2010 at 8:13 am
manufacturer is STEC, judging by the model no from their Zeus range.
The process is a heavy user of Tempdb. We also tried user database log and data files on SSD with no advantage noted.
---------------------------------------------------------------------
October 6, 2010 at 10:43 am
It's kind of bizarre that you're not seeing any improvement. One of my colleagues at this client site told me they had moved TempDB to an SSD and saw incredible performance gains. I've also heard anecdotally through the years that it works wonders. If you operate on the assumption that the SSD isn't the problem then I suppose you have to examine the layers above the SSD.
1. Is it part of the SAN? If so, is it fibre channel or iSCSI?
2. Is there a storage cache between the server and the SSD? Most SAN's provide large caches and write to the disks asynchronously.
3. Is the SSD properly configured within the storage array, i.e. does the SSD work well and play well with the software that runs the storage array, SAN or otherwise?
4. How is the network performing?
Some older SAN's may have some inherent hardware limitations, for example, all devices must be configured as RAID 5.
I'm sure there are more but that's all I can think of off the top of my head. Good luck.
"Beliefs" get in the way of learning.
October 6, 2010 at 3:30 pm
Robert Frasca (10/6/2010)
It's kind of bizarre that you're not seeing any improvement. One of my colleagues at this client site told me they had moved TempDB to an SSD and saw incredible performance gains. I've also heard anecdotally through the years that it works wonders. If you operate on the assumption that the SSD isn't the problem then I suppose you have to examine the layers above the SSD.1. Is it part of the SAN? If so, is it fibre channel or iSCSI?
2. Is there a storage cache between the server and the SSD? Most SAN's provide large caches and write to the disks asynchronously.
3. Is the SSD properly configured within the storage array, i.e. does the SSD work well and play well with the software that runs the storage array, SAN or otherwise?
4. How is the network performing?
Some older SAN's may have some inherent hardware limitations, for example, all devices must be configured as RAID 5.
Its fibre channel, 256Gb cache (IIRC), no known configuration problems, the SAN is all carved up raid 5 but the SSD disk itself is Raid 0 (this was just a test,we had no issues but wanted to see if SSD helped as we had one available).
The SAN is very new.
---------------------------------------------------------------------
October 7, 2010 at 9:09 pm
I had a client switch over to SSD's for most of the storage on the instance. The first database moved to SSD was TempDB. We didn't see much improvement right off. Then the heavy workload hit and it made a remarkable difference. The server had 32 GB RAM and it seemed that most of the work in TempDB before the heavy workload hit was actually performed there. When memory pressure forced physical IO to TempDB, the performance was very noticeable.
Todd Fifield
October 8, 2010 at 3:30 am
Just for Info the ETL process we used during the testing runs for about 20 hours and shifts about 200GB of data (thats the size of the database it creates at the end of it).
Our feeling is that as the SAN has a cache of 256GB all the activity takes place there so the speed of the disk is taken out of the equation.
The server itself has 38Gb memory assigned to SQL (64bit)
---------------------------------------------------------------------
October 8, 2010 at 6:07 am
george sibbald (10/8/2010)
The server itself has 38Gb memory assigned to SQL (64bit)
Is that server supporting any other apps besides SQL Server? How much memory is on the machine total? Are there multiple instances of SQL Server on the server?
Is it a physical server or a virtual server? How many cpu's?
Does the SQL Server service user account have the "lock pages in memory" user right?
How much memory is SQL Server actually using?
Where is the source of the data that the ETL processes are using to create the database? The same server, a different server? It's possible that your bottleneck isn't TempDB, it's the source.
Speaking of bottlenecks, have you used perfmon to figure out where your bottleneck is? Are you seeing a lot of queued I/O's on the SSD or any other devices? Memory pressure? CPU pressure? What is your MAXDOP setting? (Parallelism can be a performance killer for in certain scenarios.) Are there indexes on the target tables? (If so, you might drop the indexes, load the data and then add the indexes back.)
I'm just curious because something doesn't seem right so I figured I'd ask as many questions as I could think of and maybe it will trigger some brain cells for you and others.
"Beliefs" get in the way of learning.
October 8, 2010 at 6:30 am
Could you give a shot at IOMeter on your previous disks and your new SSD to see the differences?
Only test for 8K and 64K packet, read and write, sequential and random, up to as many concurent IO as you have cpu cores.
How many disk is your SSD replacing? A SSD might be very good but enough disks will end up doing the same job.
Are you alone on your SAN or does your SAN has any load?
If you are alone you might end up using all the cache, which is a situation that will not happen when you end up using most of your SAN (even if that amount of cache seems huge, it's actualy tiny compared to the full size of the SAN).
October 8, 2010 at 7:48 am
Thanks for all the interest guys I appreciate it but the SSD was only on temporary loan so we thought we would a have a look to see what it bought us while we had the chance. We were not suffering performance problems just wanted to try this out.
I seem to have monopolised the OPS thread, sorry about that.
the server is a pretty meaty BL460c with 48GB RAM in total and 2 X 6 core. 64bit throughout. SQL is capped to 38GB and uses all of that. Only SSRS also on server. Lock pages in memory not set but I will be trying that.
Only pressure we have seen is some paging so will be increasing memory allocation, eventually sticking 64GB in the production box. IO wait types don't even register in the top ten. the SAN is already shared by other servers.
I don't think we have missed anything although like you the results were not what I was expecting. I just don't think this set up is stressed by the load we are putting through it and if we want any improvements it would have to come from the coding\ETL process design.
Probably shows if the disks are not your problem (specifically not waiting and good throughput in terms of sec per read and write) SSD might not be a silver bullet. Having said that when costs come down they would still be a no-brainer and one day they will be the only type you get. 🙂
---------------------------------------------------------------------
October 8, 2010 at 8:21 am
I would definitely make sure the "lock pages in memory" right is granted. That is a strongly recommended best practice for 64-bit SQL Server.
It does sound like you might be able to gain some efficiencies by revisiting the architecture of your ETL processes. It's been my experience that there are lots of little things that can make a big difference. While I'm sure you're aware of all or most of these things, sometimes these things can slip through the cracks.
1. Make sure the target database is in simple or bulk-logged recovery mode to enable minimally logged transactions.
2. Try to commit with some frequency during large table loads. If you try to load ten million rows without a commit it can cause interesting things to happen particularly if you have to rollback after 9,999,999 rows. I usually try to commit every 50,000 to 100,000. It also allows you to make the process restartable.
3. As I mentioned earlier, drop any indexes on tables being loaded and add them back in after the load. Pre-sorting your data prior to load can be advantageous as well.
4. Make sure your log file is pre-allocated to a size that can handle the entire load. If the log has to keep extending it will kill performance but it won't be clearly visible.
5. Obviously, you know about isolating TempDB but if you haven't already you might consider creating multiple TempDB data files. Some say you should have one file per cpu as a rule of thumb others say that one for every two cpu's is adequate. I believe Paul Randal blogged about this a year or two ago. It really depends on your load.
6. Running load jobs in parallel can be really efficient particularly if the tables are in separate filegroups and the filegroups are on separate storage devices.
Obviously, this is not a comprehensive list and factors such as foreign key constraints, the complexity of the transformations, and the quality of the data architecture can all play a role. This is just a short list of some of the things I look at first when I'm on ETL performance consulting engagements.
"Beliefs" get in the way of learning.
October 8, 2010 at 9:23 am
Robert, thanks, we have ticks in most of those boxes some I am not sure as I have not seen the code.
As for the lock pages in memory I use this as my bible:
This refers to 'possible' improvements with 64bit so this being our first large scale 64bit server I started without it as its not required to actually access the memory, but I will do a comparison run.
---------------------------------------------------------------------
Viewing 10 posts - 16 through 24 (of 24 total)
You must be logged in to reply to this topic. Login to reply