First thing, there are no bad operators, just bad parents, uh, I mean query writers, or database designers, or ORM tools. Why do I say this? Because all the operators within a query execution plan serve a purpose. They are there to fulfill a task. Depending on where and when you see them, they’re doing exactly what you ask of them. The issues come up because you’re asking them to do a task that they may not be well suited for. This comes from inappropriate structures and inappropriate code. Lazy spools are not really bad (that was just link bait). In fact, depending on the query, what’s being done, how you’re retrieving data, what data is being retrieved, the lazy spool is actually awesome. But, it’s good to know what these things are doing because, like magic, the lazy spool comes with a price.
The spool operators within SQL Server are indications of the need to store data for reuse. That’s it. A spool is just gathering up information, putting it into a temp table so that it can use that data again. Key word and trick phrase in that last sentence, temp table. Again, depending on the needs of the query, temporary storage is a great solution. BUT, you are creating more stuff into tempdb, additional overhead to your disks, all that’s associated with this. So if your system is already under load in this area, depending on the query in question, you may need to adjust in order to eliminate the spool. The use of tempdb is the price you pay for the lazy spool.
A lazy spool is a non-blocking operator. That means that it doesn’t stop the flow of data while it gets loaded. Instead, it acts more or less as a pass through, letting data flow through it as needed by the downstream operations. That makes a lot sense when you consider that the lazy spool is loaded in a “lazy” fashion, or, as requested. The eager spool for example stops all operations until it completes loading it’s temporary object, loading in an “eager” fashion. Once data is loaded into the lazy spool, it can be used by other operators within the plan.
Let’s see this in action. I’m going to call a recursive CTE query from AdventureWorks2012
EXEC dbo.uspGetEmployeeManagers @BusinessEntityID = 9
This query results in the following execution plan (click on it to expand it).
Reading plans that involve spools, is where you really see the need to understand both the logical and physical processing order within an plan. The reason I say that is that if you just read this from right to left, the physical order, you’re likely to think that the spool is being loaded all the way over there on the right. Instead, it’s being loaded by Node ID 5, which I show below.
This shows how the logical processing order, left to right, like reading a book, has to be taken into account to deal with query plans. Now, in this case, the spool is how the optimizer is dealing with the recursive nature of the query. It loads data into a spool operator in a lazy fashion and that data is there when needed later within the operations of the query. In short, the lazy spool in this case is doing a good thing and helping you out.
So, why the lazy spool is bad? Well, it isn’t. It’s just not free.
For more on spools, you can read an older post of mine here, excellent information from Fabiano Amorim here, and an example of forcing a spool to improve performance by Paul White here.
Want to talk more query tuning? Let’s get together in Dallas on November 1st, 2013 where I’ll be doing an all day pre-conference seminar before SQL Saturday 255.