I have a web app that recently timing out and the timing out exception is actually coming from the SQL Server. When I run the execution plan I found that there is one item called Table Spool (Lazy Spool) which is costing about 20%. I thought it was caused by my full text search but when I drilled down further more is because of DISTINCT keyword. So I decided to change to use GROUP BY instead. In my case it is only a single column so It won’t make any difference at all. Once I’ve changed that my web application running fast and no more timeout
I got this explanation from this website
3) We do an all-AMPs RETRIEVE step from … by way of an
all-rows scan with no residual conditions into Spool x
(group_amps), which is redistributed by hash code to all AMPs.
Then we do a SORT to order Spool 1 by the sort key in spool field1
eliminating duplicate rows.
First there’s a redistribution, then duplicate rows are removed:
Efficient, if there are just a few rows per value [per AMP].
Spool size is always about the same, but may be extremely skewed → 2646: No more Spool Space
Explain Group By:
3) We do an all-AMPs SUM step to aggregate from … by way
of an all-rows scan with no residual conditions, and the grouping
identifier in field 1025. Aggregate Intermediate Results are
computed globally, then placed in Spool x.
First each AMP removes duplicate rows locally (first aggregate) and hashes/redistributes the resulting
rows, then there’s a second aggregation to remove duplicate rows:
Efficient, if there are lots of rows per value [per AMP].
Large number of rows per value Spool → small spool size
Small number of rows per value Spool → large spool size
Spool is never skewed.
Other interesting fact quoted from this article/discussion
INDEXING: Take a look at your indices to make sure that they’re all covering the columns that you’re selecting out of the tables. You’ll want to aim to get all the columns included in JOINs and WHERE clauses within the indices. All other columns that are in the SELECT statements should be INCLUDEd, or covered, by the index.
OPERATORS: See if you can get rid of the not equals (“<>”) operators, in favor of a single greater than or less than operator. Can this statement
and T.CurrentHorizon <> 0 be changed to this
and T.CurrentHorizon > 0?
JOINS: Get rid of the subqueries that are JOINing to tables outside of themselves. For instance, this line
and FV2.elementId = FV.elementID might be causing some problems. There’s no reason you can’t move that out of a subquery and into a JOIN to
dbo.aowCollectedFact FV, given that you’re GROUPing (DISTINCT) in the main query already.
DISTINCT: Change it to a GROUP BY. I’ve got no reason other than, because it’s good practice and takes two minutes.
LAST NOTE: The exception to all the above might be to leave the final subquery, the
IF NOT EXISTS, as a subquery. If you change it to a JOIN, it’ll have to be a
LEFT JOIN...WHERE NULL statement, which can actually cause spooling operations. No great way to get around that one.