November 30, 2008 at 4:03 pm
I'm looking for an efficient method of creating grouped totals as displayed on the left side of this page:
http://www.vast.com/cars/used-for-sale-Ford
Here is another example:
http://www.newegg.com/Store/SubCategory.aspx?SubCategory=14&name=Internal-Hard-Drives
Currently, I'm using a temp table to store my initial results in and then re-query for all the various "reports" I want on the left side. However, under heavy traffic loads, I run into deadlocks. Is there a better method to performing this type of reporting after a database search? I need something that is lightning fast like the two sites above.
November 30, 2008 at 5:25 pm
I don't see anything obviously wrong with your current method, however, the details that you have given us are very sketchy. Please read this article on how to get better results from these forums and then post your table definitins and the code that you are currently using: http://www.sqlservercentral.com/articles/Best+Practices/61537/
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
November 30, 2008 at 5:27 pm
I just want to add, that there is no obvious reason why you should be getting deadlocks with this approach. This leads me to believe that there is more to this and thus, we need more information.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
November 30, 2008 at 8:47 pm
I've written up some sample code to illustrate what I have so far. It would seem there is a lot of overhead with inserting the records into the temp table to re-query. Although this sample is small, consider an initial query that could return 500,000 to 1,000,000 records to be inserted into a temp table. My initial theory was that it would be faster to re-query the temp table versus the main table because the average search would return a much lower number of records than the main table contained. In this scenario below, the CarData temp table represents the main table I would query from.
if object_id('tempdb.dbo.#CarData') IS NOT NULL
drop table [#CarData]
--Create temp table to hold initial results
CREATE TABLE #CarData
(
ID INT,
[Year] smallint,
Make varchar(50),
Model varchar(50),
Price INT,
Miles INT
)
INSERT INTO #CarData VALUES (37161, 1990, 'Oldsmobile', 'Cutlass Supreme', 10000, 10000)
INSERT INTO #CarData VALUES (37166,2004,'Nissan','Maxima',22995,41330)
INSERT INTO #CarData VALUES (37167,2003,'Pontiac','Vibe',13595,27855)
INSERT INTO #CarData VALUES (37168,2007,'Nissan','Altima',22745,1)
INSERT INTO #CarData VALUES (37170,2007,'Nissan','Altima',22745,1)
INSERT INTO #CarData VALUES (37179,2007,'Cadillac','DTS',48005,1)
INSERT INTO #CarData VALUES (37188,2007,'Nissan','Altima',26085,1)
INSERT INTO #CarData VALUES (37194,2007,'Cadillac','DTS',46440,1)
INSERT INTO #CarData VALUES (37196,2006,'Hyundai','Sonata',15995,22885)
INSERT INTO #CarData VALUES (37197,2006,'Chevrolet','Malibu',14595,30952)
INSERT INTO #CarData VALUES (37200,2006,'Saturn','Ion',14995,32207)
INSERT INTO #CarData VALUES (37201,2006,'Chevrolet','Malibu',13995,30201)
INSERT INTO #CarData VALUES (37202,2006,'Pontiac','Grand Prix',17995,29625)
INSERT INTO #CarData VALUES (37203,2006,'Pontiac','Grand Prix',14995,28058)
INSERT INTO #CarData VALUES (37204,2006,'Chevrolet','Impala',15595,27580)
INSERT INTO #CarData VALUES (37205,2007,'Nissan','Maxima',33475,1)
INSERT INTO #CarData VALUES (37214,2002,'Ford','Focus',9995,68191)
INSERT INTO #CarData VALUES (37216,2007,'Pontiac','Grand Prix',22580,1)
INSERT INTO #CarData VALUES (37217,2007,'Nissan','Versa',16765,1)
INSERT INTO #CarData VALUES (37218,2007,'Nissan','Versa',16765,1)
INSERT INTO #CarData VALUES (37219,2007,'Nissan','Versa',16765,1)
--Declare & set testing variables.
DECLARE @iStartYear INT,
@iEndYear INT,
@iMilesLow INT,
@iMilesHigh INT,
@iPriceRangeLow INT,
@iPriceRangeHigh INT
SET @iStartYear = 1955
SET @iEndYear = 2009
SET @iMilesLow = 1
SET @iMilesHigh = 100000
SET @iPriceRangeLow = 1
SET @iPriceRangeHigh = 100000
--Store Variables Declaration
DECLARE @strParams nvarchar(4000)
SET @strParams = '@iStartYear INT, @iEndYear INT, @iMilesLow INT, @iMilesHigh INT, @iPriceRangeLow INT, @iPriceRangeHigh INT'
--Store SQL Command
DECLARE @strSQL nvarchar(4000)
--Wipe our temporary holding table.
if object_id('tempdb.dbo.#TempData') IS NOT NULL
drop table [#TempData]
--Create temp table to hold initial results
CREATE TABLE #TempData
(
[AutoNum] [int] IDENTITY(1, 1)
NOT NULL,
[ID] [int],
[Year] [smallint],
[Make] [varchar](50),
[Model] [varchar](50),
[Price] [int],
[Miles] [int]
)
--Create our query with conditional WHERE clauses.
SET @strSQL = '
INSERT INTO #TempData (ID,[Year],Make,Model,Price,Miles)
SELECT ID, [Year], Make, Model, Price, Miles
FROM #CarData
WHERE ID > 1'
IF @iStartYear > 0
SET @strSQL = @strSQL + ' AND [Year] >= @iStartYear'
IF @iEndYear > 0
SET @strSQL = @strSQL + ' AND [Year] <= @iEndYear'
IF @iPriceRangeLow > 0
SET @strSQL = @strSQL + ' AND [Price] >= @iPriceRangeLow'
IF @iPriceRangeHigh > 0
SET @strSQL = @strSQL + ' AND [Price] <= @iPriceRangeHigh'
IF @iMilesLow > 0
SET @strSQL = @strSQL + ' AND [Miles] >= @iMilesLow'
IF @iMilesHigh > 0
SET @strSQL = @strSQL + ' AND [Miles] <= @iMilesHigh'
--SELECT @strSQL
--EXECUTE QUERY
EXEC sp_executeSQL @strSQL, @strParams, @iStartYear, @iEndYear, @iMilesLow, @iMilesHigh, @iPriceRangeLow, @iPriceRangeHigh
--Query to return search results
SELECT ID,
[Year],
Make,
Model,
Price,
Miles
FROM #TempData
ORDER BY AutoNum
--Guided Search Results
SELECT CASE WHEN Price BETWEEN 0 AND 5000 THEN 'Under $5K'
WHEN Price BETWEEN 5000 AND 9999 THEN '$5K to $10K'
WHEN Price BETWEEN 10000 and 14999 THEN '$10K to $15k'
WHEN Price BETWEEN 15000 and 19999 THEN '$15K to $20K'
WHEN Price BETWEEN 20000 and 24999 THEN '$20K to $25K'
WHEN Price BETWEEN 25000 and 29999 THEN '$25K to $30K'
WHEN Price BETWEEN 30000 and 39999 THEN '$30K to $40K'
WHEN Price BETWEEN 40000 and 49999 THEN '$40K to $50K'
WHEN Price BETWEEN 50000 and 74999 THEN '$50K to $75K'
WHEN Price BETWEEN 75000 and 99999 THEN '$75K to $100K'
WHEN Price >= 100000 THEN 'Over $100K'
END AS PriceRange,
COUNT(1) AS PriceCount
FROM #TempData
WHERE Price IS NOT NULL
GROUP BY CASE WHEN Price BETWEEN 0 AND 5000 THEN 'Under $5K'
WHEN Price BETWEEN 5000 AND 9999 THEN '$5K to $10K'
WHEN Price BETWEEN 10000 and 14999 THEN '$10K to $15k'
WHEN Price BETWEEN 15000 and 19999 THEN '$15K to $20K'
WHEN Price BETWEEN 20000 and 24999 THEN '$20K to $25K'
WHEN Price BETWEEN 25000 and 29999 THEN '$25K to $30K'
WHEN Price BETWEEN 30000 and 39999 THEN '$30K to $40K'
WHEN Price BETWEEN 40000 and 49999 THEN '$40K to $50K'
WHEN Price BETWEEN 50000 and 74999 THEN '$50K to $75K'
WHEN Price BETWEEN 75000 and 99999 THEN '$75K to $100K'
WHEN Price >= 100000 THEN 'Over $100K'
END
ORDER BY PriceRange
--Guided Search **** Year Counts *****
SELECT [year],
COUNT([Year]) AS YearCount
FROM #TempData
GROUP BY [Year]
ORDER BY [Year] DESC
November 30, 2008 at 10:03 pm
Thanks. I need to check a couple of things:
1) I don't see any transactions here. Are there any transactions in the actual code that is getting the deadlocks?
2) Do you know which statement is executing when you get the deadlock?
3) Any keys or indexes on #CarData?
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
December 1, 2008 at 9:32 am
I'll address each of your questions respectively:
1. After further review, it doesn't appear the deadlocks are coming from this process, but rather from some other ones. This process is the victim and not the cause.
2. Not applicable, since we addressed this in #1.
3. I've tried various indexes, but with little effect. Since the WHERE clause is conditional, you never know what the user submits, so each index could be applied to different fields. If you apply indexes across multiple fields, the number of permutations become very large. If you apply multiple indexes, you get a serious performance hit when inserting the data into the temp table. In the end, the net result is negligible.
Keep in mind that this is a small data set and small table design example. A real world search would have triple the number of conditions in the WHERE clause and 4 times the number of fields in the temp table.
December 1, 2008 at 10:00 am
awinter (12/1/2008)
1. After further review, it doesn't appear the deadlocks are coming from this process, but rather from some other ones. This process is the victim and not the cause.
Well a process cannot be a Deadlock Victim unless it was a participant in (and thus, partly the cause of) the deadlock.
Do you perhaps mean that you are getting Blocking and this query is slow or timing out? Or that you are getting deadlocks and this procedure is slow or timing out at that same time?
In either event, if this procedure is not the cause, then I am a little uncertain what problem we are trying to fix with it.
3. I've tried various indexes, but with little effect. Since the WHERE clause is conditional, you never know what the user submits, so each index could be applied to different fields. If you apply indexes across multiple fields, the number of permutations become very large. If you apply multiple indexes, you get a serious performance hit when inserting the data into the temp table. In the end, the net result is negligible.
I understand that, but I still need to know what the primary key is, any foreign keys and what the Clustered Index, if any, is.
Keep in mind that this is a small data set and small table design example. A real world search would have triple the number of conditions in the WHERE clause and 4 times the number of fields in the temp table.
We understand this too. Many of us have implemented these also.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
December 1, 2008 at 10:26 am
Just an idea... are you able to conditionnaly choose which index to build depending on what conditions are set to be used in the query?
Here's the idea...
create index on PK and usefull FK, then build a compound index to satisfy the where clause, this means that only 1 index will be created for the query, and assuming that the search is sargable, then you should help performance there quite a bit.
I never did that so let me know how it goes.
December 1, 2008 at 10:57 am
RBarryYoung:
I see your points on deadlocking. We'll leave that part out of the discussion for the sake of simplicity. My main objective was to see if there's a better way of doing a Guided Search. You mentioned that other people have implemented a Guided Search as well. Do you know if they have done it in the same manner as I am proposing? Have they used a 3rd language like .net or ColdFusion to produce the grouped results on the page, or do they all use SQL Server to generate those totals?
Ninja's_RGR'us:
That's an interesting concept. It would take nearly the same amount of programming and testing as the rest of the process combined, but it may produce the read performance needed to have efficient results while maintaining a low overhead of indexing since it's specific to each search. SQL 2005's Database Engine Tuning Advisor doesn't work with temp tables, does it?
Thanks again you guys for your ideas. It's really appreciated.
December 1, 2008 at 11:12 am
I don't really see how much testing this would require...
SET @CI = 'CREATE NONCLUSTERED INDEX #IX_TmpTable_spName ON #Temp ('
if @parm 1 IS NOT NULL or @Param2 is not null
begin
SET @CI = @CI + 'ColumnA, '
end
....
trim the last 2 characters an add parenthesis
RUN createindex CMD
The real tricky part is that to run dynamic sql here requires a global temp table... or requires a fairly complex if else if else sequence in a sub procedure to build the index. That's where the whole problem lies.
Another trick here is to startchecking for the best candidates and make sure those are put at the most left of the columns list. It's pretty easy to do once you know the data.
Keep in mind that the sp will recompile after the index cmd... so make sure you do that first before loading the table.
December 1, 2008 at 11:21 am
awinter (12/1/2008)
RBarryYoung:I see your points on deadlocking. We'll leave that part out of the discussion for the sake of simplicity. My main objective was to see if there's a better way of doing a Guided Search. You mentioned that other people have implemented a Guided Search as well. Do you know if they have done it in the same manner as I am proposing? Have they used a 3rd language like .net or ColdFusion to produce the grouped results on the page, or do they all use SQL Server to generate those totals?
Let me clarify a few things: What I was saying is that there are a lot of folks here (including me) who have implemented the kind of search procedure where users can select search values for any combination of a large number of columns. Your "Guided Search" is that plus a lot more. Though I am sure that there are folks here who have done Guided Searches also, I am not one of them. Nonetheless, I do not think that that is a barrier, except that I cannot answer your question about "what does everyone else do?". However, I am a believer in the principal of convergent engineering, so I am sure that we can help.
Determining a better/best approach in SQL Server (or any database) has a lot to do with the production environment, the data design, the schema context, the expected load and the execution intensity and data sizes. Unlike, OO and 3GL development, these things tend to be a lot more interdependent and less separable on the database. And leaving aside the deadlocking/blocking issue is likely to leave off a big part of the picture.
So, if you could fill some of this information, it would be helpful: like How many rows are in the source table (#CarData) and how much space is it occupying? Is changing this table an option? How many transactions/min do you get or are you expecting, and how frequently is this specific procedure executed (during peak hours for both)?
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
December 1, 2008 at 3:53 pm
OK, I have looked through your procedure a few times and I do not see anything wrong, but it probably does not scale very well.
I would say that there are 3 obvious approaches:
1) Filtered dump to a temp table & requery: (this is the approach that you are using) simple, compared to the other two, and relatively easy to implement. Should be OK for low to medium table sizes and request rates.
2) Summary table with periodic re-population: Write a job that uses the GROUP BY...WITH CUBE clause to build an up-to-date dimensional statistical summary table (i.e., a Cube) every 5 to 10 seconds. Use 2 (or 3) different output tables and use alias's to swap/rotate them. Then the users' simply build their individualized summary statistics from this cube table using the GROUPING columns and GROUP BY. This should be good for high request volumes, but may have trouble with either large row counts in the source table or a very high number of cube "cells".
3) Summary Table with live/continuous statistical updating: Create a Cube table in the same form as above, and populate it do periods when no updates are going on. Then have every modification to the source table also update the summary statistics in this table (by changing the write procedures or through triggers). If this slows down the writes, or causes deadlocks, you may need to de-synch it by using Service Broker. You may need completely to re-populate the statistics occasionally to insure that it stays reasonably accurate. This should be good for large row counts in the source table or even very high numbers of Cube "cells", may may have problems with high request volumes and particularly with high write volumes.
For both large rowcounts and high request volumes, you may need some hybrid approach, and you will certainly end up throwing lots of HW and developer resources at either (2) or (3) to realize their full potential.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
December 1, 2008 at 6:07 pm
Ummm... wouldn't an index view work just fine here?
--Jeff Moden
Change is inevitable... Change for the better is not.
December 1, 2008 at 8:08 pm
I don't think that an Indexed View can be based on GROUP BY, can it?
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
December 1, 2008 at 9:07 pm
I believe indexed views require SQL Server Enterprise, but we're only running standard. In addition, do any of your other recommendations require Enterprise to implement?
Viewing 15 posts - 1 through 15 (of 22 total)
You must be logged in to reply to this topic. Login to reply