Guided Search Method

  • I'm looking for an efficient method of creating grouped totals as displayed on the left side of this page:

    http://www.vast.com/cars/used-for-sale-Ford

    Here is another example:

    http://www.newegg.com/Store/SubCategory.aspx?SubCategory=14&name=Internal-Hard-Drives

    Currently, I'm using a temp table to store my initial results in and then re-query for all the various "reports" I want on the left side. However, under heavy traffic loads, I run into deadlocks. Is there a better method to performing this type of reporting after a database search? I need something that is lightning fast like the two sites above.

  • I don't see anything obviously wrong with your current method, however, the details that you have given us are very sketchy. Please read this article on how to get better results from these forums and then post your table definitins and the code that you are currently using: http://www.sqlservercentral.com/articles/Best+Practices/61537/

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • I just want to add, that there is no obvious reason why you should be getting deadlocks with this approach. This leads me to believe that there is more to this and thus, we need more information.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • I've written up some sample code to illustrate what I have so far. It would seem there is a lot of overhead with inserting the records into the temp table to re-query. Although this sample is small, consider an initial query that could return 500,000 to 1,000,000 records to be inserted into a temp table. My initial theory was that it would be faster to re-query the temp table versus the main table because the average search would return a much lower number of records than the main table contained. In this scenario below, the CarData temp table represents the main table I would query from.

    if object_id('tempdb.dbo.#CarData') IS NOT NULL

    drop table [#CarData]

    --Create temp table to hold initial results

    CREATE TABLE #CarData

    (

    ID INT,

    [Year] smallint,

    Make varchar(50),

    Model varchar(50),

    Price INT,

    Miles INT

    )

    INSERT INTO #CarData VALUES (37161, 1990, 'Oldsmobile', 'Cutlass Supreme', 10000, 10000)

    INSERT INTO #CarData VALUES (37166,2004,'Nissan','Maxima',22995,41330)

    INSERT INTO #CarData VALUES (37167,2003,'Pontiac','Vibe',13595,27855)

    INSERT INTO #CarData VALUES (37168,2007,'Nissan','Altima',22745,1)

    INSERT INTO #CarData VALUES (37170,2007,'Nissan','Altima',22745,1)

    INSERT INTO #CarData VALUES (37179,2007,'Cadillac','DTS',48005,1)

    INSERT INTO #CarData VALUES (37188,2007,'Nissan','Altima',26085,1)

    INSERT INTO #CarData VALUES (37194,2007,'Cadillac','DTS',46440,1)

    INSERT INTO #CarData VALUES (37196,2006,'Hyundai','Sonata',15995,22885)

    INSERT INTO #CarData VALUES (37197,2006,'Chevrolet','Malibu',14595,30952)

    INSERT INTO #CarData VALUES (37200,2006,'Saturn','Ion',14995,32207)

    INSERT INTO #CarData VALUES (37201,2006,'Chevrolet','Malibu',13995,30201)

    INSERT INTO #CarData VALUES (37202,2006,'Pontiac','Grand Prix',17995,29625)

    INSERT INTO #CarData VALUES (37203,2006,'Pontiac','Grand Prix',14995,28058)

    INSERT INTO #CarData VALUES (37204,2006,'Chevrolet','Impala',15595,27580)

    INSERT INTO #CarData VALUES (37205,2007,'Nissan','Maxima',33475,1)

    INSERT INTO #CarData VALUES (37214,2002,'Ford','Focus',9995,68191)

    INSERT INTO #CarData VALUES (37216,2007,'Pontiac','Grand Prix',22580,1)

    INSERT INTO #CarData VALUES (37217,2007,'Nissan','Versa',16765,1)

    INSERT INTO #CarData VALUES (37218,2007,'Nissan','Versa',16765,1)

    INSERT INTO #CarData VALUES (37219,2007,'Nissan','Versa',16765,1)

    --Declare & set testing variables.

    DECLARE @iStartYear INT,

    @iEndYear INT,

    @iMilesLow INT,

    @iMilesHigh INT,

    @iPriceRangeLow INT,

    @iPriceRangeHigh INT

    SET @iStartYear = 1955

    SET @iEndYear = 2009

    SET @iMilesLow = 1

    SET @iMilesHigh = 100000

    SET @iPriceRangeLow = 1

    SET @iPriceRangeHigh = 100000

    --Store Variables Declaration

    DECLARE @strParams nvarchar(4000)

    SET @strParams = '@iStartYear INT, @iEndYear INT, @iMilesLow INT, @iMilesHigh INT, @iPriceRangeLow INT, @iPriceRangeHigh INT'

    --Store SQL Command

    DECLARE @strSQL nvarchar(4000)

    --Wipe our temporary holding table.

    if object_id('tempdb.dbo.#TempData') IS NOT NULL

    drop table [#TempData]

    --Create temp table to hold initial results

    CREATE TABLE #TempData

    (

    [AutoNum] [int] IDENTITY(1, 1)

    NOT NULL,

    [ID] [int],

    [Year] [smallint],

    [Make] [varchar](50),

    [Model] [varchar](50),

    [Price] [int],

    [Miles] [int]

    )

    --Create our query with conditional WHERE clauses.

    SET @strSQL = '

    INSERT INTO #TempData (ID,[Year],Make,Model,Price,Miles)

    SELECT ID, [Year], Make, Model, Price, Miles

    FROM #CarData

    WHERE ID > 1'

    IF @iStartYear > 0

    SET @strSQL = @strSQL + ' AND [Year] >= @iStartYear'

    IF @iEndYear > 0

    SET @strSQL = @strSQL + ' AND [Year] <= @iEndYear'

    IF @iPriceRangeLow > 0

    SET @strSQL = @strSQL + ' AND [Price] >= @iPriceRangeLow'

    IF @iPriceRangeHigh > 0

    SET @strSQL = @strSQL + ' AND [Price] <= @iPriceRangeHigh'

    IF @iMilesLow > 0

    SET @strSQL = @strSQL + ' AND [Miles] >= @iMilesLow'

    IF @iMilesHigh > 0

    SET @strSQL = @strSQL + ' AND [Miles] <= @iMilesHigh'

    --SELECT @strSQL

    --EXECUTE QUERY

    EXEC sp_executeSQL @strSQL, @strParams, @iStartYear, @iEndYear, @iMilesLow, @iMilesHigh, @iPriceRangeLow, @iPriceRangeHigh

    --Query to return search results

    SELECT ID,

    [Year],

    Make,

    Model,

    Price,

    Miles

    FROM #TempData

    ORDER BY AutoNum

    --Guided Search Results

    SELECT CASE WHEN Price BETWEEN 0 AND 5000 THEN 'Under $5K'

    WHEN Price BETWEEN 5000 AND 9999 THEN '$5K to $10K'

    WHEN Price BETWEEN 10000 and 14999 THEN '$10K to $15k'

    WHEN Price BETWEEN 15000 and 19999 THEN '$15K to $20K'

    WHEN Price BETWEEN 20000 and 24999 THEN '$20K to $25K'

    WHEN Price BETWEEN 25000 and 29999 THEN '$25K to $30K'

    WHEN Price BETWEEN 30000 and 39999 THEN '$30K to $40K'

    WHEN Price BETWEEN 40000 and 49999 THEN '$40K to $50K'

    WHEN Price BETWEEN 50000 and 74999 THEN '$50K to $75K'

    WHEN Price BETWEEN 75000 and 99999 THEN '$75K to $100K'

    WHEN Price >= 100000 THEN 'Over $100K'

    END AS PriceRange,

    COUNT(1) AS PriceCount

    FROM #TempData

    WHERE Price IS NOT NULL

    GROUP BY CASE WHEN Price BETWEEN 0 AND 5000 THEN 'Under $5K'

    WHEN Price BETWEEN 5000 AND 9999 THEN '$5K to $10K'

    WHEN Price BETWEEN 10000 and 14999 THEN '$10K to $15k'

    WHEN Price BETWEEN 15000 and 19999 THEN '$15K to $20K'

    WHEN Price BETWEEN 20000 and 24999 THEN '$20K to $25K'

    WHEN Price BETWEEN 25000 and 29999 THEN '$25K to $30K'

    WHEN Price BETWEEN 30000 and 39999 THEN '$30K to $40K'

    WHEN Price BETWEEN 40000 and 49999 THEN '$40K to $50K'

    WHEN Price BETWEEN 50000 and 74999 THEN '$50K to $75K'

    WHEN Price BETWEEN 75000 and 99999 THEN '$75K to $100K'

    WHEN Price >= 100000 THEN 'Over $100K'

    END

    ORDER BY PriceRange

    --Guided Search **** Year Counts *****

    SELECT [year],

    COUNT([Year]) AS YearCount

    FROM #TempData

    GROUP BY [Year]

    ORDER BY [Year] DESC

  • Thanks. I need to check a couple of things:

    1) I don't see any transactions here. Are there any transactions in the actual code that is getting the deadlocks?

    2) Do you know which statement is executing when you get the deadlock?

    3) Any keys or indexes on #CarData?

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • I'll address each of your questions respectively:

    1. After further review, it doesn't appear the deadlocks are coming from this process, but rather from some other ones. This process is the victim and not the cause.

    2. Not applicable, since we addressed this in #1.

    3. I've tried various indexes, but with little effect. Since the WHERE clause is conditional, you never know what the user submits, so each index could be applied to different fields. If you apply indexes across multiple fields, the number of permutations become very large. If you apply multiple indexes, you get a serious performance hit when inserting the data into the temp table. In the end, the net result is negligible.

    Keep in mind that this is a small data set and small table design example. A real world search would have triple the number of conditions in the WHERE clause and 4 times the number of fields in the temp table.

  • awinter (12/1/2008)


    1. After further review, it doesn't appear the deadlocks are coming from this process, but rather from some other ones. This process is the victim and not the cause.

    Well a process cannot be a Deadlock Victim unless it was a participant in (and thus, partly the cause of) the deadlock.

    Do you perhaps mean that you are getting Blocking and this query is slow or timing out? Or that you are getting deadlocks and this procedure is slow or timing out at that same time?

    In either event, if this procedure is not the cause, then I am a little uncertain what problem we are trying to fix with it.

    3. I've tried various indexes, but with little effect. Since the WHERE clause is conditional, you never know what the user submits, so each index could be applied to different fields. If you apply indexes across multiple fields, the number of permutations become very large. If you apply multiple indexes, you get a serious performance hit when inserting the data into the temp table. In the end, the net result is negligible.

    I understand that, but I still need to know what the primary key is, any foreign keys and what the Clustered Index, if any, is.

    Keep in mind that this is a small data set and small table design example. A real world search would have triple the number of conditions in the WHERE clause and 4 times the number of fields in the temp table.

    We understand this too. Many of us have implemented these also.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Just an idea... are you able to conditionnaly choose which index to build depending on what conditions are set to be used in the query?

    Here's the idea...

    create index on PK and usefull FK, then build a compound index to satisfy the where clause, this means that only 1 index will be created for the query, and assuming that the search is sargable, then you should help performance there quite a bit.

    I never did that so let me know how it goes.

  • RBarryYoung:

    I see your points on deadlocking. We'll leave that part out of the discussion for the sake of simplicity. My main objective was to see if there's a better way of doing a Guided Search. You mentioned that other people have implemented a Guided Search as well. Do you know if they have done it in the same manner as I am proposing? Have they used a 3rd language like .net or ColdFusion to produce the grouped results on the page, or do they all use SQL Server to generate those totals?

    Ninja's_RGR'us:

    That's an interesting concept. It would take nearly the same amount of programming and testing as the rest of the process combined, but it may produce the read performance needed to have efficient results while maintaining a low overhead of indexing since it's specific to each search. SQL 2005's Database Engine Tuning Advisor doesn't work with temp tables, does it?

    Thanks again you guys for your ideas. It's really appreciated.

  • I don't really see how much testing this would require...

    SET @CI = 'CREATE NONCLUSTERED INDEX #IX_TmpTable_spName ON #Temp ('

    if @parm 1 IS NOT NULL or @Param2 is not null

    begin

    SET @CI = @CI + 'ColumnA, '

    end

    ....

    trim the last 2 characters an add parenthesis

    RUN createindex CMD

    The real tricky part is that to run dynamic sql here requires a global temp table... or requires a fairly complex if else if else sequence in a sub procedure to build the index. That's where the whole problem lies.

    Another trick here is to startchecking for the best candidates and make sure those are put at the most left of the columns list. It's pretty easy to do once you know the data.

    Keep in mind that the sp will recompile after the index cmd... so make sure you do that first before loading the table.

  • awinter (12/1/2008)


    RBarryYoung:

    I see your points on deadlocking. We'll leave that part out of the discussion for the sake of simplicity. My main objective was to see if there's a better way of doing a Guided Search. You mentioned that other people have implemented a Guided Search as well. Do you know if they have done it in the same manner as I am proposing? Have they used a 3rd language like .net or ColdFusion to produce the grouped results on the page, or do they all use SQL Server to generate those totals?

    Let me clarify a few things: What I was saying is that there are a lot of folks here (including me) who have implemented the kind of search procedure where users can select search values for any combination of a large number of columns. Your "Guided Search" is that plus a lot more. Though I am sure that there are folks here who have done Guided Searches also, I am not one of them. Nonetheless, I do not think that that is a barrier, except that I cannot answer your question about "what does everyone else do?". However, I am a believer in the principal of convergent engineering, so I am sure that we can help.

    Determining a better/best approach in SQL Server (or any database) has a lot to do with the production environment, the data design, the schema context, the expected load and the execution intensity and data sizes. Unlike, OO and 3GL development, these things tend to be a lot more interdependent and less separable on the database. And leaving aside the deadlocking/blocking issue is likely to leave off a big part of the picture.

    So, if you could fill some of this information, it would be helpful: like How many rows are in the source table (#CarData) and how much space is it occupying? Is changing this table an option? How many transactions/min do you get or are you expecting, and how frequently is this specific procedure executed (during peak hours for both)?

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • OK, I have looked through your procedure a few times and I do not see anything wrong, but it probably does not scale very well.

    I would say that there are 3 obvious approaches:

    1) Filtered dump to a temp table & requery: (this is the approach that you are using) simple, compared to the other two, and relatively easy to implement. Should be OK for low to medium table sizes and request rates.

    2) Summary table with periodic re-population: Write a job that uses the GROUP BY...WITH CUBE clause to build an up-to-date dimensional statistical summary table (i.e., a Cube) every 5 to 10 seconds. Use 2 (or 3) different output tables and use alias's to swap/rotate them. Then the users' simply build their individualized summary statistics from this cube table using the GROUPING columns and GROUP BY. This should be good for high request volumes, but may have trouble with either large row counts in the source table or a very high number of cube "cells".

    3) Summary Table with live/continuous statistical updating: Create a Cube table in the same form as above, and populate it do periods when no updates are going on. Then have every modification to the source table also update the summary statistics in this table (by changing the write procedures or through triggers). If this slows down the writes, or causes deadlocks, you may need to de-synch it by using Service Broker. You may need completely to re-populate the statistics occasionally to insure that it stays reasonably accurate. This should be good for large row counts in the source table or even very high numbers of Cube "cells", may may have problems with high request volumes and particularly with high write volumes.

    For both large rowcounts and high request volumes, you may need some hybrid approach, and you will certainly end up throwing lots of HW and developer resources at either (2) or (3) to realize their full potential.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Ummm... wouldn't an index view work just fine here?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I don't think that an Indexed View can be based on GROUP BY, can it?

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • I believe indexed views require SQL Server Enterprise, but we're only running standard. In addition, do any of your other recommendations require Enterprise to implement?

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply