Need help developing quick way to search millions of records...

  • Yes, I did say millions...I am trying to develop web services which will allow users access to search a financial archive database with millions upon millions of records. Clustered indexed views can be applied but mostly on a table level and a first view level only. The problem is that to assemble a master view, it will require compiling many sub views, and/or tables, or having multiple nested select statements. Making matters worse, I usually end up using either a top, min, or some method that will not allow me to index secondary or above views. The search criteria can be at least 1 up to 10 different variables, some exact, some using "contains" or "like" for matching purposes.

    Unfortunately, this DB was built for archiving, not for searching...

    Another possibility would be to try to search each individual table in a stored procedure, and return composite PKeys by which you search another table, on down the line....

    I was contemplating having a batch run that writes a table holding all the initial search criteria, then applying indexes to that table.

    Any suggestions? Please...?

  • Robert -

    Wow!  Talk about an open ended question.  In order to provide any help at all we would need a lot more information about the structure of your database and the requirements for your "search".  There may be an answer but without a lot more information I wouldn't even begin to propose a solution.

    In your post you referred to using both "contains" and "like" as possible search criteria which at least tells us that a full-text index(es) is part of the mix but without knowing a lot more about your database we're going to be shooting in the dark.

    Joe

  • I've had an idea for this sort of situation that I've never implemented, but I think should work well: turn all searches into text-searches (CONTAINS). Create keywords for range queries. For example, if you have a money field you are searching, and your user enters a range of $100 to $2000, your query might turn this into a CONTAINS search with various keywords that represent the range, eg: (Money100To200 OR Money200To500 OR Money500To1000 OR Money 1000TO2000). For more precise filtering, you could then add regular SQL clauses ... AND Money BETWEEN @LowerRange AND @UpperRange ...

    You would need to add a field to your table for the keywords generated, and populate it (initial pass, triggers to maintain it, etc).

  • Aaron, Many about 8 years ago, we had a similar problem, dealing with millions of rows and couldn't get the queries to get under a second no matter how we covered them so we finally made an OLAP cube that took an hour to build, but the queries were in milliseconds (non-cached)! This was a big eye opener to us. Because the cube took an hour to build, we used two cubes and switched back and forth between them, while one was building the other was being used in an Internet app. I don't remember much more than this, but OLAP is awesome for really fast queries. Of course the queries were canned, but with more advanced use of OLAP, they don't have to be. Hope this helps.

  • Most everything you need to know and more is here:  http://www.sommarskog.se/dyn-search.html

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply