Need help with strategy / architecture for searching large dataset

Question

Need help with strategy / architecture for searching large dataset

tdemille-1102706

SSC Enthusiast

Points: 135
More actions
May 30, 2009 at 12:13 pm

#384685

I'm an application engineer and while I'm comfortable in SQL Server I'm certainly not a SQL MVP. We have a few very large datasets and I'm trying to design a good search architecture. We have processes that push the data in from application servers and I can live with slow writes as reading the data during a search is our important criteria. I'm not worried about disk space, I'll duplicate the data as much as necessary.
In short, we are trying to search millions (and growing) of rows of MLS (real estate Multiple Listing Service) data. We allow user's of our web site to search on 40 different criteria (roughly), for instance:
"between 200 and 250K, 2 bedrooms, 1+ bathrooms, in a certain area, with pool, golf, etc. etc. etc... "
They can also add a key word to the search (which is currently done using like and is very slow).
The search is currently accomplished by using an SP to build dynamic sql and then executing it. This can involve multiple joins as well.
I have been considering storing the MLS data as a JSON object in a
ntext, blob or nvarchar type field and building a seperate table or
set of tables / indexes for the search with an fkey back to the MLS
table (that just has a property ID and a JSON object).
One data warehouse expert friend of mine suggested using an attribute
table approach. Each column now in the MLS table would be an
attributetypeID in a table that looked something like
propertyID AttributeTypeID Value
He had suggested this would be super fast way to search.
I also have questions about possibly leveraging full text search to do
this. We could come up with a nonsense keyword for each MLS column,
for instance a property with 3 bedrooms, we could use keyword 3B3B.
We could then create a table to full text index like this
propertyID WordList
=================
where wordlist contained a list of our nonsense words...
someone searching for all properties with 3 beds and two baths the
query would be somethinng like
select propertyID where contains ('3B3B' and '2BA2BA')
So those are just two ideas, but I'm open to anything. I'm also open to an expert in the field contacting me privately about a short consulting gig.

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply

PaulB-TheOneAndOnly SSC Guru Points: 66772 More actions · Answer 1

First the good news...

Described scenario is a text book case of a single star-schema datamart with a single FACT table in the center surrounded by DIMension tables.

Now the not so good news...

In my experience the best available technology to deal with this scenario is... Oracle bitmap indexes.

Last but not least...

You may want to explore star-schema concept as described by Kimball -no reason not to implement it on SQL Server.

_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.