January 6, 2010 at 10:46 am
Hi,
Thanks for your help in advance, being snowed in has made this problem more annoying.
Basically I would search for keywords associated to a book abstract, eg title, author, country (lots of other fields of info) etc
To do this I was going to create a free text index on a view which is a conglomeration of about 20 tables linked by left outer joins (because a book may or may not be associated with a country or countries). When I do this however the view takes over 1 hour to create and is about a 100 rows large
the sql looks like so:SELECT ABST.ABSTRACT_ID, ABST.USER_ABSTRACT_ID, dbo.R_REF_TYPE.REF_TYPE, ABST.AUTHORS, ABST.EDITORS,
ABST.ARTCHAP_TITLE, ABST.BOOKPROC_TITLE, dbo.R_PUBLISHER.PUBLISHER, dbo.R_JOURNAL.JOURNAL, ABST.YEAR,
ABSTOUTLIT.OUTCROP_LITH_LOG, ABSTPALCON.PALAEORECONSTRUCTION, ABSTWELL.LOG_NAME, ABSTISO.ISOPACH,
ABSTPALGEO.PALAEOGEOGRAPHIC, ABSTLIT.LITHOSTRATIGRAPHY, ABSTFLD.FIELD, ABSTBAS.BASIN, ABST.ABSTRACT_ID AS Expr1,
dbo.R_REGION.REGION, CTRYCODE.COUNTRY, TECH.TECHNIQUE, USGSBAS.PROVINCE_NAME, dbo.R_PERIOD.PERIOD_NAME, dbo.R_ERA.ERA,
dbo.R_STAGE.STAGE, dbo.R_EPOCH.EPOCH
FROM dbo.ABSTRACT AS ABST LEFT OUTER JOIN
dbo.R_REF_TYPE ON ABST.REFTYPE_ID = dbo.R_REF_TYPE.ID LEFT OUTER JOIN
dbo.REL_ABSTRACT_ISOPACH AS ABSTISO ON ABST.ABSTRACT_ID = ABSTISO.ABSTRACT_ID LEFT OUTER JOIN
dbo.REL_ABSTRACT_LITHOSTRAT AS ABSTLIT ON ABST.ABSTRACT_ID = ABSTLIT.ABSTRACT_ID LEFT OUTER JOIN
dbo.REL_ABSTRACT_OUTCROP_LITH AS ABSTOUTLIT ON ABST.ABSTRACT_ID = ABSTOUTLIT.ABSTRACT_ID LEFT OUTER JOIN
dbo.REL_ABSTRACT_PALAEOGEOGRAPHIC AS ABSTPALGEO ON ABST.ABSTRACT_ID = ABSTPALGEO.ABSTRACT_ID LEFT OUTER JOIN
dbo.REL_ABSTRACT_WELL_LOG AS ABSTWELL ON ABST.ABSTRACT_ID = ABSTWELL.ABSTRACT_ID LEFT OUTER JOIN
dbo.REL_ABSTRACT_FIELD AS ABSTFLD ON ABST.ABSTRACT_ID = ABSTFLD.ABSTRACT_ID LEFT OUTER JOIN
dbo.REL_ABSTRACT_PALAEORECONSTRUCTION AS ABSTPALCON ON ABST.ABSTRACT_ID = ABSTPALCON.ABSTRACT_ID LEFT OUTER JOIN
dbo.REL_ABSTRACT_BASIN AS ABSTBAS ON ABST.ABSTRACT_ID = ABSTBAS.ABSTRACT_ID LEFT OUTER JOIN
dbo.R_JOURNAL ON ABST.JOURNAL_ID = dbo.R_JOURNAL.ID LEFT OUTER JOIN
dbo.R_PUBLISHER ON ABST.PUBLISHER_ID = dbo.R_PUBLISHER.ID LEFT OUTER JOIN
dbo.REL_ABSTRACT_GEO_TIME AS ABSTGEOT RIGHT OUTER JOIN
dbo.R_PERIOD ON ABSTGEOT.PERIOD_ID = dbo.R_PERIOD.ID RIGHT OUTER JOIN
dbo.R_EPOCH ON ABSTGEOT.EPOCH_ID = dbo.R_EPOCH.ID RIGHT OUTER JOIN
dbo.R_STAGE ON ABSTGEOT.STAGE_ID = dbo.R_STAGE.ID RIGHT OUTER JOIN
dbo.R_ERA ON ABSTGEOT.ERA_ID = dbo.R_ERA.ID ON ABST.ABSTRACT_ID = ABSTGEOT.ABSTRACT_ID LEFT OUTER JOIN
dbo.REL_ABSTRACT_USGS AS ABSTUSGS RIGHT OUTER JOIN
dbo.R_USGS_BASIN AS USGSBAS ON ABSTUSGS.ID = USGSBAS.ID ON ABST.ABSTRACT_ID = ABSTUSGS.ABSTRACT_ID LEFT OUTER JOIN
dbo.R_TECHNIQUE AS TECH LEFT OUTER JOIN
dbo.REL_ABSTRACT_TECHNIQUE AS ABSTTECH ON TECH.ID = ABSTTECH.TECHNIQUE_ID ON
ABST.ABSTRACT_ID = ABSTTECH.ABSTRACT_ID LEFT OUTER JOIN
dbo.REL_ABSTRACT_COUNTRY AS ABSTCTRY RIGHT OUTER JOIN
dbo.R_COUNTRY_CODE AS CTRYCODE ON ABSTCTRY.ID = CTRYCODE.ID ON ABST.ABSTRACT_ID = ABSTCTRY.ABSTRACT_ID LEFT OUTER JOIN
dbo.REL_ABSTRACT_REGION AS ABSTREG ON ABST.ABSTRACT_ID = ABSTREG.ABSTRACT_ID RIGHT OUTER JOIN
dbo.R_REGION ON ABSTREG.REGION_ID = dbo.R_REGION.ID
Any recommendations as to what to do would be really greatly appreciated. One idea I did have was to combine all the column fields into on nvarchar(max) column and free text index this.
Thanks for your opinions, really appreciated.
Oliver
January 6, 2010 at 12:48 pm
Hi,
Do not combine the data into 1 column, you'll break the first normal form and this beats the entire purpose of the database.
I would start by analysing the first two joins and work on transforming rows into columns or in other words flatten the data. You'll probably end up with hundreds of columns, but it'll be easier for you to analyse the data in a long term run. Flattening the data is also a time consuming process (depending on how much data you are working with).
January 7, 2010 at 4:32 am
Many Thanks for the reply, I have used the rows to row code and then created views for each of the tables and brought this into a single view.
SELECT
t1.abstract_ID,
List = substring((SELECT ( ', ' + ISOPACH )
FROM dbo.REL_ABSTRACT_ISOPACH t2
WHERE t1.abstract_ID = t2.abstract_ID
ORDER BY
abstract_ID
FOR XML PATH( '' )
), 3, 1000 )FROM dbo.REL_ABSTRACT_ISOPACH t1
GROUP BY abstract_ID
Seems to have cut down the query process to under 20 sec's for 237000 rows.
I understand you concerns about breaking the normalisation rule, but this is only for producing a full-text search on the data.
Cheers for the help,
Oliver
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply