January 24, 2006 at 5:43 am
I have queries that nearly always based on part-code
is it better to use substring or like ?
e.g
where substring(partCode,1,4)='ukd5'
or
where partcode like 'ukd5%'
January 24, 2006 at 6:02 am
Depends if the table has indexes
If not then either will use a Table Scan
If the column is indexed SUBSTRING will use Index Scan and LIKE will use Index Seek
If the table has index (but the column has not) then both will use Index Scan
Any combination can affect performance and you should check the execution plan(s)
Far away is close at hand in the images of elsewhere.
Anon.
January 24, 2006 at 6:18 am
thank you David,
In this context, where there is a clustered index on the column, the LIKE uses the clustered index seek and is over 20 times faster
3 secs for the seek compared to 1min 11 seconds for the scan
happy days!
January 25, 2006 at 7:07 am
It depends on the functionality you are looking for. If you trying to find all records where partcode starts with, contains, or ends with some value, then the LIKE operator is the correct choice. If you want to find the value at an exact position within the part code, SUBSTRING is a better choice.
January 25, 2006 at 9:22 am
If this was something you were going to be regularly searching for, you could create a calculated column with a formula of:
substring(PartCode, 1, 4)
Then build an index on this calculated column - the database will then use the index to search instead of performing a table scan.
January 25, 2006 at 9:29 am
ah - this might help me
the part code is normally 10 chars long (alphanumeric)
but is normally used in first 3, first 4, first 7, or whole 10
so if i made calculated colums for 3,4,7 then created an index for each of these the database would choose the most optimal index to use depending on what I was searching for or would i have specy the calulated column in the where clause?
January 25, 2006 at 9:33 am
Just specify the Calculated column in the where clause and the query optimiser should choose the appropriate index. You can check by looking at the Execution Plan.
Beware that creating too many indexes will have a performance impact on your Inserts, Updates & Deletes since these new indexes will have to be kept up to date whenever your change the data.
January 25, 2006 at 9:39 am
well we are fortunate - no updates or deletes, just inserts once a quarter (aprx 2.4 million records)
is there any benefit in not bothering with calculated column, and just have a fixed column for each ?
January 25, 2006 at 9:54 am
The data in calculated columns will not add to the storage requirement for the table - it is in effect virtual data. However, by creating the index you will persist the value of the calculated column in the index even though the value is not stored in the table.
With a table of this size it would probably be beneficial to drop the index prior to the quarterly data population and recreate it after the import completes.
June 26, 2006 at 7:45 am
i just got back to this issue
I prepared a test:
Table: 76 million rows
Query: Matches 360k rows and sums up into 26 rows
like 'ukd5%' - indexed partCode column: 8secs
= 'ukd5' - indexed calculated column of substring(partCode,1,4): 7secs
= 'ukd5' - indexed normalised column of substring(partCode,1,4): 7secs
so the '=' is marginally faster than the 'like',
and there was no direct improvement from normalising the column in this case
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply