Heap vs Clustered Wildcard Search

Question

Heap vs Clustered Wildcard Search

Viewing 15 posts - 16 through 30 (of 36 total)

You must be logged in to reply to this topic. Login to reply

josh-1127203 SSC Eights! Points: 876 More actions · Answer 1

I wouldn't say that the part number thing is horribly overloaded, but I understand where you are going with that. We take the manufacturer part number (123) and add a prefix (ABC or DEF, etc) to create a unique ItemCode (ABC123) in our system. I am looking for a long term solution here, not a band aide so I certainly appreciate you pushing to look at the underlying issue. With that said, I could add and add'l column for MfrPartNumber and put the second part of the ItemCode in there, then index both columns and remove the wildcard prefix.

E.G.
ItemCode: ABC123
MfrPN: 123
ItemCode: DEF123
MfrPN: 123

Would that be the optimal approach that you would go after for a long term solution?

And to complicate matters, on the same subject, we also have a Descr column that holds a product description (e.g. "Red Widget with Triangle Pieces"); we also have a search for this column (optional) whereas we may need to lookup all things with Triangle. Is this column best served with Full Text Index?

Again, thanks in advance for your feedback. It's much appreciated.

josh-1127203 SSC Eights! Points: 876 More actions · Answer 2

For S&G and I tried adding in a Full Text Index on the ItemCode column. I also added in the following items ABC123456, DEF123456, GHI123456.

SELECT ItemCode FROM TABLE WHERE ItemCode LIKE '%123456%'
returns all 3 results

SELECT ItemCode FROM TABLE WHERE Contains(ItemCode, '123456')
returns 0 results

If I add in an item ABC-123456

SELECT ItemCode FROM TABLE WHERE Contains(ItemCode, '123456')
returns 1 result (ABC-123456; not ABC123456, DEF123456, GHI123456)

So it appears that the Full Text Index is not searching partial strings unless there is a separator. Same goes if I use Full Text Index on the Descr field:

TEST BOX
TEST BOXRED
TEST REDBOX

SELECT ItemCode FROM TABLE WHERE Contains(Descr, 'box')
returns 1 result (TEST BOX)

SELECT ItemCode FROM TABLE WHERE Contains(Descr, '"box*"')
returns 2 results (TEST BOX, TEST BOXRED)
but no TEST REDBOX

Is this by design for Full Text Index?

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 3

josh-1127203 - Wednesday, May 17, 2017 9:47 AM
I wouldn't say that the part number thing is horribly overloaded, but I understand where you are going with that. We take the manufacturer part number (123) and add a prefix (ABC or DEF, etc) to create a unique ItemCode (ABC123) in our system. I am looking for a long term solution here, not a band aide so I certainly appreciate you pushing to look at the underlying issue. With that said, I could add and add'l column for MfrPartNumber and put the second part of the ItemCode in there, then index both columns and remove the wildcard prefix.
E.G.
ItemCode: ABC123
MfrPN: 123
ItemCode: DEF123
MfrPN: 123
Would that be the optimal approach that you would go after for a long term solution?
And to complicate matters, on the same subject, we also have a Descr column that holds a product description (e.g. "Red Widget with Triangle Pieces"); we also have a search for this column (optional) whereas we may need to lookup all things with Triangle. Is this column best served with Full Text Index?
Again, thanks in advance for your feedback. It's much appreciated.

Yeah, that seems good. Probably a good choice on full text too. Again, hard to say for certain, testing will be your friend there.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 4

josh-1127203 - Wednesday, May 17, 2017 12:08 PM
For S&G and I tried adding in a Full Text Index on the ItemCode column. I also added in the following items ABC123456, DEF123456, GHI123456.
SELECT ItemCode FROM TABLE WHERE ItemCode LIKE '%123456%'
returns all 3 results
SELECT ItemCode FROM TABLE WHERE Contains(ItemCode, '123456')
returns 0 results
If I add in an item ABC-123456
SELECT ItemCode FROM TABLE WHERE Contains(ItemCode, '123456')
returns 1 result (ABC-123456; not ABC123456, DEF123456, GHI123456)
So it appears that the Full Text Index is not searching partial strings unless there is a separator. Same goes if I use Full Text Index on the Descr field:
TEST BOX
TEST BOXRED
TEST REDBOX
SELECT ItemCode FROM TABLE WHERE Contains(Descr, 'box')
returns 1 result (TEST BOX)
SELECT ItemCode FROM TABLE WHERE Contains(Descr, '"box*"')
returns 2 results (TEST BOX, TEST BOXRED)
but no TEST REDBOX
Is this by design for Full Text Index?

Add wild cards to the search '*box*'. See what you get then.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Jacob Wilkins One Orange Chip Points: 27976 More actions · Answer 5

josh-1127203 - Wednesday, May 17, 2017 12:08 PM
For S&G and I tried adding in a Full Text Index on the ItemCode column. I also added in the following items ABC123456, DEF123456, GHI123456.
SELECT ItemCode FROM TABLE WHERE ItemCode LIKE '%123456%'
returns all 3 results
SELECT ItemCode FROM TABLE WHERE Contains(ItemCode, '123456')
returns 0 results
If I add in an item ABC-123456
SELECT ItemCode FROM TABLE WHERE Contains(ItemCode, '123456')
returns 1 result (ABC-123456; not ABC123456, DEF123456, GHI123456)
So it appears that the Full Text Index is not searching partial strings unless there is a separator. Same goes if I use Full Text Index on the Descr field:
TEST BOX
TEST BOXRED
TEST REDBOX
SELECT ItemCode FROM TABLE WHERE Contains(Descr, 'box')
returns 1 result (TEST BOX)
SELECT ItemCode FROM TABLE WHERE Contains(Descr, '"box*"')
returns 2 results (TEST BOX, TEST BOXRED)
but no TEST REDBOX
Is this by design for Full Text Index?

It's by design. You can do prefix searches, but can't do suffix searches (barring hacks like storing the REVERSE of a string and indexing that).

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 6

Jacob Wilkins - Thursday, May 18, 2017 11:48 AM
josh-1127203 - Wednesday, May 17, 2017 12:08 PM
For S&G and I tried adding in a Full Text Index on the ItemCode column. I also added in the following items ABC123456, DEF123456, GHI123456.
SELECT ItemCode FROM TABLE WHERE ItemCode LIKE '%123456%'
returns all 3 results
SELECT ItemCode FROM TABLE WHERE Contains(ItemCode, '123456')
returns 0 results
If I add in an item ABC-123456
SELECT ItemCode FROM TABLE WHERE Contains(ItemCode, '123456')
returns 1 result (ABC-123456; not ABC123456, DEF123456, GHI123456)
So it appears that the Full Text Index is not searching partial strings unless there is a separator. Same goes if I use Full Text Index on the Descr field:
TEST BOX
TEST BOXRED
TEST REDBOX
SELECT ItemCode FROM TABLE WHERE Contains(Descr, 'box')
returns 1 result (TEST BOX)
SELECT ItemCode FROM TABLE WHERE Contains(Descr, '"box*"')
returns 2 results (TEST BOX, TEST BOXRED)
but no TEST REDBOX
Is this by design for Full Text Index?
It's by design. You can do prefix searches, but can't do suffix searches (barring hacks like storing the REVERSE of a string and indexing that).

Oh, you can't use the wild card on both? My bad. Maybe I should reread the chapter I wrote on fulltext indexes.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Jacob Wilkins One Orange Chip Points: 27976 More actions · Answer 7

Yeah, they've been saying it's high on their list for a while now (see https://connect.microsoft.com/SQLServer/feedback/details/758588/full-text-leading-wildcard-suffix-search, for example), but hasn't been done yet as far as I know.

The documentation is careful to say that prefix search is supported while quietly omitting any mention of the lack of suffix search.

jcelko212 32090 SSCrazy Eights Points: 9303 More actions · Answer 8

josh-1127203 - Tuesday, May 16, 2017 10:21 AM

Please read a book on RDBMS. The identity column is not a column! It's a table property imposed by physical storage. If you use that as a key. It means you do not have a valid relational design and we should shoot you. Okay, send you to reeducation to learn the right terms.

By definition, let me repeat that by definition, a key has to be a subset of columns in the entity that is modeled by the table. In your case it looks like a thing called him "item_code" is the expected key. But I am only guessing since you didn't bother to follow form rules and the etiquette of the last 30 years of SQL forums and post DDL.

Based on 40+ years of programming; first get the logical design right then worry about the implementation. Getting the wrong answer very fast is much, much, much worse than getting the right answer a little slower. You'll poison everything in your system.

>> Right now the table is a heap and I have a query that does a wildcard against the item_code column (varchar(250)): <<

Oh dear God in heaven, I certainly hope not! First of all, you don't know what a "code" is or the ISO 11179 standards; it's what's called an "attribute property" and it has a specific meaning as to the kind of attribute it is. What you should have had for a key was an "item_id"; an identifier for each item, not an attribute that puts it in the category.

Also, I've been at this for over 40 years and have never seen a variable length code of 250 characters. In fact I cannot find anything like that in the ISO standards. Essentially, your problem really is that you don't know how to design a table, how to model data or anything else related to abstraction and standards. If I'm wrong, please post an example of a VARCHAR(250) encoding scheme for your items. I'd love to use it in one of my books is a bad example of design.

In a properly designed schema, there is an identifier for each entity (usually the key), and the encoding schemes for the various attributes are "x CHAR(n) NOT NULL CHECK (x LIKE '...')" or or if it is quantity or magnitude "x <numeric data type> [NOT NULL] CHECK (x <numeric predicate>) " or "x <temporal data type> [NOT NULL] CHECK (x <temporal predicate>) " to validate the data.

>> I really want to get this table over to cluster but doing so slows the query that is used. <<

When I'm teaching classes, one of the things I stress that you need to design data. I have a horrible feeling that your insanely long item_code is a total mess. Let me give an example that you understand, if you been to a library. As you ever consider how libraries organize their shelves before there was Dewey Decimal Classification? Anyway they wanted to and is personnel changed, so did the classifications. Every library was different. Having been in the bookstore business. I actually ran into one new age feminist bookstore in Atlanta in the late 1970s that classified their books but the color of the binding. No, really! It made the shelves looks pretty.

You probably need to sit down and actually design your item_code. If that is the main search criteria. I've got a whole book and a lot of articles on how to design encoding schemes. I happen to like hierarchical encoding schemes like Dewey, because it's really easy to use simple string matches on them (I know that "5%" is science, "51%" is mathematics within science,, etc.).

We don't have any details or DDL or anything else about your real database. All your caring about is how to get the best performance out of a really crappy design. This is not professional. Do you want to be a database professional or just a code monkey?

Please post DDL and follow ANSI/ISO standards when asking for help.

josh-1127203 SSC Eights! Points: 876 More actions · Answer 9

jcelko212 32090 - Thursday, May 18, 2017 12:16 PM