Search for entire set of records.

  • I have rather odd requirement that I can't figure out how to construct. It's a botanical database, and the taxonomic description has as one of its elements an author string. It can be a single person, or a construct of several people, with various forms of punctuation between. Here is a sample of some of the possible strings:

    F. Muell. ex Hieron. in Engl. & Prantl

    (Hook.f. & Thomson) Dalla Torre & Harms

    (Mertens et W. D. J. Koch) Boissier

    Cosson et Durieu de Maisonneuve ex Cosson

    (Champ. ex Benth.) Champ. ex Benth.

    (Harv. ex Benth.) Torre & Hillc.

    (Hook. & Arn.) Hook. & Arn.

    Baill. ex Dalla Torre & Harms

    Schrenk ex Fisch., C.A.Mey. & Ave-Lall.

    (Bluff, Nees & Schauer) Rchb.

    (Graebn.) Dalla Torre & Harms

    Banks & Sol. ex G.Forst.

    The possible delimiters are 'ex', 'in', '&', and ',', and there may be parentheses starting at the left end, and encompassing one or more of the names. The individual names are in a separate table, each with its own unique ID, and I turn these strings into a set of records, each with its own group ID, which ties them all together, and an ordering field, that specifies left to right how the names are to appear, and some other auxiliary fields that specify delimiters and parentheses. This all works very nicely, but it's a huge pain in the fundament to investigate whether a particular configuration already exists. There is no way (that I have been able to think of, at least) to do it with set-based commands. I have to scan through the entire table of string constructs, and for each SET, assemble the entire string and compare it to what the user has assembled in the app running this. If there is a match, an indicator tells the user that the construct already exists and gives the option to use it. If there is nothing matching, the user may save the construct, and it is then available for future assignment to additional taxons (a particular set of people may have described more than one taxon).

    I thought of assembling the entire construct into one text field, but then I lose the binding of the name used in the construct to the table of names. That is, I could make a field something like:

    '( 123 ex 456 ) in 789 & 135'

    that would represent

    (F. Muell. ex Hieron.) in Engl. & Prantl,

    Where

    123 = F. Muell.

    456 = Hieron.

    789 = Engl.

    135 = Prantl

    But that loses the binding of name to ID, which I now have in the table of individual records.

    Something like a persistent grouping query would do the trick, like a persistent computed column, but I haven’t been able to find that anything like that exists. Any notions? Or am I going at the whole problem badly?

  • Does order of names matter?

  • It's crucial. That's what the ordering field is for.

  • OK, missed that on the first and second read of the article. My bad. the approach I have used in the past does not maintain RI, but your second table would be able to keep that in place to avoid deleting a single person/name. I used a varbinary to hold concatenated integers, which will keep the order. Searching is relatively fast depending on size of the string. If things get slow, use a hash of the concatenation or even the originating string, such as MD5, which limits the overall size and is very fast for searching.

  • Hm, you've lost me. Second table? Do you mean keep the table where I now store the set of records from which the author string is assembled? And another table, where I maintain the assembled IDs in one string as you describe, with a varbinary field, and use that for searching?

  • Table 1

    Record ID

    hash to speed searching

    concatenated integers for full comparison

    other CRUD stamps (who, when,etc)

    optional - original text string for display purposes

    Table 2

    Record ID FK constraint to Table 1

    Order Number

    Name Key - FK constraint to Table 3

    Table 3

    Name Key

    Name String

  • Okay, I see. I was concentrating on the disadvantages inherent in each method, and didn't think of using both together. It would require manual maintenance of the coherence between the set of records in Table2 and the concatenated IDs in Table1, which would make it more complex, but it might be worth it from a performance point of view.

    I can't use it exactly as you specify, because the delimiters are important - F. Muell. ex Hieron. is different from F. Muell. & Hieron., and from F. Muell. in Hieron. and from F. Muell., Hieron., and I have to keep track of where the parentheses are as well, (F. Muell. ex Hieron.) in Engl. & Prantl is different from F. Muell. ex Hieron. in Engl. & Prantl, but those are implementation details.

    I'll look into some making this as automatic and bulletproof as possible, and see how it works. Maybe even via a trigger? Yeah, I know, but this table will see lots of lookups and only occasional adds (constantly fewer as time goes on, and available combinations increase), and almost no updates or deletes, so it might be workable.

    Thanks for the hint.

  • If you need to keep the delimiters, add them as names. put another column on the name table to indicate that is what they are, but it keeps all the data in a single location versus portion of it in code. The concatenation would keep everything.

    I try to stay away from triggers as well and typically place the code in a procedure that does the lookup and insert if missing.

    one last bit of advice, the hash is the key to the performance. I use it for our wait stats utility that is comparable to Confio Ignite. I would spend all day comparing huge select statements and plans, but I can build and find the hash in milliseconds.

  • OK, that's a good idea, I'll do it that way. The DB doesn't care whether it's a real name, or a delimiter - it'll string it all together regardless.

    Not sure about the trigger - I know they're generally considered bad manners, but I think it's because they get misused a lot. They're a useful tool, if applied only in the proper circumstances. This may be a little too involved for a trigger, but I'll see how it looks once I get to work.

    Thanks again.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply