Looking up a dimension's key when the comparison data is a blob

  • When adding rows to a fact table, I need to lookup the correct key for a dimension for the fact. The dimension itself is a "detailed" dimension that contains very long text. The text is the only way to retrieve the key (I also have a checksum column but I don't think I can gurantee there won't be collisions).

    SSAS refuses to use the text as a lookup since it's a "blob" (It's varchar(max)).

    I would just insert the strings into the dimension as the fact rows are inserted but there's a lot of duplicate text so I don't want there to be a row in the dimension for every fact row.

    How can I map these strings to the fact rows?

  • You can use a SQL query with a left outer join to determine if a row is a new one or not.

    This is not a very fast and efficient method if your dimension grows very large.

    I would consider creating a trustworthy checksum or hash to create a more compact natural key.

    If the collision rate is acceptable, why not?

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Could you elaborate on why it is the only way to get they key? Also, I just thought of an idea to better use hashes, perhaps hash it with SHA1 and MD5 and use those for the comparison. I'm thinking the collision possibility when you have 2 hashes with 2 different methodologies might help..

    Not sure, I just thought of it...

    CEWII

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply