Relationally Divided over EAV

  • I remember hearing a surgeon talk about his surgical memoir. It really struck me. He said, the first 10 years, you learn how to cut. The next 10 years, you learn when to cut. The next 10 years, you learn when not to cut.

  • SwePeso (12/9/2013)


    Here is a link to download the schema of my hybrid solution.

    It can also be found on my web page http://www.sqltopia.com[/quote%5D

    Just starting to try to get my head around your schema...a couple of questions.

    1) Most of the tables have primary keys, except factHeuristic and factPair, however both of these have Unique Clustered Indexes and no nullable columns. Any chance of an explanation as to why you didn't go for a PK? I'm thinking it has got something to do with the Partition Scheme?

    2) I'm interested in using this technique for an Oracle solution, but Oracle doesn't support included columns in indexes. Any thoughts on my chances of success - or is this key? Could I for example create new or extend existing indexes to literally cover these included columns?

    I'm working from the pdf document...I wonder has it been written up in more detail elsewhere?

    It really does look very promising, but not easy for a aging brain like mine to digest.

    Many thanks,

    David McKinney.

  • Stephen Hirsch wrote:

    I remember hearing a surgeon talk about his surgical memoir. It really struck me. He said, the first 10 years, you learn how to cut. The next 10 years, you learn when to cut. The next 10 years, you learn when not to cut.

    If literally true, then I want a surgeon with gray hair.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Hoping the broken links can be mended... please...

    Thanks for re-posting as an editorial - may just be at the right time for a hell-hole of an "unknown" attributes per customer problem with a horrendously wide table...

  • I'm a little late to the party due to the repost. But I look forward to reading more on this topic. We had a really terrible EAV implementation, so I instinctively avoided it afterwards. And I probably still should in most cases, but I never expected to see an article pointing out that EAV can have its uses.

    I realize,  however, that in IT many things called "best practices" are really more like "received wisdom" or even "urban legends." I admire the experts here on SSC because you always scrutinize everything and can break down when things are useful or helpful, and when they aren't. That is more valuable to me than just doing things automatically because someone said so. Even when the "said so" is correct, this site has taught me it's important to understand why it is correct.

    Thanks.

    -- webrunner

    -------------------
    A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
    Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html

  • Where I have seen EAVs it has been when someone wanted a flexible schema or something that, in 2024, would suggest a use case for a either a document DB or for the JSON capabilities within an RDBMS.

    Do EAV models remain relevant given the evolution of DB tech since Tony wrote this editorial?

  • Yes.

    Not only because of the simplicity, but for the streamlined storage need and search optimization. You only need one index for all attributes.

    The EAV model is extremely well suited for Relational Division. One of the largest implementations of my hybrid EAV data model currently hold 200 billion rows (in one table) and no matter which combination of search parameters you use, the execution time is still under 500 ms (IO most often less than 120).

    My hybrid EAV data model is currently used in one of the largest cancer research projects ever put together in the USA, and I anticipate the number of records to be even higher than that.


    N 56°04'39.16"
    E 12°55'05.25"

  • Yes,  implementing multi-value columns in JSON is a better solution than an EAV table design, and the app developer may end up serializing the resultset as JSON regardless, so you're giving them exactly what they want.

    From what I've been, it's only a subset of queries for a specific table that need the extended fields.

    So, here is a quick fix for anyone who needs to performance tune a query using JSON columns like this:

    SELECT SalesOrderNumber,
    OrderDate,
    JSON_VALUE(Info, '$.Customer.Name') AS CustomerName
    FROM Sales.SalesOrderHeader
    WHERE JSON_VALUE(Info, '$.Customer.Name') = N'Aaron Campbell'

    The trick is that you can index computed columns:

    https://learn.microsoft.com/en-us/sql/relational-databases/json/index-json-data?view=sql-server-ver16#index-json-properties-by-using-computed-columns

    FYI: Despite popular belief, the computed columns don't even need to be persisted.

    Also, for the index to be used, you don't have the modify the original SQL query to reference the new computed column name, because it's the expression, not the computed column, that gets indexed.

    This is like on my TOP 10 list of things to look for and refactor when I'm asked to tune the queries on an application. But I guess we all have to re-configure our approach to performance tuning to match the typical design patterns of whatever organization we working for at the moment.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • webrunner wrote:

    I'm a little late to the party due to the repost. But I look forward to reading more on this topic. We had a really terrible EAV implementation, so I instinctively avoided it afterwards. And I probably still should in most cases, but I never expected to see an article pointing out that EAV can have its uses.

    I realize,  however, that in IT many things called "best practices" are really more like "received wisdom" or even "urban legends." I admire the experts here on SSC because you always scrutinize everything and can break down when things are useful or helpful, and when they aren't. That is more valuable to me than just doing things automatically because someone said so. Even when the "said so" is correct, this site has taught me it's important to understand why it is correct.

    Thanks.

    -- webrunner

    + 1 Million, WebRunner.  That, the fact that such discussions are actually encouraged (rather than being discouraged like on other websites),  and the people that practice that are what keep me here.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Well... Do that for 100+ attributes and you need 100+ indexes.

    And still you can't use more than one index when searching for

    WHERE treatment = @treatment AND bp = @bp AND ward = @ward;

    While it is true that the table does not grow in size with calculated columns, the indexes sure does grow. And inserts will block all indexes referencing the JSON document and corresponding column.

     


    N 56°04'39.16"
    E 12°55'05.25"

  • SwePeso wrote:

    Well... Do that for 100+ attributes and you need 100+ indexes.

    And still you can't use more than one index when searching for

    WHERE treatment = @treatment AND bp = @bp AND ward = @ward;

    While it is true that the table does not grow in size with calculated columns, the indexes sure does grow. And inserts will block all indexes referencing the JSON document and corresponding column.

    Yeah, if a JSON attribute requires an index (it's actually used as a search predicate), then that's a clue that it really needs to be refactored as a properly normalized column.

    If a table is essentially a collection of JSON documents, then you're better off using Azure CosmosDB or MongoDB.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • I tried Cosmos. Still 2 magnitudes slower.

    Haven't tried Mongo yet, but I don't expect it to be faster.


    N 56°04'39.16"
    E 12°55'05.25"

  • I wish the original demonstration had been videoed.

Viewing 13 posts - 16 through 27 (of 27 total)

You must be logged in to reply to this topic. Login to reply