Today we have a guest editorial as Steve is on vacation.
Over the past 30 years of so there have been numerous improvements to database software. Engines have been improved to speed up transaction processing. Replication has become more stable and easier to administer. Compression and encryption have been added. New data types have been added. The ability to audit data, transactions and schema change have also been greatly improved.
But it is still up to administrators, developers, and end users to carefully craft the schema and be vigilant about identifying bad data, or correcting for weakly “typed” schemas. Often, default data types are used or automatically created. For example an ETL or data import may load dates or numbers into varchar datatype fields. This creates numerous issues when trying to join on related data with mismatched data types. It is time to start developing database intelligence.
Ideally the future of database excellence will be the ability of the engine to understand not only the structure, but the data itself. As developers add schema objects, the engine would recognize what structures are being built and create the relationships needed to help with accuracy by adding primary and foreign keys and indexes.
The database intelligence software would continually monitor the health of the database and drop un-used indexes, tune existing ones and add missing indexes. The engine would also have the ability to understand types of tables: transactional, reference, and intersection. The engine would have to mimic human behavior to understand what data is contained in the tables. The engine would know to ask “What are these data types?” It would correct data types based on the data in the columns and correct the structure to ensure the database is strongly typed. It would create exception tables to capture any poorly designed schemas. The intelligent engine would automatically move exception records to these tables, and then notify an admin about the anomaly.
The software would be robust enough to understand the attributes of data and identify the anomalies and outliers that exist in the data. As the database intelligence software continued to understand the data and structure, it would know how the data was connected and optimize for speed and accuracy. The software could deal with both relational data and unstructured data in the same database.
This new intelligent data base engine would also alert users to bad data by having the ability to recognize, clean, audit, and isolate it. The software would also have the ability to assess the data quality. Many of the tedious and time consuming tasks that are manually done by developers and analysts would be part of the core engine.
Imagine an engine has the intelligence to design and isolate bad data, but correct schema and structure automatically and smartly. Perhaps the engines of the future will no longer be called databases but “informationbases”. Imagine the value that would be gained by taking all of the best practices that have been employed by developers and DBAs and integrating that into an intelligent engine. Imagine the time this would save. I see a future where the moment you create an object in database, be it a table, view, or stored procedure, the engine would analyze it against the current system and “auto correct” it, providing the most efficient storage and access.