November 19, 2013 at 7:49 am
Hi all,
I work with a database that has no primary keys defined on any tables. As a customer's consultant I have no control over the supplier's database. It frustrates me that I have no control over the database and I'm looking for some points of discussion so I can go back to the supplier to ask them to review the design and consider making some changes.
So I guess I'm just looking to provoke a discussion about this in order to gather some points I can then discuss with the supplier.
Anyone got any comments...?
November 20, 2013 at 11:04 am
Questions:
1. Are there any clustered indexes on the tables? Without them, indexing will be inefficient since each table without a clustered index will be a heap.
2. Are you talking about natural primary keys or setting up IDENTITY columns that will be used as clustered primary keys? If IDENTITY columns exist then you are one step closer to making efficient indexing.
Sometimes vendors treat SQL Server as if it were a kind of set-and-forget file system. That's not what it is.
Perhaps we can help you educate your client.
Thanks
John
November 20, 2013 at 11:45 am
it might be the product of someone who thought to keep their database design as if it were secret and proprietary, and have no PK's so you can't deduce the design by following Foreign Key relationships.
if that's true you could probably run a trace when the application is running, parse the JOIN criteria of any statements, and infer the PK/FK's to create.
I've seen that quite a few times here on SSC, where someone thinks their database design, and the relationship between Invoice and InvoiceDetails was unique to their thought pattern and no one else would think of it.
Lowell
November 20, 2013 at 2:15 pm
Thanks, perhaps a bit more background is in order.
The database was originally built years ago in Progress (old OpenEdge) but some Clients run it through the OpenEdge SQL DataServer which basically migrates the schema and data into SQL and then uses a schema placeholder to translate OpenEdge database queries into SQL queries. This means that the database can run on SQL whilst the main GUI and interface doesn't need re-coding and can continue to run in Progress 4GL.
There genuinely are no (clustered) primary keys and very few clustered keys; most indices are non-clustered indices that yes will just exist on the heap. It's not a case of them trying to keep the db a secret as it is fairly self-explanatory; I used to work for the supplier and know the db inside out.
I'd like an idea for what kind of impact having no primary, clustered keys at all would have on your typical transactional database...
November 20, 2013 at 2:27 pm
Drammy (11/20/2013)
Thanks, perhaps a bit more background is in order.(snip)
I'd like an idea for what kind of impact having no primary, clustered keys at all would have on your typical transactional database...
A lot. The heap would be a big slow-down.
Is there an identity-type column (int or bigint) on any table?
You can likely change the underlying indexing to clustered and rebuild all indexes, and gain performance, without impacting the app. Can you set up a test environment to verify this and to regression test before running such a script on production? You may not be able to specify PK in the DDL, since that brings restrictions along with it. But you can still create an unique clustered index on the natural primary key and achieve the identical effect.
Obviously, with no defined PK, features like replication are not going to be a future option. If there is an unique identity key (a sequential integer field), you can cluster that column and reap maximum indexing benefits, especially since inserts will always go to the end of the cluster.
Then once you get indexes clustered on all tables, go find Michelle Ufford's dba_indexdefrag freeware and schedule yourself a weekly index rebuild job that runs on a weekend schedule. If you wish, another step to the same job could update statistics, and you would reap best speed and keep your DB nice and clean.
Thanks
John.
December 2, 2013 at 4:10 am
Plus the fact that no unique constraints radically changes the optimizers choices for retrieving data. I'll bet you see a LOT more scans than you should since the optimizer has no way of knowing if your data is unique.
Honestly, I'd be hard pressed to list the very large number of issues you're seeing here that you shouldn't in terms of performance. But the biggest worry would be data quality. Your business can quite easily insert duplicate data if there are no constraints in place (and probably has). That would be the first, and most important, point I'd raise.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
December 2, 2013 at 6:09 am
Thanks for all the guidance and help.
I have plenty to be going back to them with and might even change some indexing on the primary tables and prove the performance improvement as part of my argument.
Thanks again,
Drammy
December 11, 2013 at 10:25 am
Are you talking about application performance or reporting performance.
If it is reporting, then I woudl be tempted to replicate the database and create reporting indexes. In a large system the differing demands on OLTP and OLAP can cause conflicts as lots of indexes to support reporting will slow down the insert and update of records.
December 11, 2013 at 10:50 am
aaron.reese (12/11/2013)
Are you talking about application performance or reporting performance.If it is reporting, then I would be tempted to replicate the database and create reporting indexes. In a large system the differing demands on OLTP and OLAP can cause conflicts as lots of indexes to support reporting will slow down the insert and update of records.
That is a good idea, but SQL Server replication requires defined primary keys on all articles. Therefore, if reporting requires data in close to realtime, the PK issue that the original poster raised persists.
JT
December 11, 2013 at 11:06 am
Yeah, understood - this is an application database only. No reporting requirements or indices on this database.
December 12, 2013 at 12:28 pm
Where's Celko? I thought he'd be in here administering some kind of beat down based on the title alone. 🙂
December 12, 2013 at 2:12 pm
Happy Christmas Joe 😛
Which reminds me.
Q) Why do Assembly programmers get Xmas and Halloween confused
A) Because OCT31 = DEC25
Anyway, JC wouldn't touch Progress with a barge pole - come to think of it, neither would I
JC - maybe we should be nicer to him - it might also be his birthday
December 12, 2013 at 2:14 pm
Don't we celebrate JC's birthday on the 25th regardless of when it actually is?
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply