Is Data Integrity really important or not

Question

Post reply

Is Data Integrity really important or not

_ms65g_

SSCommitted

Points: 1504
More actions
July 13, 2010 at 6:49 am

#221206

Why is the data integrity in database system very important?
Do you really use CHECK CONSTRAINT and other mechanism like Trigger for enforcing data integrity at your projects?

Viewing 15 posts - 1 through 15 (of 28 total)

You must be logged in to reply to this topic. Login to reply

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 1

_ms65g_ (7/13/2010)
Why is the data integrity in database system very important?

Because users tend to get very upset when applications return garbage information.

Do you really use CHECK CONSTRAINT and other mechanism like Trigger for enforcing data integrity at your projects?

Yes, and foreign key constraints

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

_ms65g_ SSCommitted Points: 1504 More actions · Answer 2

Could you introduce an article about values of data integrity and enforcing and implementing it?

And Do we must implement it with using DDL or we can implementing it at front end application?

PaulB-TheOneAndOnly SSC Guru Points: 66772 More actions · Answer 3

_ms65g_ (7/13/2010)
Why is the data integrity in database system very important?

As a DBA you are responsible for: Database integrity, Recoverability and Performance - data integrity is part of your job description 🙂

To better get familiar with the matter I would suggest to research Referential Integrity as well as Data Normalization - 3rd Normal Form a.k.a. 3NF would do.

_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.

PaulB-TheOneAndOnly SSC Guru Points: 66772 More actions · Answer 4

_ms65g_ (7/13/2010)
Could you introduce an article about values of data integrity and enforcing and implementing it?
And Do we must implement it with using DDL or we can implementing it at front end application?

As per Codd's Rule #10 it has to be on the back end. Having said that I recognize some really high-end systems require creative approaches.

_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.

Steve Jones - SSC Editor SSC Guru Points: 741285 More actions · Answer 5

You can implement it at the front end or back end. However, if you do it at the front end, then every time to make a call that touches data, it needs to check for RI.

If you could enter a customer name from 3 or 4 places in the front, every one of them needs to check the customer name maintains RI with other tables it might link to.

If someone builds a web service, or another application, or connects from Access/Excel, you have to ensure they do not break RI.

Typically this is why it is implemented on the back end, often with constraints, less often with triggers.

A few links:

http://www.sqlservercentral.com/articles/67327/

http://www.sqlservercentral.com/articles/1670/

PaulB-TheOneAndOnly SSC Guru Points: 66772 More actions · Answer 6

Steve Jones - Editor (7/13/2010)
Typically this is why it is implemented on the back end, often with constraints, less often with triggers.

Agreed.

PKs/FKs when enforcing RI in the -local- database.

Triggers when in need of enforcing RI against another database.

_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.

jvanderberg SSCertifiable Points: 6762 More actions · Answer 7

Because users won't follow the rules unless you force them to. They may even mean well. You may train all your users really, really well, but eventually they'll bring in new employees, and those employees will be trained by other employees, and then they might even train other new employees, until eventually your original plan is all but forgotten.

Because users aren't perfect. They forget things, they rush things, they overlook things, and they produce typos.

One of the worst culprits is almost always NULL fields. If they don't think it's relevant, they'll just leave it blank. But then what does NULL mean? Can you trust it's value, or is it being used subjectively. Maybe they forgot to put something in. Forcing them to put in a value means you don't have to ask "does NULL mean 0, or does it mean someone forgot to put something in there". 0 always means 0. NULL means a lot of things.

We have two ERPs at the moment, and one of them does not enforce any sort of referential integrity on ship states. Not only does that mean we could potentially get a shipment sent back to us, it makes it real hard to prepare a sales report by state. A great deal of time was spent mapping invalid state values into correct ones, and nothing's there to stop them from doing it again. If you just give them a free-form field, they might put in "FL", "Fla", "Florida", "Flordia", or even NULL. In our case, we ended up with a lot of zip codes in the state field. It's a mess.

Foreign key constraints can be done at the application level with drop-down lists, but not all data comes from the application. What if your SQL developer starts trying to push data into your database from some other system. Hopefully, he's doing it to the test database first, but his job becomes a lot easier if the system doesn't allow him to insert invalid data. The customer ID should be a value in the customer table, right? Why not inforce it?

More than anything, referential integrity means peice of mind for the dba. It means the data you pull from the system is consistant and can be reasonably trusted. It also makes report building much easier when you don't have to worry about what NULL means, or why you have a state called "11001".

Steve Jones - SSC Editor SSC Guru Points: 741285 More actions · Answer 8

Excellent points, jvanderberg. Often RI is there to prevent accidental mistakes, not malicious ones.

DataDog Hall of Fame Points: 3305 More actions · Answer 9

I like to have very tight control of my data - with many constraints and triggers etc.

but, saying that, I have built systems without any of the usual constraints

because:

- it simplified things

- it increased performance levels

- the probability of a constraint failing was microscopic

and I never had reason to regret it

if your design and permissions are tight there is little point expending precious cpu and io performing millions of:

if (true) then do nothing

Unless you like spending money on big servers

jvanderberg SSCertifiable Points: 6762 More actions · Answer 10

I'm afraid I have to disagree whole-heartedly. It doesn't take that many cycles to check a constraint. Further, those cycles don't really cost that much in the grand scope of things. Inconsistant, inaccurate data cost very real money, and sometimes a lot of it. It also costs a lot less to enforce data integrity early than to try and add it after the fact. An application that doesn't check constraints simply doesn't scale well, and almost everything I build ends up scaling to a grander scale that I ever imagined. A system meant for 5 advanced users could easily end up being used by 50 low-level users within just a few years. I wouldn't design a business application that wasn't meant to scale, because that's what business ultimately want to do - scale.

sqlvogel SSCrazy Eights Points: 9526 More actions · Answer 11

Ensuring the right results from your data is more important to most business users than either simplicity or performance. So if you sacrifice accuracy in the name of simplicity or performance you are doing those business stakeholders a disservice. There is an If here of course. It may be that you have found a more efficient way to implement integrity rules (putting them into applications for example) but as a general rule, implementing those rules in the database is the most cost effective and most performant way of ensuring data integrity.

I'd be interested to know more about your scenario and why you think the probability of the constraint failing was "microscopic". Presumably that's because you can test your application(s) very well and manage the deployments of those applications in a highly controlled way? If you had dozens or hundreds of applications with many of them changing frequently then ensuring each one implements exactly the same rules at the same time could become an enormous and perhaps impossible task.

DataDog Hall of Fame Points: 3305 More actions · Answer 12

I'd be interested to know more about your scenario and why you think the probability of the constraint failing was "microscopic"

plenty of validation at the application level

plenty more at the stored procedure level

inserts are highly unlikely to be invalid

and updates / deletes only possible through tightly defined application interfaces and stored procedures

plenty of healthy paranoia and defensive coding

no point in:

set [field] = 'STATUS'

also performing:

check ([field] in ('STATUS', 'BLAH1', 'BLAH2', 'BLAH3'))

-- or --

insert into ([username]) values ('bob')

also checking:

references dbo.Users([username])

when 'bob' was selected from dbo.Users in the same transaction

and users are only archived inside a tablockx

there is 0% probability of 'bob' not existing in the Users table ...

(unless server has run out of virtual memory then anything can happen)

sqlvogel SSCrazy Eights Points: 9526 More actions · Answer 13

doobya (7/24/2010)
I'd be interested to know more about your scenario and why you think the probability of the constraint failing was "microscopic"
plenty of validation at the application level
plenty more at the stored procedure level

Why would checking in the app or a proc be more efficient than checking in a constraint? In certain cases with CHECK using a small number of possible values it might be I suppose. It seems unlikely you could check UNIQUE or FOREIGN KEY constraints more efficiently in procedural code than with a declarative constraint though because you'll need extra queries or an extra round trip to do it. Even if you found a way to do it more efficiently, removing those constraints would probably be a bad idea because those same constraints can be used to produce more efficient execution plans for queries.

I would agree that in some special cases with CHECK constraints where the quantity of data access code is quite small and tightly controlled then you might want to save a few processor cycles by doing some checks in the app or in a procedure. I'm not sure how that qualifies as a "simpler" method however. It seems like declarative CHECK constraints is about as simple as you can get and generally involves less code than any procedural equivalent.

DataDog Hall of Fame Points: 3305 More actions · Answer 14

because the proc had to select the user account to check the balance already

the foreign key constraint was unnecessary

a single check constraint is quick but that isn't a common scenario

checking many check and foreign key constraints which will never return false

is a waste of resources - if you are running servers at 80% that extra could

push you over into scaling up - in that case looking at 5 figure sums coming out

of my pocket!

set @var = 1

if @var <> 1 raiserror('no point in this line existing however cheap it is', 16, 1)