March 4, 2021 at 12:28 pm
Hi,
I am working on an old database table which stores emails, including From, To, CC etc but it also stores the body of the email and the person that designed the table has used datatype nvarchar(max) to store the body.
At the moment the table is relatively small with only 97500 records but the data size for the table exceeds 2.5Gb, with the body data column taking up almost all of this space.
Is there an easy way to change how this data is stored and let me reduce the size of this data for future rows.
Also is there anyway of reducing the size of the current data by storing it in a different format?
The body of the email is searched in a program written for the user, so it is indexed too.
Any help would be much appreciated
Many thanks
Bob :o)
March 4, 2021 at 1:57 pm
In the event you don't need unicode data in the body, you could convert it to a VARCHAR(MAX) as one thought.
Alternately, if you select the max length of the body and it is 4000 or under, you could change that from NVARCHAR(MAX) to NVARCHAR(4000) (or lower). Won't be future proof that way mind you.
If possible, combining both of those approaches would save you space.
If you don't need all of the emails, you could remove old ones.
You could use your email client (outlook 365 for example has a pretty quick search) to store emails instead of SQL.
If there are any duplicate email bodies, you could pull the bodies out to a separate table and have an int that links the 2 tables (ie basically have a lookup table for the body).
As a random thought, is the index on the body actually being used? I'd check the execution plan to make sure that index is actually being used appropriately. My thoughts here are that if I was searching an email body, it is very unlikely that I remember exactly the text of the email and am actually doing a LIKE '% something %' search on it isn't going to play well with the index. I would be curious if dropping the index on the body resulted in any performance hit on your queries AND if it helped with the space used.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
March 4, 2021 at 2:55 pm
To save space, COMPRESS the body (and other very long columns). This process is rather slow, but it typically does save a lot of space.
And, using triggers, you can make the COMPRESSion and DECOMPRESSion transparent to the developers and the users except that a $IDENTITY column for the table can no longer be returned directly to code after an INSERT. That is, SCOPE_IDENTITY() (or any of the other standard methods) no longer work to return the assigned $IDENTITY value, because the $id assignment is a trigger and thus is in a separate scope.
There are numerous other ways to return the $id to the code -- such as sp_set_session_context and sp_get_session_context -- but the INSERTing code must be changed to do that, of course.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
March 4, 2021 at 3:31 pm
To save space, ENCRYPT the body (and other very long columns). This process is rather slow, but it typically does save a lot of space.
And, using triggers, you can make this transparent to the developers and the users except that a $IDENTITY column for the table can no longer be returned directly to code after an INSERT. That is, SCOPE_IDENTITY() (or any of the other standard methods) no longer work to return the assigned $IDENTITY value, because the $id assignment is a trigger and thus is in a separate scope.
There are numerous other ways to return the $id to the code -- such as sp_set_session_context and sp_get_session_context -- but the INSERTing code must be changed to do that, of course.
Did you mean COMPRESS instead of ENCRYPT?
March 4, 2021 at 4:05 pm
ScottPletcher wrote:To save space, ENCRYPT the body (and other very long columns). This process is rather slow, but it typically does save a lot of space.
And, using triggers, you can make this transparent to the developers and the users except that a $IDENTITY column for the table can no longer be returned directly to code after an INSERT. That is, SCOPE_IDENTITY() (or any of the other standard methods) no longer work to return the assigned $IDENTITY value, because the $id assignment is a trigger and thus is in a separate scope.
There are numerous other ways to return the $id to the code -- such as sp_set_session_context and sp_get_session_context -- but the INSERTing code must be changed to do that, of course.
Did you mean COMPRESS instead of ENCRYPT?
D'OH, yeah, COMPRESS / DECOMPRESS, not ENCRYPT. Had ENCRYPT on the brain from what I was doing here.
Edit: I've corrected the original post.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
March 6, 2021 at 10:57 am
Thank you so much for your replies Mr. Brian Gale, ScottPletcher and Jonathan AC Roberts, I will now try these out and see how I get on with a trial db first and if they work implement them into the live db.
Many thanks
Bob
March 6, 2021 at 4:09 pm
the table is relatively small with only 97500 records but the data size for the table exceeds 2.5Gb
That's an average of 3.5 Pages of text (assuming Nvarchar) PER message. Seems a lot to me ... of the messages are being stored with HTML and more besides - maybe Images embedded rather than linked.
My first aim would be to only store the "meat" from the messages, e.g. plain-text, because long term that will have the greatest impact on saving.
March 6, 2021 at 4:22 pm
maybe some analysis of what proportion of the Email Body are Small,. Medium and Big in size. Something like this
SELECTSUM(CASE WHEN LEN(Body) <= 1000 THEN 1 ELSE 0 END) AS T1000
, SUM(CASE WHEN LEN(Body) > 1000 AND LEN(Body) <= 2000 THEN 1 ELSE 0 END) AS T2000
, SUM(CASE WHEN LEN(BodyBody) > 2000 AND LEN(Body) <= 5000 THEN 1 ELSE 0 END) AS T5000
, SUM(CASE WHEN LEN(Body) > 5000 AND LEN(Body) <= 10000 THEN 1 ELSE 0 END) AS T10000
, SUM(CASE WHEN LEN(Body) > 10000 THEN 1 ELSE 0 END) AS T10001
, SUM(CASE WHEN LEN(Body) > TBiggest * 0.9 THEN 1 ELSE 0 END) AS T90Percent
, [TBiggest] = MAX(TBiggest)
FROMMyEmailTable
CROSS APPLY
(
SELECTMAX(LEN(Body)) AS TBiggest
FROMMyEmailTable
) AS X
Then if a relatively number of records are significantly larger than average then worth having a look at why that is.
Check the biggest ones anyway
SELECT TOP 100 Body
FROMMyEmailTable
ORDER BY LEN(Body) DESC
March 7, 2021 at 8:17 pm
And, using triggers, you can make the COMPRESSion and DECOMPRESSion transparent to the developers
Interesting... How do you make DECOMPRESSion transparent to SELECTs (for example) using triggers?
--Jeff Moden
Change is inevitable... Change for the better is not.
March 7, 2021 at 10:14 pm
I agree with the suggestions from Brian. Also, I was wondering Table Compression can help to save space. I've noticed up-to 80-85% of storage space savings with Table Compression. It can be given a try.
March 8, 2021 at 6:56 am
This was removed by the editor as SPAM
March 8, 2021 at 3:19 pm
ScottPletcher wrote:And, using triggers, you can make the COMPRESSion and DECOMPRESSion transparent to the developers
Interesting... How do you make DECOMPRESSion transparent to SELECTs (for example) using triggers?
The DECOMPRESS is transparent using a view. Sorry, I was in a hurry and conflated the trigger and view.
The original table is renamed. You create a view that matches the original table name. That view handles the DECOMPRESS and an INSTEAD OF INSERT trigger on that view transparently handles the COMPRESS on the INSERTs.
The downside, as I noted, is that after the INSERT, any $IDENTITY value is no longer directly available to the code that did the INSERT because the INSERT is now down in a trigger, which has its own scope. If there's no $IDENTITY column, or the INSERTing code doesn't need to reference it directly after the INSERT, no problem. If the INSERTing code does need ident, you'd have to adjust that code to not rely on SCOPE_IDENTITY(), etc.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
March 8, 2021 at 3:31 pm
Jeff Moden wrote:ScottPletcher wrote:And, using triggers, you can make the COMPRESSion and DECOMPRESSion transparent to the developers
Interesting... How do you make DECOMPRESSion transparent to SELECTs (for example) using triggers?
The DECOMPRESS is transparent using a view. Sorry, I was in a hurry and conflated the trigger and view.
The original table is renamed. You create a view that matches the original table name. That view handles the DECOMPRESS and an INSTEAD OF INSERT trigger on that view transparently handles the COMPRESS on the INSERTs.
The downside, as I noted, is that after the INSERT, any $IDENTITY value is no longer directly available to the code that did the INSERT because the INSERT is now down in a trigger, which has its own scope. If there's no $IDENTITY column, or the INSERTing code doesn't need to reference it directly after the INSERT, no problem. If the INSERTing code does need ident, you'd have to adjust that code to not rely on SCOPE_IDENTITY(), etc.
Thanks, Scott. I've done that type of thing in the past and, you're correct... is totally transparent to the user except for the $IDENTITY thing you mentioned (haven't tried a SCOPE_IDENTITY() in an INSERT trigger but that would likely work as a hack.). I was just confused for the reason you stated... no view was mentioned and thought you found something different. 😀
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply