Large objects data type issues

Question

Post reply

Large objects data type issues

bobrooney.81

SSC-Addicted

Points: 468
More actions
November 2, 2020 at 1:39 pm
Go to Answer
#3803532

Hi All,
Have some questions LOB data. One of the table with below schema has reached to 500GB.
Application team is trying to delete some data from this table and it is taking more time.
We aren’t sure what is really happening. The deletes are getting successful but the number of rows are not getting reduced nor space is getting released.
We are on SQL Server 2017 RTM. application team is deleting data based on date range condition.
Other thing we noticed is database is part of AG and it is in FULL recovery model. Also, read_committed_snapshot is ON for this database.
create table test
( c1 bigint not null,
c2 int not null,
c3 ntext,
c4 int,
c5 int,
c6 int,
c7 int,
c8 int,
c9 int,
c10 nvarchar(255),
c11 ntext,
c12 int,
c13 bigint,
c14 ntext,
c15 int
)
Questions
==========
1) What are the options available to clean up LOB data
2) Does lob data stored in single 8kb page or somewhere else?
3) If I am selecting data from disk in what type of pages the LOB data is read into inside sql server memory?
4) Why LOB row deletions are slow?
5) How can calculate the row size of above table?
6) Even though we are deleting the data, why the row count and space is not reclaimed?
Thanks,
Bob
Jeff Moden

SSC Guru

Points: 1004704
More actions
November 4, 2020 at 2:39 pm
Answer
#3804244

bobrooney.81 wrote:
Hi Jeff,
I still didn't understand in what type of pages will the LOB data get stored ? apart from 8k data page are there any other type of pages to LOB data in buffer pool?
Thanks,
Bob
They are still stored in 8K pages. By default for the MAX datatypes, if they fit "in-row" they will be (very unfortunately) stored as a part of the clustered index in the same 8k pages as the clustered index, which can trap "short rows" (can cause page density to be remarkably low) and causes the average row size to be rather huge and that causes a whole lot more pages to be scanned if a clustered index scan occurs (sometimes hundred or thousands more).
If they go or are forced out of row (the latter being the best option, IMHO), they still live on 8K pages but the storage structure is quite different and it doesn't follow the same rules as a clustered index. For example, the data is not compelled to live on pages designated by the clustered index keys. It's more like a non-structured heterogeneous storage area and it can be quite efficient. It requires virtually no maintenance unless you do a shedload of deletes but the deleted space will be reused. It also (again) keeps all that stuff from actually being stored in either the Clustered Index or Heap, depending on what you have.
--Jeff Moden
RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
Change is inevitable... Change for the better is not.
Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

Jo Pattyn SSC-Dedicated Points: 32632 More actions · Answer 1

Jo Pattyn

SSC-Dedicated

Points: 32632

November 2, 2020 at 3:06 pm

#3803569

7) Why are you still on 2017 RTM

Michael L John One Orange Chip Points: 27079 More actions · Answer 2

Jo Pattyn wrote:

7) Why are you still on 2017 RTM

Same question from me. As well as why are you still using an ntext data type? That was deprecated with SQL 2005.

ntext is stored in the row, which is one reason that it's slow.

You are not going to get any space back until you re-build the clustered index.

Depending upon what you are deleting, you may be better off selecting what you want to keep into a new table, drop the old table, and rename the new table.

Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/

bobrooney.81 SSC-Addicted Points: 468 More actions · Answer 3

It's a POC server for 2017 upgrade. Waiting for a service pack to be released. I think only cumulative updates are available so far.

This database is a old database and now they wanted to upgrade from sql 2012 to sql 2017 as is.

This reply was modified 5 years, 4 months ago by bobrooney.81.

Jo Pattyn SSC-Dedicated Points: 32632 More actions · Answer 4

There are no more servicepacks since SQL 2017 as part of the cloudfirst strategy. Currently at update 22. https://sqlserverbuilds.blogspot.com/

*edit* You might consider SQL Server 2019 is your project is still a POC.

This reply was modified 5 years, 4 months ago by Jo Pattyn.

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 5

Michael L John wrote:

Jo Pattyn wrote:
7) Why are you still on 2017 RTM
Same question from me. As well as why are you still using an ntext data type? That was deprecated with SQL 2005.
ntext is stored in the row, which is one reason that it's slow.
You are not going to get any space back until you re-build the clustered index.
Depending upon what you are deleting, you may be better off selecting what you want to keep into a new table, drop the old table, and rename the new table.

Sorry but that's not correct. NText, Text, and Image are all stored out-of-row by default. The newer MAX and XML lobs are (very unfortunately) stored in-row by default and, of course, only if the fit in-row.

https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sp-tableoption-transact-sql?view=sql-server-ver15

I totally agree on your last two points.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

bobrooney.81 SSC-Addicted Points: 468 More actions · Answer 6

As per my understanding, LOB data will also get stored inside the page. However , if the row data is exceeding 8000 bytes, then the data is stored off the page. Now , I have question. All data inside sql server are stored in 8k pages. For, LOB data how the data is stored in memory? For holding LOB data will there be any different pages inside buffer pool ? How does an LOB column update work? will it bring data from disk and puts into 8k pages in buffer pool , makes the update and then those dirty 8k pages written back to disk or how does I/O work for LOB tables?

Sergiy SSC Guru Points: 110209 More actions · Answer 7

Updating LOB does not remove old value from the storage, it creates a new one and then changes the pointer in the row of the table.

cleaning up happens on some stage later, when the server finds sufficient spare time and resources.

_____________
Code for TallyGenerator

Sergiy SSC Guru Points: 110209 More actions · Answer 8

I must say - storing big documents in SQL Server tables is a bad idea. Storing them inline with transactional updatable data is an extremely bad idea.

if you cannot store documents outside of database in a file storage system (storing only links in the table), at least move them into a “LOB only” table and when you need to update a LOB just add a new one to the LOB table and update LOB_id in the relevant table(s).

_____________
Code for TallyGenerator

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 9

bobrooney.81 wrote:

As per my understanding, LOB data will also get stored inside the page. However , if the row data is exceeding 8000 bytes, then the data is stored off the page. Now , I have question. All data inside sql server are stored in 8k pages. For, LOB data how the data is stored in memory? For holding LOB data will there be any different pages inside buffer pool ? How does an LOB column update work? will it bring data from disk and puts into 8k pages in buffer pool , makes the update and then those dirty 8k pages written back to disk or how does I/O work for LOB tables?

The key is that MS kind of screwed everyone that works with LOBs way back in 2005. The MAX datatypes are great compared to the predecessors but the predecessors had a huge advantage... they did and still do DEFAULT to being stored out-of-row, which is a bit like some (pardon the term) document storage whereas the MAX datatypes default to in-row, if they fit. That leads to all sorts of clustered index woes like trapped short rows and slow queries because they have to load all that in-row junk when you're just trying to read normal data.

Whether you have in-row or out-of-row LOBs, the only way you can "compress" deletions is to REORGANIZE the indexes that contain them (Clustered Indexes and Non-Clustered index that you've made the mistake of INCLUDEing them in).

The best thing to do is to set the table option to force them out of row and then doing an "in-place" update to actually get existing lobs to do the move to out-of-row. I strongly recommend that any new table you build that will contain MAX or XML datatypes to have the out-of-row table option set on its birthday.

The out-of-row stuff actually works quite well. It will reuse space and a bunch of other cool things automatically. The bad part is that of you do deletes because you need space, you have to run REORGANIZE and the nasty bit of code may have to be executed many times before it does the job completely. I did a test with some not-so-big manufactured LOBs and I had to run REORGANZE 10 times to achieve the final result. REBUILDing won't do much at all for you for either form of LOB.

And, yeah... the Buffer Pool takes a hell of a beating if defaulting to in-row LOBs and a lot of them actually fit in row.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Michael L John One Orange Chip Points: 27079 More actions · Answer 10

Jeff Moden wrote:

Michael L John wrote:
Jo Pattyn wrote:
7) Why are you still on 2017 RTM
Same question from me. As well as why are you still using an ntext data type? That was deprecated with SQL 2005.
ntext is stored in the row, which is one reason that it's slow.
You are not going to get any space back until you re-build the clustered index.
Depending upon what you are deleting, you may be better off selecting what you want to keep into a new table, drop the old table, and rename the new table.
Sorry but that's not correct. NText, Text, and Image are all stored out-of-row by default. The newer MAX and XML lobs are (very unfortunately) stored in-row by default and, of course, only if the fit in-row.
https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sp-tableoption-transact-sql?view=sql-server-ver15
I totally agree on your last two points.

Yeah. My bad. My brain and my fingers were not in sync. Thanks for correcting me.

Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 11

Michael L John wrote:

Yeah. My bad. My brain and my fingers were not in sync. Thanks for correcting me.

Heh... NP. At least it wasn't me for a change. 😀 😀 😀

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

bobrooney.81 SSC-Addicted Points: 468 More actions · Answer 12

Hi Jeff,

I still didn't understand in what type of pages will the LOB data get stored ? apart from 8k data page are there any other type of pages to LOB data in buffer pool?

Thanks,

Bob

bobrooney.81 SSC-Addicted Points: 468 More actions · Answer 13

bobrooney.81

SSC-Addicted

Points: 468

November 5, 2020 at 6:49 pm

#3804701

Thanks you Jeff and everyone.