Documents in VLDB

Question

Documents in VLDB

IkeRoss

SSC Enthusiast

Points: 146
More actions
July 31, 2007 at 6:12 am

#179923

Hello All,

I have been experimenting with SQL Server 2005 partitions. I loaded a terabyte of information into 2 tables. The first holds the document information and the second holds the actual binary document (in this case pdf). Most of the documents are about 1 megabyte in size, but the largest is 212 megabytes.

SQL Server has no problem storing the blobs. The problem occurs when I attempt to get the data.

I did some quick tests to test how fast I could pull the documents out. The largest took about 24 seconds. The 1 meg documents are sub-second.

Here is how the 212 meg doc breaks down:

Time to load datatable: 18.79 seconds
Time to load byte array: 3.84 seconds
Time to Write and open document: 0.01 seconds

If I access the file from a file server, the time is 0.04 seconds to begin showing the document.

As you can see, the longest time period is related to retrieving the data from SQL, and it is much slower that launching it from disk across the network. (note: the sql server and file server used to test are next to each other).

My question is, how can I speed up the access from SQL Server? I believe the keys are "partition aligned". Any suggestions would be appreciated.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

IkeRoss SSC Enthusiast Points: 146 More actions · Answer 1

Here are the table definitions. The table tblDocInfo has DocInfoID, which is the FK in tblDocs. The partitions are defined by this key.

Info table:

CREATE TABLE [dbo].[tblDocInfo](

[DocInfoID] [decimal](18, 0) IDENTITY(1,1) NOT NULL,

[DocName] [nvarchar](150) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

[DocSize] [decimal](18, 0) NULL,

[FKID] [decimal](18, 0) NULL,

[FKTableName] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

[FKDBName] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

[FKDBServer] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

[OriginalLocationPath] [nvarchar](500) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

[CreationDate] [datetime] NOT NULL CONSTRAINT [DF_tblDocInfo_CreationDate] DEFAULT (getdate()),

[UserID] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

[Computer] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

[LastModifiedDate] [datetime] NULL CONSTRAINT [DF_tblDocInfo_LastModifiedDate] DEFAULT (getdate()),

CONSTRAINT [PK_tblDocInfo] PRIMARY KEY CLUSTERED

(

[DocInfoID] ASC

)WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]

) ON [PRIMARY]

GO

ALTER TABLE [dbo].[tblDocInfo] WITH CHECK ADD CONSTRAINT [FK_tblDocInfo_tblDocInfo] FOREIGN KEY([DocInfoID])

REFERENCES [dbo].[tblDocInfo] ([DocInfoID])

GO

ALTER TABLE [dbo].[tblDocInfo] CHECK CONSTRAINT [FK_tblDocInfo_tblDocInfo]

Binary Table:

CREATE TABLE [dbo].[tblDocs](

[DocID] [decimal](18, 0) IDENTITY(1,1) NOT NULL,

[DocInfoID] [decimal](18, 0) NOT NULL,

[Doc] [image] NULL,

CONSTRAINT [PK_tblDocs] PRIMARY KEY CLUSTERED

(

[DocInfoID] ASC

)WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [partscheme]([DocInfoID])

) ON [partscheme]([DocInfoID])

GO

ALTER TABLE [dbo].[tblDocs] WITH CHECK ADD CONSTRAINT [FK_tblDocs_tblDocInfo] FOREIGN KEY([DocInfoID])

REFERENCES [dbo].[tblDocInfo] ([DocInfoID])

GO

ALTER TABLE [dbo].[tblDocs] CHECK CONSTRAINT [FK_tblDocs_tblDocInfo]

IkeRoss SSC Enthusiast Points: 146 More actions · Answer 2

Here are the filegroups:

Name FileID FileName FileGroup Size Maxsize Growth Usage

Doc_main 1 e:\mssqldata\Doc.mdf PRIMARY 116736 KB Unlimited 1024 KB data only

Doc_Log 2 f:\MSSQLDataLog\Doc_log.ndf NULL 2376256 KB 2147483648 KB 10% log only

FG_default 3 e:\mssqldata\fg.ndf FG 5120 KB Unlimited 1024 KB data only

FG1_dat 4 e:\mssqldata\fg1.ndf FG1 136236032 KB Unlimited 1024 KB data only

FG2_dat 5 e:\mssqldata\fg2.ndf FG2 110420992 KB Unlimited 1024 KB data only

FG3_dat 6 e:\mssqldata\fg3.ndf FG3 110420992 KB Unlimited 1024 KB data only

FG4_dat 7 e:\mssqldata\fg4.ndf FG4 110420992 KB Unlimited 1024 KB data only

FG5_dat 8 e:\mssqldata\fg5.ndf FG5 110420992 KB Unlimited 1024 KB data only

FG6_dat 9 e:\mssqldata\fg6.ndf FG6 110420992 KB Unlimited 1024 KB data only

FG7_dat 10 e:\mssqldata\fg7.ndf FG7 110420992 KB Unlimited 1024 KB data only

FG8_dat 11 e:\mssqldata\fg8.ndf FG8 103505920 KB Unlimited 1024 KB data only

FG9_dat 12 e:\mssqldata\fg9.ndf FG9 25600 KB Unlimited 1024 KB data only

FG10_dat 13 e:\mssqldata\fg10.ndf FG10 25600 KB Unlimited 1024 KB data only

FG11_dat 14 e:\mssqldata\fg11.ndf FG11 25600 KB Unlimited 1024 KB data only

FG12_dat 15 e:\mssqldata\fg12.ndf FG12 25600 KB Unlimited 1024 KB data only

FG13_dat 16 e:\mssqldata\fg13.ndf FG13 25600 KB Unlimited 1024 KB data only

FG14_dat 17 e:\mssqldata\fg14.ndf FG14 25600 KB Unlimited 1024 KB data only

FG15_dat 18 e:\mssqldata\fg15.ndf FG15 25600 KB Unlimited 1024 KB data only

FG16_dat 19 e:\mssqldata\fg16.ndf FG16 25600 KB Unlimited 1024 KB data only

FG17_dat 20 e:\mssqldata\fg17.ndf FG17 25600 KB Unlimited 1024 KB data only

FG18_dat 21 e:\mssqldata\fg18.ndf FG18 25600 KB Unlimited 1024 KB data only

FG19_dat 22 e:\mssqldata\fg19.ndf FG19 25600 KB Unlimited 1024 KB data only

FG20_dat 23 e:\mssqldata\fg20.ndf FG20 25600 KB Unlimited 1024 KB data only

FG21_dat 24 e:\mssqldata\fg21.ndf FG21 25600 KB Unlimited 1024 KB data only

FG22_dat 25 e:\mssqldata\fg22.ndf FG22 25600 KB Unlimited 1024 KB data only

FG23_dat 26 e:\mssqldata\fg23.ndf FG23 25600 KB Unlimited 1024 KB data only

FG24_dat 27 e:\mssqldata\fg24.ndf FG24 25600 KB Unlimited 1024 KB data only

FG25_dat 28 e:\mssqldata\fg25.ndf FG25 25600 KB Unlimited 1024 KB data only

FG26_dat 29 e:\mssqldata\fg26.ndf FG26 25600 KB Unlimited 1024 KB data only

FG27_dat 30 e:\mssqldata\fg27.ndf FG27 25600 KB Unlimited 1024 KB data only

FG28_dat 31 e:\mssqldata\fg28.ndf FG28 25600 KB Unlimited 1024 KB data only

FG29_dat 32 e:\mssqldata\fg29.ndf FG29 25600 KB Unlimited 1024 KB data only

FG30_dat 33 e:\mssqldata\fg30.ndf FG30 25600 KB Unlimited 1024 KB data only

FG31_dat 34 e:\mssqldata\fg31.ndf FG31 25600 KB Unlimited 1024 KB data only

FG32_dat 35 e:\mssqldata\fg32.ndf FG32 25600 KB Unlimited 1024 KB data only

Here is how the data is spread across the partitions (only 8 partitions in use)

Partition No Min ID Max ID Rows in Partition

1 1 100000 99997

2 100001 200000 100000

3 200001 300000 100000

4 300001 400000 100000

5 400001 500000 100000

6 500001 600000 100000

7 600001 700000 100000

8 700001 793738 93737

Crispin Proctor SSCoach Points: 18161 More actions · Answer 3

As a side note, your growth is, in my opinion, to small. You will end up with some bad fragmentation as well as SQL waiting to grow the file during inserts. Change it to a large number. Say 5-10GB.

As to your problem, I have not worked with BLOBs to the size you are talking about but it would be slower than getting it of the FS. SQL has much more to do to reassemble the blob into a stream and stream it back where as the FS just says here you go...

For the amounts you are talking, I would even go so far as to say why store them in SQL? Even more so as your table only contains the blob and method of linking it.

There have been many discussions on this subject and generally, it tends towards the FS. Not always though.

There are many reasons as to why the retrieval is slow. How's your memory? Disk bottle neck? Network? Your test with opening it from the FS, where was the file? Local to test or on the SQL server?

On out network here, it takes around 5 seconds to copy a 7MB file across to another machine but that's because we are running TCP/IP over WS (WS: Wet string)

How are you opening the document on your end? Could it be that the application waits for EOF before rendering it while opening from the FS starts to render it as it gets the stream?

Cheers,CrispinI can't die, there are too many people who still have to meet me!It's not a bug, SQL just misunderstood me!

IkeRoss SSC Enthusiast Points: 146 More actions · Answer 4

Thanks for the input Crispin. I was hoping I was missing something, but your comment about "reassemble the blob" was my opinion also.

I wanted to use SQL, because, In my experience with lots of files, we get many "disconnects" where users move, rename, delete the source. Most only have readonly, but we have some that have permissions by necessity. My hope was to spend effort putting them in SQL instead of building processes to detect missing files, manage permissions, ect.

As to your slow retrieval questions...the FS and SQL Server are identical machines side by side on the same subnet. In both scenarios, I used Adobe to open the file...the SQL blob is written on the client side the launched using process.start in VB. The FS is just launched using the process.start. But as you saw in my original post, most of the time was dedicated to extracting the blob from SQL server.

Thanks again...

Ike

Kurt W. Zimmerman SSCrazy Eights Points: 8676 More actions · Answer 5

Not really knowing all of the specs to your application but for what reason are you storing the image as a blob in the DB? Is it a security issue? What if you were to keep the image in a file and keep the UNC reference in the DB?

Just trying to think aloud here...

One of my web-based apps which is one of my larger databases does exactly this, keep the image (in this case a .jpg) outside the server and I keep the reference in the database.

Kurt

Kurt W. Zimmerman
SR DBA
Lefrak Organization
New York, NY

http://www.linkedin.com/in/kurtwzimmerman