July 31, 2007 at 6:12 am
Hello All,
I have been experimenting with SQL Server 2005 partitions. I loaded a terabyte of information into 2 tables. The first holds the document information and the second holds the actual binary document (in this case pdf). Most of the documents are about 1 megabyte in size, but the largest is 212 megabytes.
SQL Server has no problem storing the blobs. The problem occurs when I attempt to get the data.
I did some quick tests to test how fast I could pull the documents out. The largest took about 24 seconds. The 1 meg documents are sub-second.
Here is how the 212 meg doc breaks down:
Time to load datatable: 18.79 seconds
Time to load byte array: 3.84 seconds
Time to Write and open document: 0.01 seconds
If I access the file from a file server, the time is 0.04 seconds to begin showing the document.
As you can see, the longest time period is related to retrieving the data from SQL, and it is much slower that launching it from disk across the network. (note: the sql server and file server used to test are next to each other).
My question is, how can I speed up the access from SQL Server? I believe the keys are "partition aligned". Any suggestions would be appreciated.
July 31, 2007 at 6:13 am
Here are the table definitions. The table tblDocInfo has DocInfoID, which is the FK in tblDocs. The partitions are defined by this key.
Info table:
CREATE TABLE [dbo].[tblDocInfo](
[DocInfoID] [decimal](18, 0) IDENTITY(1,1) NOT NULL,
[DocName] [nvarchar](150) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[DocSize] [decimal](18, 0) NULL,
[FKID] [decimal](18, 0) NULL,
[FKTableName] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[FKDBName] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[FKDBServer] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[OriginalLocationPath] [nvarchar](500) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[CreationDate] [datetime] NOT NULL CONSTRAINT [DF_tblDocInfo_CreationDate] DEFAULT (getdate()),
[UserID] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[Computer] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[LastModifiedDate] [datetime] NULL CONSTRAINT [DF_tblDocInfo_LastModifiedDate] DEFAULT (getdate()),
CONSTRAINT [PK_tblDocInfo] PRIMARY KEY CLUSTERED
(
[DocInfoID] ASC
)WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[tblDocInfo] WITH CHECK ADD CONSTRAINT [FK_tblDocInfo_tblDocInfo] FOREIGN KEY([DocInfoID])
REFERENCES [dbo].[tblDocInfo] ([DocInfoID])
GO
ALTER TABLE [dbo].[tblDocInfo] CHECK CONSTRAINT [FK_tblDocInfo_tblDocInfo]
Binary Table:
CREATE TABLE [dbo].[tblDocs](
[DocID] [decimal](18, 0) IDENTITY(1,1) NOT NULL,
[DocInfoID] [decimal](18, 0) NOT NULL,
[Doc] [image] NULL,
CONSTRAINT [PK_tblDocs] PRIMARY KEY CLUSTERED
(
[DocInfoID] ASC
)WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [partscheme]([DocInfoID])
) ON [partscheme]([DocInfoID])
GO
ALTER TABLE [dbo].[tblDocs] WITH CHECK ADD CONSTRAINT [FK_tblDocs_tblDocInfo] FOREIGN KEY([DocInfoID])
REFERENCES [dbo].[tblDocInfo] ([DocInfoID])
GO
ALTER TABLE [dbo].[tblDocs] CHECK CONSTRAINT [FK_tblDocs_tblDocInfo]
July 31, 2007 at 6:14 am
Here are the filegroups:
Name FileID FileName FileGroup Size Maxsize Growth Usage
Doc_main 1 e:\mssqldata\Doc.mdf PRIMARY 116736 KB Unlimited 1024 KB data only
Doc_Log 2 f:\MSSQLDataLog\Doc_log.ndf NULL 2376256 KB 2147483648 KB 10% log only
FG_default 3 e:\mssqldata\fg.ndf FG 5120 KB Unlimited 1024 KB data only
FG1_dat 4 e:\mssqldata\fg1.ndf FG1 136236032 KB Unlimited 1024 KB data only
FG2_dat 5 e:\mssqldata\fg2.ndf FG2 110420992 KB Unlimited 1024 KB data only
FG3_dat 6 e:\mssqldata\fg3.ndf FG3 110420992 KB Unlimited 1024 KB data only
FG4_dat 7 e:\mssqldata\fg4.ndf FG4 110420992 KB Unlimited 1024 KB data only
FG5_dat 8 e:\mssqldata\fg5.ndf FG5 110420992 KB Unlimited 1024 KB data only
FG6_dat 9 e:\mssqldata\fg6.ndf FG6 110420992 KB Unlimited 1024 KB data only
FG7_dat 10 e:\mssqldata\fg7.ndf FG7 110420992 KB Unlimited 1024 KB data only
FG8_dat 11 e:\mssqldata\fg8.ndf FG8 103505920 KB Unlimited 1024 KB data only
FG9_dat 12 e:\mssqldata\fg9.ndf FG9 25600 KB Unlimited 1024 KB data only
FG10_dat 13 e:\mssqldata\fg10.ndf FG10 25600 KB Unlimited 1024 KB data only
FG11_dat 14 e:\mssqldata\fg11.ndf FG11 25600 KB Unlimited 1024 KB data only
FG12_dat 15 e:\mssqldata\fg12.ndf FG12 25600 KB Unlimited 1024 KB data only
FG13_dat 16 e:\mssqldata\fg13.ndf FG13 25600 KB Unlimited 1024 KB data only
FG14_dat 17 e:\mssqldata\fg14.ndf FG14 25600 KB Unlimited 1024 KB data only
FG15_dat 18 e:\mssqldata\fg15.ndf FG15 25600 KB Unlimited 1024 KB data only
FG16_dat 19 e:\mssqldata\fg16.ndf FG16 25600 KB Unlimited 1024 KB data only
FG17_dat 20 e:\mssqldata\fg17.ndf FG17 25600 KB Unlimited 1024 KB data only
FG18_dat 21 e:\mssqldata\fg18.ndf FG18 25600 KB Unlimited 1024 KB data only
FG19_dat 22 e:\mssqldata\fg19.ndf FG19 25600 KB Unlimited 1024 KB data only
FG20_dat 23 e:\mssqldata\fg20.ndf FG20 25600 KB Unlimited 1024 KB data only
FG21_dat 24 e:\mssqldata\fg21.ndf FG21 25600 KB Unlimited 1024 KB data only
FG22_dat 25 e:\mssqldata\fg22.ndf FG22 25600 KB Unlimited 1024 KB data only
FG23_dat 26 e:\mssqldata\fg23.ndf FG23 25600 KB Unlimited 1024 KB data only
FG24_dat 27 e:\mssqldata\fg24.ndf FG24 25600 KB Unlimited 1024 KB data only
FG25_dat 28 e:\mssqldata\fg25.ndf FG25 25600 KB Unlimited 1024 KB data only
FG26_dat 29 e:\mssqldata\fg26.ndf FG26 25600 KB Unlimited 1024 KB data only
FG27_dat 30 e:\mssqldata\fg27.ndf FG27 25600 KB Unlimited 1024 KB data only
FG28_dat 31 e:\mssqldata\fg28.ndf FG28 25600 KB Unlimited 1024 KB data only
FG29_dat 32 e:\mssqldata\fg29.ndf FG29 25600 KB Unlimited 1024 KB data only
FG30_dat 33 e:\mssqldata\fg30.ndf FG30 25600 KB Unlimited 1024 KB data only
FG31_dat 34 e:\mssqldata\fg31.ndf FG31 25600 KB Unlimited 1024 KB data only
FG32_dat 35 e:\mssqldata\fg32.ndf FG32 25600 KB Unlimited 1024 KB data only
Here is how the data is spread across the partitions (only 8 partitions in use)
Partition No Min ID Max ID Rows in Partition
1 1 100000 99997
2 100001 200000 100000
3 200001 300000 100000
4 300001 400000 100000
5 400001 500000 100000
6 500001 600000 100000
7 600001 700000 100000
8 700001 793738 93737
July 31, 2007 at 7:29 am
As a side note, your growth is, in my opinion, to small. You will end up with some bad fragmentation as well as SQL waiting to grow the file during inserts. Change it to a large number. Say 5-10GB.
As to your problem, I have not worked with BLOBs to the size you are talking about but it would be slower than getting it of the FS. SQL has much more to do to reassemble the blob into a stream and stream it back where as the FS just says here you go...
For the amounts you are talking, I would even go so far as to say why store them in SQL? Even more so as your table only contains the blob and method of linking it.
There have been many discussions on this subject and generally, it tends towards the FS. Not always though.
There are many reasons as to why the retrieval is slow. How's your memory? Disk bottle neck? Network? Your test with opening it from the FS, where was the file? Local to test or on the SQL server?
On out network here, it takes around 5 seconds to copy a 7MB file across to another machine but that's because we are running TCP/IP over WS (WS: Wet string)
How are you opening the document on your end? Could it be that the application waits for EOF before rendering it while opening from the FS starts to render it as it gets the stream?
Cheers,CrispinI can't die, there are too many people who still have to meet me!It's not a bug, SQL just misunderstood me!
July 31, 2007 at 8:09 am
Thanks for the input Crispin. I was hoping I was missing something, but your comment about "reassemble the blob" was my opinion also.
I wanted to use SQL, because, In my experience with lots of files, we get many "disconnects" where users move, rename, delete the source. Most only have readonly, but we have some that have permissions by necessity. My hope was to spend effort putting them in SQL instead of building processes to detect missing files, manage permissions, ect.
As to your slow retrieval questions...the FS and SQL Server are identical machines side by side on the same subnet. In both scenarios, I used Adobe to open the file...the SQL blob is written on the client side the launched using process.start in VB. The FS is just launched using the process.start. But as you saw in my original post, most of the time was dedicated to extracting the blob from SQL server.
Thanks again...
Ike
August 2, 2007 at 9:27 am
Not really knowing all of the specs to your application but for what reason are you storing the image as a blob in the DB? Is it a security issue? What if you were to keep the image in a file and keep the UNC reference in the DB?
Just trying to think aloud here...
One of my web-based apps which is one of my larger databases does exactly this, keep the image (in this case a .jpg) outside the server and I keep the reference in the database.
Kurt
Kurt W. Zimmerman
SR DBA
Lefrak Organization
New York, NY
http://www.linkedin.com/in/kurtwzimmerman
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply