Documents in VLDB

  • Hello All,

     

    I have been experimenting with SQL Server 2005 partitions. I loaded a terabyte of information into 2 tables. The first holds the document information and the second holds the actual binary document (in this case pdf). Most of the documents are about 1 megabyte in size, but the largest is 212 megabytes.

     

    SQL Server has no problem storing the blobs. The problem occurs when I attempt to get the data.

     

    I did some quick tests to test how fast I could pull the documents out. The largest took about 24 seconds. The 1 meg documents are sub-second.

     

    Here is how the 212 meg doc breaks down:

     

    Time to load datatable: 18.79 seconds

    Time to load byte array: 3.84 seconds

    Time to Write and open document: 0.01 seconds

     

    If I access the file from a file server, the time is 0.04 seconds to begin showing the document.

     

    As you can see, the longest time period is related to retrieving the data from SQL, and it is much slower that launching it from disk across the network. (note: the sql server and file server used to test are next to each other).

     

    My question is, how can I speed up the access from SQL Server? I believe the keys are "partition aligned". Any suggestions would be appreciated.

  • Here are the table definitions. The table tblDocInfo has DocInfoID, which is the FK in tblDocs. The partitions are defined by this key.

     

    Info table:

     

    CREATE TABLE [dbo].[tblDocInfo](

    [DocInfoID] [decimal](18, 0) IDENTITY(1,1) NOT NULL,

    [DocName] [nvarchar](150) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [DocSize] [decimal](18, 0) NULL,

    [FKID] [decimal](18, 0) NULL,

    [FKTableName] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [FKDBName] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [FKDBServer] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [OriginalLocationPath] [nvarchar](500) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [CreationDate] [datetime] NOT NULL CONSTRAINT [DF_tblDocInfo_CreationDate] DEFAULT (getdate()),

    [UserID] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [Computer] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [LastModifiedDate] [datetime] NULL CONSTRAINT [DF_tblDocInfo_LastModifiedDate] DEFAULT (getdate()),

    CONSTRAINT [PK_tblDocInfo] PRIMARY KEY CLUSTERED

    (

    [DocInfoID] ASC

    )WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    ALTER TABLE [dbo].[tblDocInfo] WITH CHECK ADD CONSTRAINT [FK_tblDocInfo_tblDocInfo] FOREIGN KEY([DocInfoID])

    REFERENCES [dbo].[tblDocInfo] ([DocInfoID])

    GO

    ALTER TABLE [dbo].[tblDocInfo] CHECK CONSTRAINT [FK_tblDocInfo_tblDocInfo]

     

     

    Binary Table:

     

    CREATE TABLE [dbo].[tblDocs](

    [DocID] [decimal](18, 0) IDENTITY(1,1) NOT NULL,

    [DocInfoID] [decimal](18, 0) NOT NULL,

    [Doc] [image] NULL,

    CONSTRAINT [PK_tblDocs] PRIMARY KEY CLUSTERED

    (

    [DocInfoID] ASC

    )WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [partscheme]([DocInfoID])

    ) ON [partscheme]([DocInfoID])

    GO

    ALTER TABLE [dbo].[tblDocs] WITH CHECK ADD CONSTRAINT [FK_tblDocs_tblDocInfo] FOREIGN KEY([DocInfoID])

    REFERENCES [dbo].[tblDocInfo] ([DocInfoID])

    GO

    ALTER TABLE [dbo].[tblDocs] CHECK CONSTRAINT [FK_tblDocs_tblDocInfo]

     

  • Here are the filegroups:

     

    Name FileID FileName FileGroup Size Maxsize Growth Usage

    Doc_main 1 e:\mssqldata\Doc.mdf PRIMARY 116736 KB Unlimited 1024 KB data only

    Doc_Log  2 f:\MSSQLDataLog\Doc_log.ndf NULL 2376256 KB 2147483648 KB 10% log only

    FG_default 3 e:\mssqldata\fg.ndf FG 5120 KB Unlimited 1024 KB data only

    FG1_dat  4 e:\mssqldata\fg1.ndf FG1 136236032 KB Unlimited 1024 KB data only

    FG2_dat  5 e:\mssqldata\fg2.ndf FG2 110420992 KB Unlimited 1024 KB data only

    FG3_dat  6 e:\mssqldata\fg3.ndf FG3 110420992 KB Unlimited 1024 KB data only

    FG4_dat  7 e:\mssqldata\fg4.ndf FG4 110420992 KB Unlimited 1024 KB data only

    FG5_dat  8 e:\mssqldata\fg5.ndf FG5 110420992 KB Unlimited 1024 KB data only

    FG6_dat  9 e:\mssqldata\fg6.ndf FG6 110420992 KB Unlimited 1024 KB data only

    FG7_dat  10 e:\mssqldata\fg7.ndf FG7 110420992 KB Unlimited 1024 KB data only

    FG8_dat  11 e:\mssqldata\fg8.ndf FG8 103505920 KB Unlimited 1024 KB data only

    FG9_dat  12 e:\mssqldata\fg9.ndf FG9 25600 KB Unlimited 1024 KB data only

    FG10_dat 13 e:\mssqldata\fg10.ndf FG10 25600 KB Unlimited 1024 KB data only

    FG11_dat 14 e:\mssqldata\fg11.ndf FG11 25600 KB Unlimited 1024 KB data only

    FG12_dat 15 e:\mssqldata\fg12.ndf FG12 25600 KB Unlimited 1024 KB data only

    FG13_dat 16 e:\mssqldata\fg13.ndf FG13 25600 KB Unlimited 1024 KB data only

    FG14_dat 17 e:\mssqldata\fg14.ndf FG14 25600 KB Unlimited 1024 KB data only

    FG15_dat 18 e:\mssqldata\fg15.ndf FG15 25600 KB Unlimited 1024 KB data only

    FG16_dat 19 e:\mssqldata\fg16.ndf FG16 25600 KB Unlimited 1024 KB data only

    FG17_dat 20 e:\mssqldata\fg17.ndf FG17 25600 KB Unlimited 1024 KB data only

    FG18_dat 21 e:\mssqldata\fg18.ndf FG18 25600 KB Unlimited 1024 KB data only

    FG19_dat 22 e:\mssqldata\fg19.ndf FG19 25600 KB Unlimited 1024 KB data only

    FG20_dat 23 e:\mssqldata\fg20.ndf FG20 25600 KB Unlimited 1024 KB data only

    FG21_dat 24 e:\mssqldata\fg21.ndf FG21 25600 KB Unlimited 1024 KB data only

    FG22_dat 25 e:\mssqldata\fg22.ndf FG22 25600 KB Unlimited 1024 KB data only

    FG23_dat 26 e:\mssqldata\fg23.ndf FG23 25600 KB Unlimited 1024 KB data only

    FG24_dat 27 e:\mssqldata\fg24.ndf FG24 25600 KB Unlimited 1024 KB data only

    FG25_dat 28 e:\mssqldata\fg25.ndf FG25 25600 KB Unlimited 1024 KB data only

    FG26_dat 29 e:\mssqldata\fg26.ndf FG26 25600 KB Unlimited 1024 KB data only

    FG27_dat 30 e:\mssqldata\fg27.ndf FG27 25600 KB Unlimited 1024 KB data only

    FG28_dat 31 e:\mssqldata\fg28.ndf FG28 25600 KB Unlimited 1024 KB data only

    FG29_dat 32 e:\mssqldata\fg29.ndf FG29 25600 KB Unlimited 1024 KB data only

    FG30_dat 33 e:\mssqldata\fg30.ndf FG30 25600 KB Unlimited 1024 KB data only

    FG31_dat 34 e:\mssqldata\fg31.ndf FG31 25600 KB Unlimited 1024 KB data only

    FG32_dat 35 e:\mssqldata\fg32.ndf FG32 25600 KB Unlimited 1024 KB data only

     

    Here is how the data is spread across the partitions (only 8 partitions in use)

     

    Partition No Min ID Max ID Rows in Partition

    1 1 100000 99997

    2 100001 200000 100000

    3 200001 300000 100000

    4 300001 400000 100000

    5 400001 500000 100000

    6 500001 600000 100000

    7 600001 700000 100000

    8 700001 793738 93737

     

  • As a side note, your growth is, in my opinion, to small. You will end up with some bad fragmentation as well as SQL waiting to grow the file during inserts. Change it to a large number. Say 5-10GB.

    As to your problem, I have not worked with BLOBs to the size you are talking about but it would be slower than getting it of the FS. SQL has much more to do to reassemble the blob into a stream and stream it back where as the FS just says here you go...

    For the amounts you are talking, I would even go so far as to say why store them in SQL? Even more so as your table only contains the blob and method of linking it.

    There have been many discussions on this subject and generally, it tends towards the FS. Not always though.

    There are many reasons as to why the retrieval is slow. How's your memory? Disk bottle neck? Network? Your test with opening it from the FS, where was the file? Local to test or on the SQL server?

    On out network here, it takes around 5 seconds to copy a 7MB file across to another machine but that's because we are running TCP/IP over WS (WS: Wet string)

    How are you opening the document on your end? Could it be that the application waits for EOF before rendering it while opening from the FS starts to render it as it gets the stream?

    Cheers,CrispinI can't die, there are too many people who still have to meet me!It's not a bug, SQL just misunderstood me!

  • Thanks for the input Crispin. I was hoping I was missing something, but your comment about "reassemble the blob" was my opinion also.

    I wanted to use SQL, because, In my experience with lots of files, we get many "disconnects" where users move, rename, delete the source. Most only have readonly, but we have some that have permissions by necessity. My hope was to spend effort putting them in SQL instead of building processes to detect missing files, manage permissions, ect.

    As to your slow retrieval questions...the FS and SQL Server are identical machines side by side on the same subnet. In both scenarios, I used Adobe to open the file...the SQL blob is written on the client side the launched using process.start in VB. The FS is just launched using the process.start. But as you saw in my original post, most of the time was dedicated to extracting the blob from SQL server.

    Thanks again...

    Ike

  • Not really knowing all of the specs to your application but for what reason are you storing the image as a blob in the DB? Is it a security issue? What if you were to keep the image in a file and keep the UNC reference in the DB?

    Just trying to think aloud here...

    One of my web-based apps which is one of my larger databases does exactly this, keep the image (in this case a .jpg) outside the server and I keep the reference in the database.

    Kurt

    Kurt W. Zimmerman
    SR DBA
    Lefrak Organization
    New York, NY

    http://www.linkedin.com/in/kurtwzimmerman

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply