Questions on filetables

Question

Questions on filetables

N_Muller

SSCarpal Tunnel

Points: 4037
More actions
November 3, 2016 at 11:10 am

#331509

Currently we have a process that dumps 20 million records into regular tables in database every 15 minutes. The data is transitory and we only need to keep it for 24 hours. Since the data is transitory and it is easily recomputed, we created a database in simple recovery mode to prevent logged operations, and that's where we store this data. We also have our main database which is in full recovery mode. We created 96 tables in database, one for each 15-minute interval of the day where we store the data for the specific interval. The data is written to database by a C++ applications:
1. Truncate the appropriate table (one of 96)
2. Drop the indexes and PK on the table
3. Write the data from the C++ app
4. Recreate the PK and indexes
The data structure is fairly constant: There are roughly 2,000 widgets and about 10,000 locations. The record length for each widget/location is about 30 bytes. We generally query the data for one specific widget in all locations. Rarely we query the widgets for a specific location. The table is written to database orderly, widget-1 for all locations, widget-2 for all locations, etc. The PK is clustered on widget, location. We also have a non-clustered index on location.
The requirements have changed and we now need to store one year's worth of data. I have no intention of creating 35,000+ tables to store the 15-minute data for one year, and I'm looking into using filetables for this purpose. The C++ app will write the data into filetables and we will write C# CLR functions to read the data from the filetables. We will certainly know which files to write to and read from, just like we do today for the 96 tables we write to and query from. The C# CLR knows the structure of the data and simply works with offsets and number of bytes to read from the files.
The filetables are faster to read and write, and require much less space as I don't have to index the tables, as long as the storage sequence is guaranteed by the C++ application that writes the files and known to the C# CLR function that reads the files.
I have several questions regarding filetables. I know that in the end testing, testing and more testing will dictate which way to go with the options below. However, I would appreciate if someone can share his/her experience with situations similar to the one I'm facing. With that, I have a few questions:
1. Are inserts/deleted form filetables logged operations? That is, should the filetables be on my main database (full recovery mode) or non-logged database (simple recovery mode)? We would still be fine doing daily full dumps of the simple-mode database if that's what's needed for performance.
2. What is the overhead of the filetables? For an absolute optimal querying performance, I would write one file per widget and generate 2,000 files every 15 minutes. However that creates roughly 70 million files in one year (2,000 x 96 x 366 ). They won't be all in the same folder in case we want to look at it through Windows Explorer. We will have one folder for each day of the year, and one sub-folder for each 15-minute interval, so that each sub-folder has 2,000 files, one per widget.
3. The other option is to write one file-table with 20M records and the C# reads the file from using offsets and record length. That creates 35,000+ files and we can keep one folder for each day of the year, so that each folder has 96 files.
Option 3 (35,000 files) is more easily manageable than Option 2 (70M files), however Option 2 is definitely faster to read data for all locations for a given widget than Option 3, since Option 3 requires to process offsets and data length. However I'm quite confident that the rare occasions where I have to read one location and all widgets, having each widget on a different file will be more time consuming.
Is there anything else I should be concerned that I haven't thought about?
Thanks in advance.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

sgmunson SSC Guru Points: 110640 More actions · Answer 1

Have you considered using the IN MEMORY option for the most recent data. You may be keeping a year's worth, but perhaps the most frequently read data should be in memory? As to file tables, I have no experience. I'm not sure that just speeding up reads/writes solves any particular problem ... mostly because you only asked for insights, but didn't mention whether read/write speed is an actual problem or not. One thing you can be sure of is that 70 Million files on a Windows box is not a good idea under any circumstances... Hope that helps...

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 2

N_Muller (11/3/2016)
... we created a database in simple recovery mode to prevent logged operations,...

Just a quick note about the snippet from your original post. All transactions are logged even when using the simple recovery model. Some transactions may be fully logged and some minimally logged, but still logged. The difference between the simple recovery model and the bulk logged and full recovery models is that in the simple recovery model the transaction log is truncated (not shrunk) when it is checkpointed.

Jeff Moden SSC Guru Points: 1004341 More actions · Answer 3

I don't know if you've ever read the following but it might give you some hints about why you might not want to drop the Clustered Index prior to loading your tables.

[font="Arial Black"]https://technet.microsoft.com/en-us/library/dd425070(v=sql.100).aspx[/font]

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

N_Muller SSCarpal Tunnel Points: 4037 More actions · Answer 4

Jeff Moden (11/3/2016)
I don't know if you've ever read the following but it might give you some hints about why you might not want to drop the Clustered Index prior to loading your tables.
[font="Arial Black"]https://technet.microsoft.com/en-us/library/dd425070(v=sql.100).aspx[/font]

I have a related, but slightly different problem. A C++ code writes data to database using OLE DB FastInsert and FastCommit. The process is similar:

1. truncate table

2. drop indexes

3. write data (C++ FastInsert and FastCommit)

4. recreate indexes

I tried changing from dropping and recreating the indexes, to disabling and rebuilding the indexes. Now the FastInsert fails. Is there a requirement for FastInsert that one cannot have indexes on the table?

Jeff Moden SSC Guru Points: 1004341 More actions · Answer 5

N_Muller (11/18/2016)
Jeff Moden (11/3/2016)
I don't know if you've ever read the following but it might give you some hints about why you might not want to drop the Clustered Index prior to loading your tables.
[font="Arial Black"]https://technet.microsoft.com/en-us/library/dd425070(v=sql.100).aspx[/font]
I have a related, but slightly different problem. A C++ code writes data to database using OLE DB FastInsert and FastCommit. The process is similar:
1. truncate table
2. drop indexes
3. write data (C++ FastInsert and FastCommit)
4. recreate indexes
I tried changing from dropping and recreating the indexes, to disabling and rebuilding the indexes. Now the FastInsert fails. Is there a requirement for FastInsert that one cannot have indexes on the table?

If you disable the clustered index, you've disabled the table. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

N_Muller SSCarpal Tunnel Points: 4037 More actions · Answer 6

Jeff Moden (11/18/2016)
N_Muller (11/18/2016)
Jeff Moden (11/3/2016)
I don't know if you've ever read the following but it might give you some hints about why you might not want to drop the Clustered Index prior to loading your tables.
[font="Arial Black"]https://technet.microsoft.com/en-us/library/dd425070(v=sql.100).aspx[/font]
I have a related, but slightly different problem. A C++ code writes data to database using OLE DB FastInsert and FastCommit. The process is similar:
1. truncate table
2. drop indexes
3. write data (C++ FastInsert and FastCommit)
4. recreate indexes
I tried changing from dropping and recreating the indexes, to disabling and rebuilding the indexes. Now the FastInsert fails. Is there a requirement for FastInsert that one cannot have indexes on the table?
If you disable the clustered index, you've disabled the table. 😉

Thanks. I just figured that out.