March 16, 2009 at 9:03 am
As a thought experiment, I was curious about how a database server would work if all data was arranged as a flat file, and let's say for this example, all XML based. What would be required to make it efficient in any manner whatsoever?
The only two things I could think of was:
1) Make sure that all data in the flat files stays in order (have another "index" flat file maybe?); and
2) Have as much parallelism involved in said flat files.
In theory, should a flat file that's in "perfect" order be just as quick to retrieve data as a proper RDMBS?
I know this is as Quixotic an endeavor as I can imagine, but as I said, it's only a thought experiment. Knowing what we know about RDBMS's, how could we apply those ideas to a theoretical flat file based database.
I look forward to your thoughts.
Gaby
Gaby________________________________________________________________"In theory, theory and practice are the same. In practice, they are not." - Albert Einstein
March 16, 2009 at 9:23 am
I worked at a place where an expensive outside consultant did a study of our data environment and actually proposed something like this.
The CIO just said that was the stupidest thing he had ever heard, and let him go the next day.
March 16, 2009 at 9:30 am
Michael Valentine Jones (3/16/2009)
I worked at a place where an expensive outside consultant did a study of our data environment and actually proposed something like this.The CIO just said that was the stupidest thing he had ever heard, and let him go the next day.
I'm not proposing this with any sincerity or desire to implement it in real life. It's more like a Rube Goldberg approach to data management. I'm curious what we can throw at a flat file arrangement, and see what effort is involved in trying to make an inefficient system work at all. It's a way to spell out the inherent deficiencies in this type of system. We know flat files don't work well to manage more than a tiny amount of data, so what is the exact cause(s) of failure?
I agree, I'd laugh at anyone who proposed that kind of system with any sincerity. As I said, I'm just curious as to what it could take to get such a system working, thus justifying NOT spending the time and money to do so, and invest your resources in a proper RDBMS.
Gaby________________________________________________________________"In theory, theory and practice are the same. In practice, they are not." - Albert Einstein
March 16, 2009 at 9:46 am
I think it would be hard to justify any effort to develop a flat file database, since there are fairly mature free open source databases available, and free versions of commercial databases, like SQL Server and Oracle.
Basically, any amount of development effort that took more time than downloading and installing one of the free databases would be hard to justify.
March 16, 2009 at 10:10 am
In order to answer this question, you would need a very good definition of what a "Flat File" is. This is because there are many, many things that you can do with a flat file to improve it's use for something like this, but they all raise the question "is this still really a flat file?"
Or, look at it this way: from the standpoint of Windows, how are MDF's and LDF 's not flat files? You have to be able to answer that question very precisely before you can start to judge whether some trick or another is allowable for a "flat file".
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
March 16, 2009 at 10:20 am
in .NET, your Datasource has the ability to .WriteXML(pathToFile) and .ReadXML(pathToFile), so it's very easy to load tables into your dataset form say, SQL server, and then write them to XML; then they can be read (say by a remote or laptop application?) so they have a local copy.
so you have flat files, but once you load them back into your DataSource, they are in Memory/RAM for manipulation.
i don't know if there is any advantage over connecting directly to a SQL server compared to an XML file; i would gets XML might be a bit faster, but I couldn't say for sure.
I happen to use this technique in an app i wrote that compares schemas....so you connect to your current SQL server database, and you can compare to an XML schema to see if your database is compatible to an applications expected schema , say 4.01 vs 4.5 or something like that.
easy to publish an XML file to the web for that purpose.
also publishing things to xml like changes to lookup tables(zip codes, census tracts, etc.)
dunno if that helps, but it's easy to do.
Lowell
March 16, 2009 at 10:22 am
RBarryYoung (3/16/2009)
In order to answer this question, you would need a very good definition of what a "Flat File" is. This is because there are many, many things that you can do with a flat file to improve it's use for something like this, but they all raise the question "is this still really a flat file?"Or, look at it this way: from the standpoint of Windows, how are MDF's and LDF 's not flat files? You have to be able to answer that question very precisely before you can start to judge whether some trick or another is allowable for a "flat file".
From Wikipedia: "Strictly, a flat file database should consist of nothing but data and, if records vary in length, delimiters. More broadly, the term refers to any database which exists in a single file in the form of rows and columns, with no relationships or links between records and fields except the table structure."
I'm working with the definition that a flat file has nothing in it except data and delimiters. So no components that describe how the data should be arranged (e.g. clustered indices), just individual rows of data. The Wikipedia article goes on to mention one potential use of a flat file is that of an intermediary state of data, before it is passed to a dbms. So strictly speaking, I guess XML data is not a flat file because it has more than just the data being passed in it.
Gaby________________________________________________________________"In theory, theory and practice are the same. In practice, they are not." - Albert Einstein
March 16, 2009 at 10:28 am
Lowell (3/16/2009)
in .NET, your Datasource has the ability to .WriteXML(pathToFile) and .ReadXML(pathToFile), so it's very easy to load tables into your dataset form say, SQL server, and then write them to XML; then they can be read (say by a remote or laptop application?) so they have a local copy.so you have flat files, but once you load them back into your DataSource, they are in Memory/RAM for manipulation.
i don't know if there is any advantage over connecting directly to a SQL server compared to an XML file; i would gets XML might be a bit faster, but I couldn't say for sure.
I happen to use this technique in an app i wrote that compares schemas....so you connect to your current SQL server database, and you can compare to an XML schema to see if your database is compatible to an applications expected schema , say 4.01 vs 4.5 or something like that.
easy to publish an XML file to the web for that purpose.
also publishing things to xml like changes to lookup tables(zip codes, census tracts, etc.)
dunno if that helps, but it's easy to do.
Thanks Lowell, it does give some insight into the issue. I guess I was proposing a question akin to something like, "Now I know a unicycle is not the most efficient form of transportation out there, but what can we do to improve it and make it faster and more stable."
Maybe there is no way to answer how to make a flat file better, because it is so limited by it's definition, and any changes to it mean it is no longer, strictly, a flat file, if you put anything in it besides the information you need.
Gaby________________________________________________________________"In theory, theory and practice are the same. In practice, they are not." - Albert Einstein
March 16, 2009 at 11:00 am
when i think "database", i think SQL server, so that multiple connections can get to a source of data and select/insert/update.
i think of a flat file as a snapshot ofdata for local consumption
i can't really envision allowing multiple connections to a flat file without duplicating the work of Access or any real RDBS, where you load everything into memory and start controlling access via spids/connections/etc.
for a web page, local client application, yeah, i could see that an XML file could take a load off of a db server, as Cached data.
Lowell
March 16, 2009 at 11:17 am
Right. By this definition then, XML would not be a flat file because it has metadata in it also (specifically, data definitions and "field" identifiers). What's left is pretty limited Indeed.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
March 16, 2009 at 11:28 am
I think SQL is a flat file database for the engine. Extents, pages, they're laid out like a flat file. Every flat file has some level of meta-data in it, even if it's just row delimiters. SQL mixes a bunch of flat files up.
MySQL uses more a structure like this, which is each table / index in a separate file.
It can work, but I'm not sure that it makes sense for an application to use it unless it's a commercial engine of some sort. There's a lot of testing/optimization that goes into making this work. I think caching on the client is a better solution for some data than a flat file in terms of the app writing a file and reading from it.
March 16, 2009 at 9:51 pm
I think that it's funny as hell that anyone would call an XML file a "flat file". 😉
Before I even knew SQL Server existed and the small company I was working for couldn't afford an RDBMS, I made my own very simple flat file database and compiled code in "Power Basic" (previously marketed as Turbo Basic). Fortunately, all of the "columns" were less than 20 characters, so I built it kind of like the old DBASE III files, but without the headers. I included my own indexing system (very stable and very predictable nasty fast sort) and the binary lookups I built were fast as hell. I even built my own date functions.
It was a lot of fun back then... glad I don't have to do that anymore. 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
March 18, 2009 at 1:40 am
I think the old data files used with BASIC could be considered flat files. They didn't really have anything in them except data, the definitions were in the programs that read the files. Accessing the data tended to be sequential, requiring a search for whatever key you used. Worked OK with a few tens of thousands of records, but not past that.
I wrote quick and dirty Turbo Basic applications that read flat file data downloaded from mainframes, including the pain in the rear overpunch character replacement for data that came out of IBM EBCIDIC systems. I had to write routines to create multiple new files so the data could be imported into Lotus 123 without exceeding the limit on the number of rows per spreadsheet. This was back in the early 90's - it's a lot easier to do this today with excel and text to data.
I also wrote a billing program for a law firm that took time entries and saved them in a file, then creted invoices based on that time, updating the records so I knew which ones had been billed. Thinking about that now, I am glad we don't have to do our own databases any more.
March 18, 2009 at 11:45 am
Why re-invent the wheel ???
This flat-file dbms already exists - it's called LotusNotes !!!
RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."
March 24, 2009 at 10:40 am
Bleh! I remember writing file updates for such sorted layouts in Fortan and Cobol: not much fun! But it is an excellent illustration as to why you can have only one clustered index per table. I'm so glad to be dealing with set-based mechanics!
I'm currently working on a monster flat file: 130,000 records, 3478 bytes per record, 519 fields per record. Totally un-normalized, used by a mainframe Cobol app written when you didn't bother with normalization. So a record type 1 would have fields populated in group A and B, but the rest of the record would be blank.
I normalized it down to 4 tables: 130,000 records in the header table (171 fields), the other three tables have 74,000 records (202 fields), 43,000 (87 fields), and 3,300 records (59 fields). Gotta do some space calculations to see how much smaller it is.
I'll use a flat file for moving data between systems, but I hope to never have to program against one again! T-SQL and set-based solutions for the win!
-----
[font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply