September 21, 2015 at 7:04 am
erichansen1836 (9/10/2015)
...If you implement a Network database file system of 500 MDB files, each acting as a partial table within the 5 billion row database, each MDB file containing 10 million rows...
And if you implement dozens or 100s of Jet Engines, i.e. 1 Jet Engine from each PC on the Network, accessing the Network Database file system...
And if you only allow SQL access (i.e. NO SQL) from within a user-interface front-end which controls the ODBC connections, opening them only long enough to retrieve a row(s) or uddate/delete/insert row(s), and which dynamically builds and executes the SQL statements on the end-user's behalf (from keyboard input or selection criteria chosen from the database)...
Then it may be possible for 1000s of end-users to randomly access the 500 *.MDB database system files on the Network without any concurrency issues arising.
If you shard 5 billion rows across 500 .MDB files on a network file server, and each of 1,000 users only needs access to one shard, then that solves the concurrency problem, but you're still pull TB of data across the network. This begs the question of why you even have each user's shard on the file server.
It sounds like you're trying to re-invent the Hadoop Distributed File System, except in the case of Hadoop, each node is used for both data storage and computation. If you must go down this path, then perhaps a peer-to-peer distributed model would work better, where each user has their own shard on their own PC, but each shard is still accessable centrally by the sysadmin when needed for aggregate reporting.
Have you looked into Hadoop or Cassandra?
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
September 21, 2015 at 9:15 am
Lynn Pettis (9/18/2015)
erichansen1836 (9/18/2015)
Definition of workaround: Red flag telling you you should consider a different solution
Get real. Start answering questions that you are asked that mean something to many of us.
Lynn - Don't ask Troll to not be Troll. Just Ignore Troll.
Answering one of your questions brings a closure to this that he does not want.
Finding an answer to any question is not something he created this post for.
It is almost funny how obvious the OP is being and how out of control this thread has become.
After 4<ish> days and 14 pages it is now just sad.
Not dead puppy sad, more like someone who forgot to feed their 5th Beta in two weeks sad...
September 21, 2015 at 9:29 am
erichansen1836 (9/18/2015)
Definition of workaround: Red flag telling you you should consider a different solution
It was like flying with a dead elephant on the astronauts backs (i.e. the LEM attached to the CSM), but the astronauts managed.
So you just related the solution you claim to be better than a 20 year old established RDBMS to a Dead Elephant and one of the two worst disasters in NASA's history.
That is so amazing to me I have created a new phrase to describe it.
Troll Salt:
definition - special form of BS exuded by an internet troll that only makes sense in the salty world it cam from.
Hitting the ignore button on this thing again. Have fun ya'll
September 21, 2015 at 10:24 am
erichansen1836 (9/17/2015)
So now we're saying 32 bit processors can't seek past 4 gigs? Tell me more!
edit: Seriously, I don't write big files, so I haven't had to deal with this, wheres the limitation?
It is an operating system issue. Your operating system has to support Huge Files and it has to support DOUBLE INTEGERS. Windows 7 Home Premium effidently does not as I was not able to get them to work.
This situation never had anything to do with the OS ever. It is statements like this that make me wonder where you learned what you know and if it hated you.
At the most it was an issue with how the OS was optimized to work with the 386 memory architecture physical limit of 4GB of address space for anything.
September 21, 2015 at 10:36 am
If you shard 5 billion rows across 500 .MDB files on a network file server, and each of 1,000 users only needs access to one shard, then that solves the concurrency problem, but you're still pull TB of data across the network. This begs the question of why you even have each user's shard on the file server.
It sounds like you're trying to re-invent the Hadoop Distributed File System, except in the case of Hadoop, each node is used for both data storage and computation. If you must go down this path, then perhaps a peer-to-peer distributed model would work better, where each user has their own shard on their own PC, but each shard is still accessable centrally by the sysadmin when needed for aggregate reporting.
Have you looked into Hadoop or Cassandra?
You could distribute the 500 files across multiple Windows Servers to make the system more robust.
Perhaps 1 Server foreach 100 MDB files(i.e. 1 Billion Rows) in the 500 file (i.e. 5 Billion Row) system? What are Servers $3000? And yes, by all means use a cheap/inexpensize easy to maintain Peer-to-Peer network if the number of users is not very high (under 15-20). I used a Windows 2000 professional Network like this for one company, dedicating one PC to host the database file system. Although, at the time, I only used one *.MDB file for the database as I had not yet taught myself "Why be restricted to 1 MDB file databases?".
If user access is sporadic not constant, then not much strain on the Server should occur.
Also, please keep in mind that this is a CONTROLLED system whereby NO SQL is allowed directly by end-users.
SQL result set size (across the Network) is controlled as well as the SQL Syntax and type of SQL statements allowed foreach user to perform (On their behalf by the user-interface dynamically building the SQL statements).
I am not trying to reinvent anything, but instead invent a new, FREE/NO COST way of using MS-Access databases (Reduction Database Technology) whereby MS-Access software is completely removed from the equation, and replaced with custom ODBC/GUI user-interfaces developers can design which access not 1, but 10s or 100s of *.MDB files each acting as a partial table, table, or collection of tables, and not as a database in and of themselves limited to 2 GIG.
What is going to make a huge database practical and efficient in such a system depends on how well the data logically segregates both for data storage and data access. With ODBC, you can have as many database file or Database Server connections open as you want, and across multiple Windows O/S Servers(Or Linux or Unix Servers), or, you can open and close these connections 1 at a time within a FOR/FOREACH LOOP.
I have used Win32 Perl and Oracle ODBC to connect to dozens of Oracle/Linux database servers from my desktop, and SQL access data from a common table on each Oracle Server, creating a consolidated report in just seconds. My Perl script interrogated the TNSNAMES.ora Oracle Server List (residing locally on my PC) to obtain Server Names/IP Addresses.
ReadOnly databases are always going to be more practical and efficient in this database model than Read/Write Databases, especially for HUGE databases perhaps in the billions of rows and exceeding 1/4 Terabyte or more. Doubt if past 5 Terabytes and 25 Billion rows would be practical even for ReadOnly databases (go to SQL Server for sure). Up to 1 Terabyte and 5 Billion rows should not be that hard to maintain in this DB model (i.e. 500 MDB files).
And please consider my suggestion to NOT ADD or DELETE rows across the database file system directly, but indirectly, then ADD AND DELETE them directly just before a periodic (say monthly) JetComp.exe Utility file reorganization/optimization of the entire 500 file system.
My example of US_Census_2010_TX_A.mdb is not going to be a practical and efficient method of data storage and retrieval if you goal is to randomly or sequentially perform lookups to the records based on something other than (State=Texas and LastName begins with "A").
If your goal is to lookup people this way to contact individuals for phone interview or something, then this segregation is fine and fast/efficient. You can have row indexes on {last name, city(within Texas or other State)}; then proportion out a list of folks to call for each of your phone interviewers.
But if you want to use the database for Police suspect identity, or murder victim identity, or missing persons identity, you might want to segregate the data by {State, (Region or County or City), Sex, Race, Age Range, and even Height Range, Weight Range, eye color, hair color, if available}. Not sure all the info gathered during a Census?
September 21, 2015 at 11:04 am
This situation never had anything to do with the OS ever. It is statements like this that make me wonder where you learned what you know and if it hated you.
At the most it was an issue with how the OS was optimized to work with the 386 memory architecture physical limit of 4GB of address space for anything.
Wow! you haven't been keeping up with the conversation. That was addressed many posts ago by more than just myself.
To recap, It was suggested that Perl 5.6.1 may have not been compiled and distributed with File I/O SEEK/TELL capabilities for files larger than 2 GIG.
I used both Portable code (SysOpen, SysRead, SysWrite, SysSeek, SysTell, Close) Perl syntax as well as Native Windows File I/O. Was not able to get SEEK nor TELL to work past the 2 GIG Integer Limit. Double Integer support likely exists on Windows 7 Home Premium, however ActiveState Win32 Perl versions 5.6.1 binary build 638 does not support Double Integer SEEK values. I got this version in 2002 and have a compatibale Perl application compiler for it (PL to EXE). Have not wanted to spend the money to upgrade.
I made a workaround so that I may continue using this version/build of Perl, and have random access to records in Text files to 4 GIG.
I SEEK up to 2 GIG from Top-of-File, or I can SEEK backwards up to 2 GIG from End-of-File.
i.e. Move the File Pointer.
Once the File Pointer is positioned, a READ statement or Write Statement can Read or Write data Randomly to/from data records.
This is very rapid Random Access to Fixed-Length Text File database records.
FYI, we are talking about a methodology of database implementation (Joint Database Technology) that uses Fixed-Lengh record Text files for huge data record storage, and uses Perl SDBM database files (key/value pairs tied to Perl program hash tables) as PERSISTENT random access indexes to those Text file records. The KEY is one or more fields or partial fields from the Text File records, and the VALUE is the record offset in bytes to SEEK to randomly - a positive or negative integer to (+) or (-) 2-GIG.
September 21, 2015 at 11:54 am
So you just related the solution you claim to be better than a 20 year old established RDBMS to a Dead Elephant and one of the two worst disasters in NASA's history.
That is so amazing to me I have created a new phrase to describe it.
Troll Salt:
definition - special form of BS exuded by an internet troll that only makes sense in the salty world it cam from.
Hitting the ignore button on this thing again. Have fun ya'll
MS-Access Driver ODBC-enabled databases exceeding the 2 GIG limit afforded by MS-Access software, is not exactly like flying with a dead elephant on your back. And I knew someone would suggest such a thing.
It is more like driving a GOLF CART, and not limited to a golf course.
Golf Carts were designed for persons to ride around on a golf course during play.
But Country clubs also use them for other things. I know, I worked in Tennis for 6 years.
The Tennis Dept used them too.
Who else uses Golf Carts for other than their intended purpose?
Who else has said beside Gene Kranz(NASA)
"I don't care what anything was designed to do, I care about what it can do".
Security Companies use them for their patrol officers to patrol the massive parking lots of retail malls and shopping centers.
Apartment Communities use them for their leasing office staff to take prospects out on property to view Apartment homes.
Resorts and Time Shares use them.
Amusement parks use them for their staff to get around the huge premises.
Individuals use them in their neighborhoods.
etc.
September 21, 2015 at 12:54 pm
erichansen1836 (9/21/2015)
If you shard 5 billion rows across 500 .MDB files on a network file server, and each of 1,000 users only needs access to one shard, then that solves the concurrency problem, but you're still pull TB of data across the network. This begs the question of why you even have each user's shard on the file server.
It sounds like you're trying to re-invent the Hadoop Distributed File System, except in the case of Hadoop, each node is used for both data storage and computation. If you must go down this path, then perhaps a peer-to-peer distributed model would work better, where each user has their own shard on their own PC, but each shard is still accessable centrally by the sysadmin when needed for aggregate reporting.
Have you looked into Hadoop or Cassandra?
You could distribute the 500 files across multiple Windows Servers to make the system more robust.
Perhaps 1 Server foreach 100 MDB files(i.e. 1 Billion Rows) in the 500 file (i.e. 5 Billion Row) system? What are Servers $3000? And yes, by all means use a cheap/inexpensize easy to maintain Peer-to-Peer network if the number of users is not very high (under 15-20). I used a Windows 2000 professional Network like this for one company, dedicating one PC to host the database file system. Although, at the time, I only used one *.MDB file for the database as I had not yet taught myself "Why be restricted to 1 MDB file databases?".
If user access is sporadic not constant, then not much strain on the Server should occur.
Also, please keep in mind that this is a CONTROLLED system whereby NO SQL is allowed directly by end-users.
SQL result set size (across the Network) is controlled as well as the SQL Syntax and type of SQL statements allowed foreach user to perform (On their behalf by the user-interface dynamically building the SQL statements).
I am not trying to reinvent anything, but instead invent a new, FREE/NO COST way of using MS-Access databases (Reduction Database Technology) whereby MS-Access software is completely removed from the equation, and replaced with custom ODBC/GUI user-interfaces developers can design which access not 1, but 10s or 100s of *.MDB files each acting as a partial table, table, or collection of tables, and not as a database in and of themselves limited to 2 GIG.
What is going to make a huge database practical and efficient in such a system depends on how well the data logically segregates both for data storage and data access. With ODBC, you can have as many database file or Database Server connections open as you want, and across multiple Windows O/S Servers(Or Linux or Unix Servers), or, you can open and close these connections 1 at a time within a FOR/FOREACH LOOP.
I have used Win32 Perl and Oracle ODBC to connect to dozens of Oracle/Linux database servers from my desktop, and SQL access data from a common table on each Oracle Server, creating a consolidated report in just seconds. My Perl script interrogated the TNSNAMES.ora Oracle Server List (residing locally on my PC) to obtain Server Names/IP Addresses.
ReadOnly databases are always going to be more practical and efficient in this database model than Read/Write Databases, especially for HUGE databases perhaps in the billions of rows and exceeding 1/4 Terabyte or more. Doubt if past 5 Terabytes and 25 Billion rows would be practical even for ReadOnly databases (go to SQL Server for sure). Up to 1 Terabyte and 5 Billion rows should not be that hard to maintain in this DB model (i.e. 500 MDB files).
And please consider my suggestion to NOT ADD or DELETE rows across the database file system directly, but indirectly, then ADD AND DELETE them directly just before a periodic (say monthly) JetComp.exe Utility file reorganization/optimization of the entire 500 file system.
My example of US_Census_2010_TX_A.mdb is not going to be a practical and efficient method of data storage and retrieval if you goal is to randomly or sequentially perform lookups to the records based on something other than (State=Texas and LastName begins with "A").
If your goal is to lookup people this way to contact individuals for phone interview or something, then this segregation is fine and fast/efficient. You can have row indexes on {last name, city(within Texas or other State)}; then proportion out a list of folks to call for each of your phone interviewers.
But if you want to use the database for Police suspect identity, or murder victim identity, or missing persons identity, you might want to segregate the data by {State, (Region or County or City), Sex, Race, Age Range, and even Height Range, Weight Range, eye color, hair color, if available}. Not sure all the info gathered during a Census?
And you steadfastly and arrogantly REFUSE to answer ANY questions with regard to backup and restores, point in time recovery of databases, RPO/RTO (Recovery Point Objectives and Recovery Time Objectives), HA/DR (high availability and disaster recovery). These are extremely import issues that MUST be addressed before using your system for any kind of business needs.
You wanted this discussion, yet you WON'T ANSWER any of these questions. You don't WANT to answer these questions.
At this point your solution is nothing more than a half-baked solution that if a business relies on it, it could find itself in serious trouble at some point in the future.
We are database professionals. Our core responsibility is to protect the data. Data is the key to any businesses life blood, and we hold them.
September 21, 2015 at 12:56 pm
erichansen1836 (9/21/2015)
So you just related the solution you claim to be better than a 20 year old established RDBMS to a Dead Elephant and one of the two worst disasters in NASA's history.
That is so amazing to me I have created a new phrase to describe it.
Troll Salt:
definition - special form of BS exuded by an internet troll that only makes sense in the salty world it cam from.
Hitting the ignore button on this thing again. Have fun ya'll
MS-Access Driver ODBC-enabled databases exceeding the 2 GIG limit afforded by MS-Access software, is not exactly like flying with a dead elephant on your back. And I knew someone would suggest such a thing.
It is more like driving a GOLF CART, and not limited to a golf course.
Golf Carts were designed for persons to ride around on a golf course during play.
But Country clubs also use them for other things. I know, I worked in Tennis for 6 years.
The Tennis Dept used them too.
Who else uses Golf Carts for other than their intended purpose?
Who else has said beside Gene Kranz(NASA)
"I don't care what anything was designed to do, I care about what it can do".
Security Companies use them for their patrol officers to patrol the massive parking lots of retail malls and shopping centers.
Apartment Communities use them for their leasing office staff to take prospects out on property to view Apartment homes.
Resorts and Time Shares use them.
Amusement parks use them for their staff to get around the huge premises.
Individuals use them in their neighborhoods.
etc.
And this has anything to do with what you wanted to discuss how?
I have asked several questions that you still have not answered. Do you EVER intend on addressing ANY of them?
September 21, 2015 at 1:03 pm
If you shard 5 billion rows across 500 .MDB files on a network file server, and each of 1,000 users only needs access to one shard, then that solves the concurrency problem, but you're still pull TB of data across the network. This begs the question of why you even have each user's shard on the file server.
The intention is not to have each user access just 1 *.MDB file out of the 500.
They can access any of the 500.
I've demonstrated on my Laptop the ability for 510 ODBC open connections to a single MDB file all performing independent SQL processing concurrently with reliable SQL report output, and reliable SQL Updates (i.e. 66 concurrent update SQL processes). This was to try to simulate 1 MDB file being hit with high traffic at any given moment in time, which is unlikely, but theoretically possible.
If you have 1000 users connecting randomly to any one of 500 MDB files, and those connections are only open for a second or two to retrieve a row or write a row, then concurrency and network traffic issues may not become a problem. This is of course yet to be seen/proven. In Theory only I say this.
I please remember that I earlier stated that I would be using my own custom record locking strategy.
Not that that is necessary. My example above with 510 ODBC connections running concurrent SQL processing and updates allowed MS-Jet Engine to perform its own record locking strategy and shuffling all the concurrent requests.
September 21, 2015 at 1:08 pm
erichansen1836 (9/21/2015)
If you shard 5 billion rows across 500 .MDB files on a network file server, and each of 1,000 users only needs access to one shard, then that solves the concurrency problem, but you're still pull TB of data across the network. This begs the question of why you even have each user's shard on the file server.
The intention is not to have each user access just 1 *.MDB file out of the 500.
They can access any of the 500.
I've demonstrated on my Laptop the ability for 510 ODBC open connections to a single MDB file all performing independent SQL processing concurrently with reliable SQL report output, and reliable SQL Updates (i.e. 66 concurrent update SQL processes). This was to try to simulate 1 MDB file being hit with high traffic at any given moment in time, which is unlikely, but theoretically possible.
If you have 1000 users connecting randomly to any one of 500 MDB files, and those connections are only open for a second or two to retrieve a row or write a row, then conncurrency and network traffic issues may not become a problem. This is of course yet to be seen/proven. In Theory only I say this.
All I am hearing anymore from erichansen1836 is "wha wha wha wha" just like the adults in a Charlie Brown cartoon.
September 21, 2015 at 1:09 pm
erichansen1836 (9/21/2015)
This situation never had anything to do with the OS ever. It is statements like this that make me wonder where you learned what you know and if it hated you.
At the most it was an issue with how the OS was optimized to work with the 386 memory architecture physical limit of 4GB of address space for anything.
Wow! you haven't been keeping up with the conversation. That was addressed many posts ago by more than just myself.
<continued to type a lot of useless text>
Wow! You have not been keeping up ever at all period or you never would have posted something so wrong in the first place.
Just stop being an intentional pain in the butt now and go away....
Tired of clicking on links that land me back in this thread.
September 21, 2015 at 1:10 pm
All I am hearing anymore from erichansen1836 is "wha wha wha wha" just like the adults in a Charlie Brown cartoon.
I have to say Lynn, you get the AWARD for the most instructive comments on this Thread.
Where would we be without your insightful feedback?
September 21, 2015 at 1:15 pm
erichansen1836 (9/21/2015)
All I am hearing anymore from erichansen1836 is "wha wha wha wha" just like the adults in a Charlie Brown cartoon.
I have to say Lynn, you get the AWARD for the most instructive comments on this Thread.
Where would we be without your insightful feedback?
Quite interesting. You will respond to comments I make like the one you quoted but you totally refuse to answer any questions of substance.
Trying answering the real questions I asked or just end this supposedly open discussion you started because it is painfully obvious to many of us that you have absolutely no interest in a real discussion and are nothing more than troll seeking support for your half-baked and dangerous database solution.
When you can talk about backup/recovery, point in time recovery, RPO/RTO, and HA/DR then maybe there can be a real discussion.
September 21, 2015 at 1:24 pm
erichansen1836 (9/21/2015)
All I am hearing anymore from erichansen1836 is "wha wha wha wha" just like the adults in a Charlie Brown cartoon.
I have to say Lynn, you get the AWARD for the most instructive comments on this Thread.
Where would we be without your insightful feedback?
Answer: We would be reading his positive and insightful information about M$ SQL server on the web site created to share that information.
Instead we get to read the fictitious hateful uneducated troll trash that you and your catfish keep posting to this thread.
You are not endearing yourself to anyone by disrespecting someone you have been giving backhanded insults too for the entire post.
Sorry everyone for the Off Topic posts. Although I am not sure if there was a valid Question that was seeking a true answer anyway.
Viewing 15 posts - 136 through 150 (of 245 total)
You must be logged in to reply to this topic. Login to reply