September 5, 2013 at 10:40 pm
I don't know the best way to structure my database.
I have a data feed that comes in a CSV format, its financial data about 65,000 different mutual funds with daily prices. In each CSV i get, each mutual fund is listed once with the price for the current day.
What i do first is i import the CSV. But then what i want is a table for each mutual fund with prices and dates inside each table. this means i have to create 65,000 tables in a separate database, then every day read from the imported CSV file and write to the tables. The problem is this ajax query takes almost an hour each time (for each imported CSV or in other words each day)
The logic behind this is when i want to forecast data on an individual fund later, i can just go to that fund's table, query the data out, and do what i want with the prices for that fund over a range of time which is the whole point.
The alternative approach i think is instead of creating 65,000 tables, just leave the imported CSV tables and draw data from there on demand. To do that though i would have to join together all my imported CSV with each query before grabbing fund data, because i'd want data over a course of many months even years for a particular fund.
Which approach is the most efficient at runtime? The way i have it now with the 65,000 tables i have an index on the date column in each case. I'm thinking if i did the other option and just queried the imports, i'd need some sort of dynamic way to index them once they're joined so i can use the date column effectively. I would need that i think. I'd need to have them all indexed by fund name and then by date in order to efficiently grab data from a query that would literally be billions or maybe trillions of rows. Is there a dynamic way to do indexes when you join tables?
Your thoughts are much appreciated. Thanks.
September 5, 2013 at 11:28 pm
If all you're looking at is fund prices over time, wouldn't that structure be more like this:
CREATE TABLE TickerPrice (
TickerID VARCHAR(10) NOT NULL,
SomeDate DATE NOT NULL,
Price MONEY NOT NULL
);
What are you doing with the data where you need a zillion tables? Something here just seems awry.
September 6, 2013 at 4:19 am
Remember, it's relational storage. So create a table that defines funds. That single table will have 65,000 rows, each with a unique column that identifies the fund. Then, you import each of, I assume, 65,000 CSV files into a single table that has a foreign key relationship with your funds table so that the data within that table is identified by the fund. Two tables, with lots of rows, not 65,000 tables with only a few rows.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply