October 4, 2005 at 7:20 am
Hi
I need some help in coming up with the DB architecture for the following scenario.
There are around 1.5GB files that come into a fileshare every 2 hours and these files should be loaded into a table in the database. while the table is being loaded users are querying on the table and end up getting locked and their query response time is very slow.
the file data will be roughly around 150 GB per day and the data should be available for the users for 60 days. It is huge volume of data and currently there are no indexes on the table.
Can anyone please suggest
1. What is the faster way to load the file data into the DB
2. How should I avoid locking ?
3. Increase the user query response time
Any guidance on these is appreciated
Regards
J
October 4, 2005 at 8:30 am
How are you loading the data at the moment?
1.5GB every two hours is lots less than 150GB a day - I don't understand!
Does the data need to be available to the end users immediately? Could an overnight process be employed?
I always tend to take imported data into a SQL Server 'import' table and then shunt it wherever in SQL Server from there. I would expect this to be faster than a direct import and therefore help your locking problems (a bit). It will need more space headroom though.
Have you decided not to index the table so that imports run as fast as possible? Obviously queries on huge tables without indexes are always going to be slow and there's a trade-off here.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
October 4, 2005 at 9:01 am
How are you loading the data into your tables? You probably should be using BCP utility or Bulk Insert.
Additional bulk insert performance enhancements are:
Also, using Phil's suggestion to import to a separate table, then move the rows may help, but only testing will tell for sure. In one instance, it was best reducing contention to actually create another table, copy the data from the current table to the copy, import to the copy, then drop the original and rename the copy. The only contention was during the drop and rename, which took only nanoseconds. Of course, we had to make sure the database file size would handle having two instances of the large table at the same time.
Hope this helps
Mark
October 4, 2005 at 9:08 am
1.5GB file every two hours and from multiple sources. Say there are n sources .. then n * 1.5GB files come into the file share.
The end users want to see at the latency of 6-7 hours. Overnight process cannot be used here as it will make the data as one day old .
we did not index as it makes the import slow
October 5, 2005 at 12:25 am
Are you running enterprise SQL?
If so, look up "Distributed Partitioned Views" in SQL Server books online. Then you would just set up a table for each 2 hour period and import the data into it. Then update the view to add that table to the large UNIONed select statement and also remove the oldest table (older than 60 days) from the view and drop it from the database. Using that method there shouldn't be any locking contention as the data won't be visible to the user until after the inserts for the prior 2 hours have been completed...
October 5, 2005 at 12:29 am
Thank you all for your suggestions
October 6, 2005 at 3:23 pm
SET WORSHIP_BILL_GATES_MODE OFF
GO
I realize that this is a MS/SQL Server site but it seems that yours volumes are approaching the scalability/reasonability limits of the present SQL Server. I've worked with Sybase IQ (the Sybase data warehouse product) in the past with daily load volumes comparable to yours on a DW that averaged 2 Tb in size. It's load puts everything I've seen in my career to shame and indxingin is almost non-existant. We could query a VIEW with a UNION ALL which consisted of horizontally partitioned tables while loading and deleting data with no performance degradation (the queries were complex JOINs on tables with hundreds of millions of rows to the largest with 1.8 billion rows !!!). Normally when you load data into a SQL Server size grows. A really positive point about this DW is that when you load your source data the resulting DW tables (all indexes included !!!) are actually smaller than the source data due to compression (anywhere from 30% to 50% smaller).
SET WORSHIP_BILL_GATES_MODE ON
GO
RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply