October 13, 2009 at 7:39 am
Yoiu say you have a database. You don't. You have a file, and using SQL Server as a file system and SQL as a data parser isn't going to be fast.
The way to improve performance is to move the data onto a disk, and write a short program in C using fgets() to do efficient buffered reads of the data, and pointers to parse the data.
No doubt the content and purpose of the data can be analysed, and it could be used as the source to populate a real database. But that's not something to learn from a web posting, it's a craft to study over time.
Sorry.
Roger L Reid
October 13, 2009 at 8:23 am
Buy a server with 256GB of RAM and 16+cores of CPUs and your table scans should run quite quickly since you won't have to touch the disk at all! 🙂
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
October 13, 2009 at 9:31 am
GilaMonster (10/10/2009)
Why on earth do you have a table with 400+ columns?
You know, I ask myself the same thing when I look at some of the stuff developers did before I came on the scene...
I've got 6 tables in one database with 200+ columns in each of them. Three are auditing tables, the other three are the tables being auditted. (Customers, Products, and Licensing info).
Yes I need to normalize them. Yes I want to. I've also got about 60 inhouse applications that depend on them and don't use sprocs or even views to get at the data. And our inhouse development teams is made up of 6 developers and me. And we keep having to create newer apps and never get to do maintenance.
So why he's got 400+ columns in a single table, it could be something like that.
October 13, 2009 at 12:23 pm
What is your disk configuration?
What additional indexes do you have?
How long does it take to do:
Select Count(id) from yourtable
I think that beyond normalization, an exercise in re-defining the data dictionary might be useful (i.e. persistent use of varchar(1) doesn't necessarily match the data or datatype for all of the fields where varchar(1) is defined).
As for the delay in importing this flat-file, knowing your process would be helpful. Is this a complete overwrite, or is it an update?
Have you considered partitioning your table and placing partitions on different RAIDs or LUNs?
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
October 13, 2009 at 10:05 pm
Is this locally attached disk? Partitioning of the data on seperate file group in seperate data files would do almost nothing for performance even if it is on seperate luns with sql server 2005. I guess it also should be asked is this enterprise edition?
But very true the import should not take that long if it is just an import. Are you having a large number of database growth events or are you seeing IO stalls in the log file? What is the growth rate of the data and log files and are they on seperate disk?
October 15, 2009 at 1:16 am
I've used SSIS to import. I also had to split the big file into smaller, 5GB for each. That is better and just want to make sure it's ok when commit data. One file took me 1hour 20 minute to complete. I do not know what is the best solution for this.
Note that my flat file does not have primary key and when import I have to create it. And I'm using secondary 1.5TB disk on my server.
Now I received a lot of request from customer to query complex data (for example: get all records in Maricopa County, AZ with income > $50000,... ). And each query took almost 2 hours to complete. That is really bad.
October 15, 2009 at 2:10 am
Try the count with a nolock option. It will be a lot faster.
If you regularly import data. you'll probably never update the sql data anyhow. it is just storage and queries.
Select Count(ID) from TableName WITH (NOLOCK)
Do not buy anything before you thought long and hard about your design. There is a lot of redundant information in your table and an import job could reduce the database size dramatically.
I get information in flat format regularly. After import I store the information in a well desgned database and delete the import table.
---- Example ----
INSERT INTO T_Rim_FunctionProfile(fnpdescription)
SELECT DISTINCT Position_description
FROM ImportedTable
WHERE Position_description not in (select Isnull(fnpdescription,'') from T_Rim_FunctionProfile)
---- End Example ----
October 15, 2009 at 7:06 am
You really need to break the information down into a better design. Import it into a staging table. then run SSIS packages that move it into more normalized tables.
For the example you gave, you would need an index on the column(s) that contain the county, state and income level.
Once you get the data into normalized tables, then index on the columns that you need to search in. The NOLOCK hint will be a help, but you really need indexes on your search columns.
October 15, 2009 at 5:27 pm
The best idea for reporting of this data is to create another dimensional model database with a star scheme that is populated from the large database then generate a cube off of it with SSAS. That would give you the ability to quickly generate reports. Even better create SSRS reports that connect to the CUBE and dimensional model for its dataset
October 16, 2009 at 6:40 am
He is having difficulty doing what he is doing now. I serious doubt that he knows what a star schema is or how to implement it as either a ROLAP or MOLAP. Just let him get it somewhat normalized and index properly first. Then he can get further educated on how to do the data warehousing stuff with it.
Viewing 10 posts - 16 through 24 (of 24 total)
You must be logged in to reply to this topic. Login to reply