April 8, 2011 at 8:18 am
I am very pleased with myself today.
I was tasked with reconciling data in a database with a csv file.
The CSV file is 53MB in size, containing 42471 rows of data.
I needed to:
my options were (what sprang to mind at least):
I went for the compiled binary option because I wanted it to be fast.
I have written a C++ program which loads the ENTIRE csv file into a single CHAR * array in RAM.
it then loops through the file processing the data - (I handled the date fields by converting them into an integer julian date using nothing but pure maths!)
I was a little dubious of the insert/exec thinking that might slow it down a bit, but...
The entire process, including pulling the results into sql server variables takes 250MS, yes, 250 milliseconds exactly!
this makes me happy 🙂
Ben
^ Thats me!
----------------------------------------
01010111011010000110000101110100 01100001 0110001101101111011011010111000001101100011001010111010001100101 01110100011010010110110101100101 011101110110000101110011011101000110010101110010
----------------------------------------
April 8, 2011 at 9:38 am
Well done.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
May 10, 2011 at 2:16 pm
Lordy.... I can't imagine taking the trouble to write a dedicated C++ program just to do all of those specific aggregations on a raw CSV file.
Don't get me wrong, I love C++/C# and it is fast for doing low level processes quickly and repeatedly. But I'm confident it would take no more than a few minutes to import a CSV file to a table (just using the file import applet in SSMC) and run a few queries on it to produce repeatable and reliable results. I'm certain it would take me far less time to do that than it would take me to write a C++/C# program to do the same thing on the faw file data.
The execution time might be 250Ms but how many hours did it take to write the program?
The probability of survival is inversely proportional to the angle of arrival.
May 11, 2011 at 1:58 am
heh heh about 4 hours I think it took me.
The idea behind it was the less steps that had to take place, the less could fall over when processing in batch mode.
In my estimation a well written 15KB exe with a few simple functions is less likely to fall over than some overly complicated ssis package and a bunch of aggregate queries.
that and it was more fun to do it this way 🙂
Ben
^ Thats me!
----------------------------------------
01010111011010000110000101110100 01100001 0110001101101111011011010111000001101100011001010111010001100101 01110100011010010110110101100101 011101110110000101110011011101000110010101110010
----------------------------------------
May 11, 2011 at 6:04 am
BenWard (5/11/2011)
heh heh about 4 hours I think it took me.... that and it was more fun to do it this way 🙂
Seems like you cost your employer/customer 3+ hours of your productivity, all because it "was more fun to do it this way". Having fun is great. I have fun at work all the time. It's a natural part of doing a good job.
Does your employer pay you for correct answers, or how much fun you had getting them? If it's the latter, are they hiring?
[font="Verdana"]Please don't go. The drones need you. They look up to you.[/font]
Connect to me on LinkedIn
May 11, 2011 at 6:17 am
lol it's a program that will be run regularly as part of a big process. the fast execution time going forward is definitely worth 3 hours initial development costs. 😉 and unfortunately no, we arent hiring!
Ben
^ Thats me!
----------------------------------------
01010111011010000110000101110100 01100001 0110001101101111011011010111000001101100011001010111010001100101 01110100011010010110110101100101 011101110110000101110011011101000110010101110010
----------------------------------------
May 11, 2011 at 6:26 am
If the process occurs every day on the same file, same format I'd agree there was a valid argument for doing it in a dedicated process and that the time taken to develop and test it will be amortized out over a long period of time.... given that the file format and/or data types never change.
The probability of survival is inversely proportional to the angle of arrival.
May 11, 2011 at 7:49 am
A simple bulk insert into SQL Server and then a SQL query would probably have been my approach but hey sounds like you did in a way that was fun for you.
May 11, 2011 at 7:58 am
I'm just happy you're happy. :smooooth:
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply