Wow that was fast!

Question

Wow that was fast!

BenWard

SSCertifiable

Points: 5903
More actions
April 8, 2011 at 8:18 am

#99656

I am very pleased with myself today.
I was tasked with reconciling data in a database with a csv file.
The CSV file is 53MB in size, containing 42471 rows of data.
I needed to:
Do a record count
sum up a field containing monetary values
sum up another field containing monetary values
find the earliet date in the 'ledger date' column
find the latest date in the 'ledger date' column
output a record count grouped by a char2 column
find the minimum sequence number
find the maximum sequence number
find the sum of te sequence numbers

my options were (what sprang to mind at least):

bring in file via ssis and process with SQL
query file directly using openquery
process the csv using a php script and pipe the data to a table
write a compiled binary program to do all the counts etc and output them to the command line and use insert/exec xp_cmdshell

I went for the compiled binary option because I wanted it to be fast.

I have written a C++ program which loads the ENTIRE csv file into a single CHAR * array in RAM.

it then loops through the file processing the data - (I handled the date fields by converting them into an integer julian date using nothing but pure maths!)

I was a little dubious of the insert/exec thinking that might slow it down a bit, but...

The entire process, including pulling the results into sql server variables takes 250MS, yes, 250 milliseconds exactly!

this makes me happy 🙂

Ben

^ Thats me!

----------------------------------------
01010111011010000110000101110100 01100001 0110001101101111011011010111000001101100011001010111010001100101 01110100011010010110110101100101 011101110110000101110011011101000110010101110010
----------------------------------------

GSquared SSC Guru Points: 260824 More actions · Answer 1

Well done.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

sturner SSC-Insane Points: 22411 More actions · Answer 2

Lordy.... I can't imagine taking the trouble to write a dedicated C++ program just to do all of those specific aggregations on a raw CSV file.

Don't get me wrong, I love C++/C# and it is fast for doing low level processes quickly and repeatedly. But I'm confident it would take no more than a few minutes to import a CSV file to a table (just using the file import applet in SSMC) and run a few queries on it to produce repeatable and reliable results. I'm certain it would take me far less time to do that than it would take me to write a C++/C# program to do the same thing on the faw file data.

The execution time might be 250Ms but how many hours did it take to write the program?

The probability of survival is inversely proportional to the angle of arrival.

BenWard SSCertifiable Points: 5903 More actions · Answer 3

heh heh about 4 hours I think it took me.

The idea behind it was the less steps that had to take place, the less could fall over when processing in batch mode.

In my estimation a well written 15KB exe with a few simple functions is less likely to fall over than some overly complicated ssis package and a bunch of aggregate queries.

that and it was more fun to do it this way 🙂

Ben

^ Thats me!

----------------------------------------
01010111011010000110000101110100 01100001 0110001101101111011011010111000001101100011001010111010001100101 01110100011010010110110101100101 011101110110000101110011011101000110010101110010
----------------------------------------

Thomas Abraham SSChampion Points: 10761 More actions · Answer 4

BenWard (5/11/2011)
heh heh about 4 hours I think it took me.
... that and it was more fun to do it this way 🙂

Seems like you cost your employer/customer 3+ hours of your productivity, all because it "was more fun to do it this way". Having fun is great. I have fun at work all the time. It's a natural part of doing a good job.

Does your employer pay you for correct answers, or how much fun you had getting them? If it's the latter, are they hiring?

[font="Verdana"]Please don't go. The drones need you. They look up to you.[/font]
Connect to me on LinkedIn

BenWard SSCertifiable Points: 5903 More actions · Answer 5

lol it's a program that will be run regularly as part of a big process. the fast execution time going forward is definitely worth 3 hours initial development costs. 😉 and unfortunately no, we arent hiring!

Ben

^ Thats me!

----------------------------------------
01010111011010000110000101110100 01100001 0110001101101111011011010111000001101100011001010111010001100101 01110100011010010110110101100101 011101110110000101110011011101000110010101110010
----------------------------------------

sturner SSC-Insane Points: 22411 More actions · Answer 6

If the process occurs every day on the same file, same format I'd agree there was a valid argument for doing it in a dedicated process and that the time taken to develop and test it will be amortized out over a long period of time.... given that the file format and/or data types never change.

The probability of survival is inversely proportional to the angle of arrival.

Incindium SSC Veteran Points: 209 More actions · Answer 7

A simple bulk insert into SQL Server and then a SQL query would probably have been my approach but hey sounds like you did in a way that was fun for you.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 8

Ninja's_RGR'us

SSC Guru

Points: 294069

May 11, 2011 at 7:58 am

#1323025

I'm just happy you're happy. :smooooth: