December 26, 2016 at 7:38 am
Merry Christmas and happy new year!
From time to time my work needs to deal with lots of excel sheets from clients, there is no united format to those excel files and sheets, each client and each project will be different to each other.
That means so far we have to manually import the data, here is what have been doing long before I joined the team:
1. Create a database for each project
2. Create a table for each sheet in each excel file
All columns were initially created as varchar, and 255 characters unless max is needed. No integer, no datetime, sometimes we manually update the types but most of the times that's all.
The import incurs lots of human cost. I believe this is indeed quite common and there should be some matured solution already?
I am thinking:
1. can document-based MongoDB handle this?
2. is there ETL tool capable to handle this?
3. can SSIS handle this in an acceptable manner? I mean, it sounds unrealistic if we are to create SSIS for each sheet.
Thank you for your input.
December 26, 2016 at 8:54 am
One option might be to use the bastard child of SQL Server - Access... (bear with me for a minute)....
Create a table in Access of (source column name, destination column name, destination data type), and then use it to match source columns to destination columns. The good thing is that you would only have to do it once for each variation of a name. Then you could write a query (probably using VBA to make it dynamic) and then append the changed data (proper types and sizes) to a SQL Server table.
Other than that, I don't know...
December 26, 2016 at 2:06 pm
Thanks, still very tedious, actually adding one step here because data is eventually to be imported into SQL (or NoSQL), right?
December 26, 2016 at 2:34 pm
halifaxdal (12/26/2016)
Thanks, still very tedious, actually adding one step here because data is eventually to be imported into SQL (or NoSQL), right?
It'll be much less tedious to brow beat the clients into submission and have them follow a standard. More fun, as well. π
--Jeff Moden
Change is inevitable... Change for the better is not.
December 26, 2016 at 2:51 pm
Thank you Jeff,
I completely agree with you, however you know often the time you will have to follow your client, not asking client to follow you.
It might be possible to enforce some requirement to your client, I bet your boss would ask you not to do so.
When you are talking to clients, situations could end up like a excel file with dozens of sheets and each sheet has arbitrary names with special characters and some sheet might have hundreds of columns with possible special characters in some of them.
So, no, this is not a job for DBAs or for internal developer team to deal with, I am rather looking for some kind of commercial solution as I believe there should be a mature solution for this because this is a common scenario as far as I can see.
December 26, 2016 at 5:52 pm
halifaxdal (12/26/2016)
It might be possible to enforce some requirement to your client, I bet your boss would ask you not to do so.
Fortunately, I have some good input with my bosses and both of them understand the value of consistent data.
When you are talking to clients, situations could end up like a excel file with dozens of sheets and each sheet has arbitrary names with special characters and some sheet might have hundreds of columns with possible special characters in some of them.
You've just mentioned the most valuable aspect of it all and that's talking with the clients. If you let them know how much more accurately and quickly you can do things for them if the data/spreadsheets are thoughtfully structured, they're usually eager to comply. To spring an old parable on this, the answer is always "No" unless you ask. Make it a point to help the customer see the value in it because it will be much better for both the customer and your company.
Just remember... If you don't have a plan, don't expect your clients to have one. π
So, no, this is not a job for DBAs or for internal developer team to deal with, I am rather looking for some kind of commercial solution as I believe there should be a mature solution for this because this is a common scenario as far as I can see.
Unfortunately, no one has come up with the AI solution necessary to handle every possible bit of dribble someone may drool onto a spreadsheet. You say that it's "not a job for DBAs or internal developer team" to deal with but who would know the data better than they do? They probably know it even better than the originators of the spreadsheets.
Shifting gears, do what I did. Think about how to import the first several rows of data (using the "ACE" drivers, which is a bit of that "mature" solution you're looking for) from the spreadsheet with headers set to "NO" and IMEX set to "ON". Import those rows using the standard {fx} column names and read the column names so that you can use them to create the necessary target table. If you have (for example) monthly repeated column names, use that information to create an "unpivot" of the data once you've loaded the data into a staging table. Of course, it will require dynamic SQL but, if you're clever, you can make a couple of variables to identify when you have multi-row column headers and deal with them appropriately. What I end up with is a nice little EAV that's easy to query and index. Of course, horizontal tables as part of a report, you also need to learn how to do dynamic CROSS TABs.
The reason why I don't post my solution is because it's a little too complex to post on a forum thread without an in-depth explanation and the explanation of the techniques used makes for a 30 or so page article. It also requires the use of xp_CmdShell and that alone requires a 15 page article to convince people that it can be used safely and another 10 pages to explain how to set it up to be used safely.
My point though is, if this lowly, non-certified, non-degreed, mostly self-trained, doesn't-believe-in-portable code, SQL hack can do it, so can someone else. It just takes a bit of study, testing, and dedication. To paraphrase a famous quote:
"The difficult we do immediately... the impossible takes slightly longer" and that's backed up with the most important principle for DBAs, Developers, and any walk of life, "Essayons!" π
Step one would be for you to Google how to download and install the "ACE" drivers. I'll save you the time for this one though. The link you need is https://www.microsoft.com/en-us/download/details.aspx?id=39358 and it needs to be installed on each server that you wish to do such things with.
Now... once you've downloaded the 64 bit version, be advised that the Website you downloaded from has a set of instructions that will work but ONLY if there are no 32 bit applications on your machine. DO NOT UNLOAD YOUR APPLICATIONS! ITβS NOT NECESSARY!!!
Instead, find where you the downloaded the AccessDatabaseEngine_x64.exe file to, open a DOS window, change to that directory, type the following command, and press enter.AccessDatabaseEngine_x64.exe /quiet
The command above will install the driver/provider whether or not any 32 bit programs exist on the server.
Be sure to allow any changes when prompted. You should have administrator privs on the box to do this installation.
Youβll need to restart SSMS (not the server) for the changes to take effect.
"Every great journey begins with the first step." π
Once you've properly downloaded and installed the "ACE" drivers, then you need to configure them. You can do so with the following code once you're restarted SSMS after the installation.
USE [master]
GO
EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0', N'AllowInProcess', 1
GO
EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0', N'DynamicParameters', 1
GO
EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0', N'NestedQueries', 1
GO
EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0', N'NonTransactedUpdates', 1
GO
EXEC master.dbo.sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0', N'SqlServerLIKE', 1
GO
--Jeff Moden
Change is inevitable... Change for the better is not.
December 26, 2016 at 6:35 pm
With the understanding that some of the human dialog is missing and once you've taken the first step above and you don't mind it not being quite as detailed as an article will be, you can go to the following URL, search for "EXCEL HELL" on the page, and download my complete presentation on the subject, which includes the PPTX presentation, example spreadsheets to play with, and all the necessary code. Do keep in mind that its POP code (Proof-of-Principle) that may need a tweak or two.
Here's the link...
http://detroit.sqlpass.org/MeetingArchive/2015.aspx
--Jeff Moden
Change is inevitable... Change for the better is not.
December 27, 2016 at 5:51 am
Jeff,
I am shocked when I woke up and saw you have such a long reply to my question, I think that's why I love this forum, thank you very much from bottom of my heart
December 27, 2016 at 9:14 am
halifaxdal (12/27/2016)
Jeff,I am shocked when I woke up and saw you have such a long reply to my question, I think that's why I love this forum, thank you very much from bottom of my heart
Thanks for the feedback. I aim to please. I sometimes miss but I'm always aiming. π
--Jeff Moden
Change is inevitable... Change for the better is not.
December 31, 2016 at 10:10 am
Jeff Moden (12/26/2016)
halifaxdal (12/26/2016)
Thanks, still very tedious, actually adding one step here because data is eventually to be imported into SQL (or NoSQL), right?It'll be much less tedious to brow beat the clients into submission and have them follow a standard. More fun, as well. π
ROTFL! I would buy tickets to see that!
December 31, 2016 at 7:04 pm
kevaburg (12/31/2016)
Jeff Moden (12/26/2016)
halifaxdal (12/26/2016)
Thanks, still very tedious, actually adding one step here because data is eventually to be imported into SQL (or NoSQL), right?It'll be much less tedious to brow beat the clients into submission and have them follow a standard. More fun, as well. π
ROTFL! I would buy tickets to see that!
Me too. I've experienced lots of failure in that area due to "management directives" and the like. At least Jeff's Excel Hell solution works. If you do a lot of importing, spend the time to learn the techniques - they worked great for me.
December 31, 2016 at 10:28 pm
Thanks for the endorsement, Ed. And Happy New Year to everyone!
--Jeff Moden
Change is inevitable... Change for the better is not.
January 1, 2017 at 8:16 am
Excel is a terrible format for automated data exchange. Aside from all technical issues involved in installing and configuring and Excel driver there is the following:
When the user hides rows or columns, they are ignored by Excel provider as if they were deleted.
When the user strikes-out rows, they think that means the rows are deleted, but that's just formatting.
Instead, insist that import files be submitted in TAB delimited format using a documented column layout.
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
January 1, 2017 at 8:21 am
Happy New Years too! π
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply