December 5, 2016 at 8:09 am
I have a situation where I need to import data from a Hadoop file that is in parquet format. Preferably, I'd like to do this from within a stored procedure that can truncate the staging table, import the data, then process the data as needed. I know that polybase features are in 2016, but our server will be running 2014. Any suggestions?
LinkedIn: https://www.linkedin.com/in/sqlrv
Website: https://www.sqlrv.com
December 5, 2016 at 8:51 am
Hi Aaron,
I have no idea what the "parquet" format is. I also don't know what the source is. Would it be from a text file?
If it's from a text file and you can explain what the "parquet" format is and maybe even attach such a file (no proprietary or PII info, please), I could take a whack at it.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 5, 2016 at 8:58 am
Heh... I just looked up "Parquet Format" online. You would probably be better off writing a magic decoder ring for this in Java to expand the data into a CSV file and import that with SQL.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 5, 2016 at 9:17 am
That's the thing I like about you Jeff. You're always full of optimism and hope! 😛
However, I've not given up hope yet. Who know, since support for XML is here and JSON nearly so, who knows? I've also considered a Linked Server approach although I generally don't favor those due to performance issues.
LinkedIn: https://www.linkedin.com/in/sqlrv
Website: https://www.sqlrv.com
December 5, 2016 at 9:23 am
Just trying to use the right tool for the right thing. As with most things, shredding the parquet format in SQL Server could be done but, like using even built in features for XML and Jason, SQL Server probably isn't the right place to do it.
Can't Hadoop do the data expansion into a nice neat high performance TAB delimited file? I'd be disappointed if it couldn't.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 5, 2016 at 9:31 am
The team did identify a method of pushing the data to SQL Server but this approach requires coordinating the stored procedure execution on SQL Server to occur AFTER the data push. I was hoping to avoid such a scheduling/dependency nightmare by having it all coordinated within the SQL Server SP, but I may have to take a different approach. Of course, if this were 2016, we'd have Polybase at our disposal...
LinkedIn: https://www.linkedin.com/in/sqlrv
Website: https://www.sqlrv.com
December 5, 2016 at 3:45 pm
Aaron N. Cutshall (12/5/2016)
The team did identify a method of pushing the data to SQL Server but this approach requires coordinating the stored procedure execution on SQL Server to occur AFTER the data push. I was hoping to avoid such a scheduling/dependency nightmare by having it all coordinated within the SQL Server SP, but I may have to take a different approach. Of course, if this were 2016, we'd have Polybase at our disposal...
Why wouldn't a trigger do it for you? For that matter, why couldn't Hadoop call a batch file that fires off SQLCMD?
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply