June 7, 2022 at 12:58 am
Hi fellow cloud people
We are looking at moving from the land to the cloud
we have 100's of files and SQL tables to load into the Data Lake Gen2 then process into the staging and presentation zones (I know names) and then on to the SQL Data warehouse
We are looking for the best way to move and process these files using Synapse so that we don't have to code each file for the raw zone
then I guess we would have to start coding for each file / table as the we progress zones into SQL
Has Synapse a built in functionality for this or do we need a tool , if so can you recommend or suggest a tool
We are currently an SSIS shop
Thanks in advance and keep your head out of the cloud as it may crash 🙂
Kevin
June 8, 2022 at 1:10 am
Thanks for posting your issue and hopefully someone will answer soon.
This is an automated bump to increase visibility of your question.
June 8, 2022 at 8:44 am
I'll have a go at answering but I'm sure others will have more experience of this than me!
Synapse is integrated with Data Factory which is kind of like SSIS in that it can transfer data from source to destination and transform it if necessary. You can use it to iterate through files to load them into the database and I think it can generate the destination schema as part of the process.
For the SQL tables, why not just load them straight into the database rather than going via the data lake?
If you put your files (guessing they are csv or something) into a data lake you can also use external tables and polybase to read them. If the format of the files is consistent and the layout of the data lake is thought about (perhaps have folders based on week/month for example), then you can read them all in one go or in batches.
If the data needs cleansing or transforming or is not suitable for polybase in some way then you can you python in Synapse to code the transformations (or use Data Factory)
June 9, 2022 at 12:53 am
Hi SSCrazy Eights
thank you for your reply to my post!
I think you have answered my question kind of
Its my fault the way i wrote the original post
We commissioned a consultant to build out of 1st environment and they suggest using ADF https://github.com/microsoft/azure-data-services-go-fast-codebase
to load the 100's more like 900 files on a daily bases into the RAW zone this tool they said would load these files without us having to do anything more. then process them into the next stage for example STAGING and PRES for the files that need to go into these zones etc.. my concern is
If DF can do this for me then fine dont need the tool else what other tools out there to manage the processing and coding required to transform the data at later stages
Kevin
Kevin
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply