July 5, 2019 at 3:11 pm
So, a new employee is promoting the idea of migrating from T-SQL to Python with Pandas. The existing code base is a large set of calculations (hundreds) done using stock market data. Each new day's market data is stored in a SQL database; a scheduled SQL job then runs the calculations and stores those results back into a SQL database.
None of these calculations is complex to the point of needing specialised libraries. They're all written in T_SQL.
Does anyone have experience with a situation like this? Can you share it and any advice you might have?
Many thanks.
July 5, 2019 at 3:33 pm
What's the benefit to moving to Python? I might ask that. Python is really good at handling some complex string things, and with SQL 2017+, you could do the Python calculations on data sets without import/export. That might make sense, but I would need some future benefit to why this is better in Python. Maybe there are future enhancements that would be better in Python, but those need to be spelled out.
This sounds like someone wants to use new technology, more for the sake of newness or comfort than actual value. There might be some, but you'd have to prove to me that spending resources changing something that already works is worth the cost. Time costs money here, and this isn't trivial. Dev work, testing work, risk of issues, do enough people on call have Python skills to deal with it, or does said employee want to be on call for this?
In general, I am not usually interested in rewriting things that work without a good reason. If there is technical debt and we are already working here, some refactoring might make sense. If this has limitations we'll hit, then maybe. If it's "just better" or "Python is cool/easier to read/works better", I am usually disinclined to spend the time here.
I would ask for a business case written on why.
July 5, 2019 at 7:49 pm
+1 for everything Steve wrote.
If it works, why change it? There need to be some serious benefits in order to justify this.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
July 5, 2019 at 10:11 pm
So, a new employee is promoting the idea of migrating from T-SQL to Python with Pandas. The existing code base is a large set of calculations (hundreds) done using stock market data. Each new day's market data is stored in a SQL database; a scheduled SQL job then runs the calculations and stores those results back into a SQL database. None of these calculations is complex to the point of needing specialised libraries. They're all written in T_SQL. Does anyone have experience with a situation like this? Can you share it and any advice you might have? Many thanks.
I run into such things a lot. Ask them to prove the ROI with code and, yes, there must be ROI. Doing something different just because you can isn't worth a hoot especially when something is already in place and doing things correctly.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 11, 2019 at 12:22 pm
There are some potential advantages of Python:
HTH
July 12, 2019 at 2:54 pm
It would be a resounding no from me.
I have had to support some custom Python solutions, and keeping them working is an absolute nightmare. Keeping all the dependencies working together can be awesomely time consuming and when you go to look for solutions, you get arguments from the forums on whatever module you are asking about on why you should or shouldn't be using a particular module, for a solution you neither developed or wanted in your environment.
Have also had problems where supporting python in data would require adding control characters into your existing jobs that you later come back to and wonder what they were for.
If they are willing to pay for training time on every one of the modules they plan to use for the entire DBA team, and training scripting python, then go for it if you want to. Otherwise I can easily see situations getting held hostage by your developers whenever something goes wrong, and you may not be able to do anything to troubleshoot. I have had entire SQL servers taken down by the mistakes of Python developers. The most notable was a 200 something gigabyte, single transaction that filled up the remaining space in the log drive for an extremely important reporting database which took almost a day to rollback
If you need string manipulation - there probably is nothing better than python that is still widely used. (Perl is better but is starting to become a rare skill) I would question whether what is needed can't be done in SSIS which could be easily supported by anyone with just a little bit of time to tinker with.
If you need complex math, use .net or Java. Either of them are going to be self-contained solutions without a random array of outside dependencies you will have to support.
and +1 on what Jeff Moden and Steve Jones said.
July 22, 2019 at 6:25 pm
Thank you all for responding.
In our shop there's now a lot of hand waving about relative efficiencies, but so far no proof! We don't have a problem with throughput. Who wants to solve a non-existent problem?
I do hope an analyst will find a useful calculation that requires very complex machinations from a Python library that we can implement via SQL 2017 + Python. That would make a solid business case for that calculation (not the pre-existing ones). Sign me up for that one.
But re-doing years of work without a business case? Total waste of time and money.
July 22, 2019 at 6:30 pm
Thanks to [nova] for your answer that included "Opens the door to a lot of other powerful APIs, like Spark".
This idea is the one that could get me on board i.e. if we want to make very complex calculations that need special (Python) libraries.
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply