October 28, 2020 at 12:07 pm
Hello. I just rejoined SQL Server Central. I'm here because I'm a solo reports developer for a small business. I use SSRS, Tableau and QlikSense to develop reports and dashboards. I also write complex stored procedures to produce the datasets that I need. These Forums are the only places where I can ask questions or find solutions to problems. Here are my questions about Data Warehouses. Are Data Warehouses an older technology? Is an ETL (SSIS) process only used with a Data Warehouse? Is SSIS difficult to learn?
The small business at which I work has a time entry system for contracted therapists who treat children with special needs. One of our clients has it's own time entry system. Therapists who work for this client must enter their time in the client's system AND our system. I help our billing specialist compare the data from the 2 systems to make sure they match. The billing entries must match in order for the client to accept our invoice. I build a detailed MS excel spreadsheet for the billing specialist to compare the 2 sets of data. I do this for her since I am a more skilled Excel user. One of the problems in comparing the data from the 2 systems is matching up the student names and the therapist names. I know there has to be a better way to do this; sometimes the spreadsheet as to be built many times in the month. Would a data warehouse solution help with this situation? We have at least 1 other client with a similar situation. Can anyone offer any suggestions?
I would also like to improve my skills as a Reports Developer (I think my type of position is referred to as Business Intelligence Developer) on job sites.
Thanks in advance for reading my post.
Roxy
October 28, 2020 at 1:38 pm
Hello, and welcome to the party!
There's a lot to unpack there, but I'll try to help a little. These are the questions I spotted.
At this point in time, yeah, the concept is more than 30 years old. However, that doesn't mean you can't use a data warehouse to solve certain kinds of problems. In fact, it means that the data warehouse solves certain kinds of problems, directly around data reporting & analysis, extremely well. It's possible that a warehouse is needed in your case, but let's talk about that a little more farther down.
2. Is ETL, specifically SSIS, only used with data warehouses?
Nope. I'm not an ETL expert, but I've worked with SSIS and other processes over the years in support of those who are. No, ETL is about data movement. Yes, that might be data into a data warehouse, but the key is data movement. I've also seen SSIS & ETL processes used for moving data out of warehouses and into OLTP systems and data marts. It's just about the movement, not the source or the destination.
3. Is SSIS difficult to learn?
Again, not an SSIS expert. However, the basics, yeah, they're pretty easy to pick up. Then it gets hard. At least in my opinion. Others may disagree.
4. Can a data warehouse help with disparate data issues?
And here things get interesting. So, spreadsheets are good for consuming data, data analysis, stuff like that. The problem comes when spreadsheets are used for data generation. They're just not great. If we're looking at a situation where people are literally inputting information, names you use as an example, into a spreadsheet, it can be very problematic. A better interface for data entry can be built. It can use lookups against existing data to ensure that the names being input are names that are in the system, or, conversely, that the name isn't in the system and needs to be added. This part of your question is a HUGE topic and could involve a lot of discussion.
Personally, I'd say, no, a data warehouse is probably not the right solution here (although, I'm more than willing to listen to why I'm wrong on this one). It seems more like you need a front-end database with well designed input screens to ensure that you have referential integrity on your data.
I hope some of this helps. Welcome again to SSC.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
October 28, 2020 at 1:42 pm
Oh, and if you have specific questions for follow up, don't hesitate to open another thread if necessary. More people will see the question that way. However, as much as possible, we try to keep discussion on one topic on one thread. However, it's Liberty Hall, so have fun.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
October 28, 2020 at 2:15 pm
Therapists who work for this client must enter their time in the client's system AND our system. I help our billing specialist compare the data from the 2 systems to make sure they match.
Herein lies a serious problem with data integrity, as you're finding out.
I guess my question would be... if you have enough access to the client data to build a comparison between the entries made in both systems, why not make it easier on everyone including the folks having to make double entries?
In other words, stop requiring the people making time entries to make identical time entries on the two systems. Have them make their entries on the required client system and import/distribute the data to your system. That would eliminate a whole bunch of work for everyone involved and make the process much less prone to error.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 28, 2020 at 3:53 pm
Grant, Thank You for your thoughtful reply. This will help me formulate better questions on my next post. My goal is to find better solutions and learn new techniques at the same time.
to Jeff Moden: Thank you. I've tried to contact someone on the IT staff for the other "billing system" and they didn't seem to want to help me. In the beginning just wanted them to query the data and send it to me in a spreadsheet, but they said No.
Your idea makes complete sense and I think it is do-able, but it is not in my power to get the cooperation.
What I do now, is download one of their reports to a .csv file and save only the fields I need. I then have to do subtotaling etc. - it is time consuming.
Since I want so much to improve my skills (and my resume), I'm trying to think of ways to do this that would force me to learn something new; that is why I brought up the data warehouse.
October 28, 2020 at 7:12 pm
You're approaching it all in the perfect manner. 1) find solutions to the problems at hand 2) use those solutions as a way to improve your skills.
Keep going. Don't hesitate to reach out here. We'll try to help.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
November 3, 2020 at 1:20 pm
A data warehouse is used to enforce business rules. It also normalizes the data.
Like Jeff points out, entry into one system, then populating the other, insures it is both done and consistent.
If you have access directly to the data sources, and then are manually manipulating data in Excel, you may want to look at automating. At a bare minimum, the first thing I see is an audit query daily to ensure you have entries in both systems. VBA or PowerBI and M might be worth a look.
If names are an issue, any way to control names are chosen from the same lists would be helpful. What appears to be the challenge is who owns each system. Both have needs of their own, but both also benefit from systems being in synch. Your leverage to get change lies in selling the benefits.
November 4, 2020 at 6:41 pm
Ah, yes. You don’t have any control over the front-end data collection or the third-party data feeds you’re getting. You must work with whatever is sent to you.
Rather than to build a data warehouse, it sounds like you’re looking to create an ETL process and data manipulation that consolidates and compares the data in a reporting database.
Here are some ideas for handling this:
1. Create a database for receiving, manipulating, consolidating and reporting the data from the source systems.
2. Build an SSIS process to import the data from each data source to its associated staging table, but don’t do extensive transformations or data manipulation in SSIS.
3. Use T-SQL for data manipulation after the data is staged in the database.
4. Design for your desired output.
5. Decide if you need to archive.
November 4, 2020 at 8:38 pm
Each system likely has it’s own unique key for a Therapist and Student. Your system likely has a Client key.
If a Therapist is entering a record into your system, is the record tagged with their ID? Or is someone else entering the record at times, and can only enter transactions for this client?
Surrogate keys are a way we normalized pulling together data into a corporate database.
It sounds also like there is a bit more than Therapist and Student being matched up.
And I imagine it is not only matching those, but identifying a ‘missing’ entry.
November 5, 2020 at 12:46 pm
Thank you ALL for these responses! This is great. I'm going to carefully read through all of this and then return with more questions.
RL
November 5, 2020 at 2:48 pm
The client time system - if they had a trigger to write new time records to a file, that you had access to, would simplify a lot.
We used triggers to capture new or changed records to build our data warehouse nightly. Triggers also isolated us from changes to the ERP system during upgrades. This was on an AS400 and was similar to the CDC functionality of SQL server, but much more targeted.
We tended to keep things simple, but gave a lot of thought to changes. Data needs change over time, and the more data they can get easily, the more they tend to want.
One of the main differences between data warehousing and applications is applications are concerned with transactions and performance, while data warehouses are designed to normalize and structure data for reporting purposes. Good design is key to scalable performance.
Also the memory needs show quite a different pattern. Transactional tends to be steady, while data warehousing peaks very high during the build, and you need to be very aware of this peak need. I remember the network admin trying to average our usage over 24 hours. I told him to try to log into our server during our nightly process, then get back to me. The next day, I said should there be an issue where we got a late start - either from something on the ERP side or ours, our window was gone and everyone from the shop floor to the CEO would have to wait 2 or 3 hours for or build to finish. He then asked me if I wanted more server than I was requesting.
If you have Enterprise Edition of SQL, SSRS allows you to create parameterized reports to email out on a schedule. We used this to email custom reports to over 100 sales offices in about 1 minute. Only issue I had with this was when we changed to a remote data center. Reports failed randomly, and it took me several days to prove to the Outlook Admin we were being throttled. No, the volume we produced was not spam, and the load was indeed intense.
You seem bright, inquisitive, and willing to learn. Great qualities to have in this field, it will get you far.
November 6, 2020 at 7:15 pm
Grant, Thank You for your thoughtful reply. This will help me formulate better questions on my next post. My goal is to find better solutions and learn new techniques at the same time.
to Jeff Moden: Thank you. I've tried to contact someone on the IT staff for the other "billing system" and they didn't seem to want to help me. In the beginning just wanted them to query the data and send it to me in a spreadsheet, but they said No.
Your idea makes complete sense and I think it is do-able, but it is not in my power to get the cooperation.
What I do now, is download one of their reports to a .csv file and save only the fields I need. I then have to do subtotaling etc. - it is time consuming.
Since I want so much to improve my skills (and my resume), I'm trying to think of ways to do this that would force me to learn something new; that is why I brought up the data warehouse.
You don't need a "data warehouse" for this. Just a couple of tables to do a comparison.
You don't need SSIS for this. It can all be done using T-SQL.
You may not even need to manually download the file. With a little T-SQL prestidigitation, we can have SQL Server do the download for you.
And, you certainly don't need to do all that "sub-totaling" manually. It can all be done in T-SQL using BULK INSERT to load the data into a staging table and then have T-SQL do all of the necessary "sub-totaling" and comparisons with the other source, which would also go through a similar process.
Of course, some of that is dependent on the .CSV file. We can do some serious magic to handle just about anything you need to but it's a whole lot easier if it's consistent in format each time it's downloaded and the rows are consistent in have the same number of "fields".
Interested?
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply