Interesting solution needed

  • I am hoping my issue doesn't spark too much debate about how i am doing things, i know they arent efficient and am working through many changes in design as well as implementing SSAS, and SSRS for the problem. My issue is in need of a quick fix to free up my time to make things work better.

    I have a very wide table +120 columns, it is also very large. +3M records.

    I am using this table for reporting, the end result is an excel pivot with an ODBC to the table. the pivot does calculations to derive metrics. My daily task is to refresh the pivot as the new data comes in, and post it to an internal sharepoint site. Sounds easy enough, sure. The real issue is that there are over 100 "flavors" of the same pivot, all are in a different workbook.

    That's right, 100 workbooks with 100 pivots/ODBC connections all querying the same table to provide different cuts of the data. Some workbooks are pivots, some are aggregations providing a different simple table. Either way, they all need to be refreshed and loaded daily. Total folder sizes are approx. 1.5GB. I am doing this alone.

    One solution was to provide access to everyone that would need it, but i would like to avoid that as i have about 1500 users accessing the data, i don't want to bog down the server more than it already is. Nothing is normalized (flat table).

    again, as i said. i am working through those issues.

    So i need to find a way to update 100+ reports that equal to about 1.5GB and load them to a Sharepoint.

    I have some VBA that i can use to update the Worksheets. It opens excel, refreshes the ODBC then saves the file. The problem is that it cycles through and does them one at a time, My workday is over by the time it is finished.

    I have tried to make multiple scripts and run them in parallel, and while it appears to work (open and save), it doesnt refresh the data on every report.

    Here is the code i am using

    strPath = "C:\reports"

    Set objExcel = CreateObject("Excel.Application")

    objExcel.Visible = False

    objExcel.DisplayAlerts = False

    Set objFso = CreateObject("Scripting.FileSystemObject")

    Set objFolder = objFso.GetFolder (strPath)

    For Each objFile In objFolder.Files

    ' Error Trapping.

    On Error Resume Next

    If objFso.GetExtensionName (objFile.Path) = "xlsx" Then

    'Wscript.Echo objFile.Path

    Set objWorkbook = objExcel.Workbooks.Open(objFile.Path)

    Set objWorkbook = objExcel.ActiveWorkbook.Worksheets(1)

    ' Clean up.

    objExcel.ActiveWorkbook.RefreshAll

    objExcel.ActiveWorkbook.SAVE

    objExcel.ActiveWorkbook.Close

    End If

    Next

    Wscript.Echo "refresh Complete"

    so while i like this solution, it isn't working properly. One thing that i do (while i update each of these reports manually) is open multiple instances of excel, drag a report to each instance, and cycle through refreshing. It allows me to update about 10 reports at a time, and i can get the whole job done in about 2 hours.

    Has anyone had this type of problem and come up with some awesomely weird solution?

  • I would suggest recreating the reports in SSRS. That would handle the "updating" when the reports are called. Sure, it will take some time to create the reports, but from there you can export as Excel or other formats...

    ______________________________________________________________________

    Personal Motto: Why push the envelope when you can just open it?

    If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.

    Jason L. Selburg
  • I am working on that with the restructure of the DB. If i do this now with the current structure then it will bog down the DB more so than it already is.

    Some of the files i have are used for analysis and deep diving on results meaning that i am bringing in the whole of the table columns so people can research based on the metrics they see.

    Logically i would create SSRS reports for the smaller more aggregated reports, but making the transition in steps may harm the output of the server. If I have a ton of people querying a report as opposed to one person (me).

  • Would it be possible to automate the refreshes so that they happen off-hours?

    Replacing them with SSAS+SSRS is definitely the right path to head down. As you mentioned, you'll never get there if you spend all day every day just manually updating the current spreadsheets. The usual answer to that kind of thing would be to have a server (or at least a workstation) schedule the refreshes to run unattended during the early hours of the morning or some such time.

    Alternatively, are the individual Excel files actually different in some meaningful way, or just multiple copies of the same file? Line-of-business-people may think they're different, but that may be a perception created by the fact that they will look different depending on which columns were configured in the pivot tables last time the file was open, without actual differences. If it's just the pivot selections, then you might be able to update once and then distribute the files, and let end-users define their own pivots. That one might require some sales work on convincing people it's okay to do it that way.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Would it be possible to automate the refreshes so that they happen off-hours?

    that would be ideal, but i don't receive the data until 8:00 am CST. so I'm working on a time crunch as it stands, business wants all the reports done by noon CST (of course with the amount of reporting "needed"/Wanted, that isn't happening).

    The usual answer to that kind of thing would be to have a server (or at least a workstation) schedule the refreshes to run unattended during the early hours of the morning or some such time.

    That is the intention, i have two workstations available to me, the one i work from and another i can use to schedule jobs or run the refreshes. the issue is that i have attempted to load balance using the above code in a VB script set to run via SSIS package, but while it attempts to open refresh and save, it doesnt always refresh every file. Is there a way to run the code or something similar in tandem? I.e. use multiple instances of excel much like i do manually?

    Alternatively, are the individual Excel files actually different in some meaningful way, or just multiple copies of the same file? Line-of-business-people may think they're different, but that may be a perception created by the fact that they will look different depending on which columns were configured in the pivot tables last time the file was open, without actual differences. If it's just the pivot selections, then you might be able to update once and then distribute the files, and let end-users define their own pivots. That one might require some sales work on convincing people it's okay to do it that way.

    Same table different data sets, some reports have all columns, filters on datapoints are different between reports, some are aggregations, etc. I have had to break up the reporting from one massive report to many smaller reports based on different lines of business due to the size of the pivot cache and the end result.

  • in a programming language, you could use a thread pool and spawn mulktiple asyncronous commands, like the code you are using to refresh each excel spreadsheet.

    you'd have to convert from VBA to .NET, but a single program could refresh them all for you, but i don't know if you'd still have the "sheet not refreshed even though it ran" issue.

    hear's a question on multiple threads on stackoverflow, for example:

    http://stackoverflow.com/questions/2356130/starting-multiple-threads-and-keeping-track-of-them-from-my-net-application

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • If refreshing the data is a significant part of the time being consumed by this, I would suggest breaking up the data into the sets actually needed by each set of reports. Indexed views might be a good way to do that, so you only have to run one ETL process into your base-table, and then do any needed rebuilds on the indexes, and then load the Excel files from the views instead of from the base table.

    That would allow you to pre-build the data in a way that would probably speed up refreshing the Excel files. Won't save you any work on telling them to update, but might save considerable time on the actual refresh step.

    Building a multi-thread service that will refresh the Excel files without actually opening them on the desktop will almost certainly be faster. You can call the Excel application object without opening Excel. That is much faster, because it avoids GDI resources and loads a much smaller footprint into the active system space (CPU and RAM), than actually opening multiple Excel files does. At least, it did the last time I did Excel automation. Admittedly, that was about 5 years ago, and I'm a little rusty on how to get it done, but I do remember it being much, much faster than opening the files so they could actually be viewed on the screen.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • here's the vb.Net code to do what you were doing in VBA.

    if you grab a code sample from my previous link, you could see how to spawn , say, 10 threads at a time , each thread doing an excel spreadsheet.

    that's how i would tackle Gus's suggestion.

    'Imports Excel = Microsoft.Office.Interop.Excel

    Declare Function PostMessage Lib "user32" Alias "PostMessageA" (ByVal hwnd As Int32, ByVal wMsg As Int32, ByVal wParam As Int32, ByVal lParam As Int32) As Int32

    Private Const WM_QUIT = &H12

    Private Sub DoStuff(ByVal FileToProcess As String)

    Dim xlApp As Excel.Application

    Dim xlWorkBook As Excel.Workbook

    Dim xlWorkSheet As Excel.Worksheet

    Dim range As Excel.Range

    xlApp = New Excel.Application

    xlWorkBook = xlApp.Workbooks.Open(FileToProcess)

    xlWorkSheet = xlWorkBook.Worksheets(1) '--Not sure of the name! 1-based index, not zero

    range = xlWorkSheet.UsedRange

    xlWorkBook.RefreshAll()

    xlWorkBook.Save()

    xlWorkBook.Close()

    xlApp.Quit()

    --excel can be tough to quit from or kill. this is the code that tries nicely, then hits it with a hammer:

    PostMessage(xlApp.Hwnd, WM_QUIT, 0, 0)

    releaseObject(xlApp)

    releaseObject(xlWorkBook)

    releaseObject(xlWorkSheet)

    'this hammer kills ALL excel object processes.

    'Dim proc As System.Diagnostics.Process

    'For Each proc In System.Diagnostics.Process.GetProcessesByName("EXCEL")

    ' proc.Kill()

    'Next

    End Sub

    Private Sub releaseObject(ByVal obj As Object)

    Try

    System.Runtime.InteropServices.Marshal.ReleaseComObject(obj)

    obj = Nothing

    Catch ex As Exception

    obj = Nothing

    Finally

    GC.Collect()

    End Try

    End Sub

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • I would suggest breaking up the data into the sets actually needed by each set of reports. Indexed views might be a good way to do that, so you only have to run one ETL process into your base-table, and then do any needed rebuilds on the indexes, and then load the Excel files from the views instead of from the base table.

    That would allow you to pre-build the data in a way that would probably speed up refreshing the Excel files. Won't save you any work on telling them to update, but might save considerable time on the actual refresh step.

    yup, part of that is done. Now to automate the refreshing. and to actually accomplish what you mentioned I have a procedure that performs a lot of the calculations and logic on the new data and inserts it to the final table. It is allowing me to not have all of the reports calculate the whole data set when you refresh the report.

    Building a multi-thread service that will refresh the Excel files without actually opening them on the desktop will almost certainly be faster. You can call the Excel application object without opening Excel. That is much faster, because it avoids GDI resources and loads a much smaller footprint into the active system space (CPU and RAM), than actually opening multiple Excel files does. At least, it did the last time I did Excel automation. Admittedly, that was about 5 years ago, and I'm a little rusty on how to get it done, but I do remember it being much, much faster than opening the files so they could actually be viewed on the screen.

    this is what the script does, and yes it is much faster. I agree on the multi-thread piece, thanks for that mention to Lowell as well. the problem is that i dont know how to do that. I posted something on a VB Forum linking back to here. Hopefully i can get something programmed. I know diddly about VB and even less about .net.

    If anyone has some hints about building an app to do just this let me know, or where to go.

    here is a link to the post i made on the VB forum.

    http://www.vbforums.com/showthread.php?t=684020

    Thanks to everyone for all the help already, this has been a great session to point me to some new solutions.

  • I really hate to sound needy here, and your post seems like it will take care of what i have going on.

    and of course here is my stupid question/s.

    What do i do with the code? Is this something i would save as a certain file type in a text editor and run? something i need a visual studio project to run?

    Also how would i edit this to accommodate either the folder where the Excel files reside or the actual file itself?

    I gather i need to edit something here, but i am not familiar enough with the syntax.

    xlApp = New Excel.Application

    xlWorkBook = xlApp.Workbooks.Open(FileToProcess)

    xlWorkSheet = xlWorkBook.Worksheets(1) '--Not sure of the name! 1-based index, not zero

    range = xlWorkSheet.UsedRange

  • i was still looking at this as it kind of caught my interest.

    i'm trying to get a working example of multi threads to work; each thread would call my DoStuff function with a different excel spreadsheet, so my idea is a Parent function that creates worker threads for each excel document.

    i had to create a worksheet that queried a database first, so i might have a prototype for you soon.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Do all of the reports actually get used and how much? For the ones with less usage, why don't you set the external data to refresh on open. That way you can concentrate on refreshing the ones of high usage. I'd consolidate any as much as possible, can you add page filters on the pivots and have the end user filter what they want to see?

    Ultimately you NEED to move to a new framework as you've already hit capacity. You could start with building out the tables to support your SSAS cube, build the cube and then hook the Excel workbooks into the cube. Then move to creating your SSRS report from the cube to replace the workbooks.

    MWise

  • I would either execute the .NET code that Lowell provided in a SSIS script task. In a prior position I used to do work like this via an MsAccess db using VBA spinning out 1700+ Excel workbooks. Basically creating individual workbooks with a cut of the same base data for each branch/area/group/corp in the org. We had someone kick it off in the morning on a stand alone PC that would create and post them by 2PM that day. Insanely stupid waste of resources as only about 30% of the books were being used on a daily basis. We should have stopped at the area level and made the branch users use filters to pinpoint their data.

    MWise

  • the funny thing is that yes they are all being used, and they all average 45MB. I have tried to make them all as small as possible, and am immediately met with pushback.

    I was really taken from behind with this project. The begining request was 4 or 5 trending reports. nothing big as their is already a dashboard interface already running. the business just refuses to use it, and of course any changes take forever, so why not get a monkey like me who is making reports, problem is that they have abused it over the last few months and i'm at a breaking point.

    I need to move this stuff to the teradata servers and have it backed by IT, but none of the orgs want to pay for it, although they all want the data.

    I have about 12k individual users on my sharepoint site.

    today alone i have had 1k hits, and each report is getting usage. the daily email distro is about 1,500 people.

    we submitted to have a contractor signed on to back me up so i could spend more time redesigning the database to handle the load so i can move to SSRS and SSAS as a solution, but that was thrown out as budgets are a no go.

    Lowell, THANKS A TON! You may just be my life saver.

    p.s. The 100 ish reports are all spread out in subfolders, the old code basically looked into each folder (one script file for each folder), each script would cycle through the folder it was assigned to.

  • Mhlewis (7/12/2012)


    I would either execute the .NET code that Lowell provided in a SSIS script task. In a prior position I used to do work like this via an MsAccess db using VBA spinning out 1700+ Excel workbooks. Basically creating individual workbooks with a cut of the same base data for each branch/area/group/corp in the org. We had someone kick it off in the morning on a stand alone PC that would create and post them by 2PM that day. Insanely stupid waste of resources as only about 30% of the books were being used on a daily basis. We should have stopped at the area level and made the branch users use filters to pinpoint their data.

    MWise

    this is pretty much exactly what i am doing, I stopped it at what we call a functional level. although i could definitely do more granular filters than that. I am driving this with MS SQL so i have the option for SSRS and SSAS, like I said i'm working through that, but a lot of time is spent working on updating the reporting.

    and yes it is my own fault for not seeing the potential needs, but what can i do, they already had a dashboard and my objective was a simple one at first.

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply