options for querying external data sources from SQL Server

  • is there a good introductory article somewhere that covers the various options for querying data outside of SQL Server, like OPENROWSET(), BULK IMPORT, etc and when to use each one? (Ideally a tutorial, but I'll take what I can get!).

    I did search, but it's harder to find stuff if you don't know exactly what you're looking for or the keywords to search for. =)

    (Kinda wish MSFT would do something like the PowerBI import/link kind of UI for SQL Server... but that's wishful thinking on my part, I'm sure!)

    This is just the OPENROWSET() example: OPENROWSET (Transact-SQL) - SQL Server | Microsoft Learn

    Here's a pretty good article: Different Options for Importing Data into SQL Server (mssqltips.com)

    Is there a document that gives the overview of querying external sources and explains when it's appropriate to use each one?

    • This topic was modified 1 year, 10 months ago by  pietlinden.
    • This topic was modified 1 year, 10 months ago by  pietlinden.
    • This topic was modified 1 year, 10 months ago by  pietlinden.
  • pietlinden wrote:

    (Kinda wish MSFT would do something like the PowerBI import/link kind of UI for SQL Server... but that's wishful thinking on my part, I'm sure!)

    Most people would tell you they did such a thing a very long time ago and its initials are SSIS.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • oh, right. That's the part I was going to try to avoid: SSIS.

    Also, I wanted to maybe summarize them first, and then query... basically like if I were querying a CTE (yeah, I know, kind of a dirty word).  Something like this:

    Also, I wanted to maybe summarize them first, and then query... basically like if I were querying a CTE (yeah, I know, kind of a dirty word).  Something like this:

    WITH summarized_data (AggColumn1, AggColumn2, AvgValue)
    AS 
    (SELECT AggColumn1, Aggcolumn2, AVG(AggValue)
    FROM TableFromOtherDB
    WHERE...
    GROUP BY Aggcolumn1, AggColumn2)

    Gross simplification, but you get the idea I h9pe. (Maybe I'm just tripping and I'd need to create something like a materialized view in the source and then query that, but I'm not sure that's an option).

    I guess I could do it in SSIS, on account of it stores credentials there (well, if you set it up right), and then just dump the query results to a local table, and index it.

    • This reply was modified 1 year, 10 months ago by  pietlinden.
    • This reply was modified 1 year, 10 months ago by  pietlinden.
  • pietlinden wrote:

    oh, right. That's the part I was going to try to avoid: SSIS.

    Totally understood and agreed but, YOU asked! 😀 😀 😀

    First, and to set the record straight, the word "CTE" doesn't come close to being a "dirty" word to me.  I rely heavily on CTEs in a whole lot of what I do.  They're one of the keys to successful "top down programming" in SQL, especially when you know how and when it inject a blocking operator.  Even a recursive CTE can have its place, although too many people resort to those because they simply can't think of a set-based solution.

    I know of no really decent summary of the nature that you ask.  I'll also state that any summary that doesn't include the ACE Drivers is a seriously inadequate summary.  I'll further state that the like reason why no decent summary and the intro's are short is because the subject is huge (you already know that).  It's kind of like asking for everything that you could do with a SELECT (what a helluva interview question THAT would make!).

    My recommendation would be like the advice Granny gave me a long time ago... "You can only peel one potato at a time".  I'd concentrate only on  the more important external sources (multiple forms of text based files, spreadsheets, perhaps XML and JSON) and let Mother Nature dictate what to study next including some pretty handy drivers (providers) like the "ACE" drivers and the IBM Power Systems drivers.

    Not the answer that you were looking for, though.  I've just not seen such a thing before and most of the attempts that Ive seen are pretty bad.

     

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • My exposure to data virtualization limited but it seems to be a buzzword. So something for you to consider. The tool I used had drivers for various sources and allowed you to write sql-like queries against multiple sources.

    Decent writeup.

    https://www.altexsoft.com/blog/data-virtualization/

    • This reply was modified 1 year, 10 months ago by  Chrissy321.
  • If I were doing a lot of external data, I'd absolutely be looking at Polybase. It's a way to query the data in place that's more efficient than some of the old mechanisms. I'm not saying it's a panacea for your situation, but it is something I'd test out & understand as a potential solution, especially since Microsoft is giving it simply tons of love these days.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Oh, super cool!!! I guess I'll look around for tutorials on it. You wouldn't happen to know any, would you? (Check learn.microsoft etc?)

  • Be careful of "old" tutorials on this one.  They HAVE changed it, especially starting in 2019 (according to the MS documentation) and some of the "old" methods may no longer be functional as previously written.  The capabilities are still there but the methods to use those capabilities may have changed or have been deprecated and maybe changed according to 2022 documentation.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • pietlinden wrote:

    Oh, super cool!!! I guess I'll look around for tutorials on it. You wouldn't happen to know any, would you? (Check learn.microsoft etc?)

    I don't have one I can recommend, no. Look for anything that Bob Ward did. I'm sure he has some Youtube videos out there. Not exactly classes, but they should get you started.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Oh okay. Thanks, Grant. At least it's a place to start!

    And no DAX required! =)

  • This was removed by the editor as SPAM

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply