How to search in database for a specific data... Sometime you are given a screen shot of something and you don't know where it is in the data.

  • DECLARE 
    @value NVARCHAR(500) = 'Place Text Here',
    @loop INT = 0,
    @loopmax INT,
    @table NVARCHAR(500),
    @schemaName NVARCHAR(500),
    @column NVARCHAR(500),
    @rowcount INT,

    @database NVARCHAR(500) = quotename(db_name())

    IF OBJECT_ID('tempdb..#tmp_Table') IS NOT NULL

    DROP TABLE #tmp_Table

    CREATE TABLE #tmp_Table (
    pkey BIGINT identity(1, 1) PRIMARY KEY NOT NULL,
    TableName VARCHAR(500),
    schemaName VARCHAR(500),
    ColumnName VARCHAR(500)

    )

    INSERT INTO #tmp_Table (
    TableName,
    schemaName,
    ColumnName

    )

    SELECT t.NAME tableName,
     
    s.NAME schemaName,
    c.NAME ColumnName
    FROM [sys].[tables] t
    LEFT JOIN [sys].[columns] c
    ON t.object_id = c.object_id
    LEFT JOIN [sys].[schemas] s
    ON t.schema_id = s.schema_id

    SELECT @loopmax = max(pkey)

    FROM #tmp_Table
    WHILE @loop <= @loopmax

    BEGIN

    SET @loop = @loop + 1

    SELECT @table = quotename(TableName),
     @schemaName = quotename(schemaName),
     @column = quotename(ColumnName)

    FROM #tmp_Table

    WHERE @loop = pkey

    DECLARE @sql NVARCHAR(max)

    SET @sql = N'select @rowcountOUT = count(1) from ' + @database + '.' + @schemaName + '.' + @table +
          where patindex(''%' + @value + '%'', ' + @column + ') > 0'

    EXEC sp_executesql @sql,
     N'@rowcountOUT int OUTPUT',
     @rowcountOUT = @rowcount OUTPUT
    IF isnull(@rowcount, 0) > 0

    SELECT 
      db_name() as [Database],
      @schemaName as [Schema],
      @table as [Table],
      @column as [Column]
    END

  • This script you've posted is missing a description and comments, but what it appears to be doing is a SELECT COUNT(*) across *every* table and column in the database, even regardless of column name and data type. This brute force method is going to hit your server hard for an extended period of time. So if there are 100 tables in the database, each containing an average of 10 columns, then that's 1,000 full table scans!

    What I'd suggest first is using a tool like RedGate SQL Search (or retrofit your script above) to only query object meta-data for references to something like a column name. Like, if the screenshot is showing a list of customers and account balances, then attempt to locate all columns containing the keyword '%sale%' or '%amt%'.
    https://www.red-gate.com/products/sql-development/sql-search/

    Once you've narrowed down a handful of likely candidates, then query only those tables and columns.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell - Friday, August 24, 2018 8:26 AM

    This script you've posted is missing a description and comments, but what it appears to be doing is a SELECT COUNT(*) across *every* table and column in the database, even regardless of column name and data type. This brute force method is going to hit your server hard for an extended period of time. So if there are 100 tables in the database, each containing an average of 10 columns, then that's 1,000 full table scans!

    What I'd suggest first is using a tool like RedGate SQL Search (or retrofit your script above) to only query object meta-data for references to something like a column name. Like, if the screenshot is showing a list of customers and account balances, then attempt to locate all columns containing the keyword '%sale%' or '%amt%'.
    https://www.red-gate.com/products/sql-development/sql-search/

    Once you've narrowed down a handful of likely candidates, then query only those tables and columns.

    Sorry to say what I post is look at the data in a database not the columns data elements. Please look very carefully at the SQL before you assume what it is doing. What this does is not taxing on a server it is design to say the location of data like a descriptor in a data lake and you need to find where it is. SQL search does not do data in the databases.

  • That is some funky code, you've also recreated a cursor in an exceptionally awkward way.... and yes it is scanning every column in every table the way it's written as well as doing implicit conversions since you aren't filtering by type.
  • There are scripts here on the site to search for every table and column for data. If I kick this off, what I going to do is scan every row in every table. That is going to flush my buffer pool completely, unless the entire database is in memory.

    However, what I do here is I need some domain knowledge.  What is the screen shot of the data? What's in the database? Normally I can narrow this to searching a few tables rather than everything. I might also try to search for either dates, numerics, or strings first, so I'd limit the query to those column data types.

  • wolfsvein - Friday, August 24, 2018 8:44 AM

    Sorry to say what I post is look at the data in a database not the columns data elements. Please look very carefully at the SQL before you assume what it is doing. What this does is not taxing on a server it is design to say the location of data like a descriptor in a data lake and you need to find where it is. SQL search does not do data in the databases.

    I did run the script on a sample database, commenting out the sp_executesql command and instead printing the sql statements that would be executed. It is actually attempting to SELECT COUNT on every table and column. The script above is a good start, looking at sys.tables and sys.columns, but it should instead search for likely column names not scanning across the table for a specific value. It's just that I felt a need to warn users before anyone attempts to run this script as is.

    We shouldn't treat SQL Server like a data lake. In a relational database, we locate the tables and columns we need by exploring the meta-data catalogs. The problem is that when looking for an occurrence of the product description 'Brand X - 8 ct paper towels', it's going to waste time scanning the Customer.FirstName column. Fortunately, attempts to scan a datatype other than VARCHAR will simply throw a datatype conversion error before skipping to the next statement.

    Another way to reverse engineer the data source for an application is to run a SQL Profiler or Extended Event trace to see what SQL statements are being executed while the data forms in question are loading.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • If you look closely, it does do a count, but it qualifies the count in a way that results in scanning every row.

  • My first step would be to ask the people that created the screen where the data is.  From what I've been made to understand, it's not that difficult to couple the fields on the screen to the tables and columns by going through the front end code especially if it's well written code.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply