Dictionary Lookup

  • Hi

    I have hundred million ID, NAME pairs stored in SQL Server as a table that is accessed by C# code, which has a responsibility to lookup NAMEs corresponding to million IDs at a time.    C# code has these million IDs stored in an array, before invoking SQL module.

    Columns of the SQL table:

    ID VARCHAR (500)  -- this is indexed on

    NAME VARCHAR (500)

    Could you please share your thoughts on what is the most performant method to do these million lookups?

    The approach I am thinking of, is - through C# sqlDataClient feature, create a #temp table in SQL consisting of million IDs ( the "input")  Then, JOIN the temp table with these million inputs to main table to create a result temp table.  And, fetch this final result table holding NAMEs (output) corresponding to the IDs passed (input) out of SQL into C#,  by appending NAMES into an array of results.

    Is there a more performant approach?  Alternatively, does SQL Server support Dictionary Lookup sort of built-in feature, to avoid JOINs?

    thank you

  • Good idea would be to leave "the lookup" to the SQL server, then pass the results to C# code. Something like:

    SELECT ID, NAme FROM DirectoryTable 
    WHERE ID = '12345' -- you can supply any value that works for you

    The problem is what i just typed works for somebody who sits in front of SSMS (SQL editor). The real thing would be to write a stored procedure, or a user define function with parameter(s). Then C# must provide code to pass given ID to the procedure/function, and to catch the response of the procedure/function. That way, most of the work is done by SQL Server, and C# code just communicates with SQL, receives results and displays them for the user to see.

    C# is not my area of expertise, someone else will have to help you with that part. However, SQL part is "my area of expertise" and I can  help with that part. However, you need to provide table design and some test data before that. You know, CREATE TABLE... then INSERT INTO...

    Cheers

    Zidar

    Zidar's Theorem: The best code is no code at all...

  • Do you really have 500-character IDs?  Such a long ID seems unusual and non-ideal.  At 500 bytes, only a few rows will fit in an 8K database page. If so, at least make sure these don't get treated as nvarchar -- type conversions and/or storage could further exacerbate the problem.

    A bigint can hold 18 quintillion different values, and the number of possible 128-bit guids (32 characters) is almost incomprehensibly larger. A compact indexable ID is going to be your friend here.

    Are these ID/Name pairs semi-static/slowly-changing?  i.e., assuming this is a repeated process and not one-time-only, would you query the same sets of IDs each time (allowing for additions/deletions)? Or do they vary from run-to-run? If they are consistent, you might be able to add something like a BatchID to facilitate processing w/o having to push the ID's the application is looking for back to SQL and then SQL Server having to join to a temp table to locate those IDs.

     

     

     

     

     

  • (1) Cluster the main table on ID: a nonclus index is worthless here.

    (2) Cluster the lookup table on ID.  The join of the two tables should now become a MERGE join, which will be most efficient here.

    (3) Evaluate page-compressing the main table.  That is, exec sys.sp_estimate_data_compression_savings ..., 'PAGE', and if it yields good results, then page compress the main table.

     

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • You don't need to create a temp table and join to it. You can have the query in a stored procedure that has a parameter which is a "user defined table type". The C# code can populate this parameter from the array of values you want to query. The stored procedure would do a join between this table type parameter and the SQL table that you use for lookups.

    In SQL;

    -- Create the user-defined table type.
    create type [dbo].[udt_MyTableType] as table (
    [ID] [varchar](500) not null
    )

    -- Create the stored procedure
    create procedure [dbo].[MyProcedure] @InputTable dbo.udt_MyTableType readonly
    as
    begin
    begin try
    set nocount on;

    select lt.[ID], lt.[Value]
    from LookupTable lt
    inner join @InputTable i on i.[ID] = lt.[ID]

    return 0;
    end try

    begin catch
    declare @ErrStr varchar(5000), @ErrMsg varchar(4000), @ErrSeverity int, @ErrNo int;
    select @ErrMsg = error_message(), @ErrSeverity = error_severity(), @ErrNo = error_number();
    set @ErrStr = error_procedure()
    + '. Error = ' + rtrim(ltrim(str(isnull(@ErrNo, 0), 10, 0)))
    + ': ' + isnull(@ErrMsg, 'Msg is null');
    print @ErrStr;
    raiserror(@ErrStr, @ErrSeverity, 1);
    return -1;
    end catch
    end

    In C#;

    DataTable lookupTable = new DataTable();
    lookupTable.Columns.Add("ID", typeof(string));

    // Loop through your list of values doing the following

    DataRow lookupRow = lookupTable.NewRow();
    lookupRow["ID"] = currentID;
    lookupTable.Rows.Add(lookupRow);

    // End loop

    var sqlConnection = new SqlConnection(<your connection details here>)
    var sqlCmd = new SqlCommand();
    sqlConnection.Open();
    SqlTransaction tran = sqlConnection.BeginTransaction(IsolationLevel.ReadCommitted);
    sqlCmd.Connection = sqlConnection;
    sqlCmd.CommandType = CommandType.StoredProcedure;
    sqlCmd.CommandText = "MyProcedure";
    sqlCmd.Parameters.Add(new SqlParameter("@InputTable", lookupTable));
    sqlCmd.Transaction = tran;
    SqlDataAdapter adapter = new SqlDataAdapter();
    adapter.SelectCommand = sqlCmd;
    adapter.Fill(<locale information>);
    tran.Commit();

    • This reply was modified 3 years, 3 months ago by  Chris Wooding.
    • This reply was modified 3 years, 3 months ago by  Chris Wooding.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply