August 31, 2021 at 7:46 pm
Hi
I have hundred million ID, NAME pairs stored in SQL Server as a table that is accessed by C# code, which has a responsibility to lookup NAMEs corresponding to million IDs at a time. C# code has these million IDs stored in an array, before invoking SQL module.
Columns of the SQL table:
ID VARCHAR (500) -- this is indexed on
NAME VARCHAR (500)
Could you please share your thoughts on what is the most performant method to do these million lookups?
The approach I am thinking of, is - through C# sqlDataClient feature, create a #temp table in SQL consisting of million IDs ( the "input") Then, JOIN the temp table with these million inputs to main table to create a result temp table. And, fetch this final result table holding NAMEs (output) corresponding to the IDs passed (input) out of SQL into C#, by appending NAMES into an array of results.
Is there a more performant approach? Alternatively, does SQL Server support Dictionary Lookup sort of built-in feature, to avoid JOINs?
thank you
August 31, 2021 at 8:11 pm
Good idea would be to leave "the lookup" to the SQL server, then pass the results to C# code. Something like:
SELECT ID, NAme FROM DirectoryTable
WHERE ID = '12345' -- you can supply any value that works for you
The problem is what i just typed works for somebody who sits in front of SSMS (SQL editor). The real thing would be to write a stored procedure, or a user define function with parameter(s). Then C# must provide code to pass given ID to the procedure/function, and to catch the response of the procedure/function. That way, most of the work is done by SQL Server, and C# code just communicates with SQL, receives results and displays them for the user to see.
C# is not my area of expertise, someone else will have to help you with that part. However, SQL part is "my area of expertise" and I can help with that part. However, you need to provide table design and some test data before that. You know, CREATE TABLE... then INSERT INTO...
Cheers
Zidar
Zidar's Theorem: The best code is no code at all...
August 31, 2021 at 8:16 pm
Do you really have 500-character IDs? Such a long ID seems unusual and non-ideal. At 500 bytes, only a few rows will fit in an 8K database page. If so, at least make sure these don't get treated as nvarchar -- type conversions and/or storage could further exacerbate the problem.
A bigint can hold 18 quintillion different values, and the number of possible 128-bit guids (32 characters) is almost incomprehensibly larger. A compact indexable ID is going to be your friend here.
Are these ID/Name pairs semi-static/slowly-changing? i.e., assuming this is a repeated process and not one-time-only, would you query the same sets of IDs each time (allowing for additions/deletions)? Or do they vary from run-to-run? If they are consistent, you might be able to add something like a BatchID to facilitate processing w/o having to push the ID's the application is looking for back to SQL and then SQL Server having to join to a temp table to locate those IDs.
August 31, 2021 at 8:35 pm
(1) Cluster the main table on ID: a nonclus index is worthless here.
(2) Cluster the lookup table on ID. The join of the two tables should now become a MERGE join, which will be most efficient here.
(3) Evaluate page-compressing the main table. That is, exec sys.sp_estimate_data_compression_savings ..., 'PAGE', and if it yields good results, then page compress the main table.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
September 3, 2021 at 10:05 am
You don't need to create a temp table and join to it. You can have the query in a stored procedure that has a parameter which is a "user defined table type". The C# code can populate this parameter from the array of values you want to query. The stored procedure would do a join between this table type parameter and the SQL table that you use for lookups.
In SQL;
-- Create the user-defined table type.
create type [dbo].[udt_MyTableType] as table (
[ID] [varchar](500) not null
)
-- Create the stored procedure
create procedure [dbo].[MyProcedure] @InputTable dbo.udt_MyTableType readonly
as
begin
begin try
set nocount on;
select lt.[ID], lt.[Value]
from LookupTable lt
inner join @InputTable i on i.[ID] = lt.[ID]
return 0;
end try
begin catch
declare @ErrStr varchar(5000), @ErrMsg varchar(4000), @ErrSeverity int, @ErrNo int;
select @ErrMsg = error_message(), @ErrSeverity = error_severity(), @ErrNo = error_number();
set @ErrStr = error_procedure()
+ '. Error = ' + rtrim(ltrim(str(isnull(@ErrNo, 0), 10, 0)))
+ ': ' + isnull(@ErrMsg, 'Msg is null');
print @ErrStr;
raiserror(@ErrStr, @ErrSeverity, 1);
return -1;
end catch
end
In C#;
DataTable lookupTable = new DataTable();
lookupTable.Columns.Add("ID", typeof(string));
// Loop through your list of values doing the following
DataRow lookupRow = lookupTable.NewRow();
lookupRow["ID"] = currentID;
lookupTable.Rows.Add(lookupRow);
// End loop
var sqlConnection = new SqlConnection(<your connection details here>)
var sqlCmd = new SqlCommand();
sqlConnection.Open();
SqlTransaction tran = sqlConnection.BeginTransaction(IsolationLevel.ReadCommitted);
sqlCmd.Connection = sqlConnection;
sqlCmd.CommandType = CommandType.StoredProcedure;
sqlCmd.CommandText = "MyProcedure";
sqlCmd.Parameters.Add(new SqlParameter("@InputTable", lookupTable));
sqlCmd.Transaction = tran;
SqlDataAdapter adapter = new SqlDataAdapter();
adapter.SelectCommand = sqlCmd;
adapter.Fill(<locale information>);
tran.Commit();
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply