SQLite is a light weight, but ACID compliant, RDBMS implemented as a C library. It is used in a number of very significant software packages including FireFox, Chrome, and Skype. It has proven itself effective for internal data storage. SQLite is implemented as a library and thus becomes fully integrated into the application using it, rather than having the database management system operate separately as is common with other RDBMs like MS SQL Server.
Clearly, SQLite is not a good choice for the kind of multiple-connection, multiple-user, heavy-weight databases that MS SQL Server is called upon to handle. SQLite often makes a fine option for local applications which need a relational data store to call upon, and avoids the need to install or manage a major piece of software like SQL Server. But SQL Server Express is redistributable, relatively lightweight, and occasionally used as a relational data store for single user applications.
There are certain use cases where either SQLite or SQL Server, especially SQL Server Express, might make sense. So, I thought it would be useful to have an informal look at the relative performance of the two systems, especially since I could not readily find a broad benchmark comparison of the two of them.
This is meant to be a general performance comparison of some common commands to provide a rough idea of how they compare, rather than an in-depth benchmarking. Also, I am running all tests through Python 2.7, simply because that is the language I am most familiar with outside of T-SQL itself. In order to compare them in a rational fashion, I will be using a local instance of SQL Server rather than connecting to an outside machine.
So, first I timed creating a table and inserting a few thousand rows:
import time #used to time the execution #chose time over timeit since timeit is built for short code segments import pyodbc #connects to SQL Server import sqlite3 #the sqlite code, not just a connection to the server. numReps = 5000 #make the connections sqlConnStr = ('DRIVER={SQL Server};Server=(local);Database=Test;'+ 'Trusted_Connection=YES') sqliteFile = 'testdb.lit' msCon = pyodbc.connect(sqlConnStr) liteCon = sqlite3.connect(sqliteFile) start_time = time.time() curs = msCon.cursor() sql = """ if OBJECT_ID('dbo.testTable') is not null drop table dbo.testTable create table dbo.testTable (col1 int, col2 varchar(50))""" curs.execute(sql) for x in range(numReps): sql = """ insert into dbo.testTable (col1, col2) values (?, ?)""" curs.execute(sql, (x, str(x))) msCon.commit() curs.close() print "{0:.3f} secs for {1} inserts SQL Server (commit each)".format( time.time() - start_time, numReps) start_time = time.time() curs = liteCon.cursor() #functionally equivalent, but different syntax from SQL Server #Specifying column types in SQLite is optional sql = "drop table if exists testTable;" curs.execute(sql) sql = 'create table testTable (col1 int, col2 text)' curs.execute(sql) liteCon.commit() #effectively same code as for SQL Server for x in range(numReps): sql = """ insert into testTable (col1, col2) values (?, ?)""" curs.execute(sql, (x, str(x))) liteCon.commit() curs.close() print "{0:.3f} secs for {1} inserts SQLite (commit each)".format( time.time() - start_time, numReps)
We get results like:
- 2.012 secs for 5000 inserts SQL Server (commit each)
- 46.909 secs for 5000 inserts SQLite (commit each)
Now, I am keeping this informal rather than trying to do a precise benchmarking, but I have repeated all of these tests multiple times across a three different machines to ensure that the results are representative instead of a quirk of one machine or caused by a background process that happens to execute at the same time. The precise measurements varied substantially as I moved between systems[1], but the general trends remained the same. I also tried a variety of different repetition numbers. That also substantially effected the actual run times, but the general trends remained consistent.
Here, SQL Server is obviously faster, and by a large degree. But the default SQLite commit process is inherently expensive, as discussed in their FAQ (http://www.sqlite.org/faq.html#q19 ). So, I decided to try putting all 5000 commits into a single transaction:
#Now commit only after all inserts start_time = time.time() curs = msCon.cursor() sql = """ if OBJECT_ID('dbo.testTable') is not null drop table dbo.testTable create table dbo.testTable (col1 int, col2 varchar(50))""" curs.execute(sql) for x in range(numReps): sql = """ insert into dbo.testTable (col1, col2) values (?, ?)""" curs.execute(sql, (x, str(x))) msCon.commit() curs.close() print "{0:.3f} secs for {1} inserts SQL Server (commit once)".format( time.time() - start_time, numReps) start_time = time.time() curs = liteCon.cursor() #functionally equivalent, but different syntax from SQL Server #Specifying column types in SQLite is optional sql = "drop table if exists testTable;" curs.execute(sql) sql = 'create table testTable (col1 int, col2 text)' curs.execute(sql) liteCon.commit() #effectively code as for SQL Server for x in range(numReps): sql = """ insert into testTable (col1, col2) values (?, ?)""" curs.execute(sql, (x, str(x))) liteCon.commit() curs.close() print "{0:.3f} secs for {1} inserts SQLite (commit once)".format( time.time() - start_time, numReps)
The results:
- 0.718 secs for 5000 inserts SQL Server (commit once)
- 0.047 secs for 5000 inserts SQLite (commit once)
Which creates a dramatic difference. First, even SQL Server got noticeably faster results by holding the transaction open instead of committing after every insert.[2] But the difference in performance for SQLite is remarkable. With large transactions and infrequent commits, it goes from being substantially slower than SQL Server to slightly faster.
Then I turned to look at the read performance:
#Now test the reads, SQL Server first. start_time = time.time() curs = msCon.cursor() for x in range(numReps): sql = 'select col1, col2 from testTable where col1 = ?' curs.execute(sql, (x,)) row = curs.fetchall() curs.close() print "{0:.3f} secs for {1} reads SQL Server".format( time.time() - start_time, numReps) #Now SQLlite start_time = time.time() curs = liteCon.cursor() for x in range(numReps): sql = 'select col1, col2 from testTable where col1 = ?' curs.execute(sql, (x,)) row = curs.fetchall() curs.close() print "{0:.3f} secs for {1} reads SQLite".format( time.time() - start_time, numReps)
Which returned:
- 2.059 secs for 5000 reads SQL Server
- 2.527 secs for 5000 reads SQLite
Here SQL Server comes out slightly faster, but this is reading from two tables that are in an unindexed heap. So, I created indexes and repeated the queries:
start_time = time.time() curs = msCon.cursor() sql = 'Create nonclustered index col1Idx on dbo.testTable (col1)' curs.execute(sql) msCon.commit() print "{0:.3f} secs to create index SQLServer".format( time.time() - start_time) start_time = time.time() curs = liteCon.cursor() sql = 'Create index col1Idx on testTable (col1)' curs.execute(sql) liteCon.commit() print "{0:.3f} secs to create index SQLite".format(time.time() - start_time) #Now test the reads, SQL Server first. start_time = time.time() curs = msCon.cursor() for x in range(numReps): sql = 'select col1, col2 from testTable where col1 = ?' curs.execute(sql, (x,)) row = curs.fetchall() curs.close() print "{0:.3f} secs for {1} reads SQL Server".format( time.time() - start_time, numReps) #Now SQLlite start_time = time.time() curs = liteCon.cursor() for x in range(numReps): sql = 'select col1, col2 from testTable where col1 = ?' curs.execute(sql, (x,)) row = curs.fetchall() curs.close() print "{0:.3f} secs for {1} reads SQLite".format( time.time() - start_time, numReps)With results of:
The results from this test:
- 0.016 secs to create index SQLServer
- 0.015 secs to create index SQLite
- 0.765 secs for 5000 reads SQL Server
- 0.312 secs for 5000 reads SQLite
We can see that SQLite comes out slightly faster. One other thing I noticed immediately was that the combined time to create the index and perform the read operations was less than the time it took to just perform the reads when using heaps. This reinforces the significance of indexes and provides a good reminder that it is often worth indexing even temp tables if you they are somewhat large and expected to support numerous read operations.
Finally, I wanted to look at tables with a proper primary key instead of heaps:
#redo with PKs #Now commit only after #now create the tables and insert 5000 rows start_time = time.time() curs = msCon.cursor() sql = """ if OBJECT_ID('dbo.testTable') is not null drop table dbo.testTable create table dbo.testTable (col1 int primary key, col2 varchar(50))""" curs.execute(sql) for x in range(numReps): sql = """ insert into dbo.testTable (col1, col2) values (?, ?)""" curs.execute(sql, (x, str(x))) msCon.commit() curs.close() print "{0:.3f} secs for {1} inserts SQL Server (commit once)".format( time.time() - start_time, numReps) start_time = time.time() curs = liteCon.cursor() #functionally equivalent, but different syntax from SQL Server #Specifying column types in SQLite is optional sql = "drop table if exists testTable;" curs.execute(sql) sql = 'create table testTable (col1 int primary key, col2 text)' curs.execute(sql) liteCon.commit() #effectively code as for SQL Server for x in range(numReps): sql = """ insert into testTable (col1, col2) values (?, ?)""" curs.execute(sql, (x, str(x))) liteCon.commit() curs.close() print "{0:.3f} secs for {1} inserts SQLite (commit once)".format( time.time() - start_time, numReps) ############################################################################# #Now test the reads, SQL Server first. start_time = time.time() curs = msCon.cursor() for x in range(numReps): sql = 'select col1, col2 from testTable where col1 = ?' curs.execute(sql, (x,)) row = curs.fetchall() curs.close() print "{0:.3f} secs for {1} reads SQL Server".format( time.time() - start_time, numReps) #Now SQLlite start_time = time.time() curs = liteCon.cursor() for x in range(numReps): sql = 'select col1, col2 from testTable where col1 = ?' curs.execute(sql, (x,)) row = curs.fetchall() curs.close() print "{0:.3f} secs for {1} reads SQLite".format( time.time() - start_time, numReps)
The results:
- 0.686 secs for 5000 inserts SQL Server (commit once)
- 0.047 secs for 5000 inserts SQLite (commit once)
- 0.717 secs for 5000 reads SQL Server
- 0.3312 secs for 5000 reads SQLite
This is where the results surprised me. The insert operations were slightly faster for SQL Server with the clustered index than without. Clearly, this is a small and ever increasing primary key, but I had still expected the existence of the clustered index to have either no impact (since the rows were being presented in the same order the as the primary key) or a slight negative performance impact. Instead it improved the write performance. After some searching, I found an answer on StackExchange (http://stackoverflow.com/questions/7264820/removing-a-primary-key-clustered-index-to-increase-insert-performance ) by marc_s which led me to an article by Kimberly Tripp (http://sqlskills.com/BLOGS/KIMBERLY/post/The-Clustered-Index-Debate-Continues.aspx ). Her article explained that an insert in a clustered table will be faster than inserting into the heap when the order is defined by a clustered index and that order is ever increasing since it avoids the need to do a lookup in the IAM/PFS[3].
Aside from that surprise, SQLite came out slightly faster, just as it did with the nonclustered index before.
Summary and Conclusions
This was kept deliberately informal and is not meant to be decisive or complete. After having worked with both systems on numerous projects, I wanted to get a somewhat better feel for their performance where their use cases could possibly overlap. Along the way, I saw again the importance of indexes and saw yet another example of a situation where creating the indexes and doing the necessary queries is faster than doing the queries against an unindexed heap. I also discovered that a clustered index can improve write performance, under the right circumstances.
Generally, I found that SQLite can outperform SQL Server under certain scenarios. For a single-user application with a need for a relatively small database, it can avoid the need to install and maintain a separate RDBMS while providing excellent performance.
However, SQLite is much more sensitive to proper optimizations than SQL Server is. While both benefit from batching actions into fewer transactions rather than committing frequently, the SQL Server is much more tolerant of frequent commits. And while both benefit from proper indexing, the difference is more significant for SQLite.
Although SQLite performs brilliantly in some use cases, SQL Server shines in all of the areas for which SQLite is clearly unsuitable. SQL Server is a more robust and full featured database system with vastly superior ability to handle concurrency for multi-user environments and a wider feature set including more options to provide integrity checking within the database. It also readily handles database that are too large for SQLite. There is a vast array of tasks outside of providing a database for a single-user application which SQL Server handles easily which SQLite cannot.
In short, they are both useful tools within their own respective spheres.
Additional References
SQLite Wikipedia page - http://en.wikipedia.org/wiki/Sqlite
SQLite FAQ - http://www.sqlite.org/faq.html#q19
SQLite FAQ about insert speed - http://www.sqlite.org/faq.html#q19
Appropriate Uses For SQLite - http://www.sqlite.org/whentouse.html
Main Pyodbc page by mkleehammer/ - http://code.google.com/p/pyodbc/
Python Homepage - http://python.org/
[1] The 5000 inserts for sqlite with a commit on each had by far the most variability of all the results.
[2] An insert in SQL Server by default takes an exclusive table lock, so in a project where there are multiple users interacting with the table, it may make sense to commit relatively frequently even though it comes at something of a performance cost. Here, I am mostly looking at database support for single user applications.
[3] Index Allocation Map/Page Free Space. Essentially, SQL Server is trying to optimize for space on a heap and so searches for free space before the insert. With the clustered index, it focuses on maintaining the index, so with an ever increasing index it just adds the new row at the end.