MySQL is "The world's most popular open source database" and it is widely-used to store and access your data. In fact, there are many cloud and on-premise databases (like MemSQL and Google Cloud SQL) that use the MySQL interface. Several native, open-source drivers are available for connecting to your MySQL data from other applications.
In this article, we will compare the read and write performance of the native MySQL Connectors (JDBC and ODBC)1 and the CData Software JDBC and ODBC Drivers for MySQL2 when working with larger datasets. We show how the CData Driver is able to read large data sets approximately 20% faster than the native driver and write 1 million rows nearly 40% faster than the native drivers.
The Data
In order to provide a reproducible comparison for reading data, we copied the US Amazon book reviews from the Amazon Customer Reviews dataset3 into a MySQL database. The details for the amazon_book_reviews table are below:
Table Size : 9.7 GB
Table Number of Rows : 10,534,179
Number of Columns : 15
# JDBC Driver Read Performance
The main goal of this investigation was to compare the related performance of the JDBC drivers. We did this by running the same set of queries with each JDBC driver. We used LIMIT clauses to change the size of the dataset returned in each query, but we requested the same columns for each query.
Base Query
SELECT
marketplace,
customer_id,
review_id,
product_id,
product_parent,
product_title,
product_category,
star_rating,
helpful_votes,
total_votes,
vine,
verified_purchase,
review_headline,
review_body,
review_date,
FROM
cdata.amazon_book_reviews;
Limit by Query
- LIMIT 1000000
- Full Dataset
To test the drivers, we connected to MySQL using a basic Java application and executed the above queries repeatedly. The results were read and stored in a new variable (based on the datatype) for each column in each row.
Query Times by Driver (in seconds)
Query | CData JDBC Driver | MySQL Connector/J
1 (1,000,000 rows) 14.02 (+26%) 17.67
2 (~10,000,000 rows) 171.09 (+20.7%) 206.42
As can be seen in the results, the CData Driver regularly outperformed the native MySQL Driver, largely due to the way the CData JDBC Driver uses client-side resources.
JDBC Driver Resource Usage
While testing the read performance of the JDBC drivers, we also measured client-side resource usage, looking specifically at memory. The charts below were found by running a sample Java program and using Java VisualVM to capture the memory usage. We used Java OpenJDK version 12.0.2 with a maximum heap size of 8 Gigabytes.
For this comparison, we ran the query for the full dataset.
CData Driver
Native Driver*
- Note the change in scale for the Heap graph.
The CData Driver averages approximately 150 MB of heap usage for the duration of the run. However, the native driver continues to use more and more client resources for the duration of the run. In fact, we had to set the max Heap size to 8 GB in order for the native MySQL Driver to complete its execution. Despite the fact that the native driver is using significantly more resources, it still takes longer to read the data than it takes the CData JDBC Driver.
ODBC Driver Read Performance
The main goal of this investigation was to compare the related performance of the ODBC Drivers. We did this by running the same set of queries with each ODBC Driver. We used LIMIT clauses to change the size of the dataset returned in each query, but we requested the same columns for each query.
Base Query
SELECT
marketplace,
customer_id,
review_id,
product_id,
product_parent,
product_title,
product_category,
star_rating,
helpful_votes,
total_votes,
vine,
verified_purchase,
review_headline,
review_body,
review_date,
FROM
cdata.amazon_book_reviews;
Limit by Query
- LIMIT 1000000
- Full Dataset
To test the drivers, we connected to MySQL using a basic C++ application and executed the above queries repeatedly. The results were read and stored in a new variable (based on the datatype) for each column in each row.
Query Times by Driver (in seconds)
Query | CData ODBC Driver | MySQL Connector/ODBC
1 (1,000,000 rows) 14.95 (+96.7%) 29.40
2 (~10,000,000 rows) 177.51 (+72.3%) 305.77
As can be seen in the results, the CData ODBC Driver regularly outperformed the MySQL Connector/ODBC, largely due to better use of available client resources.
ODBC Driver Resource Usage
While testing the read performance of the ODBC drivers, we also measured client-side resource usage, looking specifically at processing capacity and network bandwidth. The charts below were found by executing the simple C++ application and using the Windows Resource Monitor.
For this comparison, we ran the query for the full dataset.
CData Driver
Native Driver
Both drivers appear to use 20 - 25% of the available processing capacity, with CData using slightly more. The larger difference is in the network bandwidth used, where the CData Driver uses around 250Mbps compared to the 100Mbps used by the native driver.
Million Row Challenge
In addition to measuring read performance, we also want to compare the write performance of the drivers. In short, the CData MySQL JDBC Driver is able to write 1 million rows nearly 40% faster than the native MySQL Connector.
We used a simple Java program to add the rows to a copy of the amazon_book_reviews table referenced above.3. For our testing, we inserted the data in 100 batches of 10,000 rows.
Sample Code
//one batch
Connection connection = DriverManager.getConnection("jdbc:mysql:server=" + myServer + ";port=" + myPort + ";user=" + myUser + ";password=" + myPassword + ";database=" + myDatabse + ";");
String cmd = "INSERT INTO cdata.amazon_book_reviews_insert (marketplace, customer_id, review_id, product_id, product_parent, product_title, product_category, star_rating, helpful_votes, total_votes, vine, verified_purchase, review_headline, review_body, review_date) VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)";
PreparedStatement pstmt = connection.prepareStatement(cmd);
for (int row = 0; row < rows.length; row++){
pstmt.setString(1, marketplace);
pstmt.setLong(2, customer_id);
pstmt.setString(3, review_id);
pstmt.setLong(4, product_id);
pstmt.setLong(5, product_parent);
pstmt.setString(6, product_title);
pstmt.setString(7, product_category);
pstmt.setLong(8, star_rating);
pstmt.setLong(9, helpful_votes);
pstmt.setLong(10, total_votes);
pstmt.setString(11, vine);
pstmt.setString(12, verified_purchase);
pstmt.setString(13, review_headline);
pstmt.setString(14, review_body);
pstmt.setDate(15, review_date);
pstmt.addBatch();
}
int[] affected = pstmt.executeBatch();
Results
Time (in Seconds) to Insert 1m Rows
CData JDBC Driver | MySQL Connector/J
77.6 127.9
Conclusion
Most technologies that emerge are all about being more efficient and providing more functionality in a smaller package. If you can meet your data management needs with fewer tools, then it’s a win-win for cost-effectiveness, efficiency, and ease of use.
The CData Driver offers better querying of large datasets over the native connector, processing the largest dataset at least 20% faster by making better use of the available client resources.
When it comes to inserting data, the CData Driver stands apart in its ability to rapidly insert large sets of data. The CData JDBC Driver is able to insert 1 million rows in under 80 seconds, nearly 40% faster than the native connector.