Introduction
Loading smalls set of data into SQL Server has been always easy on Windows. Moreover if you like GUI based tools, there are plenty of options. We have SSMS, Azure Data Studio, Visual Studio (SSDT), SSIS, Toad for SQL Server, RapidSQL to name a few. If you are interested to learn how to use Azure Data Studio, I recommend you look at my previous article.
Now, let's take a step back and imagine you have no access to any of these GUI based tools for a moment. To make things complex, one of your clients is asking you to export/import data from/into a SQL Server instance running on Linux (VM or container).
Luckily for us, the BCP (Bulk Copy Program) utility is available for Linux as part of the command line tools package called mssql-tools.
Installing mssql-tools
The mssql-tools package comes installed by default when using a Docker container image of SQL Server, unfortunately, this is not what happens when SQL Server is manually installed on a Linux VM or bare metal machine. We have to install mssql-tools separately.
I will be using Ubuntu for this article, this is very important to note because the installation steps are a little bit different if you want to use RedHat.
The first step is to register the public repository key used by apt to authenticate the packages, then we just simply register Microsoft's official repository for Ubuntu, here are the commands:
[dba mastery] $ curl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add - [dba mastery] $ curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list | sudo tee /etc/apt/sources.list.d/msprod.list
Now that we have the repository ready, we have to update the source list and then proceed with the installation of the mussel-tools package as follows:
[dba mastery] $ apt-get update [dba mastery] $ apt-get install mssql-tools
Once the installation is complete, we have sqlcmd and bcp ready to use. Both command-line utilities will be installed under this path: /opt/mssql-tools/bin/. As an additional step, you can add this folder as part of the $PATH environment variable, but I will skip that part for now.
Exploring data
There are multiple sources to get CSV files with sample data all over the internet, however, in this example I will use the filename "hw_25000.csv". The file in question is composed of biometric information from 25,000 individuals separated in three columns: ID, height and weight.
Because I will be using command line base tools I want to make sure my data is clean, so the very first thing I want to do is to learn more about my dataset. Checking the first and last ten rows of the file will be the easiest way to do a quick check, I will use the head and tail commands to make that happen.
I would check the first ten rows first using the "head -10" command followed by the filename:
[dba mastery] $ head -10 hw_25000.csv 1, 65.78331, 112.9925 2, 71.51521, 136.4873 3, 69.39874, 153.0269 4, 68.2166, 142.3354 5, 67.78781, 144.2971 6, 68.69784, 123.3024 7, 69.80204, 141.4947 8, 70.01472, 136.4623 9, 67.90265, 112.3723 10,66.78236,120.6672
At glance, the first column data corresponds to an ID or some kind of correlative number. The second column displays height data, while the third columns holds the weight data. I can confirm, I don't have dirty data in the first ten rows.
Let's move to check the last ten rows of the file now. This is pretty much the same as the "head" command, specifying the number of rows we want to see but in reverse. That means we will see the last ten rows of the file:
[dba mastery] $ tail -10 hw_25000.csv 24991, 69.97767, 125.3672 24992, 71.91656, 128.284 24993, 70.96218, 146.1936 24994, 66.19462, 118.7974 24995, 67.21126, 127.6603 24996, 69.50215, 118.0312 24997, 64.54826, 120.1932 24998, 64.69855, 118.2655 24999, 67.52918, 132.2682 25000, 68.87761, 124.8742
This concludes, the first and last ten rows of my file has accurate and clean data. The most important, I was able to confirm the ID, height and weight structure is looking good.
Let's move on to import some data, using the "hw_25000.csv" source file with BCP.
Pre-requisites
- Have a database created before the import process
- Have a target table created before the import process
Importing data with BCP
I will provide you with the basic example of using BCP to import data in Linux. There are many more complex scenarios, for what I strongly recommend to check BCP's utility documentation at Microsoft Docs. The BCP utility requires a few arguments when importing data. Let's take a look at each one of them:
- -S: The server name or IP address to connect
- -U: SQL Server user name, this is the login we will use to connect
- -P: Password of the SQL Server login used for the connection
- -d: The target database
- -c: It specifies the operation is made using a character data type (optional)
- -t: It specifies the field terminator, it can be a comma or a tab
- in: Specifies we are importing data into a database
Here is how the BCP command looks like when importing data into a table called "HW" that belongs to the "Biometrics" database using a comma-separated CSV file called hw_25000.csv:
bcp HW in hw_25000.csv -S localhost -U sa -P MyP@ssw0rd# -d Biometrics -c -t ','
Right after executing the command I can see the output on screen:
Starting copy... 1000 rows sent to SQL Server. Total sent: 1000 1000 rows sent to SQL Server. Total sent: 2000 1000 rows sent to SQL Server. Total sent: 3000 1000 rows sent to SQL Server. Total sent: 4000 1000 rows sent to SQL Server. Total sent: 5000 1000 rows sent to SQL Server. Total sent: 6000 1000 rows sent to SQL Server. Total sent: 7000 1000 rows sent to SQL Server. Total sent: 8000 1000 rows sent to SQL Server. Total sent: 9000 1000 rows sent to SQL Server. Total sent: 10000 1000 rows sent to SQL Server. Total sent: 11000 1000 rows sent to SQL Server. Total sent: 12000 1000 rows sent to SQL Server. Total sent: 13000 1000 rows sent to SQL Server. Total sent: 14000 1000 rows sent to SQL Server. Total sent: 15000 1000 rows sent to SQL Server. Total sent: 16000 1000 rows sent to SQL Server. Total sent: 17000 1000 rows sent to SQL Server. Total sent: 18000 1000 rows sent to SQL Server. Total sent: 19000 1000 rows sent to SQL Server. Total sent: 20000 1000 rows sent to SQL Server. Total sent: 21000 1000 rows sent to SQL Server. Total sent: 22000 1000 rows sent to SQL Server. Total sent: 23000 1000 rows sent to SQL Server. Total sent: 24000 1000 rows sent to SQL Server. Total sent: 25000 25000 rows copied. Network packet size (bytes): 4096 Clock Time (ms.) Total : 693 Average : (36075.0 rows per sec.)
It took 693 milliseconds to import 25,000 records, not bad at all. I'm using a SQL Server 2017 container with one CPU and two gigabytes of RAM.
We can see from the output that there were no errors, however, I would like to verify the data running a simple query to check the first ten rows of the HW table:
sqlcmd -S localhost -d Biometrics -U sa -P MyP@ssw0rd# -I -Q "SELECT TOP 10 * FROM HW;" Id Height Weight ----------- ---------------- ---------------- 1 65.78331 112.9925 2 71.51521 136.4873 3 69.39874 153.0269 4 68.2166 142.3354 5 67.78781 144.2971 6 68.69784 123.3024 7 69.80204 141.4947 8 70.01472 136.4623 9 67.90265 112.3723 10 66.78236 120.6672 (10 rows affected)
These first ten rows match perfectly with the ones I checked using the "head -10" command, this is a really good indicator my data looks as expected.
Exporting data with BCP
Exporting data is pretty straightforward, in this example, I will "dump" all the data from the table created in the previous section. The arguments are basically the same with a slight change, we are exporting data (out):
- out: Specifies we are exporting data into a database
Once again, I will use the same database, table this time exporting the data into a comma-separated file called hw_bcp_out.csv. Here is how the BCP command looks:
bcp HW out hw_bcp_out.csv -S localhost -U sa -P MyP@ssw0rd# -d Biometrics -c -t ','
Right after executing the command I can see the output on screen:
Starting copy... 1000 rows successfully bulk-copied to host-file. Total received: 1000 1000 rows successfully bulk-copied to host-file. Total received: 2000 1000 rows successfully bulk-copied to host-file. Total received: 3000 1000 rows successfully bulk-copied to host-file. Total received: 4000 1000 rows successfully bulk-copied to host-file. Total received: 5000 1000 rows successfully bulk-copied to host-file. Total received: 6000 1000 rows successfully bulk-copied to host-file. Total received: 7000 1000 rows successfully bulk-copied to host-file. Total received: 8000 1000 rows successfully bulk-copied to host-file. Total received: 9000 1000 rows successfully bulk-copied to host-file. Total received: 10000 1000 rows successfully bulk-copied to host-file. Total received: 11000 1000 rows successfully bulk-copied to host-file. Total received: 12000 1000 rows successfully bulk-copied to host-file. Total received: 13000 1000 rows successfully bulk-copied to host-file. Total received: 14000 1000 rows successfully bulk-copied to host-file. Total received: 15000 1000 rows successfully bulk-copied to host-file. Total received: 16000 1000 rows successfully bulk-copied to host-file. Total received: 17000 1000 rows successfully bulk-copied to host-file. Total received: 18000 1000 rows successfully bulk-copied to host-file. Total received: 19000 1000 rows successfully bulk-copied to host-file. Total received: 20000 1000 rows successfully bulk-copied to host-file. Total received: 21000 1000 rows successfully bulk-copied to host-file. Total received: 22000 1000 rows successfully bulk-copied to host-file. Total received: 23000 1000 rows successfully bulk-copied to host-file. Total received: 24000 1000 rows successfully bulk-copied to host-file. Total received: 25000 25000 rows copied. Network packet size (bytes): 4096 Clock Time (ms.) Total : 123 Average : (203252.0 rows per sec.)
Nice! The 25,00 rows were copied into a comma-separated file that contains all the data from the HW table. Let's check my current folder to make the hw_bcp_out.csv file exists using the list (ls -l) with details command:
[dba mastery] $ ls -l total 1212 -rw-r--r-- 1 root root 608306 Dec 13 04:38 hw_25000.csv -rw-r--r-- 1 root root 608307 Dec 13 05:37 hw_bcp_out.csv
I can see two files, the one I used during the import example (hw_25000.csv) and the other created by the BCP out (hw_bcp_out.csv). In case you want to go the extra mile, you can check the first and last ten rows using the "head" and "tail" commands as before, but I will skip that for now.
Conclusion
The BCP utility is a very powerful cross-platform command-line tool we can use to import/export data into/from a SQL Server instance running on any environment (Windows, Linux, Docker containers).