SQLServerCentral Article

Will AI end with SQL, Is this the end of SQL?

,

Introduction to the End of SQL

Will the AI end with SQL? That is the question.

The AI is replacing many job positions, like content writers, designers, programmers, and more. For example, Buzzfeed laid off 15 % of its staff and replaced them with AI tools like ChatGPT and other technologies. The AI indeed has several errors and it still requires human supervision, but it could replace some humans. In Poland Dictador, a rum company, hired an AI robot named Mika as the CEO.

Is it possible, that the DBAs will be replaced by the AI?

In this article, I will show how to get data from files. In my opinion, if a simple human can extract data using natural language, there will be fewer jobs for DBAs. I'm not predicting the extinction of DBAs, but I anticipate a reduced demand for them in the future. In this article, I will provide a demonstration of my vision for the future.

Getting started with the AI demonstration

For this article, I will use Claude which is an artificial intelligence assistant developed by Anthropic. You will need to register to use Claude. In a previous article, I talked about Claude.

We will use the following CSV file that contains information about air flights. You can download the file from this link. The data of the CSV is like this:

It contains the year, month, and passengers of flights. Let’s do some queries on the data.

First, I attached the CSV file in Claude.

Then I asked the following question:

Me: Based on the CSV file attached can you show me the results of the average flights per year in a table?

Claude: Here is a table showing the average number of flight passengers per year based on the CSV data:

<table> <tr><th>Year</th><th>Average Passengers</th></tr> <tr><td>1949</td><td>134</td></tr> <tr><td>1950</td><td>146</td></tr> <tr><td>1951</td><td>180</td></tr> <tr><td>1952</td><td>194</td></tr> <tr><td>1953</td><td>219</td></tr> <tr><td>1954</td><td>235</td></tr> <tr><td>1955</td><td>272</td></tr> <tr><td>1956</td><td>306</td></tr> <tr><td>1957</td><td>348</td></tr> <tr><td>1958</td><td>362</td></tr> <tr><td>1959</td><td>404</td></tr> <tr><td>1960</td><td>460</td></tr> </table>

To calculate the averages, I parsed the CSV data, summed the monthly passengers for each year, and divided by 12 to get the mean. Let me know if you need me to show the calculations or have any other questions!

Convert HTML to CSV with AI

Me: Could you convert the results of the HTML code to CSV?

Sure, here is the average passengers per year data converted to CSV format:

Year, Average Passengers
1949,134
1950,146
1951,180
1952,194
1953,219
1954,235
1955,272
1956,306
1957,348
1958,362
1959,404
1960,460
I took the table data from the HTML and formatted it into a CSV with two columns - "Year" and "Average Passengers". The years are in one column and the calculated average monthly passengers are in the second column. Let me know if you would like the CSV formatted or presented differently!

I took the table data from the HTML and formatted it into a CSV with two columns - "Year" and "Average Passengers". The years are in one column and the calculated average monthly passengers are in the second column. Let me know if you would like the CSV formatted or presented differently!

Me: If the AI can extract data from files, there is no need to use SQL. That is my point, why would you continue using T-SQL if it is no longer necessary?

Even some functions in Excel, PowerBI, Looker, Microstrategy, and Qlik will no longer be necessary if the AI can extract data using natural language.

Even the Data Science job will require less learning time because the AI will handle this.

I agree that today, AI is not so advanced and there are several things that it cannot do yet, but in the long run, it will do a lot of things that humans do.

Replacing SQL queries with natural language requests using AI

Now, I will ask some typical requests that I usually solve with SQL. SUM, AVG queries will be solved by the AI instead of SQL.

Me: from the CSV file, show me the total number of flights, the average number of flights, the year with fewer flights, and the year with more flights.

Claude:

Total number of flights:

There are 12 months per year, from 1949 to 1960, which is 12 * 12 = 144 months total

The data shows the number of passengers per month, which represents the number of flights

So adding up all the passenger numbers gives the total number of flights: 19,334

Average number of flights per month:

Total flights = 19,334

Total months = 144

Average = Total / Months

Therefore, the average number of flights per month is 134

The year with the least flights:

The year with the lowest sum of passengers/flights is 1949, with total flights of 1,608

Also, the year with the most flights:

The year with the highest sum of passengers/flights is 1957, with total flights of 2,740

In summary:

Total flights: 19,334

Average flights per month: 134

The year with the least flights: 1949 (1,608 flights)

The year with the most flights: 1957 (2,740 flights)

Analyzing information with AI

Me: From the CSV attached, which one is the month with fewer flights?

Claude: Based on the CSV data provided, the month with the fewest flights is November. Here are the number of passengers for each November from 1949 to 1960:

1949: 104
1950: 114
1951: 146
1952: 172
1953: 180
1954: 203
1955: 237
1956: 271
1957: 305
1958: 310
1959: 362
1960: 390

November has the lowest passenger numbers in each year, so it can be concluded that November had the fewest flights over this period.

Analyze reasons

Me: Do you know why would November be the month with fewer flights?

Claude:

There are a few potential reasons why November tends to have the fewest flights based on the data provided:

  • Weather - In many parts of the world, November falls in the autumn/winter season. Cold temperatures, rain, snow, etc. can lead to more flight delays and cancellations. This could reduce overall flights scheduled and completed.
  • Holidays - There are fewer major holidays in November compared to other months. Other months like December (Christmas), July/August (summer vacations), etc. see more leisure and holiday travel which increases flights. November has fewer occasions driving travel demand.
  • Travel habits - November falls between the busy summer vacation period and the winter holidays. Travel is less common as families settle into school/work routines. People may be more likely to stay home rather than take trips requiring flights.
  • Cost - With fewer holidays and events, November may have a lower demand for flights which could drive prices down. The lower cost of flights could deter some discretionary travel.
  • Data set - This is a limited data set from 1949-1960. There may be certain historical trends or factors specifically affecting November flights in this period that may not carry through to today.

So, in summary, the combination of weather, fewer holidays, travel habits, and costs are likely factors contributing to November having the lowest number of flights in the years observed. The specific context of the data from 1949-1960 also needs to be considered as a potential influence.

Getting charts with AI

Me: Can you generate a histogram based on the CSV file attached?

Claude: Sure, here is a histogram generated from the CSV data you provided:

import csv
import matplotlib.pyplot as plt
data = []
with open('flights.csv') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    next(csv_reader) # skip header row
    for row in csv_reader:
        data.append(int(row[2]))
plt.hist(data, bins=12, rwidth=0.8)
plt.xlabel('Passengers')
plt.ylabel('Frequency')
plt.title('Histogram of Monthly Flight Passengers')
plt.tight_layout()
plt.show()

This parses the CSV file into a list of monthly passenger numbers. It then plots a histogram with 12 bins (one for each month) and configures labels and chart formatting. The histogram shows the distribution and frequency of monthly passenger counts from 1949-1960. We can see which ranges of values are more common, and which months have outliers in terms of very low or high numbers of passengers.

Let me know if you would like me to modify the histogram or generate any other charts based on the data!

Me: I just modified the path of the flights.csv file in a local path:

with open('c:\\data\\flights.csv') as csv_file:

The histogram generated is the following:

AI histogram generated

As you can see, most of the flights have around 150 passengers. Few flights had 500 or more passengers.

Apply cluster algorithms to the data using AI

Me: Can you apply the clustering algorithm to the CSV file and get some cluster groups?

Yes, I can apply clustering algorithms to group the data in the provided CSV file.

Here is one approach using k-means clustering in Python:

import csv
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
data = []
with open('flights.csv') as f:
    reader = csv.reader(f)
    next(reader) # skip header row
    for row in reader:
        data.append([int(row[0]), int(row[2])])
        # Extract year and passengers into separate lists
        years = [d[0] for d in data]
        passengers = [d[1] for d in data]
# Reshape data into 2D array for clustering
data = np.array(data)
# Apply k-means clustering
kmeans = KMeans(n_clusters=3).fit(data)
labels = kmeans.labels_
# Plot clustered data
plt.scatter(years, passengers, c=labels)
plt.xlabel('Year')
plt.ylabel('Passengers')
plt.title('K-Means Clustering')
plt.show()

This extracts the year and passenger data, reshapes it for clustering, applies k-means with k=3 clusters, and visualizes the clusters on a scatter plot.

The resulting clusters roughly group:

Low passenger years: 1949-1952

Medium passenger years: 1953-1956

High passenger years: 1957-1960

The clusters could be further analyzed to understand the characteristics of each group. I can tune the clustering as needed - let me know if you would like me to modify the approach in any way!

Me: when I tried to run the code, I got the error that np is not defined.

I added this code at the beginning to make it run:

import numpy as np

Then it worked. The Clustering applied is the following:

AI cluster chart

Conclusion about the AI and the end of SQL

As you can see, the AI is very powerful. Currently, Claude does not generate charts, but it does the code, so I expect to see charts with analysis created automatically very soon. I can see that the AI can easily convert CSV to HTML and XML. This could be the end of SSIS, Azure Data Factory, AWS Glue, Talend, Pentaho, etc.

Also, we may no longer be required to use Data Visualization tools like Power BI, MicroStrategy, Tableau, QlikView, Looker, Domo, Sisense, Yellowfin, Birst, Zoho Analytics, etc in the future. The AI can create nice charts and provide some data analysis and some explanations about the data.

Finally, will we need to use SQL to retrieve data? I don't think it will be necessary. AI can extract data from files, and SQL knowledge may not be required in the future.

So, will we all be fired? Are we expected to switch to professions like cowboys, butchers, bakers, or sellers?

The answer is no. However, the way we used to work will change. Everything that we do, will be different. We will use a lot of AI in our jobs. Millions of people are using AI every day, all the time.

New jobs related to AI are now available, creating fresh opportunities. I am being paid to write about AI, so it couldn't be too bad. We need to embrace AI, utilize it, and adapt to the new era—the AI era. If we could survive COVID, I'm confident we can navigate the challenges posed by AI.

Un humano abrazando la inteligencia artificial

Images for the AI and the end of SQL

Some images were created in Bing.

Rate

4.5 (4)

You rated this post out of 5. Change rating

Share

Share

Rate

4.5 (4)

You rated this post out of 5. Change rating