Blog Post

Introduction to OpenAI and LLMs – Part 2

,

My previous blog post on this topic was Introduction to OpenAI and LLMs, the “what” part (what is OpenAI and LLM), and this blog post will talk about the “how” part (how to use OpenAI on your own unstructured data via products like Azure OpenAI On Your Data). But first, a review of the previous blog with some additional clarifications and definitions to help you better understand what OpenAI and LLMs are.

First a few definitions:

  • Structured data: relational databases
  • Semi-structured data: files and logs in CSV, XML, or JSON formats
  • Unstructured data: emails, documents, and PDFs
  • Binary data: images, audio, video

The “generative” in generative AI refers to the ability of these systems to create or generate new content (text, images, code), often in response to a prompt entered by a user, rather than simply analyzing or responding to existing content. This involves producing output that is original and often unpredictable, based on the patterns, rules, and knowledge it has learned during its training phase. Large Language Models (LLMs) like GPT (Generative Pre-trained Transformer) are a type of generative AI that allow users to type questions or instructions into an input field (such as a chat bot), upon which the model will generate a human-like response.

Machine learning (ML) is a broad field encompassing various algorithms and models that learn from data to make predictions or decisions. Large Language Models (LLMs) like GPT are a specific type of ML model focused on processing and generating text based on the patterns they’ve learned from large datasets. The main difference between LLMs and the broader category of ML is that ML includes a wide range of algorithms for different tasks, such as image recognition, data analysis, and predictive modeling, whereas LLMs are specialized for understanding and generating human language.

Generative AI uses a computing process known as deep learning to analyze patterns in large sets of data and then replicates this to create new data that appears human-generated. It does this by employing neural networks, a type of machine learning process that is loosely inspired by the way the human brain processes, interprets and learns from information over time. LLMs are based on a specific type of neural network architecture known as the Transformer, which uses vectors and weights (vectors convert words into numerical data that the model interprets, while weights adjust how these vectors influence each other, enabling the model to identify patterns and generate text that reflects the meaning and context of words).

To give an example, if you were to feed lots of fiction writing into a generative AI model, it would eventually gain the ability to craft stories or story elements based on the literature it’s been trained on. This is because the machine learning algorithms that power generative AI models learn from the information they’re fed — in the case of fiction, this would include elements like plot structure, characters, themes and other narrative devices. Generative AI models get more sophisticated over time — the more data a model is trained on and generates, the more convincing and human-like its outputs become.

Training a LLM is about teaching a computer to understand and use language effectively and answer like a human. First, we collect a lot of written text and prepare it so that it’s easy for the computer to process. Then, we use this text to train the neural network, which learns by trying to predict what word comes next in a sentence. We constantly adjust the program to help it learn better and check its progress by testing it with new text it hasn’t seen before. If needed, we can further train it on specific types of text to improve its skills in certain areas. This whole process requires powerful computers and the knowledge of how to train and adjust these complex models.

RAG (Retrieval-Augmented Generation), as explained in detail in my prior blog post, does not rewire or fundamentally alter the neural network of the LLM itself. The underlying neural architecture of the LLM component remains the same. What changes is the input process—RAG methods combine the input query with additional context from the retrieval system before processing it through the neural network. This allows the LLM to use both the original input and the external information effectively, improving the relevance and quality of the outputs by effectively altering the input vectors that the model works with which changes the data that is fed into the model. In short, it makes the questions a person asks smarter.

GPT-4 is reported to have approximately 1.8 trillion parameters spread across 120 layers. Layers in a neural network are levels of neurons, where each layer processes inputs from the previous layer and passes its output to the next. Parameters are the internal variables that the model adjusts during training to improve its predictions. This process helps the model understand how words, phrases, and sentences are typically used and relate to each other. The 120 layers in GPT-4 allow for more complex processing and deeper analysis of language compared to previous models. This is a significant increase from GPT-3.5’s 175 billion parameters and 96 layers. The larger number of parameters and layers enables GPT-4 to have a deeper understanding of language nuances and generate more complex responses.

As part of the fully managed Azure OpenAI Service, the GPT-3 models analyze and generate natural language, Codex models analyze and generate code and plain text code commentary, and the GPT-4 models can understand and generate both natural language and code. These models use an autoregressive architecture, meaning they use data from prior observations to predict the most probable next word. This process is then repeated by appending the newly generated content to the original text to produce the complete generated response. Because the response is conditioned on the input text, these models can be applied to various tasks simply by changing the input text.

The GPT-3 series of models are pretrained on a wide body of publicly available free text data. This data is sourced from a combination of web crawling (specifically, a filtered version of Common Crawl, which includes a broad range of text from the internet and comprises 60 percent of the weighted pretraining dataset and is filtered to improve the quality of the data) and higher-quality datasets, including an expanded version of the WebText dataset (which includes a broad range of text, including Reddit), two internet-based books corpora (Books1 and Books2) and English-language Wikipedia (for more info on these data sources, check out AI Training Datasets: the Books1+Books2 that Big AI eats for breakfast). The model was fine-tuned using reinforcement learning with human feedback (RLHF). Note that OpenAI has not publicly disclosed the full details of the specific datasets used to train GPT-4.

It’s accurate to describe GPT as a sophisticated autocomplete system (autocomplete, or word completion, is a feature in which an application predicts the rest of a word a user is typing.). GPT-4, like other advanced language models, predicts the next word or sequence of words based on the input it receives, similar to how autocomplete functions. However, it goes beyond simple prediction by understanding context, managing complex conversations, generating coherent and diverse content, and adapting to a wide range of tasks and prompts. This level of sophistication and adaptability is what sets it apart from standard autocomplete features.

Learn more about the training and modeling techniques in OpenAI’s GPT-3GPT-4, and Codex research papers.

A point of confusion with a LLM is whether it “stores” text or whether the LLM “remembers” prior interactions and learns from the questions. To clarify, when you ask a question or make a request, the LLM uses only the text or documents you provide as input to generate an answer. It does not access or retrieve information from external sources in real-time or pull from a specific database of stored facts. Instead, it generates responses based on patterns and information it learned during its training phase. This means the model’s responses are constructed by predicting what text is most likely to be relevant or appropriate, based on the input it receives and the training it underwent.

The model does not “remember” previous interactions in the traditional sense. Each response is independently generated based on the current input it receives, without any retained knowledge from past interactions unless those interactions are part of the current session or explicitly included in the conversation. So it is not learning along the way, and the RAG method is not training your model.

The accuracy and relevance of the model’s answers depend on how well its training data covered the topic in question and how effectively it learned from that data. Therefore, while it can provide information that feels quite informed and accurate, it can also make errors or produce outdated information if that reflects its last training update.

An LLM doesn’t “store” text in the sense of storing it directly as a database does. Instead, it learns patterns, relationships, and information from the text it was trained on and encodes this knowledge into a complex neural network of weights and biases within its architecture. During training, the model adjusts these weights and biases based on the input data. The weights determine the strength of the connection between neurons, while biases adjust the output of the neurons.

When you ask a question, the model uses these learned patterns to generate text that it predicts would be a plausible continuation or response based on the input you provide. It’s not recalling specific texts or copying them verbatim from its training data but rather generating responses based on the statistical properties and linguistic structures it learned during training.

So, the model doesn’t contain the text itself but has learned from a vast amount of text how to generate relevant and coherent language outputs. This process is more akin to a skilled writer recalling knowledge, ideas, and linguistic structures they’ve learned over time to compose something new, rather than pulling exact entries from a reference book.

Grounded models, such as the RAG method, are those that reference or integrate external data sources or specific pieces of information in real-time from the user prompt while generating responses. Ungrounded models, on the other hand, generate responses based solely on the data they were trained on, without any real-time access to external information (the GTP-4 model in its base form is an ungrounded model).

Now that you have a good understanding of OpenAI and its LLMs, let’s discuss how you can leverage these technologies with your own unstructured data (text in documents). When you interact with ChatGPT, you’re engaging with a model trained on a diverse range of internet text. However, OpenAI now offers the capability to upload your own documents (.txt, .pdf, .docx, .xlsx) directly into ChatGPT (Bing Copilot supports document uploads too, and many more file types). This allows the model to reference your specific documents when answering questions, enhancing the relevance and accuracy of responses. This feature is an application of RAG techniques.

A product Microsoft has to help you build a solution that allows you to upload documents (unstructured enterprise data) and ask questions of them is Azure OpenAI Service On Your Data. Azure OpenAI On Your Data enables you to run advanced AI models such as GPT-35-Turbo and GPT-4 on your own enterprise data without needing to train or fine-tune models. You can chat on top of and analyze your data with greater accuracy. You can specify sources such as company databases, internal document repositories, cloud storage systems like Azure Blob Storage, SharePoint, or other designated data sources that contain the latest information to support the responses. You can access Azure OpenAI On Your Data using a REST API, via the SDK or the web-based interface in the Azure OpenAI Studio. Azure OpenAI On Your Data supports the following file types for uploading: .txt, .md, .html, .docx, .pptx, .pdf. You can also create a web app that connects to your data to enable an enhanced chat solution, or deploy it directly as a Copilot using Microsoft Copilot Studio.

Microsoft Copilot Studio is a low-code conversational AI platform that enables you to extend and customize Copilot for Microsoft 365 with plugins, as well as build your own copilots. Plugins are reusable building blocks that allow Copilot to access data from other systems of record, such as CRM, ERP, HRM and line-of-business apps, using 1200+ standard and premium connectors. You can also use plugins to incorporate your unique business processes into Copilot, such as expense management, HR onboarding, or IT support. And you can use plugins to control how Copilot responds to specific questions on topics like compliance, HR policies, and more.  Imagine you want to know how much of your team’s travel budget is left for the rest of the quarter. You ask Copilot in the chat, but it can’t answer because the data you’re looking for resides in your SAP system. With Copilot Studio, you can customize Copilot to connect to your SAP system and retrieve the information you need. Ask questions like “How many deals did I close this quarter?” or “What are the top opportunities in my pipeline?”. You can also orchestrate workflows with Copilot using Power Automate, such as booking a meeting, sending an email, or creating a document.

Copilot Studio also allows you to build custom copilots for generative AI experiences outside of Microsoft 365. With a separate Copilot Studio license, you can create conversational copilots for customers or employees and publish them on various channels, including websites, SharePoint, and social media. This flexibility enables organizations to design unique AI experiences, whether for enhancing customer interactions, streamlining internal functions, or developing innovative solutions. For instance, you can create a copilot for your website to help customers check in-stock items, provide quotes, or book services, or for your SharePoint site to assist employees with HR or IT requests.

Lastly, there is an accelerator called Information Assistant that will create an out-of-box solution in your Azure environment that includes a chat bot and the ability to upload your own documents to give answers to your questions using RAG techniques. Check out the code at https://aka.ms/fia and this video on how it works: Information Assistant, built with Azure OpenAI Service (youtube.com).

More info:

Transparency Note for Azure OpenAI Service

Will AI end with SQL, Is this the end of SQL?

Using OpenAI with Structured Data: A Beginner’s Guide | by Margaux Vander Plaetsen | Medium

How to use Azure Open AI to Enhance Your Data Analysis in Power BI (microsoft.com)

Querying structured data with Azure OpenAI | by Valentina Alto | Microsoft Azure | Medium

Using your data with Azure OpenAI Service – Azure OpenAI | Microsoft Learn

Microsoft Copilot or Copilot for Microsoft 365 (M365) or ChatGPT

Generative AI Defined: How it Works, Benefits and Dangers

The post Introduction to OpenAI and LLMs – Part 2 first appeared on James Serra's Blog.

Original post (opens in new tab)
View comments in original post (opens in new tab)

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating