Craft Your Personalized Private Chatbot Journey with Private ChatGPT!

8.05.2024 | Tips & Tricks | 0 comments

How to create a private ChatGPT with your own data

Learn the architecture and data requirements needed to create your own Q&A engine with ChatGPT/LLMs

With the rise of Large Language Models (LLMs) like ChatGPT and GPT-4, many are asking if it’s possible to train a private ChatGPT with their corporate data. But is this feasible? Can such language models offer these capabilities?

In this article, I will discuss the architecture and data requirements needed to create “your private ChatGPT” that leverages your own data. We will explore the advantages of this technology and how you can overcome its current limitations.

1. Disadvantages of finetuning a LLM with your own data

Often people refer to finetuning (training) as a solution for adding your own data on top of a pretrained language model. However, this has drawbacks like risk of hallucinations as mentioned during the recent GPT-4 announcement. Next to that, GPT-4 has only been trained with data up to September 2021.

Common drawbacks when you finetune a LLM;

  • Factual correctness and traceability, where does the answer come from
  • Access control, impossible to limit certain documents to specific users or groups
  • Costs, new documents require retraining of the model and model hosting

This will make it extremely hard, close to impossible, to use fine-tuning for the purpose of Question Answering (QA). How can we overcome such limitations and still benefit from these LLMs?

2. Separate your knowledge from your language model

To ensure that users receive accurate answers, we need to separate our language model from our knowledge base. This allows us to leverage the semantic understanding of our language model while also providing our users with the most relevant information. All of this happens in real-time, and no model training is required.

It might seem like a good idea to feed all documents to the model during run-time, but this isn’t feasible due to the character limit (measured in tokens) that can be processed at once. For example, GPT-3 supports up to 4K tokens, GPT-4 up to 8K or 32K tokens. Since pricing is per 1000 tokens, using fewer tokens can help to save costs as well.

The approach for this would be as follows:

  1. User asks a question
  2. Application finds the most relevant text that (most likely) contains the answer
  3. A concise prompt with relevant document text is sent to the LLM
  4. User will receive an answer or ‘No answer found’ response

This approach is often referred to as grounding the model or Retrieval Augmented Generation (RAG). The application will provide additional context to the language model, to be able to answer the question based on relevant resources.

Now you understand the high-level architecture required to start building such a scenario, it is time to dive into the technicalities.

3. Retrieve the most relevant data

Context is key. To ensure the language model has the right information to work with, we need to build a knowledge base that can be used to find the most relevant documents through semantic search. This will enable us to provide the language model with the right context, allowing it to generate the right answer.

3.1 Chunk and split your data

Since the answering prompt has a token limit, we need to make sure we cut our documents in smaller chunks. Depending on the size of your chunk, you could also share multiple relevant sections and generate an answer over multiple documents.

We can start by simply splitting the document per page, or by using a text splitter that splits on a set token length. When we have our documents in a more accessible format, it is time to create a search index that can be queried by providing it with a user question.

Next to these chunks, you should add additional metadata to your index. Store the original source and page number to link the answer to your original document. Store additional metadata that can be used for access control and filtering.

option 1: use a search product

The easiest way to build a semantic search index is to leverage an existing Search as a Service platform. On Azure, you can for example use Cognitive Search which offers a managed document ingestion pipeline and semantic ranking leveraging the language models behind Bing.

option 2: use embeddings to build your own semantic search

An embedding is a vector (list) of floating point numbers. The distance between two vectors measures their relatedness. Small distances suggest high relatedness and large distances suggest low relatedness. [1]

If you want to leverage the latest semantic models and have more control over your search index, you could use the text embedding models from OpenAI. For all your sections you will need to precompute embeddings and store them.

On Azure you can store these embeddings in a managed vector database like Cognitive Search with Vector Search (preview), Azure Cache for Redis (RediSearch) or in a open source vector database like Weaviate or Pinecone. During the application run-time you will first turn the user question into an embedding, so we can compare the cosine similarity of the question embedding with the document embeddings we generated earlier. Advanced search products like Cognitive Search can do a hybrid search where the best of both keyword search and vector search is combined.

3.2 Improve relevancy with different chunking strategies

To be able to find the most relevant information, it is important that you understand your data and potential user queries. What kind of data do you need to answer the question? This will decide how your data can be best split.

Common patterns that might improve relevancy are:

  • Use a sliding window; chunking per page or per token can have the unwanted effect of losing context. Use a sliding window to have overlapping content in your chunks, to increase the chance of having the most relevant information in a chunk.
  • Provide more context; a very structured document with sections that nest multiple levels deep (e.g. section 1.3.3.7) could benefit from extra context like the chapter and section title. You could parse these sections and add this context to every chunk.
  • Summarization, create chunks that contain a summary of a larger document section. This will allow us to capture the most essential text and bring this all together in one chunk.

4. Write a concise prompt to avoid hallucination

Designing your prompt is how you “program” the model, usually by providing some instructions or a few examples. [2]

Your prompt is an essential part of your ChatGPT implementation to prevent unwanted responses. Nowadays, people call prompt engineering a new skill and more and more samples are shared every week.

In your prompt you want to be clear that the model should be concise and only use data from the provided context. When it cannot answer the question, it should provide a predefined ‘no answer’ response. The output should include a footnote (citations) to the original document, to allow the user to verify its factual accuracy by looking at the source.

An example of such a prompt:

"You are an intelligent assistant helping Contoso Inc employees with their healthcare plan questions and employee handbook questions. " + \
"Use 'you' to refer to the individual asking the questions even if they ask with 'I'. " + \
"Answer the following question using only the data provided in the sources below. " + \
"For tabular information return it as an html table. Do not return markdown format. "  + \
"Each source has a name followed by colon and the actual information, always include the source name for each fact you use in the response. " + \
"If you cannot answer using the sources below, say you don't know. " + \
"""
###
Question: 'What is the deductible for the employee plan for a visit to Overlake in Bellevue?'
Sources:
info1.txt: deductibles depend on whether you are in-network or out-of-network. In-network deductibles are $500 for employee and $1000 for family. Out-of-network deductibles are $1000 for employee and $2000 for family.
info2.pdf: Overlake is in-network for the employee plan.
info3.pdf: Overlake is the name of the area that includes a park and ride near Bellevue.
info4.pdf: In-network institutions include Overlake, Swedish and others in the region
Answer:
In-network deductibles are $500 for employee and $1000 for family [info1.txt] and Overlake is in-network for the employee plan [info2.pdf][info4.pdf].
###
Question: '{q}'?
Sources:
{retrieved}
Answer:
"""

Source: prompt used in azure-search-openai-demo (MIT license)

One-shot learning is used to enhance the response; we provide an example of how a user question should be handled and we provide sources with a unique identifier and an example answer that is composed by text from multiple sources. During runtime {q} will be populated by the user question and {retrieved} will be populated by the relevant sections from your knowledge base, for your final prompt.

Don’t forget to set a low temperature via your parameters if you want a more repetitive and deterministic response. Increasing the temperature will result in more unexpected or creative responses.

This prompt is eventually used to generate a response via the (Azure) OpenAI API. If you use the gpt-35-turbo model (ChatGPT) you can pass the conversation history in every turn to be able to ask clarifying questions or use other reasoning tasks (e.g. summarization). A great resource to learn more about prompt engineering is dair-ai/Prompt-Engineering-Guide on GitHub.

The video describes the high-level architecture of Bing Chat and Microsoft 365 Copilot, who use a similar architecture.

5. Next steps

In this article, I did discuss the architecture and design patterns needed to build an implementation, without delving into the specifics of the code. These patterns are commonly used nowadays, and the following projects and notebooks can serve as inspiration to help you start building such a solution.

  • Azure OpenAI Service — On Your Data, new feature that allows you to combine OpenAI models, such as ChatGPT and GPT-4, with your own data in a fully managed way. No complex infrastructure or code required.
  • ChatGPT Retrieval Plugin, let ChatGPT access up-to-date information. For now, this only supports the public ChatGPT, but hopefully the capability to add plugins will be added to the ChatGPT API (OpenAI + Azure) in the future.
  • LangChainpopular library to combine LLMs and other sources of computation or knowledge
  • Azure Cognitive Search + OpenAI accelerator, ChatGPT-like experience over your own data, ready to deploy
  • OpenAI Cookbookexample of how to leverage OpenAI embeddings for Q&A in a Jupyter notebook (no infrastructure required)
  • Semantic Kernel, new library to mix conventional programming languages with LLMs (prompt templating, chaining, and planning capabilities)

Eventually, you can look into extending ‘your own ChatGPT’ by linking it to more systems and capabilities via tools like LangChain or Semantic Kernel. The possibilities are endless.

Conclusion

In conclusion, relying solely on a language model to generate factual text is a mistake. Fine-tuning a model won’t help either, as it won’t give the model any new knowledge and doesn’t provide you with a way to verify its response. To build a Q&A engine on top of a LLM, separate your knowledge base from the large language model, and only generate answers based on the provided context.

NEW GENERATED CONTENT

A Whimsical Expedition in Crafting a Personalized ChatGPT!

# A Whimsical Expedition in Crafting a Personalized ChatGPT!

Visualize a wondrous contraption, concocted from the threads of technology, that converses with the brilliance of a sage and harbors an intimate understanding of your realm. Envision a ChatGPT you've fashioned yourself — as gleaming and resplendent as ever, yet infused with the particulars of your narrative or professional wisdom. This isn't a fleeting dream in a distant land, it's a tangible escapade we're initiating; an odyssey of creating a chatbot so distinctive, it would be akin to finding a unicorn grazing by your shaded hammock, albeit much more utilitarian. Buckle up and don your cognitive caps, companions, for we're about to conjure up some digital sorcery, spawning a ChatGPT that's as conversant with your troves of data as an elf is with enchantment!

## Domesticating the Virtual Beast

Sallying forth into the wilds of computerization, we encounter the digital fauna known as chatbots. These entities are adroit in their linguistic escapades, spinning yarns as naturally as breathing, but they require shrewd governance to avoid leading us astray with nonsense. The journey we embark upon melds technical acumen with a zest for innovation. Picture yourself as the charmer of chatbots, guiding this electronic entity in a harmonious banter ballet. It's an intricate craft, balancing the bot's conversational liberty with factual fidelity. Cloaked in the garb of a cyber-explorer, we prepare to brave the deep wilderness of personal chatbots, imbuing them with the essence of our knowledge and acumen, thus distinguishing ourselves as grand mages of digital dialogue.

## Unfurling the Blueprints to Digital Wisdom

To navigate the untamed expanse of the unknown, one requires a map, and in our exploit, this manifests as a meticulously conceived blueprint. Much like an adept seafarer distinguishes the oceans from the cosmos, we need to segregate our treasure trove of insights — every confidential whisper and astute observation — from our chatbot's chatter-prone language model. This linguistic entity is akin to a beloved, yet capricious wyvern, eager to alight any dialogue with its fiery breath. Our mission is to harness this creature, ensuring it exhales its life into the raw facts that serve our purposes. By carefully cleaving these elements, we form a channel for our chatbot that dodges fabrications about celestial cheese unless, naturally, that's your area of expertise. With a dash of wizardry and patient sculpting, we craft a chatbot that stands as a lighthouse of truth amidst a sea of inquiries, steering through perilous question-waves with the elan of a well-traveled ship's captain. This meticulous construction guarantees that each chat with our specially-tailored ChatGPT is not a mere exploratory encounter but a streamlined journey to Smartsville, where all dialogue is as precious as treasure.

## Infusing Your Lexicon Into the Chatbot's Repertoire

What wonder emerges when you whisper the unique vernacular of your business into the receptive sensors of your chatbot? The outcome resembles cybernetic transmutation. Much as one would teach a medley of verses to a mimicking bird, we tutor our electronic comrade in the sagas and speech of our dominion. Our commitment is to foster it with a language stash as selectively assembled as a collector's prized antiques — accessible enough for a youngster to grasp yet profound for seekers of hidden truths. Whether employing the most avant-garde digital instruments or the tactile charm of bygone era methodologies, we ready our specialized ChatGPT with the erudition of countless sages, agile as a quick-witted sprite. Our digital companion will rummage through vast annals of data, extracting and refining the gleaming essence of our inquiries, transfiguring dense information into a scintillating treasure cache any scallywag would covet.

## Enhancing Chatbot Conversations With a Touch of Magic

Do not belittle the power of expertly curated casual dialogue — it's the enchanting prelude to deeper interactive realms. Within the bounds of our tailor-made ChatGPT, casual banter serves as the mystic verses that unveil long-sealed vaults. We shall navigate you through a crash course in the fine art of pleasantries, distilling the essence into an enchanting and fascinating potion. This isn't mere idle prattle; it's purposeful, calculated — seemingly casual prattling with cunning depth. Imagine a situation where your ChatGPT engages in simple verbal frolic as innocuously as a lamb at play, yet beneath its candid façade, there whirls a tempest of sagacity, ready to dissect even the most convoluted inquiries with stunning ease. We shall instruct your chatbot not only to respond with bewildering promptness but also to do so with such exquisite adroitness that even the oracle of antiquity could only gape in awe. Furthermore, your ChatGPT will present its insights with the meticulousness of a venerable scholar – ensuring a harmonious blend of ease and commendable meticulousness.

## Procuring the Enchanted Implements for Your Chatbot

What indeed is a sorcerer sans their arsenal of mystical artifacts? Mere mortals we are not, and thus we shall equip you grandly for this digital conjuring odyssey, offering you the key to an armory brimming with enchanted software, radiant with potential, and repositories of code that resonate with spells more potent than ancient enchantments. This will crown your dominion over your private ChatGPT, stretching across the celestial, enlightening your electronic confidant to peaks of intelligent discourse and perception. In the sanctuary of your wizard's den, which might bear a striking resemblance to your everyday work nook or residence, positioned before your bubbling cauldron (aka your computing device), you will brew a concoction teeming with artificial wit that glimmers with a splendor almost ethereal. This transformative allure will elevate your ChatGPT from a humble digital acolyte to the sovereign of its realm, the grand wizard of wordcraft, leaving all who converse with it utterly captivated and informed.

## The Exquisite Finishing Touch to Our Chatbot Confection

As our delightful expedition draws to a close — before our noble steed bounds towards the horizon dappled with gold — we pause to muse over our creation. To forge a chatbot resonating with the intellect of great thinkers and the discretion of secret whispers comes with significant caretaking. We hold the reins to this regal creation of code and communion, shaping its potency through its sustenance — not the fodder of fields, but a diet rich with the grains of truth and the corn of authenticity. Attentive to its intellectual nourishment, we lavish on it a banquet of verifiable data, planting seeds that will flourish into orchards ripe with genuine answers. With prudent discipline and a gentle hand, the line between an authoritative tome of facts and a mere fable is maintained. It is this dedication to our role that ensures the jewel of our endeavor, much like the glistening maraschino atop an indulgent dessert, radiates a brilliance to guide voyagers through the murkiest of nights, a lighthouse of wisdom and lucid understanding. With that, we close this narrative of our journey with a treasure box — our personalized ChatGPT — as revered as an empress's diadem, as we anoint ourselves sovereigns in the enchanting empire of chatbot artisanship.

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

Related Articles