An immediate application of AI in the real world is the ability to interact with documents and proprietary information to generate outcomes. By using artificial intelligence, we can make natural language inquiries, summarize texts, or engage in chat within a knowledge base through the use of Retrieval Augmented Generation.

This technique not only improves efficiency but also personalizes the user experience, as AI can generate responses or content based directly on specific customer information.

Practical Applications of AI with Retrieval Augmented Generation in Various Sectors:

  1. Education: It can provide personalized educational material, condensing information on complex subjects and generating practical exercises. This not only complements traditional teaching methods but also supports educators in addressing individual students' needs.
  2. Human Resources: Employee onboarding is critical for any organization. AI facilitates this process, allowing new staff to interact with intelligent systems to obtain relevant information and address their concerns immediately.
  3. Sales and Marketing: In a highly competitive environment, personalization is key. AI allows sales teams to create custom communications, combining product data, preferences, and customer contact information, resulting in more meaningful interactions and higher conversion.
  4. Customer Service: Implementing AI in customer service improves efficiency by providing quick and accurate responses to common inquiries. This not only enhances customer satisfaction but also allows staff to focus on more complex issues that require specialized intervention.

There's nothing better than seeing an example, so at Lostium we've developed a prototype that turns GPT into an expert teacher in fermentation and pickling. Users can talk to her to resolve doubts related to the lessons in our application FermentApp.

In this video, you can see our assistant in action.

In this case, we have implemented a chat with our own interface, but the solution could be perfectly integrated into existing messaging tools, like WhatsApp or Telegram, or in specific applications of our clients.

Remember, if you want to implement a Generative Artificial Intelligence-based solution like this, contact us, and we will be happy to assist you.

What is Retrieval Augmented Generation?

To implement these AI-based solutions, we use an approach called 'Retrieval Augmented Generation', better known by its acronym, RAG.

RAG is a strategy designed to maximize the accuracy and relevance of responses generated by AIs. This technique allows Large Language Models (LLMs) to use customer information without the need for retraining.

Within the RAG approach, knowledge repositories are created that index client-specific data. A significant advantage is that these repositories are easily updatable, allowing the generative AI to provide responses that are not only accurate but also tailored to the client's context.

Once this knowledge base is generated, the operation is very simple:

flowchart TD user("fa:fa-user User")-- 1. Question -->app("fa:fa-mobile-alt Application") app-- 2. Search -->bbdd[("\nfa:fa-database Knowledge\nbase")] bbdd--3. Relevant\nDocuments-->app app--4. Prompt with\nuser\nquestion\nand\ndocuments -->LLM(<img src=""/>GPT-3.5-Instructor) LLM--5. Answer-->app style user fill:#f9620f,stroke:#CA4A05,color:#fff style app fill:#61d2de,stroke:#28A8B3 style bbdd fill:#fcd948,stroke:#AB8C0E style LLM fill:#fff, stroke:#333

1. The user poses a question.

2. The application, based on this query, searches for relevant documents or snippets within the client's knowledge repository.

3. The search returns a set of documents that may be relevant to the question. With this information, a summary is made to provide context.

4. Using the user's question and the context, the query is made by applying prompt engineering techniques to GPT or any other LLM model. This produces an appropriate response with the client's own information and using natural language.

It is essential to emphasize that the quality of the responses provided by artificial intelligence is directly linked to the richness and depth of the information it is fed. If it only has a brief document, the AI's response capabilities will be limited, offering possibly superficial or insufficient information in relation to the user's query. However, if there is a series of documents that extensively cover the spectrum of possible user questions, the quality and accuracy of the responses generated by artificial intelligence will be significantly higher.

Technical Implementation of Retrieval Augmented Generation-based Solutions

When implementing solutions of this kind, we have the advantage of not using intermediary services. In addition, we have the ability to develop specific solutions that are exclusively tailored to the client's needs. 

We use programming tools like LangChain, a framework that allows interaction with LLMs in a standardized way and independent of the artificial intelligence provider we use. This framework also supports Retrieval Augmented Generation, which greatly facilitates our work.

Creation of the Knowledge Database

To carry out this task, it is necessary to compile all the available client information. This can be varied, covering formats such as PDFs, web pages, plain text, JSON files, Markdown, and even video or audio content.

The process of creating the knowledge database is as follows:

flowchart TD subgraph LLM LLM(<img src=""/>Embedding\nGeneration) end subgraph node ["Database Creation"] load("fa:fa-file-upload Document upload") pre("fa:fa-spinner Content processing") split("fa:fa-cut Chunking phase") store("fa:fa-save Storage") end load-->pre pre-->split split-->LLM LLM-->store store-->bbdd[("\nfa:fa-bars Vector Base\n")] style load fill:#fff,stroke:#000 style pre fill:#f9620f,stroke:#CA4A05,color:#fff style split fill:#000,stroke:#000,color:#fff style LLM fill:#fff, stroke:#333 style bbdd fill:#fcd948,stroke:#AB8C0E

To carry out this task, we create a custom process with TypeScript and LangChain. First, we load the information provided by the client into memory.

Sometimes it is necessary to process the documents. For example, if it's HTML, it might be useful to clean the tags so we are left with the text only. If, on the other hand, it's a YouTube video, it will be necessary to transcribe the audio to text so that it can be interpreted by the AI.

The chunking phase is very important, as LLM models have a limitation: it is not possible to exceed the maximum size of the context window. That is, if we have very large documents, we cannot provide them in full to the model; it will only receive a series of snippets relevant to the user's query.

Once we have the set of documents, resulting from processing and chunking the original information, we need to convert them into embeddings.

And what is an embedding? It is a vector representation of an object. Without delving into the technical aspects, these embeddings are generated using an AI model that is responsible for transforming text into numerical vectors. The vectors are stored in a vector store, which is a database optimized for storing and retrieving information in this format.

LangChain has a set of tools to carry out this process efficiently. In our case, we have also relied on OpenAI's infrastructure to generate the embeddings.

How are user queries processed?

Using the example we have shown you in the video, we present to you a slightly more complex chart explaining its operation:

sequenceDiagram actor User box Application participant App participant Vector Store end box OpenAI participant Whisper participant GPT end box Google Cloud participant Text-to-Speech AI end User->>App: Question App->>Whisper: User's audio Whisper->>App: Transcribed question in text App->>Vector Store: Search with the question Vector Store->>App: Relevant results App->>GPT: Prompt + context + user question GPT->>App: AI generative text response App->>Text-to-Speech AI: Response Text-to-Speech AI->>User:Audio response
  • We use a Speech to Text system to transcribe the audio generated by the client. In our case, we use OpenAI's Whisper.
  • We convert the question into a vector and search for related documents within the vector store to generate a context of information.
  • With the relevant documents, we compose a request to the LLM, in our case gpt-3.5-instructor, with the user's question and the context resulting from the search in the vector base.
  • We process the response. Essentially, we obtain an answer in text format and using the text to speech API from Google Cloud, we convert the text into voice through AI.

And finally, we remind you that if you are interested in applying AI to your projects and need a customized solution, contact us and we will see how we can assist you.