Blog August 15th, 2023 by Ken Tilk, Head of Data&AI; Hannes Rõõs, AI Architect; Ulan Yisaev, AI Architect
Smart and state of the art – Intelligent Search makes life easier
With Intelligent Search you can ask questions just as you would in a conversation and get answers just as effortlessly. Dive deep into your own data, use the language that suits you best, and enjoy the peace of mind that secure operations powered by large language models (LLMs) and cutting-edge technologies bring.
Navigating today’s maze of information can feel daunting. Jumping between documents, hassling your workmates, or knocking on unfamiliar doors. Intelligent Search does away with all the hassle by bringing everything together in one solution where things like the operating environment, text and audio/video format, and context language don’t matter.
What can Intelligent Search do?
Imagine a single digital hub that effortlessly blends Sharepoint files, Confluence pages, Teams meeting recordings, emails, and even your talent planning tools. Now imagine asking that same hub a simple question like “Show me projects completed in Finland’s industrial sector?” or “Which Java Developers are available from September to December?”
But it’s not just about asking; it’s about understanding too. Your team can ask questions and get answers in their own language, even if the source content is in a different language. And if you’re someone who likes to sweat the details, Intelligent Search can highlight the exact snippet the answer is taken from.
Feel like a conversation? Switch to chat mode, dig deeper, ask questions, and uncover more. If you only need insights from a select few documents, try the “Document Chatting” feature and rest easy knowing that these standalone files won’t mingle with your core dataset.
The best part? We sculpt Intelligent Search to sit perfectly wherever you need it – whether on a webpage, in Microsoft Teams, or on your preferred platform.
How does it work?
Intelligent Search leverages state-of-the-art technologies. At its core is Retrieval Augmented Generation (RAG) for question answering (QA). This approach combines the best of retrieval-based and generative methods for QA, ensuring precise and contextually relevant answers. RAG methods enable us to give the large language model access to an up-to-date knowledge base. This approach is generally considered more efficient in terms of computational resources and time compared to training a model from scratch or fine-tuning. However, depending on the specific needs of a particular project and customer, we also have the capability to train a model from scratch or fine-tune it.
Our demo environment is built on the Haystack open-source LLM orchestration framework, but this is just one of the many options available in our highly customizable solution. Thanks to Haystack’s flexibility we can integrate the latest natural language processing (NLP) models, particularly the Transformer-based models, which have revolutionized the field of semantic document search, question answering, and summarization. This framework is particularly adept at implementing Large-scale Fact-checking QA (LFQA) systems, which use a large corpus of documents as a knowledge base. This two-step process involves retrieval, where relevant documents or passages are identified, and generation, where a response is crafted based on the retrieved documents.
Our solution’s flexibility means it can be tailored to run on other latest open-source models and be deployed either on-premise or on any cloud provider, including local ones. Instead of directly using OpenAI’s models, our system integrates with the Azure OpenAI Service LLM models so we have access to the most advanced language models while also benefiting from Azure’s robust security and enterprise-grade features.
The journey of data: From acquisition to indexed embeddings
The graph below shows how raw data is transformed into a structured and searchable format, highlighting the steps involved in preparing the data for efficient retrieval.
Decoding user queries: From simple questions to intelligent responses
The graph below shows the process of understanding and responding to user queries, emphasizing the sophistication and intelligence behind generating relevant answers.
Robust security you can count on
Security is the number-one priority in the digital age, so Intelligent Search packs several robust measures.
It can be tailored to use fine-grained access control so that only users with permission to view the underlying documents can see search results. This is crucial if your organization handles sensitive data or needs information to be strictly compartmentalized.
All data is encrypted, both at rest and in transit, using advanced encryption standards. This not only protects against external threats but ensures that readable data isn’t exposed if there’s a breach.
Because our demo environment uses the Azure OpenAI Service, it benefits from Microsoft Azure’s robust infrastructure. This includes data encryption both at rest and in transit using FIPS 140-2 compliant 256-bit AES encryption. Azure also includes abuse monitoring to detect and mitigate potential misuse.
If Azure isn’t your preferred choice, our solution can be adapted to prioritize security considerations inherent to open-source models and on-premise hardware. This flexibility ensures that, regardless of your chosen implementation, data integrity and security remain uncompromised.
Flexible, versatile, and model-agnostic
Besides generative LLMs, there are also models for many other related use cases, such as semantic search. The performance of the best open-source text similarity models is comparable to the offering of OpenAI, the inference speed is high, and the hardware requirements are relatively low. The semantic search model can be combined seamlessly with a generative LLM to cover the entire Intelligent Search pipeline – enabling thorough answers to queries and questions based on your own documents by finding the most relevant documents and then generating the answer based on them. It is also possible to use LLMs to develop chatbots that consider previous discussions when answering users’ questions.
Even though it is also possible to use general-usage LLMs for tasks as diverse as translation, summarization, and question answering on structured data, there are also specific open-source models for other tasks related to such use cases. In addition to those, there are also models to convert speech to text and vice versa if needed. This diversity of capabilities allows our product to cater to a wide range of tasks, ensuring versatility and utility across various applications.
What do I need to host Intelligent Search?
The hardware you need will depend on whether you’re hosting the model yourself or prefer to use an API to a third party. If you’re hosting, then you need to decide whether this will be done on a cloud or on-premise platform. We have hands-on experience deploying our system on both Azure and AWS cloud platforms as well as on premise. Each platform offers unique advantages, from scalability and performance to specific security features and regional data storage options.
If you’re not using LLM as a service from a third-party provider like OpenAI, you’ll need to bear in mind that LLMs are generally trained on very large corpuses and have billions of trainable parameters, which means running them takes a lot of computing power. To run most of them, access to GPU or TPU power is needed, as running on CPUs will be prohibitively slow in most cases.
Because of the considerable hardware requirements associated with LLMs, efforts have been made to mitigate their demands without compromising their performance. One approach involves using quantized versions of these models. In this context, quantization refers to the reduction of numerical precision, wherein floating-point values are approximated using a smaller set of bits. While this technique can lead to a reduction in the computational load and memory requirements of LLMs, it comes with inherent trade-offs linked to precision. The process of quantization may lead to a loss of subtle details in the model’s output, potentially affecting the quality and accuracy of generated text. Consequently, the decision to deploy quantized LLMs should be carefully weighed against the specific use case’s requirements for precision and performance. Moving from 16-bit precision to 8-bit precision doesn’t result in a noticeable drop in output quality, while moving to 4-bit version has a more substantial impact.
Even though some smaller LLMs can be evaluated for personal use on more powerful laptops, for industrial use the deployment of LLMs predominantly follows two primary pathways: on-premise deployment or cloud-based solutions. On-premise implementation involves configuring and hosting the models within your organization’s own infrastructure. This approach gives you greater control over data security, privacy, and compliance while also allowing for customization in line with your business requirements. Alternatively, cloud solutions involve leveraging the computational resources and infrastructure of third-party cloud providers. Cloud-based deployment offers scalability and flexibility, enabling you to swiftly adapt to varying workloads without having to invest heavily in hardware. It also relieves the burden of hardware management and maintenance, allowing your teams to focus on their core tasks. The choice between on-premise and cloud deployment hinges on factors like your security needs, operational preferences, and the available technological infrastructure.
Where can I try Intelligent Search?
Schedule a demo with us and we’ll guide you through the solution, tailored with your sample data if needed.Get in touch
Get in touch
Let us offer you a new perspective.