At the core of Neum AI is the ability to search. Once we go through the process of extracting, processing, and ingesting data into vector databases, we can use that data to power search capabilities. This could be a search bar on a website or Retrieval Augmented Generation for an LLM chatbot. In this blog, we will explore the end-to-end capabilities provided by Neum AI for search.
A primer on Semantic Search
Today, keyword-based and full-text search are not enough. The software industry is moving towards a new kind of search that goes beyond keywords and fuzzy-matching, it’s moving towards semantic search. A lot of Vector Databases have sprung around recently because they provide an easy way to store information that can then be retrieved with fancy algorithms using natural language, where the results will be the closest ones in terms of meaning, or, semantically similar.
An example from elastic:
“Consider “chocolate milk.” A semantic search engine will distinguish between “chocolate milk” and “milk chocolate.” Though the keywords in the query are the same, the order in which they are written affects the meaning. As humans, we understand that milk chocolate refers to a variety of chocolate, whereas chocolate milk is chocolate-flavored milk.”
In this blog post we will show how to create embedding and indexing pipelines with Neum AI very easily so that you can take advantage of semantic search over your data. Whether it is for a Retrieval Augmented Generation (think a chatbot that needs constant context), or search functionality within an application.
Getting started
We will start by installing the neumai python sdk
pip install neumai
For this example, we will use Open AI embeddings and Weaviate vector database as part of our pipeline configuration:
- Open AI embeddings model for which you will need an Open AI API Key. To get an API Key visit OpenAI. Make sure you have configured billing for the account.
- Weaviate vector database for which you will need a Weaviate Cloud Service URL and API Key. To get a URL and API Key visit Weaviate Cloud Service.
Configure a simple pipeline
We will start with a pipeline configured using the Neum AI framework. The pipeline will extract data from a website, process it and drop it into a vector database. We will configure it using the OpenAI and Weaviate credentials. You can further customize and configure your desired pipeline using our components.
Once we have the pipeline, we will run it for the first time to populate our vector database. The pipeline can be triggered again if the data needs to be updated. (ex. website is updated or if using document stores, new documents are added.)
We now have a populated vector database and a pipeline configuration. Lets now search that vector database using the pipeline configuration to extract data.
Search data
Neum AI has built-in methods for search through the pipeline object that abstract the process of taking a text based query, translating it into a vector and performing a query on the underlying vector database.
This will output results in the form of a NeumSearchResult object. Each of the NeumSearchResult objects contains an id for the vector retrieved, the score of the similarity search and the metadata which includes the contents that were embedded to generate the vector as well as any other metadata that was included.
Output:
These results can be used to generate context to be fed to an LLM application or presented to the user as search results. For example processing them into a context string:
Next, we will explore the additional search capabilities provided when we take a local pipeline and deploy it to Neum AI Cloud.
Deploy to Neum AI
Using the NeumClient, we can use the same pipeline configuration we used locally and deploy it to the managed cloud. In order to deploy it, you will need a Neum AI key which you can do by signing up for Neum AI at dashboard.neum.ai and going to the settings page.
Then using the NeumClient, we can deploy the pipeline using the create_pipeline method. Simply pass the Pipeline object.
Save the the pipeline_id provided as you will need it to interact with the pipeline moving forward.
Once deployed, you can go to dashboard.neum.ai/mypipelines/<pipeline_id> to check the status or use the get_pipeline() method in the sdk.
Search Pipeline
Once deployed, you have access to the search_pipeline() method through the NeumClient to directly query.
Alternatively you can query pipelines from the UX or using REST APIs.
You can leverage the search results in the same way as shown earlier for either search bar capabilities or as context for an LLM application.
One key capability to highlight is the track property. This property is only available currently through the Neum AI Cloud and allows you to track queries and retrieved responses through your Pipeline. This means you can see what your users are querying for and the data being provided back. You can get a dump of all the retrievals captured through REST APIs.
Provide feedback on retrieval quality
One handy feature the Neum AI Cloud provides out of the box, is the ability to provide feedback (good / bad) for retrieved data. This feedback can be captured either through the Neum Dashboard or through REST APIs. You can export this data to be used for fine tuning purposes as well as to improve the pipeline.