Objective

In this hands-on workshop, We will learn how to connect ChatGPT to proprietary data stores using Elasticsearch and build question/answer capabilities for your data. In a demo, We will quickly convert your website, FAQ, or any documentation into prompt chat where your user can directly ask a question on your data.

Flow

ChatGPT with Elasticsearch

Prerequisites

  1. You have used ChatGPT :)

  2. Good to have understanding around Elasticsearch (Not mandatory, Introduction will be cover)

  3. System + Internet connection

  4. OpenAI account with API key - Create new one from https://platform.openai.com/account/api-keys . Make sure it having free credits .

Without local setup

  1. Google account to use google Colab .

  2. Render account.

Local setup

  1. Git - Install it from https://git-scm.com/downloads

  2. Docker - Good to have. Install it from https://docs.docker.com/engine/install/ .

  3. Having basic python knowledge will be good.

For a workshop we going to follow without local setup.

1. Setup cluster

  1. Visit cloud.elastic.co and signup.

  2. Click on Create deployment. In the pop-up, you can change the settings or leave it default.

  3. We need to add machine learning instance. For that, simply click on “advance settings” .

  4. Go to “Machine Learning instances” -> click on “Add Capacity” and select at least 4GB ram capacity.

  5. Finally click on “Create deployment”.

  6. Download / Copy the deployment credentials.

  7. Once deployment ready, click on “Continue” (or click on Open Kibana). It will redirect you on kibana dashboard.

2. Deploy Model

Go to the kibana panel. Navigate to Menu -> Machine Learning (In Analytics section). In left menu, Click on Trained Models (In Model Management Section).

  1. ELSER can be found in the list of trained models.
  2. Click the Download model button under Actions.
  3. After the download is finished, start the deployment by clicking the Start deployment button.
  4. Provide a deployment ID, select the priority, and set the number of allocations and threads per allocation values.
  5. Click Start.

Third party model

We are going to use all-distilroberta-v1 model hosted on a hugging face. Lets import on an elastic cluster using eland.

Get your credentials ready

  • cloud_id : Visit “cloud.elastic.co ” -> Navigate to your deployment and click on “manage”. Simply copy Cloud ID and save it.
  • cloud_user: elastic
  • cloud_password: You will get it from step 1.6. If you forget to save, Simply click on “Action” -> “Reset password”. (Username will be elastic only)
  • hf_model_id: sentence-transformers/all-distilroberta-v1 (Go to model page on huggingface & copy the ID sentence-transformers/all-distilroberta-v1)

Now there is two way, You can upload the model using docker as well as Google colab.

Simply click on below link. It will open ready made notebook. You just need to click on play button to run notebood.

Open In Colab

Using Docker

  1. We’re going to use docker for import model to the elastic cluster

    1.  git clone https://github.com/elastic/eland.git 
       cd eland
      
    2.  docker build -t elastic/eland .
      
    3.  docker run -it --rm elastic/eland eland_import_hub_model \
           --cloud-id <cloud_id> \
           -u elastic -p <elastic_cloud_password> \
           --hub-model-id sentence-transformers/all-distilroberta-v1 \
           --task-type text_embedding \
           --start
      
    4. Let’s wait till the model gets uploaded without any error.

    5. Exit from eland folder.

      cd ..
      

Verify uploaded model

Go to the kibana panel. Navigate to Menu -> Machine Learning (In Analytics section). In left menu, Click on Trained Models(Model Management Section). You must see your model here in the “Started” state.

In case if a warning message is displayed at the top of the page that says ML job and trained model synchronization required. Follow the link to Synchronize your jobs and trained models. Then click Synchronize.

3. Crawling private data

  1. Click on Menu -> Enterprise Search -> “Create an Elasticsearch index” button
  2. Click on Web crawler.
  3. Add index name (It will add prefix search) and hit “Create index”. In my case index name is (search-ashish.one)
  4. Go to “Pipelines” to create a pipeline.
  5. Click “Copy and customize” in the Ingest Pipeline Box.
  6. Click “Add Inference Pipeline” in the Machine Learning Inference Pipelines box.
  7. Give the unique pipeline name e.g. “ml-inference-ashish-one
  8. Select a trained ML Model from the dropdown “sentence-transformers__all-distilroberta-v1” (For ELSER choose “.elser_model_1”)
  9. Select “title” as the Source field and set “title-vector” as a destination. You can specify your own destination field name. (In case of ELSER, just select the “Source” field e.g title, body_content)
  10. Let’s click on “Continue” and move to the Test(Optional) tab. Click on “Continue” again.
  11. At the Review stage let’s click on “Create pipeline”.
  12. (Skip this for ELSER) Go to Menu -> Management -> Dev Tools. Let’s create a mapping
POST <index_name>/_mapping
{
  "properties": {
    "<vector_field_name>": {
      "type": "dense_vector",
      "dims": 768,
      "index": true,
      "similarity": "dot_product"
    }
  }
}

In my case mapping will be:

POST search-ashish.one/_mapping
{
  "properties": {
    "title-vector": {
      "type": "dense_vector",
      "dims": 768,
      "index": true,
      "similarity": "dot_product"
    }
  }
}

Paste above query in cosole and hit on play button.

  1. Go to Enterprise search -> indices -> your_index_name -> Manage Domains. Enter the domain (e.g. https://ashish.one . You can add your own domain) to crawl and hit “Validate Domain”.
  2. If everything is fine, simply click on “Add domain” and start crawling by click on Crawl -> Crawl all domains on this index.
  3. Go to Enterprise Search -> Indices. You should see your index name.

4. Setup Interface

** Get your credentials ready **

  1. cloud_id : Visit “cloud.elastic.co ” -> Navigate to your deployment and click on “manage”. Simply copy Cloud ID and save it.
  2. cloud_user: elastic
  3. cloud_password: You will get it from step 1.6. If you forget to save, Simply click on “Action” -> “Reset password”. (Username will be elastic)
  4. openai_api: Create open ai api key from https://platform.openai.com/account/api-keys .
  5. es_index: Index name which we created in step 3.3. (search-ashish.one)
  6. vector_field: The field which we’ve set for destination at step 3.9. i.e. title-vector

Setup on local with Docker

  1. Clone
git clone https://github.com/ashishtiwari1993/elasticsearch-chatgpt.git
cd elasticsearch-chatgpt
  1. Replace credentials in Dockerfile

Open Dockerfile and change below creds

ENV openai_api="<open_api_key>"
ENV cloud_id="<elastic cloud id>"
ENV cloud_user="elastic"
ENV cloud_pass="<elastic_cloud_password>"
ENV es_index="<elasticsearch_index_name>"
ENV chat_title="<Any title for your page e.g. ashish.one GPT>"
ENV vector_field="< specify vector field where embedding will be save. e.g. title-vector>"
  1. Build
docker build -t es-gpt .
  1. Run
docker run -p 8501:8501 es-gpt

Simply visit on localhost:8501

Setup on Render with Docker

  1. Signup on https://render.com .

  2. Create Web Service.

  3. Go to Public Git repository section and add below repo url

https://github.com/ashishtiwari1993/elasticsearch-chatgpt

Hit on Continue.

  1. Add Name and select Free Instance Type.

  2. Click on Advanced and Add Environment Variable

openai_api="<open_api_key>"
cloud_id="<elastic cloud id>"
cloud_user="elastic"
cloud_pass="<elastic_cloud_password>"
es_index="<elasticsearch_index_name>"                                                 
chat_title="<Any title for your page e.g. ashish.one GPT>"
vector_field="< specify vector field where embedding will be save. e.g. title-vector>"
  1. Finally click on Create Web Service

Output

ashish.one ChatGPT

Reference

Blog - ChatGPT and Elasticsearch: OpenAI meets private data