Quick Start

Let's see how we can quickly use citrus to index texts and perform semantic search!

We'll only focus on the core citrusdb functions. For production, you'd likely implement the same on a server. Tutorials for that coming soon!

Install the library

# Install via pip
pip install --upgrade citrusdb

Create an index

Before we can insert vectors, we need to create an index.

main.py
import citrusdb

client = citrusdb.Client(persist_directory="citrus")

client.create_index(
    name="my_index",                # index name
    dimensions=1536,                 # vector dimension
    max_elements=1000,
    allow_replace_deleted=True
)

This will create a client for interacting with our index. We then create the index itself with a name my_index. Check out the API reference later to find how you can play around with this.

We're also choosing to persist our data inside the citrus directory. This will ensure that our data remains saved and in the future, citrusdb can load the index from there.

Insert elements

ids = [0, 1, 2]
documents = [
  "Your time is limited, so don't waste it living someone else's life",
  "I'd rather be optimistic and wrong than pessimistic and right.",
  "Running a start-up is like chewing glass and staring into the abyss."
]

client.add(
  index="my_index",
  ids=ids,
  documents=documents
)

You can directly pass vector embeddings as well. If you're passing a list of strings like we have done here, ensure you have your OPENAI_API_KEY in the environment. By default we use OpenAI to generate the embeddings. Please reach out if you're looking for support from a different provider (or submit a PR 😉)!

If you're using a different embedding model, you can insert that vector directly. Please ensure that you create the index with the same dimension as that of your vector embedding.

client.add(
    index="my_index",
    ids=ids,
    embeddings=vectors
)

After inserting elements, we can perform semantic search to retrieve similar documents. Here's an example of performing a semantic search:

query_document = "What is it like to launch a startup"
results = client.query(
    index="my_index",
    documents=[query_document],
    k=1,
    include=["document"]
)

for quotes_list in results:
    for quote in quotes_list:
        print(quote["document"])
  Running a start-up is like chewing glass and staring into the abyss.

You can specify if you want the associated text document to be returned or not. By default, only the IDs are returned.

Sweet! Now it's time for you to build your thing.

Last updated