February 18, 2025Fermin Ordaz Reading time ~10 minutes

LlamaIndex Chatbot: Beyond Hello World

Building a Chatbot with LlamaIndex : Beyond Hello World

chat-bot

Motivation

This series goes beyond the typical ‘Hello World’ tutorial to explore essential considerations for building real-world Generative AI applications. While there are excellent resources [1] [2] on best practices for building Generative AI systems, applying them in real implementations isn’t always clear. This series is for those who are already familiar with Large Language Models (LLMs), have some experience with basic tutorials, and are ready to take the next steps in building more realistic applications.

Through the series, we’ll use LlamaIndex and other tools to incrementally build a minimal Generative AI application while addressing key topics like LLM hallucinations, evaluations, monitoring, and retrieval-augmented generation (RAG).

Example Application

In this tutorial, we’ll build a simple Q&A chatbot using a fictitious product called RelMiner and its documentation. The chatbot will enable us to query the RelMiner documentation interactively, retrieving relevant information based on our questions.

Understanding your input data is crucial in Generative AI. For convenience, here is a small section of the docs, so you get familiar with RelMiner:

RelMiner is an open-source command-line tool designed to automatically extract a knowledge graph from a given text corpus. It detects entities, concepts, and relationships within unstructured text, making it ideal for researchers, data scientists, and knowledge engineers looking to structure textual data.

Reading the docs (relminer.txt) in more detail will help you assess better the model’s response quality.

Walkthrough the code

The first iteration of our chatbot is simple, and here are the key sections of the code.

llm = OpenAI(
    model=MODEL_NAME,
    system_prompt=system_prompt,
)

The LLM is created using the model name and the system prompt, which frames the personality and goal of the Chatbot. LlamaIndex supports many providers, in this case the OpenAI class, allows the Chatbot to use OpenAI for text generations.

The LLM is responsible for generating the final answer, given our question and relevant sections of our documentation. This technique of providing relevant context to the LLM along with the user query is called grounding, and is key to prevent model hallucinations and getting more accurate and relevant answers.

data = SimpleDirectoryReader(input_dir=raw_data_dir).load_data()

# we use the storage context for persisting our index
storage_context = StorageContext.from_defaults()

"""
the in memory index that allows the chat engine to find relevant 
pieces of content
"""
vector_index = VectorStoreIndex.from_documents(
    data, show_progress=True, storage_context=storage_context
)

# saving the index for future use
vector_index.storage_context.persist(persist_dir=persistence_index_dir)

This section of the code, does three things:

Loads the RelMiner’s documentation
Creates a semantic index with the documentation
Persists the the index for later use

Here we create an index using the RelMiner documentation, so the Chatbot can search and find the semantically relevant sections of the documentation for the given question.

LlamaIndex provides a SimpleDirectoryReader to load the documentation, and the VectorStoreIndex that breaks the documents into smaller sections called chunks, and creates a semantic index with these pieces. The VectorStore provides Retrieval Augmented Generation (RAG) which basically consist in finding semantically relevant pieces of documentation, given a user question.

Last, we persist the index so we avoid reconstructing it in subsequent runs.

chat_engine = vector_index.as_chat_engine(
   chat_mode=chat_mode, 
   llm=llm, 
   verbose=True
)

Once the Index is created, we can use the convenient method as_chat_engine to get a fully initialized Chat Engine, capable of understanding our questions, finding the relevant information in the index, and invoking the LLM with the given context.

Chatbot test drive

Please following these instructions in the README.md to setup the Chatbot.

Let’s start with some basic “vibe checks” and see how the Chatbot answers our questions.

Now use this command to run the chatbot:

python iteration_0/chatbot.py --mode best --prompt v0

NOTE: Please keep in mind that you won’t get the same answers I got when running the Chatbot. The behavior might be different, specially in the first conversations. Hopefully, as you tweak the system you should get more relevant answers as described below.

Conversation 1

At this point we are using “best” chat mode, which means LlamaIndex uses the “openai” mode if you are using an OpenAI model that supports the latest function calling API, otherwise uses the “react” mode.

Question: What does relminer provides ?

RelMiner provides a set of tools and techniques for mining relationships from large datasets. It uses advanced algorithms to discover patterns and connections between different data points, which can be used for various purposes such as predictive analytics, customer segmentation, and network analysis. It can handle both structured and unstructured data, making it a versatile tool for data analysis.

At first glance, the answer looks reasonable, but is not really grounded on our documentation, and is just pure hallucination:

According to our docs, RelMiner is just a command line tool, not “set of tools and techniques”
The section “…mining relationships from large datasets” is not indicated in our docs.
The docs don’t mention anything about “predictive analytics, customer segmentation”: which in this case it could make sense, but the subtle point here is that the output is not aligned with the documents.

If this Chatbot was about a refund policy, we could get in trouble given false information to our clients.

Question: What input formats does it support ?

I’m sorry for the confusion, but as an AI model developed by OpenAI, I don’t have specific information about the input formats supported by RelMiner. It would be best to refer to the official documentation or contact the developers for accurate information.

Here the model acknowledges it does not really know about RelMiner. Somehow the chat engine did not search for relevant documentation in the index, and did not ground the LLM with this context, leaving the LLM to rely on what it saw in its training data to come up with the answer.

Internally, the Chat Engine should have used a tool invocation, which allows it to search for the relevant context in the index, but the “best” chat mode did not force the Engine to do that.

Let’s ask again the same question, and see what happens.

Question: What input formats does relminer support ?

RelMiner supports various input formats including CSV, JSON, and XML. It can also accept data directly from databases such as MySQL, PostgreSQL, and MongoDB. Additionally, it can handle data from APIs and web scraping.

This is just another hallucination, as the documentation does not refer to any XML support, but HTML instead. Notice how the output varies with different wrong answers, as we repeat the questions.

Conversation 2

It turns out that, using a different type of chat mode can help to address these issues. Basically, we need to constrain the system to ground the LLM with the information provided in our documentation.

In order to do that, we can use either “context” or “condense_plus_context” chat mode, which basically forces the system to use the context for grounding the LLM. Let’s run again the Chatbot using this command.

python iteration_0/chatbot.py --mode condense_plus_context --prompt v0

Let’s see how the system behaves now.

Question: What does relminer provides ?

RelMiner is an open-source command-line tool that provides automated extraction of a knowledge graph from a given text corpus. It detects entities, concepts, and relationships within unstructured text. It supports multiple input formats like plain text, CSV, JSON, and HTML, and outputs structured graph data in formats such as RDF, JSON-LD, and CSV. It uses natural language processing (NLP) and machine learning techniques to identify entities and relationships. This makes it suitable for applications like semantic search, recommendation systems, and data integration.

This is much better, notice how the input and output formats supported by the tool are correct, as well as the general description of it.

Question: Which databases are supported by relminer ?

RelMiner does not directly support any specific types of databases. Instead, it processes input in various formats such as plain text, CSV, JSON, and HTML, and outputs structured graph data in formats like RDF, JSON-LD, and CSV. These output formats can then be imported into a variety of database systems that support them.

This is also good, as it answered that it does not support any types of databases.

Question: What are the main commands supported by relminer ?

The answer is omitted for brevity, but fundamentally it was correct and extracted from the index.

Question: Does relminer support the fusion of two knowledge graphs ?

The provided information does not mention any feature or functionality of RelMiner that supports the fusion of two knowledge graphs.

This answer is good, and aligned with our documentation.

Question: Even if not documented, try to imagine how to use relminer to fuse two knowledge graphs ?

As a chatbot, I can only provide information based on the given document and I don’t have the ability to imagine or make assumptions. According to the provided document, there is no information about fusing two knowledge graphs using RelMiner. It would be best to refer to the official documentation or contact the developers for more accurate information.

This answer is good, even after trying to coerce the model to provide an guessed answered.

Question: What are the pricing models for relminer ?

RelMiner is an open-source tool, which means it is free to use. There is no pricing model mentioned in the document. As with most open-source projects, you can use, modify, and distribute it freely. However, contributions to the project, such as bug reports, feature requests, or pull requests, are welcome.

This is mostly accurate, the only problem is that is going beyond what’s in our documentation, and talking about how open source projects work in general, which might or might not be true in our case. Let’s assume, that our Relminer project does have some pricing models, but this information is not included yet in the documentation.

Conversation 3

In order to prevent the model from going beyond what’s in our documentation, let’s use the v1 prompt which includes this instruction:

“If the relevant context for the answer is not provided, avoid coming up with logical explanations and just say you don’t know.”

We can run it using the following command, passing the new version of the Prompt.

python iteration_0/chatbot.py --mode condense_plus_context --prompt v1

Question: What are the pricing models for relminer ?

The document does not provide information on the pricing models for relminer.

Which is much better and fully aligned with our documentation. Now that we’ve identified a gap in our documentation, let’s use the updated documentation by running our example indicating the “pricing” option for the docs parameters.

python iteration_0/chatbot.py --mode condense_plus_context --prompt v1 --docs pricing

Question: What are the pricing models for relminer ?

RelMiner offers three pricing plans …

This is much better and fully aligned with our updated documentation.

Conclusion

Based on the previous interactions, we can distill a few observations:

Using and understanding our input data (the RelMiner documentation) to ask specific questions, gave us insights about the mistakes the system was doing.
Don’t simply assume the system will answer the right thing, use a healthy doses of skepticism, and come up with adversarial questions. For example, ask about the things not in your input data, ask about questions that contradict your input data, etc.
Using the appropriate chat mode for our use case, helped the chatbot to align with our documentation. Experiment with the framework and the options it supports.
The system tends to go a bit beyond what’s on the documentation, and add sentences which “make sense”. This is good because it gives the user this fluid experience similar to chatting with a person, but it is also bad, because it might answer with wrong statements which might not be aligned with our content.
Provide clear instructions, and update your prompt as needed to guide the system accordingly (Prompt Engineering). In our case, just instructing the model to refrain from answers when no context was provided, was enough to improve its behavior.
The system is powerful, and with a small amount of code we could get started quickly, but it also has the tendency of Hallucinating if not configured properly. Fortunately, with minor updates around the system prompt and chat mode, we were able to get much better answers.

Later we’ll cover other aspects around Evals, Monitoring, etc.

Reference

The LlamaIndex Agents documentation.
All You Need to Know to Build Your First LLM App

May 09, 2021Fermin Ordaz Reading time ~11 minutes

Synthetic Annotated Documents

Motivation

This post describes my final project for the Full Stack Deep Learning 2021 Course organized and taught by Sergey Karayev and Josh Tobin.

Problem Statement

Given a dataset such as the Form Understanding in Noisy Scanned Documents (FUNSD) dataset, which provides 199 fully annotated image documents. Would it be possible to train a Machine Learning model to generate similar images with their corresponding annotations? In other words, can we automatically expand this dataset from 199 annotated images, to several thousands of samples without human intervention?

In many cases, the training dataset needed to build models for document understanding, must include every piece of text in the image, its surrounding bounding box coordinates, its relationship with other texts in the image, and other similar metadata. This is usually obtained from worker annotators who spend time providing such detailed metadata. This is costly and slow as there are usually many layouts and document types to work on.

By generating synthetic document images with annotations, we can significantly reduce the time and effort to annotate document images to train models for document understanding.

The idea is to provide synthetic document generation as a service, via a REST API which receives a number of key input parameters, such as number of samples to be generated, etc and returns a serialized representation of the annotations (JSON) and images (base 64 encoded) content.

Machine Learning Approach

The gist of the project is to use a Language Model to generate JSON annotation payloads, which then can be rendered into a document image, giving us fully annotated new document images. The ideal model, could capture the layout and text of the training set encoded in the JSON annotations, and then generate multiple instances of similar documents as well as variations of document layouts. The rendering of the document image from the generated JSON annotations, can be done without ML techniques, as this only involves placing the corresponding text at the given locations, according to the synthetic annotation metadata.

The core model architecture and training loop is an adaptation of the code for Unconditioned Surname Generation available as part of the Natural Language Processing with PyTorch - By Delip Rao and Brian McMahan. (Highly recommended read!)

The basic architecture is defined as:

One embedding layer (embedding dimension 100)
One Gated Recurrent Unit (GRU) layer with a hidden dimension of 100
One fully connected layer with dropout of 0.5

This architecture only represents the Annotation Language Model. In order to provide the final end to end service which produces annotations and images, a model wrapper was needed to sample the Language Model, validate the generated JSON annotations, render and serialize the final document images.

model-wrapper

MLOps Approach

Given the focus of the course, the end to end operational aspects of the project are equally important. For this reason, there is an important emphasis on:

Traceability: Being able to trace model versions, and datasets, and code artifacts precisely as to be able to recover them as needed.
Repeatability: Being able to bring up the whole environment, and rewind to a given point in time, including infra, code, data, models. Repeat tests, and build on top of previous ones.
Automation: Embrace everything as code, avoid manual intervention, console access, aim to script each step of the ML workflow.
Rapid Development: By using automation around all the aspects of the workflow, repetitive steps become easier and faster.
CPU and GPU: Definitely not required for this small problem, but my goal was to be able to seamlessly train models locally (CPU) and remote (AWS GPU instances).

The technology stack to achieve these goals is:

Terraform: It allows us to define the infrastructure, including but not limited to: network setup, training instances, databases and security groups as a declarative manifest, which effectively behaves like code. Giving us the chance to tear down and reconstruct the infrastructure as needed.
Ansible: It allows us to run typical Unix commands on a provisioned AWS EC2 instance to configure it properly, clone our code, and run our training tasks.
DVC: In this project it was used only to source control datasets and model artifacts via github, using AWS S3 as the artifact store.
MLFlow: In this project it was used only for model tracking and model deployment. It allows us to keep track of the model training runs, capturing key hyper-parameters and learning curves for traceability, as well as to package our models to run them locally and on Sagemaker.
Pytorch: The DL library to implement the core ML model as well as training loop and supporting Dataset classes.
Sagemaker: The service runtime to host our models for inference.
AWS S3: Use to store datasets, model artifacts.

Data Preprocessing

The original FUNSD dataset comes with pairs of annotated images as shown here:

Image	Annotations

The initial FUNSD dataset was processed in the following way:

The annotations were naturally broken into block units, which contain the definition of one piece of text, its bounding box, id, and linkage to other texts.
In order to reduce the length of the training sequences, these blocks were pre-processed to remove other redundant properties such as the words collection.
The blocks were flatten-out into strings so the Recurrent Neural Network can process them as single-line sequences.

pre-processed-annotations-3

The whole dataset was split into: train, validation and test sets and saved into a single csv file.

train	validation	test
5096	2315	2332

The raw and original datasets were added to DVC for tracking and pushed into external storage (S3).

Model Training

The model was trained on the whole FUNSD pre-processed dataset. Only the JSON annotations where used as features to train the model. Here I describe few of the key hyper-parameters and training aspects:

Each single-line JSON annotation or “sentence” was wrapped with start and end tokens.
Sentences were batched into groups of 32 sentences, using a masking token for padding.
The sentences where not shuffled to preserve the original sequence of the text blocks, and initial page layout.
Five consecutive validation error drops were use as an early stop mechanism.
A maximum of 300 epochs was used.

As recommended in the course, the model training was performed in incremental steps, first locally with a small dataset, which provided the opportunity to debug code issues, and check the general trajectory of the optimization process. Later, the model was trained remotely on a AWS EC2 GPU instance. Given the goal of automating everything, this step took significant effort, as I decided to automate the full setup of the remote machine, and run the training modules, and capture and store the trained model artifacts. To actually run the training remotely this following command was used:

ansible-playbook train_gru_remote.yaml --ask-vault-pass

This ansible script allowed me to automate the following tasks on the remote machine:

Configure aws and github credentials
Clone the project
Run the train script
Pushing the model artifacts via DVC

I could literally terminate the GPU instance one day, come next day reprovision it using terraform, and retrain the model with few commands. This allowed me to reduce cost and only have resources around while needed, and at the same time, have no delay in the iterative process of training and improving the model.

MLFlow provided a good way to track all the training runs, for example:

MLflow Runs

It also allows us to drill down into specific runs, and check training parameters and metrics.

MLflow Parameters and Metrics

Check the learning curves

MLflow Learning Curves

As well as comparisons of different runs:

MLflow Comparing models

Model Evaluation

The trained model was evaluated based on the accuracy of predicting the next character of the JSON annotation, which can be summarized as:

Train Loss	Val Loss	Test Loss
0.681	0.777	0.819

Train Accuracy	Val Accuracy	Test Accuracy
77.72	76.07	74.82

Overall, the model can be improved significantly, but at the same time does not seem to suffer from under/over fitting, and shows a relatively good ability to generalize.

The model was also evaluated by inspecting generated JSON annotations, which seemed to mirror the structure of the training dataset. Here are few generated JSON annotations:

{
   "text": "Incrur:", "box": [255, 737, 356, 868],
   "linking": [], "label": "question", "id": 41
}
{
   "box": [175, 717, 223, 740], "text": "Comer",
   "label": "question", "linking": [[4, 7]], "id": 3
}
{
   "text": "IN TOBIC", "box": [587, 241, 653, 285],
   "linking": [], "label": "header", "id": 7
}
...

Model Deployment & Serving

Regarding model packaging, deployment and serving, MLFlow took care of a lot of plumbing, among other things:

The packaging of both the annotation language model and the model wrapper, as well the the local and remote deployment, exposing a rest endpoint for easy consumption.

mlflow models serve -p 5001 \
   -m model_artifacts/saved_model_baseline_wrapper

It also took care of the model input and output standard formats, serialization and deserialization of inputs and outputs, and validation of the input.
It allowed me to create an AWS Sagemaker compatible Docker image from the trained model, and register it in AWS ECR.

mlflow sagemaker build-and-push-container

It allowed me to locally test AWS Sagemaker compatible Docker image container for sanity test. This is super useful to find issues in the dependencies and inference code path.

mlflow sagemaker run-local \
   -m model_artifacts/saved_model_baseline_wrapper

It also allowed to deploy the model image to AWS Sagemaker.

mlflow sagemaker deploy -a fsdl-synth-docs \
   -m model_artifacts/saved_model_local_wrapper -e arn:aws:iam::<rest-of-execution-role>

In this regard, mlflow simplified a lot of the operational overhead, and turned out to be very portable from local to remote deployments.

Analysis

Here are few examples of the outputs of the model:

   {
      "box": [677, 639, 749, 669], "text": ".44", 
      "label": "answer", "linking": [[94, 16]], "id": 35
   }, 
   {
      "box": [533, 197, 617, 256], "text": "(P2/ 95", 
      "label": "answer", "linking": [[31, 3]], "id": 3
   }, 

Synthetic Image	Synthetic Image with Blocks

   {
      "text": "Code Speciop", "box": [49, 975, 160, 799], 
      "linking": [], "label": "question", "id": 28
   }, 
   {
      "box": [121, 59, 252, 25], "text": "02328620", 
      "label": "question", "linking": [[5, 47]], "id": 5
   }

Synthetic Image	Synthetic Image with Blocks

Looking more closely to the synthetic annotations and images, I noticed:

It generates many instances of well formed JSON annotations. This indicates the LM was able to learn key structures like:
- Opening and closing brackets: “{, }, [, ]”
- Quoted strings, ex: “text”
- Collections of four integers, ex: “[12,34,23,78]”
- Even unicode text was generated in some instances.
The model was also able to learn basic JSON properties of the annotations such as: box, text, id, linking, label.
The final synthetic JSON annotations, were required to be not only well formed JSON, but also including the key JSON properties (box, text, id) of a valid annotation.
After applying all validations, between 30% and 35% of the synthetic annotations were fully valid.
Using only the valid generated JSON annotations, and discarding the rest, I was able to avoid any post-processing heuristics to complete or fix them.
It completely generates random JSON annotations, which is perfect for generating synthetic data.

Other aspects on which the model did not perform well:

Generates poor bounding boxes, in most cases:
- Too wide for the corresponding text.
- Too tall for the corresponding text.
- The sequence of coordinates left, top, right, bottom was not respected, therefore causing the bounding box to flip.
After inspecting the ground truth dataset, I noticed the presence of too wide or tall bounding boxes as.
Text is sometimes on top of each other, and does not follow a good document layout.
The actual generated document text is not valid english.

Learning & Conclusions

The project was useful as it allowed me to:

Take a high level idea and see it all the way through a deployed API.
Experience that end to end ML Based services include many more components and aspects than the core ML Model.
Experience how some of the tooling can greatly simplify the amount of work (MLFlow rocks!).
Have the opportunity to learn and refresh a useful set of tools to automate the ML workflow.

In case anyone feels like checking some of the scripts or pieces of code as a study or reference material, here is the source code link.

Further work

This is just scratching the surface of what might be possible to generate synthetic annotated documents.

There are different ways to improve the whole service:

Do hyper-parameter tuning and improve the performance of the current architecture.
Try other more powerful models (Transformers, Graph Networks, GANs, etc)
Explore ideas so the model captures the linking among different blocks of text.
Explore ideas so the model can use both JSON annotations as well as images to generate and capture a richer representation of the documents.
Explore ideas so the model supports conditioning, that is, when generating synthetic documents, condition the model to generate certain types of layouts, etc.
Ultimately the synthetic data should be tried on a downstream task and find out if it actually allows the models to be trained effectively.
Add realistic image noise using GANs or other image filters.
Add font types and style variations when rendering the image for more realistic images.
Update the model wrapper to work for larger batches and generate them into AWS S3, instead of returning them in the HTTP response.

January 22, 2021Fermin Ordaz Reading time ~5 minutes

On effective meetings at work

TL;DR If a meeting is needed, have a clear goal in mind, invite the least number of people who can provide input for the topic. Start the meeting communicating the context and the goal; drive the meeting to keep it on track and productive and avoid tangential conversations. Follow up with a summary of the meeting, highlighting key ideas, agreements, decisions and next action items with key owners.

Introduction

I have not yet heard anyone saying: “I’m really glad I have 5 meetings today”. Everyone dislikes meetings, but meetings are always there, and to be fair, they are needed, at least sometimes. Meetings are typically seen as a waste of time and a distraction, but they don’t have to be, meetings are a great opportunity for collaboration. Here are some of the habits that I’ve found useful to have productive meetings.

Before the meeting - Prepare

Have a goal in mind: ask yourself what you are trying to achieve with the meeting. Consider these leading questions: is the purpose to be informed or inform others?, are you looking for feedback about one proposal? Are you looking to choose between competing options? Are you trying to brainstorm about an issue or new product? Are you trying to “sell” an idea or project to some folks? Are you trying to clarify and ask questions about a given topic? Keep in mind that having too many or too ambitious goals might prevent you from making progress in any direction. It is better to divide and conquer, and make sure the goal of your meeting is something manageable.
Use the right tool: sometimes we don’t need a meeting, and we can get the job done with a quick instant message (Slack, MS Teams, etc), or maybe just an email, or a quick phone call. A meeting allows us to discuss, and go back and forth about a relatively complex subject. A meeting might be an overkill in cases when interactions are mostly unidirectional, or not a complex topic, or unlikely to require questions and different perspectives, etc. Also, aim for shorter meetings, force yourself to cover the content in 30 mins whenever possible.
Have an audience in mind: If the meeting is needed, keep the audience as small as possible. Make sure each one of the attendees have some interest or contribution to the topic. Consider these leading questions: who is needed and why? What’s each invited person expected to provide: approval, feedback, support, answers? What’s the nature of the content: technical, business, strategic, tactical? is everyone equipped to follow along the topic?
Prepare the flow of the meeting: Gather key facts and materials, recall previously agreed decisions. Think a bit about how to present your ideas, the sequence and the level of abstraction.

During the meeting - Drive

Context & Intro: state where the topic of the conversation is at, what has been done, what’s pending, what’s the issue at hand. Reduce the ideas to the essential aspects, this should be a quick section just to bring everyone to the current state, and get them ready and prepared to provide the input they can offer.
State your goal: let people know what you are expecting from them and from the meeting in general. If the audience knows what’s expected, you have a better chance of getting the input you need.
Have a flow: expose ideas in natural and logical flow. Go from high level to low level details. Only give details which are relevant, at the right time, not too early. Keep in mind, the attention span of your audience is short, and the working memory of the brain is short, and it is easy to lose your audience. Avoid giving them non relevant information, or simply too much information before they are ready to digest it.
Compare options: when trying to decide between options, make sure you present them in the same frame of reference, with the same attributes: time, effort, complexity, flexibility, etc. So, the options can be matched and compared, otherwise the team will have a hard time contrasting them and providing input.
Questions: make sure to ask your questions well posed and framed, don’t ask questions to people that cannot answer, for example a question with a heavy technical aspect might not be well suited for a non-technical folk. Each team member is playing a role, with some responsibilities and knowledge, think about if a question can actually be answered by a given person.
Drive: you have a goal in mind, drive towards it, keep the conversation on track, avoid people jumping randomly off topic, stay on the relevant aspects of the purpose of the meeting. Again, the more people invited to the meeting the harder it is to stay productive and on topic.
Postpone: identify issues that merit a whole conversation on its own and postpone them, don’t tackle everything in a single meeting, unless it is really necessary. Postpone issues/aspects and details which are not relevant to achieve the goal of meeting. Again, divide and conquer and tackle one thing at a time, so it is possible to make progress.

After the meeting - Summarize

Reply to all the attendees with a recording of the meeting, if possible.- Reply to all the attendees with a summary of the meeting, highlighting key ideas, agreements, decisions and next action items with key owners. If a follow up meeting is likely needed, this summary will help to recap and start from where you left off.- Of course, saying thanks to everyone who contributed to the meeting goes a long way.

Why is this important?

Time: is our most valuable resource. Discussions that could take 15 mins, should not take 1 hour.
Respect: when coming prepared to the meeting, you show respect for other people’s time, and the value of their input.
Convergence: bring the conversation to the point of interest, make the decision, sell your idea, etc.
Rework: avoid discussions about issues and topics previously decided, avoid going in circles.
Outcome: you have a better chance to achieve your goal of the meeting. This allows teams to collaborate effectively, get on the same page, make decisions and move on to get things done.

Key Take aways

Prepare: Avoid meetings if possible, identify the audience, define the goal, gather supporting materials.
Drive: Recap the context and highlight the goal, stay on topic, postpone auxiliary aspects, ask good questions.
Summarize: State the agreements and solved aspects, identify outstanding aspects, list next action items with owners.

November 09, 2020Fermin Ordaz Reading time ~8 minutes

Coin Change problem variants (I)

Dynamic Programming

This is not a tutorial on Dynamic Programming, but more of a practice session where we take few variants of the Coin Change problem, and step by step, break down the analysis into the fundamental insights that will lead us to put together a solution. If you are looking to brush up on DP or simply get started on it, these are two of my favorite resources Tim Roughgarden’s lectures, as well as Eric Vigoda’s lectures.

The way we’ll break down the analysis is based on the following questions or steps:

How can we define the sub-problems?

DP is all about solving smaller sub-problems, so this is where we start.
What preliminary results need to be computed?

Sub-problem solutions need to be calculated and stored so we can use them later to build higher level solutions.
What are the base cases and how do we initialize them ?

Our base cases give us the starting point to incrementally build our solutions.
How to define the solution of a problem given the solutions of the sub-problems?

This is where the magic happens, here we define how a given solution is expressed on top of smaller problem solutions.

NOTE: Even though some of these problems can be solved using recursion and other techniques, I’ve tried to use a consistent solution structure, so we can compare the different variants.

The Coin change Problem

Variant 1:

Given an unlimited supply of coins of denominations d1, d2, … , dn, we wish to make change for a value v; that is, we wish to find a set of coins whose total value is v. This might not be possible: for instance, if the denominations are 5 and 10 then we can make change for 15 but not for 12.

How can we define the sub-problems?

We can have sub-problems for either smaller amounts or smaller number of coins.
Since we have unlimited coins per denomination, let’s define our sub-problem by subtracting a given coin denomination and getting a smaller amount.
We can change a given amount \(a\), if the smaller amount \(a- d_i\) (after subtracting denomination \(d_i\)) can be changed.
For example: given initial amount 3, and denominations [1,2], the question: is there a coin change for the amount of 3?, can be translated into, can we change 3-1=2?, which then translates into, can we change 2-2=0?, which turns out to be possible.

What preliminary results need to be computed?

In DP problems, solutions to problems are computed by combining solutions to sub-problems. This usually requires the storage of some preliminary results.
In many cases, the final answer comes directly from the stored results once we’ve computed all the sub-problems.
This suggests the idea of storing true/false values for our sub-problems.
Let’s define results[a] as the true/false value, indicating if it is possible to do the coin change for the amount \(a\).
Since we have unlimited coins, the results[a] does not depend on which coins or how many coins we have used so far.

What are the base cases and how do we initialize them ?

If we keep on subtracting coin denominations to the current amount, we eventually reach results[0]. Of course, we must avoid endless subtractions of coins, getting into negative amounts.
We can see this as the indication that we were able to effectively do the coin change, otherwise we would have ended with an amount different from 0 for which there are no coins. For example: the remaining amount is 2, and the available denominations are [3,5].
We then get to our base case results[0] = True.
Since not all the amounts can be changed, given any set of denominations, let’s use False as our default value for all our results.

How to define the solution of a problem given the solutions of the sub-problems?

We can say, that a given amount \(a\) can be changed, if the amount after subtracting any of the given denominations can also be changed.
Let be \(a\) a given amount, and \(d_i\) the \(i-th\) denomination, then the recurrence expression is:

\[results[a] = results[a-d_0] \vee results[a-d_1] ... \vee results[a-d_n], \quad for \quad d_i <= a\]

Solution

One way to implement this solution in Python is

def coin_change(denominations, amount):
    results = [False]*(amount+1)
    results[0] = True
    for a in range(1, amount+1):
        for d in denominations:
            if d <= a:
                results[a] = results[a] or results[a - d]
    return results[amount]

Comments:

It is common in DP problems, that the length of the results array is padded with an extra slot for the base case, in this problem we use amount+1 array elements to include amount = 0.
The final result comes nicely as results[n]. This is not always the case, but it helps to assume this as the starting point.

Variant 2

Given a supply of one coin per each denominations d1, d2, … , dn, we wish to make change for a value v; that is, we wish to find a set of coins whose total value is v. This might not be possible: for instance, if the denominations are 5 and 10 then we can make change for 15 but not for 12.

How can we define the sub-problems?

Since coin denominations can be used at most once, the sub-problems need to be expressed as a function of the available coins and a smaller amount.
One smaller sub-problem can be defined as changing the same amount, with fewer coins.
For example: Initial amount 3, and denominations [1,2,5], the question: can we change 3 using the coins [1,2,5]?, translates into, can we change 3 using only coins [1,2], which we already know can be done. The reason is we cannot use the coin 5 to change a smaller amount.
Another sub-problem can be defined by subtracting one of the coin denominations, but this time, using the set of coins except the one we just used.
For example: Initial amount 3, and denominations [1,2], the question: can we change 3 using the coins [1,2]?, translates into, can we change 3-2=1 using only [1], which then translates into, can we change 1-1=0? Indicating it is possible.

What preliminary results need to be computed?

Since we need to keep track of both the smaller amount, as well as the available coins, we need a 2D array to represent our sub-problems.
Let’s define results[a][j] as the true or false value, indicating if it is possible to do the coin change for the amount \(a\), using the first j coin denominations (including j).

What are the base cases and how do we initialize them ?

The first case is, a zero amount with some remaining coins results[0][j] which is True, as a zero amount can be changed with anything.
The second case is, a non zero amount with no coins results[a][0] which is False, as it is not possible to change any amount with no coins.
We can initialize everything as False, and only the first row of results as True.

How to define the solution of a problem given the solutions of the sub-problems?

Let be \(a\) a given amount, and \(j\) the index of the j-th denomination, then the recurrence expression is:

results[a][j] = results[a][j-1] if d_j > a or
results[a][j] = results[a-d_j][j-1] or results[a][j-1] if d_j <= a
The first case is clear, as we cannot substract d_j from a, we just ignore the j-th coin.
The first part of the second case (results[a-d_j][j-1]) is clear, we subtract the j-th coin, and continue with the smaller amount and the remaining coins.
The second part of the second case (results[a][j-1]), begs the question: do we really to consider it again?
The answer is yes, consider the case a=2 and denominations [2,1], we can certainly discount either 1 or 2 from 2, but 2-1=1 gives us no answer, as we cannot use the denomination 1 a second time. Forcing us to explore the sub-problem of trying the same amount 2 with the single denomination of [2].

Solution

One way to implement this solution in Python is

def coin_change(denominations, amount):
    num_denom = len(denominations)
    results = [[False]*(num_denom+1) for x in range(amount+1)]
    results[0] = [True]*(num_denom+1)
    for a in range(1, amount+1):
        for j in range(1, num_denom+1):
            d = denominations[j-1]
            if d > a:
                results[a][j] = results[a][j-1]
            else:
                results[a][j] = results[a-d][j-1] or results[a][j-1]
    return results[amount][num_denom]

Comments:

Compare how the results turned out to be a 2D array in this variant, compared to 1D in the first variant.
Suggestion, think about how many things you need to keep track of, for example, in the variant #1, keeping track of the smaller amount was enough, but in the second one it was not. These things to track, can be seen as the “degrees of freedom” of the problem, and usually influences the number of dimensions of the data structure to solve the problem.
Notice how the recurrence expression was broken into different cases (d_j > a vs d_j <= a). Also, notice how there two cases, in which the j-th is either part of the solution or not.
Again, the final result comes nicely as results[amount][num_denom].

Take away

Keep an eye on 1D vs 2D arrays for your results, think about what needs to be remembered for your sub-problems, and start simple.
Solutions are built bottom up, starting with the base cases.
Break the recurrence expression into cases as needed.
Start by defining your cached results such that they give you the final answer directly.
The interplay of the base cases, default values, the recurrence expression is where the magic of DP happens. Minor changes on any of these, make the solution to work for a different problem.

References

Algorithms by Dasgupta, Papadimitriou and Vazirani (p. 184, problem 6.17)
Leetcode’s coin change
Leetcode’s coin change II

February 13, 2016Fermin Ordaz Reading time ~4 minutes

Is Fail Fast always the way to go?

Motivation

Design principles are powerful tools that guide our decisions and choices while building software systems. When applied properly, things just seem to fall into place, but when principles are applied to the wrong situation, they work against us and lead to awkward solutions.

More often than not, multiple principles can be applied to a given problem, and in some cases some of these principles might be in conflict, making it a little bit tricky to apply them.

In this post, we are interested in contrasting the Fail Fast vs Self Healing¹ principles. Specifically, how should a service A handle the case when it tries to contact a service B to retrieve critical information during its bootstrap phase, and service B is unavailable ? Do we make service A Fail Fast and stop, or do we allow it to come up and keep on retrying until eventually service B is available and Self Heal ?

Fail Fast

In this context, this means service A stops running, and any external service will see it as unavailable. This will likely happen in conjunction with a critical notification, and will require some external entity (human or automated) to correct the issue and restart the service A after service B has recovered.

To better understand the implications of this approach, let’s see what would happen if we had N number of services, with M interdependencies that evolve over time, eg. new services come, dependency from service i to j is established, others are removed, etc.

Having an external script that automatically restart the failing services, still leaves the door wide open for a situation when dependent services being restarted see one or more of their dependencies as unavailable, creating a race condition that leaves no guarantees to converge to having all the services up and running.

This makes the system fragile, as a given failing service can cause transitive failures to other services, even when they don’t depend on it directly. It also creates the problem of having to control the order in which the services have to be started. We could ask, can we come up with solutions to this problem ?, surely we could try to solve it by keeping a very detail representation of the dependencies and have automation around it, but in my opinion this is a problem that should have not existed in the first place.

Self Heal

A better alternative is to follow the automate everything mantra, and decide to build a internal retry mechanism that would allow the service A to eventually catch up and retrieve the needed information once service B has been restarted. In this approach, no other service sees service A as unavailable, therefore the ripple effects of one failing service are gone.

It is worth noticing, that automation is used in both approaches, when using Fail Fast the dependent service fails, then an automated script can restart it. On the other hand, when using Self Healing the automation is built in inside the actual service itself.

Alerts and notifications are also needed in this approach, but the way the operator or automated service management react is different, now the problem is much simpler, just restart the failing service B or failing services, and rest assure that services that depend on them will self heal eventually.

There are still the questions of what should be behavior of service A during this self healing phase ?, and for that matter what should be the behavior of the entire system under these conditions. Well, the reality is that in distributed systems, failures are expected, not only during the bootstrap sequence, but at any given point in time. This is a challenging problem, with no silver bullet solution, but it is definitely not introduced by the application of the Self Healing principle. This is also a good reason, to equip each service with the self healing mechanisms².

Among other options, we could embrace the graceful degradation principle and make the service A to operate providing limited functionality if possible, or even failing (Fast) all inbound requests. Other upstream services, should also behave in the same way, providing some limited functionality at a macro level.

Wrapping up

Of course, this is not about blindly choosing Self Heal over Fail Fast in every occasion, it is more about remembering that there are different ways to apply a given principle, and that when multiple ones are available, one must understand their implications with trivial and non trivial cases. It is about determining what is more valuable or critical in a given situation and then choosing the principle that helps the most to achieve it.

It is also about remembering that more often than not, a combination of self healing + graceful degradation + arbitrary bootstrap/deployment order leads to more robust and resilient distributed systems.

Part of a broader principle called Automate Everything as pointed out in “Scalability Lessons from Google, eBay, and Real Time Games” by Randy Shoup ↩
This type of of common concerns for all the services, should be provided by shared libraries, eg. an HTTP client which provides retry strategies, circuit breakers, bulkheads, etc. ↩

fordaz' blog

Software Development, Distributed Systems and Machine Learning

Building a Chatbot with LlamaIndex : Beyond Hello World

Motivation

Example Application

Walkthrough the code

Chatbot test drive

Conversation 1

Question: What does relminer provides ?

Question: What input formats does it support ?

Question: What input formats does relminer support ?

Conversation 2

Question: What does relminer provides ?

Question: Which databases are supported by relminer ?

Question: What are the main commands supported by relminer ?

Question: Does relminer support the fusion of two knowledge graphs ?

Question: Even if not documented, try to imagine how to use relminer to fuse two knowledge graphs ?

Question: What are the pricing models for relminer ?

Conversation 3

Question: What are the pricing models for relminer ?

Question: What are the pricing models for relminer ?

Conclusion

Reference

Motivation

Problem Statement

Machine Learning Approach

MLOps Approach

Data Preprocessing

Model Training

Model Evaluation

Model Deployment & Serving

Analysis

Learning & Conclusions

Further work

Introduction

Before the meeting - Prepare

During the meeting - Drive

After the meeting - Summarize

Why is this important?

Key Take aways

Dynamic Programming

The Coin change Problem

Variant 1:

How can we define the sub-problems?

What preliminary results need to be computed?

What are the base cases and how do we initialize them ?

How to define the solution of a problem given the solutions of the sub-problems?

Solution

Variant 2

How can we define the sub-problems?

What preliminary results need to be computed?

What are the base cases and how do we initialize them ?

How to define the solution of a problem given the solutions of the sub-problems?

Solution

Take away

References

Motivation

Fail Fast

Self Heal

Wrapping up