Motivation

This post describes my final project for the Full Stack Deep Learning 2021 Course organized and taught by Sergey Karayev and Josh Tobin.

Problem Statement

Given a dataset such as the Form Understanding in Noisy Scanned Documents (FUNSD) dataset, which provides 199 fully annotated image documents. Would it be possible to train a Machine Learning model to generate similar images with their corresponding annotations? In other words, can we automatically expand this dataset from 199 annotated images, to several thousands of samples without human intervention?

In many cases, the training dataset needed to build models for document understanding, must include every piece of text in the image, its surrounding bounding box coordinates, its relationship with other texts in the image, and other similar metadata. This is usually obtained from worker annotators who spend time providing such detailed metadata. This is costly and slow as there are usually many layouts and document types to work on.

By generating synthetic document images with annotations, we can significantly reduce the time and effort to annotate document images to train models for document understanding.

The idea is to provide synthetic document generation as a service, via a REST API which receives a number of key input parameters, such as number of samples to be generated, etc and returns a serialized representation of the annotations (JSON) and images (base 64 encoded) content.

Machine Learning Approach

The gist of the project is to use a Language Model to generate JSON annotation payloads, which then can be rendered into a document image, giving us fully annotated new document images. The ideal model, could capture the layout and text of the training set encoded in the JSON annotations, and then generate multiple instances of similar documents as well as variations of document layouts. The rendering of the document image from the generated JSON annotations, can be done without ML techniques, as this only involves placing the corresponding text at the given locations, according to the synthetic annotation metadata.

The core model architecture and training loop is an adaptation of the code for Unconditioned Surname Generation available as part of the Natural Language Processing with PyTorch - By Delip Rao and Brian McMahan. (Highly recommended read!)

The basic architecture is defined as:

  • One embedding layer (embedding dimension 100)
  • One Gated Recurrent Unit (GRU) layer with a hidden dimension of 100
  • One fully connected layer with dropout of 0.5

This architecture only represents the Annotation Language Model. In order to provide the final end to end service which produces annotations and images, a model wrapper was needed to sample the Language Model, validate the generated JSON annotations, render and serialize the final document images.

model-wrapper

MLOps Approach

Given the focus of the course, the end to end operational aspects of the project are equally important. For this reason, there is an important emphasis on:

  • Traceability: Being able to trace model versions, and datasets, and code artifacts precisely as to be able to recover them as needed.
  • Repeatability: Being able to bring up the whole environment, and rewind to a given point in time, including infra, code, data, models. Repeat tests, and build on top of previous ones.
  • Automation: Embrace everything as code, avoid manual intervention, console access, aim to script each step of the ML workflow.
  • Rapid Development: By using automation around all the aspects of the workflow, repetitive steps become easier and faster.
  • CPU and GPU: Definitely not required for this small problem, but my goal was to be able to seamlessly train models locally (CPU) and remote (AWS GPU instances).

The technology stack to achieve these goals is:

  • Terraform: It allows us to define the infrastructure, including but not limited to: network setup, training instances, databases and security groups as a declarative manifest, which effectively behaves like code. Giving us the chance to tear down and reconstruct the infrastructure as needed.
  • Ansible: It allows us to run typical Unix commands on a provisioned AWS EC2 instance to configure it properly, clone our code, and run our training tasks.
  • DVC: In this project it was used only to source control datasets and model artifacts via github, using AWS S3 as the artifact store.
  • MLFlow: In this project it was used only for model tracking and model deployment. It allows us to keep track of the model training runs, capturing key hyper-parameters and learning curves for traceability, as well as to package our models to run them locally and on Sagemaker.
  • Pytorch: The DL library to implement the core ML model as well as training loop and supporting Dataset classes.
  • Sagemaker: The service runtime to host our models for inference.
  • AWS S3: Use to store datasets, model artifacts.

Data Preprocessing

The original FUNSD dataset comes with pairs of annotated images as shown here:

Image Annotations
funsd-doc-sample1 raw-annotations

The initial FUNSD dataset was processed in the following way:

  • The annotations were naturally broken into block units, which contain the definition of one piece of text, its bounding box, id, and linkage to other texts.
  • In order to reduce the length of the training sequences, these blocks were pre-processed to remove other redundant properties such as the words collection.
  • The blocks were flatten-out into strings so the Recurrent Neural Network can process them as single-line sequences.

pre-processed-annotations-3

  • The whole dataset was split into: train, validation and test sets and saved into a single csv file.
train validation test
5096 2315 2332
  • The raw and original datasets were added to DVC for tracking and pushed into external storage (S3).

Model Training

The model was trained on the whole FUNSD pre-processed dataset. Only the JSON annotations where used as features to train the model. Here I describe few of the key hyper-parameters and training aspects:

  • Each single-line JSON annotation or “sentence” was wrapped with start and end tokens.
  • Sentences were batched into groups of 32 sentences, using a masking token for padding.
  • The sentences where not shuffled to preserve the original sequence of the text blocks, and initial page layout.
  • Five consecutive validation error drops were use as an early stop mechanism.
  • A maximum of 300 epochs was used.

As recommended in the course, the model training was performed in incremental steps, first locally with a small dataset, which provided the opportunity to debug code issues, and check the general trajectory of the optimization process. Later, the model was trained remotely on a AWS EC2 GPU instance. Given the goal of automating everything, this step took significant effort, as I decided to automate the full setup of the remote machine, and run the training modules, and capture and store the trained model artifacts. To actually run the training remotely this following command was used:

1
ansible-playbook train_gru_remote.yaml --ask-vault-pass

This ansible script allowed me to automate the following tasks on the remote machine:

  • Configure aws and github credentials
  • Clone the project
  • Run the train script
  • Pushing the model artifacts via DVC

I could literally terminate the GPU instance one day, come next day reprovision it using terraform, and retrain the model with few commands. This allowed me to reduce cost and only have resources around while needed, and at the same time, have no delay in the iterative process of training and improving the model.

MLFlow provided a good way to track all the training runs, for example:

MLflow Runs
example 2

It also allows us to drill down into specific runs, and check training parameters and metrics.

MLflow Parameters and Metrics
example 2
example 2

Check the learning curves

MLflow Learning Curves
example 2
example 2

As well as comparisons of different runs:

MLflow Comparing models
example 2

Model Evaluation

The trained model was evaluated based on the accuracy of predicting the next character of the JSON annotation, which can be summarized as:

Train Loss Val Loss Test Loss
0.681 0.777 0.819
Train Accuracy Val Accuracy Test Accuracy
77.72 76.07 74.82

Overall, the model can be improved significantly, but at the same time does not seem to suffer from under/over fitting, and shows a relatively good ability to generalize.

The model was also evaluated by inspecting generated JSON annotations, which seemed to mirror the structure of the training dataset. Here are few generated JSON annotations:

1
2
3
4
5
6
7
8
9
10
11
12
13
{
   "text": "Incrur:", "box": [255, 737, 356, 868],
   "linking": [], "label": "question", "id": 41
}
{
   "box": [175, 717, 223, 740], "text": "Comer",
   "label": "question", "linking": [[4, 7]], "id": 3
}
{
   "text": "IN TOBIC", "box": [587, 241, 653, 285],
   "linking": [], "label": "header", "id": 7
}
...

Model Deployment & Serving

Regarding model packaging, deployment and serving, MLFlow took care of a lot of plumbing, among other things:

  • The packaging of both the annotation language model and the model wrapper, as well the the local and remote deployment, exposing a rest endpoint for easy consumption.
1
2
mlflow models serve -p 5001 \
   -m model_artifacts/saved_model_baseline_wrapper
  • It also took care of the model input and output standard formats, serialization and deserialization of inputs and outputs, and validation of the input.

  • It allowed me to create an AWS Sagemaker compatible Docker image from the trained model, and register it in AWS ECR.

1
mlflow sagemaker build-and-push-container
  • It allowed me to locally test AWS Sagemaker compatible Docker image container for sanity test. This is super useful to find issues in the dependencies and inference code path.
1
2
mlflow sagemaker run-local \
   -m model_artifacts/saved_model_baseline_wrapper
  • It also allowed to deploy the model image to AWS Sagemaker.
1
2
mlflow sagemaker deploy -a fsdl-synth-docs \
   -m model_artifacts/saved_model_local_wrapper -e arn:aws:iam::<rest-of-execution-role>

In this regard, mlflow simplified a lot of the operational overhead, and turned out to be very portable from local to remote deployments.

Analysis

Here are few examples of the outputs of the model:

1
2
3
4
5
6
7
8
   {
      "box": [677, 639, 749, 669], "text": ".44", 
      "label": "answer", "linking": [[94, 16]], "id": 35
   }, 
   {
      "box": [533, 197, 617, 256], "text": "(P2/ 95", 
      "label": "answer", "linking": [[31, 3]], "id": 3
   }, 
Synthetic Image Synthetic Image with Blocks
example 2 example 2
1
2
3
4
5
6
7
8
   {
      "text": "Code Speciop", "box": [49, 975, 160, 799], 
      "linking": [], "label": "question", "id": 28
   }, 
   {
      "box": [121, 59, 252, 25], "text": "02328620", 
      "label": "question", "linking": [[5, 47]], "id": 5
   }
Synthetic Image Synthetic Image with Blocks
example 2 example 2

Looking more closely to the synthetic annotations and images, I noticed:

  • It generates many instances of well formed JSON annotations. This indicates the LM was able to learn key structures like:
    • Opening and closing brackets: “{, }, [, ]”
    • Quoted strings, ex: “text”
    • Collections of four integers, ex: “[12,34,23,78]”
    • Even unicode text was generated in some instances.
  • The model was also able to learn basic JSON properties of the annotations such as: box, text, id, linking, label.
  • The final synthetic JSON annotations, were required to be not only well formed JSON, but also including the key JSON properties (box, text, id) of a valid annotation.
  • After applying all validations, between 30% and 35% of the synthetic annotations were fully valid.
  • Using only the valid generated JSON annotations, and discarding the rest, I was able to avoid any post-processing heuristics to complete or fix them.
  • It completely generates random JSON annotations, which is perfect for generating synthetic data.

Other aspects on which the model did not perform well:

  • Generates poor bounding boxes, in most cases:
    • Too wide for the corresponding text.
    • Too tall for the corresponding text.
    • The sequence of coordinates left, top, right, bottom was not respected, therefore causing the bounding box to flip.
  • After inspecting the ground truth dataset, I noticed the presence of too wide or tall bounding boxes as.
  • Text is sometimes on top of each other, and does not follow a good document layout.
  • The actual generated document text is not valid english.

Learning & Conclusions

The project was useful as it allowed me to:

  • Take a high level idea and see it all the way through a deployed API.
  • Experience that end to end ML Based services include many more components and aspects than the core ML Model.
  • Experience how some of the tooling can greatly simplify the amount of work (MLFlow rocks!).
  • Have the opportunity to learn and refresh a useful set of tools to automate the ML workflow.

In case anyone feels like checking some of the scripts or pieces of code as a study or reference material, here is the source code link.

Further work

This is just scratching the surface of what might be possible to generate synthetic annotated documents.

There are different ways to improve the whole service:

  • Do hyper-parameter tuning and improve the performance of the current architecture.
  • Try other more powerful models (Transformers, Graph Networks, GANs, etc)
  • Explore ideas so the model captures the linking among different blocks of text.
  • Explore ideas so the model can use both JSON annotations as well as images to generate and capture a richer representation of the documents.
  • Explore ideas so the model supports conditioning, that is, when generating synthetic documents, condition the model to generate certain types of layouts, etc.
  • Ultimately the synthetic data should be tried on a downstream task and find out if it actually allows the models to be trained effectively.
  • Add realistic image noise using GANs or other image filters.
  • Add font types and style variations when rendering the image for more realistic images.
  • Update the model wrapper to work for larger batches and generate them into AWS S3, instead of returning them in the HTTP response.

TL;DR If a meeting is needed, have a clear goal in mind, invite the least number of people who can provide input for the topic. Start the meeting communicating the context and the goal; drive the meeting to keep it on track and productive and avoid tangential conversations. Follow up with a summary of the meeting, highlighting key ideas, agreements, decisions and next action items with key owners.

Introduction

I have not yet heard anyone saying: “I’m really glad I have 5 meetings today”. Everyone dislikes meetings, but meetings are always there, and to be fair, they are needed, at least sometimes. Meetings are typically seen as a waste of time and a distraction, but they don’t have to be, meetings are a great opportunity for collaboration. Here are some of the habits that I’ve found useful to have productive meetings.

Before the meeting - Prepare

  • Have a goal in mind: ask yourself what you are trying to achieve with the meeting. Consider these leading questions: is the purpose to be informed or inform others?, are you looking for feedback about one proposal? Are you looking to choose between competing options? Are you trying to brainstorm about an issue or new product? Are you trying to “sell” an idea or project to some folks? Are you trying to clarify and ask questions about a given topic? Keep in mind that having too many or too ambitious goals might prevent you from making progress in any direction. It is better to divide and conquer, and make sure the goal of your meeting is something manageable.
  • Use the right tool: sometimes we don’t need a meeting, and we can get the job done with a quick instant message (Slack, MS Teams, etc), or maybe just an email, or a quick phone call. A meeting allows us to discuss, and go back and forth about a relatively complex subject. A meeting might be an overkill in cases when interactions are mostly unidirectional, or not a complex topic, or unlikely to require questions and different perspectives, etc. Also, aim for shorter meetings, force yourself to cover the content in 30 mins whenever possible.
  • Have an audience in mind: If the meeting is needed, keep the audience as small as possible. Make sure each one of the attendees have some interest or contribution to the topic. Consider these leading questions: who is needed and why? What’s each invited person expected to provide: approval, feedback, support, answers? What’s the nature of the content: technical, business, strategic, tactical? is everyone equipped to follow along the topic?
  • Prepare the flow of the meeting: Gather key facts and materials, recall previously agreed decisions. Think a bit about how to present your ideas, the sequence and the level of abstraction.

During the meeting - Drive

  • Context & Intro: state where the topic of the conversation is at, what has been done, what’s pending, what’s the issue at hand. Reduce the ideas to the essential aspects, this should be a quick section just to bring everyone to the current state, and get them ready and prepared to provide the input they can offer.
  • State your goal: let people know what you are expecting from them and from the meeting in general. If the audience knows what’s expected, you have a better chance of getting the input you need.
  • Have a flow: expose ideas in natural and logical flow. Go from high level to low level details. Only give details which are relevant, at the right time, not too early. Keep in mind, the attention span of your audience is short, and the working memory of the brain is short, and it is easy to lose your audience. Avoid giving them non relevant information, or simply too much information before they are ready to digest it.
  • Compare options: when trying to decide between options, make sure you present them in the same frame of reference, with the same attributes: time, effort, complexity, flexibility, etc. So, the options can be matched and compared, otherwise the team will have a hard time contrasting them and providing input.
  • Questions: make sure to ask your questions well posed and framed, don’t ask questions to people that cannot answer, for example a question with a heavy technical aspect might not be well suited for a non-technical folk. Each team member is playing a role, with some responsibilities and knowledge, think about if a question can actually be answered by a given person.
  • Drive: you have a goal in mind, drive towards it, keep the conversation on track, avoid people jumping randomly off topic, stay on the relevant aspects of the purpose of the meeting. Again, the more people invited to the meeting the harder it is to stay productive and on topic.
  • Postpone: identify issues that merit a whole conversation on its own and postpone them, don’t tackle everything in a single meeting, unless it is really necessary. Postpone issues/aspects and details which are not relevant to achieve the goal of meeting. Again, divide and conquer and tackle one thing at a time, so it is possible to make progress.

After the meeting - Summarize

  • Reply to all the attendees with a recording of the meeting, if possible.- Reply to all the attendees with a summary of the meeting, highlighting key ideas, agreements, decisions and next action items with key owners. If a follow up meeting is likely needed, this summary will help to recap and start from where you left off.- Of course, saying thanks to everyone who contributed to the meeting goes a long way.

Why is this important?

  • Time: is our most valuable resource. Discussions that could take 15 mins, should not take 1 hour.
  • Respect: when coming prepared to the meeting, you show respect for other people’s time, and the value of their input.
  • Convergence: bring the conversation to the point of interest, make the decision, sell your idea, etc.
  • Rework: avoid discussions about issues and topics previously decided, avoid going in circles.
  • Outcome: you have a better chance to achieve your goal of the meeting. This allows teams to collaborate effectively, get on the same page, make decisions and move on to get things done.

Key Take aways

  1. Prepare: Avoid meetings if possible, identify the audience, define the goal, gather supporting materials.
  2. Drive: Recap the context and highlight the goal, stay on topic, postpone auxiliary aspects, ask good questions.
  3. Summarize: State the agreements and solved aspects, identify outstanding aspects, list next action items with owners.

Dynamic Programming

This is not a tutorial on Dynamic Programming, but more of a practice session where we take few variants of the Coin Change problem, and step by step, break down the analysis into the fundamental insights that will lead us to put together a solution. If you are looking to brush up on DP or simply get started on it, these are two of my favorite resources Tim Roughgarden’s lectures, as well as Eric Vigoda’s lectures.

The way we’ll break down the analysis is based on the following questions or steps:

  • How can we define the sub-problems?

    DP is all about solving smaller sub-problems, so this is where we start.

  • What preliminary results need to be computed?

    Sub-problem solutions need to be calculated and stored so we can use them later to build higher level solutions.

  • What are the base cases and how do we initialize them ?

    Our base cases give us the starting point to incrementally build our solutions.

  • How to define the solution of a problem given the solutions of the sub-problems?

    This is where the magic happens, here we define how a given solution is expressed on top of smaller problem solutions.

NOTE: Even though some of these problems can be solved using recursion and other techniques, I’ve tried to use a consistent solution structure, so we can compare the different variants.

The Coin change Problem

Variant 1:

Given an unlimited supply of coins of denominations d1, d2, … , dn, we wish to make change for a value v; that is, we wish to find a set of coins whose total value is v. This might not be possible: for instance, if the denominations are 5 and 10 then we can make change for 15 but not for 12.

How can we define the sub-problems?

  • We can have sub-problems for either smaller amounts or smaller number of coins.
  • Since we have unlimited coins per denomination, let’s define our sub-problem by subtracting a given coin denomination and getting a smaller amount.
  • We can change a given amount \(a\), if the smaller amount \(a- d_i\) (after subtracting denomination \(d_i\)) can be changed.
  • For example: given initial amount 3, and denominations [1,2], the question: is there a coin change for the amount of 3?, can be translated into, can we change 3-1=2?, which then translates into, can we change 2-2=0?, which turns out to be possible.

What preliminary results need to be computed?

  • In DP problems, solutions to problems are computed by combining solutions to sub-problems. This usually requires the storage of some preliminary results.
  • In many cases, the final answer comes directly from the stored results once we’ve computed all the sub-problems.
  • This suggests the idea of storing true/false values for our sub-problems.
  • Let’s define results[a] as the true/false value, indicating if it is possible to do the coin change for the amount \(a\).
  • Since we have unlimited coins, the results[a] does not depend on which coins or how many coins we have used so far.

What are the base cases and how do we initialize them ?

  • If we keep on subtracting coin denominations to the current amount, we eventually reach results[0]. Of course, we must avoid endless subtractions of coins, getting into negative amounts.
  • We can see this as the indication that we were able to effectively do the coin change, otherwise we would have ended with an amount different from 0 for which there are no coins. For example: the remaining amount is 2, and the available denominations are [3,5].
  • We then get to our base case results[0] = True.
  • Since not all the amounts can be changed, given any set of denominations, let’s use False as our default value for all our results.

How to define the solution of a problem given the solutions of the sub-problems?

  • We can say, that a given amount \(a\) can be changed, if the amount after subtracting any of the given denominations can also be changed.
  • Let be \(a\) a given amount, and \(d_i\) the \(i-th\) denomination, then the recurrence expression is:
\[results[a] = results[a-d_0] \vee results[a-d_1] ... \vee results[a-d_n], \quad for \quad d_i <= a\]

Solution

One way to implement this solution in Python is

1
2
3
4
5
6
7
8
def coin_change(denominations, amount):
    results = [False]*(amount+1)
    results[0] = True
    for a in range(1, amount+1):
        for d in denominations:
            if d <= a:
                results[a] = results[a] or results[a - d]
    return results[amount]

Comments:

  • It is common in DP problems, that the length of the results array is padded with an extra slot for the base case, in this problem we use amount+1 array elements to include amount = 0.
  • The final result comes nicely as results[n]. This is not always the case, but it helps to assume this as the starting point.

Variant 2

Given a supply of one coin per each denominations d1, d2, … , dn, we wish to make change for a value v; that is, we wish to find a set of coins whose total value is v. This might not be possible: for instance, if the denominations are 5 and 10 then we can make change for 15 but not for 12.

How can we define the sub-problems?

  • Since coin denominations can be used at most once, the sub-problems need to be expressed as a function of the available coins and a smaller amount.
  • One smaller sub-problem can be defined as changing the same amount, with fewer coins.
  • For example: Initial amount 3, and denominations [1,2,5], the question: can we change 3 using the coins [1,2,5]?, translates into, can we change 3 using only coins [1,2], which we already know can be done. The reason is we cannot use the coin 5 to change a smaller amount.
  • Another sub-problem can be defined by subtracting one of the coin denominations, but this time, using the set of coins except the one we just used.
  • For example: Initial amount 3, and denominations [1,2], the question: can we change 3 using the coins [1,2]?, translates into, can we change 3-2=1 using only [1], which then translates into, can we change 1-1=0? Indicating it is possible.

What preliminary results need to be computed?

  • Since we need to keep track of both the smaller amount, as well as the available coins, we need a 2D array to represent our sub-problems.
  • Let’s define results[a][j] as the true or false value, indicating if it is possible to do the coin change for the amount \(a\), using the first j coin denominations (including j).

What are the base cases and how do we initialize them ?

  • The first case is, a zero amount with some remaining coins results[0][j] which is True, as a zero amount can be changed with anything.
  • The second case is, a non zero amount with no coins results[a][0] which is False, as it is not possible to change any amount with no coins.
  • We can initialize everything as False, and only the first row of results as True.

How to define the solution of a problem given the solutions of the sub-problems?

  • Let be \(a\) a given amount, and \(j\) the index of the j-th denomination, then the recurrence expression is:

    results[a][j] = results[a][j-1] if d_j > a or
    results[a][j] = results[a-d_j][j-1] or results[a][j-1] if d_j <= a

  • The first case is clear, as we cannot substract d_j from a, we just ignore the j-th coin.
  • The first part of the second case (results[a-d_j][j-1]) is clear, we subtract the j-th coin, and continue with the smaller amount and the remaining coins.
  • The second part of the second case (results[a][j-1]), begs the question: do we really to consider it again?
  • The answer is yes, consider the case a=2 and denominations [2,1], we can certainly discount either 1 or 2 from 2, but 2-1=1 gives us no answer, as we cannot use the denomination 1 a second time. Forcing us to explore the sub-problem of trying the same amount 2 with the single denomination of [2].

Solution

One way to implement this solution in Python is

1
2
3
4
5
6
7
8
9
10
11
12
def coin_change(denominations, amount):
    num_denom = len(denominations)
    results = [[False]*(num_denom+1) for x in range(amount+1)]
    results[0] = [True]*(num_denom+1)
    for a in range(1, amount+1):
        for j in range(1, num_denom+1):
            d = denominations[j-1]
            if d > a:
                results[a][j] = results[a][j-1]
            else:
                results[a][j] = results[a-d][j-1] or results[a][j-1]
    return results[amount][num_denom]

Comments:

  • Compare how the results turned out to be a 2D array in this variant, compared to 1D in the first variant.
  • Suggestion, think about how many things you need to keep track of, for example, in the variant #1, keeping track of the smaller amount was enough, but in the second one it was not. These things to track, can be seen as the “degrees of freedom” of the problem, and usually influences the number of dimensions of the data structure to solve the problem.
  • Notice how the recurrence expression was broken into different cases (d_j > a vs d_j <= a). Also, notice how there two cases, in which the j-th is either part of the solution or not.
  • Again, the final result comes nicely as results[amount][num_denom].

Take away

  • Keep an eye on 1D vs 2D arrays for your results, think about what needs to be remembered for your sub-problems, and start simple.
  • Solutions are built bottom up, starting with the base cases.
  • Break the recurrence expression into cases as needed.
  • Start by defining your cached results such that they give you the final answer directly.
  • The interplay of the base cases, default values, the recurrence expression is where the magic of DP happens. Minor changes on any of these, make the solution to work for a different problem.

References

Motivation

Design principles are powerful tools that guide our decisions and choices while building software systems. When applied properly, things just seem to fall into place, but when principles are applied to the wrong situation, they work against us and lead to awkward solutions.

More often than not, multiple principles can be applied to a given problem, and in some cases some of these principles might be in conflict, making it a little bit tricky to apply them.

In this post, we are interested in contrasting the Fail Fast vs Self Healing1 principles. Specifically, how should a service A handle the case when it tries to contact a service B to retrieve critical information during its bootstrap phase, and service B is unavailable ? Do we make service A Fail Fast and stop, or do we allow it to come up and keep on retrying until eventually service B is available and Self Heal ?

Fail Fast

In this context, this means service A stops running, and any external service will see it as unavailable. This will likely happen in conjunction with a critical notification, and will require some external entity (human or automated) to correct the issue and restart the service A after service B has recovered.

To better understand the implications of this approach, let’s see what would happen if we had N number of services, with M interdependencies that evolve over time, eg. new services come, dependency from service i to j is established, others are removed, etc.

Having an external script that automatically restart the failing services, still leaves the door wide open for a situation when dependent services being restarted see one or more of their dependencies as unavailable, creating a race condition that leaves no guarantees to converge to having all the services up and running.

This makes the system fragile, as a given failing service can cause transitive failures to other services, even when they don’t depend on it directly. It also creates the problem of having to control the order in which the services have to be started. We could ask, can we come up with solutions to this problem ?, surely we could try to solve it by keeping a very detail representation of the dependencies and have automation around it, but in my opinion this is a problem that should have not existed in the first place.

Self Heal

A better alternative is to follow the automate everything mantra, and decide to build a internal retry mechanism that would allow the service A to eventually catch up and retrieve the needed information once service B has been restarted. In this approach, no other service sees service A as unavailable, therefore the ripple effects of one failing service are gone.

It is worth noticing, that automation is used in both approaches, when using Fail Fast the dependent service fails, then an automated script can restart it. On the other hand, when using Self Healing the automation is built in inside the actual service itself.

Alerts and notifications are also needed in this approach, but the way the operator or automated service management react is different, now the problem is much simpler, just restart the failing service B or failing services, and rest assure that services that depend on them will self heal eventually.

There are still the questions of what should be behavior of service A during this self healing phase ?, and for that matter what should be the behavior of the entire system under these conditions. Well, the reality is that in distributed systems, failures are expected, not only during the bootstrap sequence, but at any given point in time. This is a challenging problem, with no silver bullet solution, but it is definitely not introduced by the application of the Self Healing principle. This is also a good reason, to equip each service with the self healing mechanisms2.

Among other options, we could embrace the graceful degradation principle and make the service A to operate providing limited functionality if possible, or even failing (Fast) all inbound requests. Other upstream services, should also behave in the same way, providing some limited functionality at a macro level.

Wrapping up

Of course, this is not about blindly choosing Self Heal over Fail Fast in every occasion, it is more about remembering that there are different ways to apply a given principle, and that when multiple ones are available, one must understand their implications with trivial and non trivial cases. It is about determining what is more valuable or critical in a given situation and then choosing the principle that helps the most to achieve it.

It is also about remembering that more often than not, a combination of self healing + graceful degradation + arbitrary bootstrap/deployment order leads to more robust and resilient distributed systems.


  1. Part of a broader principle called Automate Everything as pointed out in “Scalability Lessons from Google, eBay, and Real Time Games” by Randy Shoup 

  2. This type of of common concerns for all the services, should be provided by shared libraries, eg. an HTTP client which provides retry strategies, circuit breakers, bulkheads, etc. 

In the Review of Consistency Models - Part I we looked at:

  • Quiescent Consistency.

  • Strict Consistency.

  • Sequential Consistency.

This post covers Linearizability, Serializability, Strict Serializability and Causal Consistency.

Linearizability

Any execution is perceived in the same way by all the processes, as if all the read and write operations were executed in real-time order and respecting the program order of each one of them.

In a Linearizable data store:

  1. The write operations appear to occur instantaneously.
  2. Once the most recent value has been read, any subsequent read returns the same or an even more recent value.

The following history is Linearizable: example 2

As expected, once P2: W(x)2 takes place, any process sees the most recent value in any subsequent read operation.

On the other hand, The following history is not Linearizable: example 2

Because, in real-time ordering the operation P2:W(x)2 takes places after P1:W(x)1, and any subsequent read operation must return the value of 2.

Linearizability is composable, which means that the combination of individually Linearizable data items, is also Linearizable.

In order to develop some intuition about this property, let’s consider these two independently Linearizable histories for two different data items. example 2

example 2

If we combine them, then the composed history is also Linearizable, and there is no way of making it non Linearizable and at the same time maintain the individual ones Linearizables. example 2

Remember: Linearizability allows single operations on single objects ordered in real-time.


Serializability

Any execution is perceived in the same way by all the processes, as if groups of read and write operations were executed in some sequential order (arbitrary interleaving), respecting the program order of each one of them.

As you can see, Serializability is similar to Sequential Consistency, but at the level of group of operations (think transactions). In this case, the group of operations take place atomically in arbitrary interleaving.

Assuming that operations within parenthesis occur as a group, the following history is Serializable: example 2

As it can be rearranged in the following sequence: example 2

Which is consistent with the fact that P3:W(y)1 and P3:R(x)1 happened before P2:R(y)1 and P2:W(x)2. Notice how the group of operations were ordered arbitrarily and not respecting the real-time order of their invocations.

On the other hand, the following history would not be Serializable: example 2

As P2:R(y)1 implies that the P3 group of operations happened before P2’s group, but P3:R(x)2 implies the opposite.

Remember: Serializability allows group of operations on multiple objects ordered in arbitrary interleaving.


Strict Serializability

Any execution is perceived in the same way by all the processes, as if groups of read and write operations were executed in real-time order and respecting the program order of each one of them.

In other words, Strict Serializability borrows the real-time ordering from Linearizability and applies it to groups of operations as indicated by the Serializability definition.

The first example from Serializabilty is not Strict Serializable example 2

As it does not respect real-time ordering, but just by making a little adjustment, it becomes Strict Serializable: example 2

As this respects the fact that P2 group of operations happened before P3.

Remember: Strict Serializability allows group of operations on multiple objects ordered in real-time.


Causal Consistency

Write operations that are potentially causally related are perceived in the same way by all the processes, as if they were executed in real-time order and respecting the program order of all the processes.

In other words, Causal Consistency is similar to Linearizability, but instead of requiring that all operations must be ordered in real-time, it is limited only to causally related operations.

Two operations are causally related if the execution of one could affect the result of the other one. Consider the following examples:

  • A read operation on x, followed by a write operation on y are potentially causally related as the write value of y might be derived by the value of x.

  • A write operation on x, followed by a read operation on x are potentially causally related as the read value of x is returning the previously written value.

  • Two simultaneous write operations on different or same variables are not causally related.

  • Two operations not causally related are called concurrent.

The following history is Causally Consistent: example 2

As P1:W(x)1 and P2:W(x)2 are considered concurrent and other processes can see their writes in different order.

Just by introducing a small variation, the following history is not Causally Consistency: example 2

Since P2:R(x)1 was generated by P1:W(x)1 and might affect the result of P2:W(x)2, therefore P1:W(x)1 and P2:W(x)2 are causally related and must be seen by all the processes in the same order.

Yet, another small variation makes this history Causally Consistent: example 2

According to a real-time ordering P1:W(x)1 occurred after P2:R(x)0 and cannot affect it. Therefore, P1:W(x)1 and P2:W(x)2 are not causally related.

Remember: Causal Consistency allows single operations on single objects, but only causally related operations must be ordered in real-time.


References