IPA: invariant-preserving applications for weakly consistent replicated databases

IPA: invariant-preserving applications for weakly consistent replicated databases Balegas et al., VLDB’19

IPA for developers, happy days!

Last we week looked at automating checks for invariant confluence, and extending the set of cases where we can show that an object is indeed invariant confluent. I’m not going to re-cover that background in this write-up, so I suggest you head over there for a quick catch-up before reading on if you missed it first time around.

Today’s paper is very much in same spirit, building on the same foundation of invariant confluence (I-Confluence), and also on Indigo which introduced an annotation model for application invariants, a invariant violation avoidance mechanism using lock reservations and escrows, and limited support for repairing violations that do happen.

With Invariant-Preserving Applications (IPAs), Balegas et al. introduce new mechanisms for avoiding invariant violations and for repairing them when detected, based on CRDTs. There’s also a very nice looking developer workflow to help ensure you’ve got all the bases covered. At the end of the day, you get the dual benefit of higher throughput and lower latency (as compared to coordination-based approaches) coupled with knowing that there isn’t some nasty invariant-violating concurrency bug waiting to bite you (so long as you specified your invariants and operation effects correctly of course!).

Having your cake and eating it too

… it remains difficult to develop applications under weak consistency. Several studies show that, in many applications, concurrent executions lead to the violation of application invariants, resulting in inconsistent states.

At this point you have a few choices:

  • The common path: turn a blind eye to the possibility and get bitten later on by strange inconsistencies in your data
  • The belt-and-braces approach: broadly constrain concurrency (i.e. coordinate heavily) in order to avoid the possibility of invariant violations – thus reducing availability and latency
  • The considered (and time-consuming in development) approach: take time to understand your invariants, figure out which parts of your system can be coordination free, and which parts still need it, and design accordingly. For bonus points, revisit your application design so that more parts of it can be coordination free.

IPA takes the third way of course, and gives you new options for tweaking your application design, so that you can have your cake and eat it too:

This paper proposes a novel approach for preserving application invariants under weak consistency that does not impact the availability and latency of applications… To help programmers adopt our approach, we propose a methodology for modifying applications. The key element of the methodology is our invariant-preservation analysis (IPA) and static analysis tool that relies on information about the application, including invariants and operations, to identify which operations might lead to invariant violations and to suggest modifications to the operations to prevent those violations from occurring.

Coordination avoidance

To avoid coordination overheads we need to ensure that all operations are invariant preserving under merge. One of IPA’s neat tricks is allowing invariant violation within a transaction, by also including the logic to repair such a violation if it does occur.

Our insight is that in many situations the effects to restore the database integrity can be applied preventively alongside the original operations, repairing the invariant violation automatically in a conflicting execution.

An example makes all this much clearer. Consider an e-games platform with players that can take part in tournaments. Players can enroll and unenroll for tournaments, and admins can create and delete tournaments. But players shouldn’t be able to enroll in tournaments that don’t exist, tournaments may have a maximum capacity for players, and it shouldn’t be possible to delete a tournament that has actively enrolled players.

What happens if we have concurrent operations where a player enrols for a tournament while an admin is deleting it? As-is, the resulting state will violate our invariant. But we can ensure any violation will be automatically repaired by adding a new operation and updating merge semantics:

…restoring a tournament to its previous state can be achieved by executing a touch operation in the tournament when executing the enroll, and adopting a conflict resolution policy where the touch wins over a concurrent delete. The touch operation has no observable effect, only updating the metadata to guarantee that the concurrent execution is detected and solved according to the defined conflict resolution policy.

That’s a pretty neat idea! Although there are still some open application design questions here for me: for example, how does the admin, who thought they just deleted a tournament, find out that in fact it still exists? Shouldn’t they be notified so that they can take some action? I.e., there are application reasons why we might want to make compensating actions when a repair occurs. So I’d probably want an event I could subscribe to to tell me when such a conflict-repairing merge occurs.

Compensations / apologies

What about cases where a silent repair isn’t possible or appropriate? For that we have compensations.

With compensations, the idea is to check that the precondition holds when executing the operation in the initial replica, and to check that the invariants hold when operations are integrated remotely or when the state is read. Implementations of compensation mechanisms typically require re-executing operations multiple times, or using a leader to order operations, to ensure that replicas converge after applying a compensation. We implement compensations without any of these limitations by relying on CRDT convergence rules.

Applying repairs and compensations

IPA relies on your application being built on top of CRDTs, with operations executed in causal order. Additional updates in an operation have to executed atomically with the rest of the operation to ensure no inconsistencies can be observed. “In our prototype, we achieve this by relying on highly available transactions…”

Integrated developer workflow

Onto one of my favourite parts of the paper – the developer workflow!

It all starts with the developer annotating their object model to specify the application invariants and operation post-conditions, like this:

Then the IPA tool performs a developer-in-the-loop iterative analysis. In each iteration the tool identifies a pair of conflicting operations that might break an invariant when executed concurrently, and then it proposes a set of modifications to the application that will prevent this from happening. The developer chooses his or her preferred resolution, and the process repeats until no more conflicting operation pairs remain.

The analysis returns a new specification of the application, which contains the selected modifications, comprising both the use fo appropriate conflict resolution policies for each object and the modification to operations to avoid invariant violations… Fully patched applications can then execute in any replicated system that provides causal consistency, highly available transactions, and the necessary type-specific conflict resolution policies.

Here’s the main algorithm used in the tool loop:

Section 5 in the paper goes into a lot more detail on this, and I wish I had the space here to do it justice. It’s definitely worth checking out the full paper here if this work interests you.

Behind the scenes: new CRDTs

To support the CRDT resolutions proposed by IPA, the authors developed new extensions to existing CRDTs:

  • An extension to Add-wins sets to support the touch operation
  • An extension to the Rem-wins set to support wildcard values in the remove operation (to support the application model equivalent of cascading delete)
  • New compensation CRDTs, for example, Limited Size Set CRDTs that allow a programmer to specify a constraint that must be maintained, and the compensation to execute when the constraint is violated.

Evaluation

The following table highlights the types of invariants covered by IPA, and their usage in four applications used in the evaluation.

IPA can make unique identfier, aggregation inclusion, referential integrity and disjunction constraints I-Confluent through repair updates, and can additionally handle numerical and aggregation constraint invariants through compensations.

Our IPA analysis and tool assist the programmer via static analysis to identify which operations might lead to an invariant violation, when executed concurrently, and by suggesting modifications to the operations. Our experimental evaluation shows that the static analysis can handle large applications in reasonable time for an offline process, and that the modified applications have similar performance to their unmodified counterparts that do not preserve invariants.