Please take a few minutes to complete this short survey on service testing.

Creating Distributed Locks with DynamoDB

  • 2015-08-27

Locks are very important in distributed systems. Sometimes we want to make sure that only one job runs at a time, but we want to have the system highly available (e.g highly available cron server). Of course there are many other uses cases for distributed locks, which I'm not going to talk about. In this post I am going to show you an example of how to implement distributed locks on DynamoDB.

Why DynamoDB?

To acquire a lock you need to make sure that you can perform an atomic write operation and fail if the item already exists. DynamoDB provides conditional put/update/delete operations which are ideal for implementing locks. Since conditional operations in DynamoDB happen require a quorum we are guaranteed that conditional write will fail if lock is already taken by somebody else.

There are other ways to build locks. Probably the most popular way is to use Zookeeper for coordination. However, if you don't have Zookeeper infrastructure setup and you can relax requirements a little bit, we can use DynamoDB and not worry about maintaining additional infrastructure.

Locks

DynamoDB will serve as a state coordinator. We will need a table to store our locks. Each lock in the dynamo table will have the following attributes:

  • name name of the lock
  • expires expiration timestamp
  • uuid lock owner id

We are going to implement simple mutex on top DynamoDB. I am going to use snippets of Go/Pseudocode to illustrate these examples.

TryLock method tries to acquire a lock.

func TryLock(mutex *Mutex, created, Timestamp, expires Timestamp) error {
    key := MakeKey(mutex)

    attributes := []dynamodb.Attribute{
        *dynamodb.NewNumericAttribute("Expires", expires.String()),
        *dynamodb.NewStringAttribute("UUID", mutex.UUID),
    }

    condition := &dynamodb.Expression{
        Text: "#name <> :name OR (#name = :name AND #exp < :exp)",
        AttributeNames: map[string]string{
            "#name": "Name",
            "#exp":  "Expires",
        },
        AttributeValues: []dynamodb.Attribute{
            *dynamodb.NewStringAttribute(":name", mutex.Name),
            *dynamodb.NewNumericAttribute(":exp", created.String()),
        },
    }

    _, err := table.ConditionExpressionPutItem(key.HashKey, key.RangeKey, attributes, condition)
    return err
}

This code snippet uses goamz to talk to DynamoDB. We are doing a conditional write and the write will only succeed if expiration time in the table is lower than the time mutex was created or if names don't match. We need expiration time in case the process who holds the lock dies and never releases it. This way we can prevent the dead lock.

For long running tasks we must periodically update Expiration time so that it doesn't expire until we finish the task. We can achieve this by running a goroutine that updates expiration time only if lock UUID match.

func UpdateExpiration(mutex *Mutex, expires Timestamp) error {
    key := MakeKey(mutex)

    attributes := []dynamodb.Attribute{
        *dynamodb.NewNumericAttribute("Expires", expires.String()),
    }

    condition := &dynamodb.Expression{
        Text: "#name = :name AND #uuid = :uuid",
        AttributeNames: map[string]string{
            "#name": "Name",
            "#uuid": "UUID",
        },
        AttributeValues: []dynamodb.Attribute{
            *dynamodb.NewStringAttribute(":name", mutex.Name),
            *dynamodb.NewStringAttribute(":uuid", mutex.UUID),
        },
    }

    _, err := l.table.ConditionExpressionUpdateAttributes(key, attributes, condition)
    return err
}

Unlock operation is simply a conditional delete where name and uuid match.

To implement a mutex we do something like this.

func (m *Mutex) Lock() {
    for {
        err := TryLock(m, ...)
        if err == nil {
            return
        }

        time.Sleep(...)
    }
}

func (m *Mutex) Unlock() {
    DeleteLock(m)
}

To use the lock

func Test() {
    m := Mutex{...}
    m.Lock()
    defer m.Unlock()
    critical code
}

This code is going to work because DynamoDB performs a strongly consistent read on conditional writes. Which means that two simultaneous writes are never going to succeed if conditional attributes change. In our case it's expiration timestamp. In this particular case timestamp is in nanoseconds which makes it almost impossible for two timestamps be the same.

There a catch, you need to maintain expiration time if your critical code takes more time to run than the initial expiration time; thus we need to periodically update expiration attribute, say every half lock TTL.

Conditional writes are really awesome, we can quickly build distributed locks and only pay a few bucks a month instead of running entire Zookeeper cluster.