allenap.me – `flock`(2) behaviour on macOS and Linux

I’ve been using flock(2) — a system call for advisory locking — in a library I’ve been porting to Rust. The original library ran only on Linux but I wanted to support macOS too, so I set about checking my assumptions.

Ubuntu 17.04’s man page for flock(2) says:

A process may hold only one type of lock (shared or exclusive) on a file. Subsequent flock() calls on an already locked file will convert an existing lock to the new lock mode.

macOS Sierra’s man page for flock(2) says:

A shared lock may be upgraded to an exclusive lock, and vice versa, simply by specifying the appropriate lock type; this results in the previous lock being released and the new lock applied (possibly after other processes have gained and released the lock).

To me this reads that, on Linux, we can switch between shared and exclusive locks without losing out to another process¹ that’s trying to grab an exclusive lock.

But on macOS it reads like other processes will be able to wriggle in.

This rang little alarm bells, so I investigated some more. It turned out to make sense in the end and be consistent between the platforms, but have more nuanced behaviour than the man pages hint at.

I also had the feeling that I’ve investigated this before, so this time I’m writing it down.

I put together the following test code (in Rust) so that I could drive an experiment or two:

extern crate getch;
extern crate nix;

use std::fs::File;
use std::os::unix::io::AsRawFd;
use nix::fcntl::{flock,FlockArg};

fn main() {
    let file = File::create("LOCK").unwrap();
    let fd = file.as_raw_fd();
    let getter = getch::Getch::new().unwrap();
    loop {
        match getter.getch().unwrap() as char {
            's' => {
                println!("Trying to obtain shared lock...");
                flock(fd, FlockArg::LockShared).unwrap();
                println!("Obtained shared lock.");
            },
            'x' => {
                println!("Trying to lock exclusive...");
                flock(fd, FlockArg::LockExclusive).unwrap();
                println!("Obtained exclusive lock.");
            },
            'u' => {
                println!("Trying to unlock...");
                flock(fd, FlockArg::Unlock).unwrap();
                println!("Unlocked.");
            },
            'q' => {
                println!("Bye.");
                break;
            },
            chr => {
                println!(concat!(
                    "Sorry, {:?} means nothing to me. Try s to acquire a ",
                    "shared lock, x to acquire an exclusive lock, u to ",
                    "unlock, or q to quit."), chr);
            },
        }
    }
}

I compiled and ran it with cargo:

cargo init --bin lockfun
cd lockfun

# Depend on the nix crate for a safe flock(2) wrapper.
echo 'nix = "~0.8"' >> Cargo.toml
# Depend on the getch crate for, er, getch.
echo 'getch = "~0.1"' >> Cargo.toml

# Replace main.rs with the source above.
$EDITOR src/main.rs

# Build and run.
cargo run

I started a second lockfun process in another terminal window so I could see how it would interact with the first. Let’s call those processes A and B. The steps I took were something like:

Take an exclusive lock in A (by pressing x) then attempt to take a shared (press s) or exclusive lock in B ⟹ B blocks.

So far, so good, but also not particularly enlightening. Starting again:

Take a shared lock in A then take a shared lock in B ⟹ both A and B obtain shared locks.
Now, attempt to take an exclusive lock in A ⟹ A blocks.
Now, attempt to take an exclusive lock in B ⟹ B gets the exclusive lock! A now has no lock at all, but that’s not immedidately obvious: it’s still blocking to get that exclusive lock.
Downgrade B’s exclusive lock to a shared lock (press s) ⟹ B now has a shared lock, and A is still blocking.
Upgrade B’s shared lock to an exclusive lock (press x) ⟹ B now has regained that exclusive lock, and A is still out in the cold.
Release B’s lock ⟹ A finally gets its exclusive lock.

That’s more interesting.

I also tried with three lockfun processes: it’s much the same, but still fun to try out, and it’s useful to see first hand the order in which a trio of exclusive-lock wannabes eventually get their prize.

I wrote another program that looped tightly, taking first an exclusive lock, converting it to a shared lock, then converting it back again, over and over forever. Michael Kerrisk, in The Linux Programming Interface, says:

A lock conversion is not guaranteed to be atomic. During conversion, the existing lock is first removed, and then a new lock is established.

Running two instances of this program concurrently I would therefore expect the lock to occassionally change hands, but I didn’t see this happen. That doesn’t prove much, unfortunately, except that it’s rare or that I wrote a duff program. The Ubuntu man page backs up Kerrisk’s words, and the macOS man page, in the snippet already quoted at the start, implies the same thing. My advice to myself: code defensively.

Things I have learned so far:

flock(2) behaves much the same on macOS and Linux.
There’s a deadlock when all processes with shared locks try to acquire exclusive locks. The kernel breaks this deadlock by giving the exclusive lock to the last process to request it; all other processes continue to block, but they (obviously, necessarily) lose their shared lock.
The behaviour appears consistent and predictable, but multiple sources suggest that it’s not entirely predictable. Making assumptions may lead to bugs.

It’s not clear yet how this is going to affect the port I’m working on, but being informed is obviously better than being ignorant, and having a tool to play around with and demonstrate flock(2)’s behaviour is invaluable.

Or thread, locking a file descriptor opened via a separate invocation of open(2), i.e. not a file descriptor created from the other via dup(2) or similar.