Threadsafe File Consistency in Ruby
A large part of the work in the 0.7.0 release of Acts As Indexed was in guaranteeing the consistency of the index files which may be written to by many processes. I shall split this into two halfs: atomic writes, and locking writes.
I talk mostly of processes here, since most Rails hosting implementations at the moment employ multiple processes, though the same methodologies can be applied to threads.
Atomic Writes
An example: Say we have one process which writes to a file, and many processes which may be reading from that same file. If we do a simple write, it is possible that one of the reading processes may see a half-written file. While digging through the Rails source I discovered a monkey-patch on the Ruby File
class which added a method called atomic_write
.
The basic operation of this as is follows:
- Write to a temporary file.
- Move that temporary file to be the actual file we want to write to.
Since we are delegating the move operation to a system call, we can almost guarantee that any process reading the file will only see a fully written one, since all that is being changed during the move is a pointer to the file’s physical location on disk. A simple implementation of this would be thus:
1
2
3
4
5
6
7
8
9
require 'fileutils'
def atomic_write(path, temp_path, content)
File.open(temp_path, 'w+') do |f|
f.write(content)
end
FileUtils.mv(temp_path, path)
end
The Rails implementation goes a lot further than this, creating a tempfile in the OS mandated location, and making sure the newly written file has the same permissions as the original file.
Locking Writes
Another example: We have many processes, all of which can write to the same file. Our processes first read the file, and then make some change to it. A race condition for this looks as follows.
- Process A reads the file.
- Process B reads the file.
- A makes changes and writes these.
- B makes changes and writes these.
In this example, changes made by A are lost. The solution to this is to use locks, which are provided by the Ruby File
class via the flock
method.
1
2
3
4
5
6
7
8
9
10
11
12
def lock(path)
# We need to check the file exists before we lock it.
if File.exist?(path)
File.open(path).flock(File::LOCK_EX)
end
# Carry out the operations.
yield
# Unlock the file.
File.open(path).flock(File::LOCK_UN)
end
We can combine this with the atomic_write method as follows:
1
2
3
lock('my_file') do
atomic_write('my_file', 'my_file.tmp', 'Hello, World!')
end
Rails’ file store has a great implementation of this pattern, which automatically unlocks the file again in case of an exception while the lock is applied.