The filesystem on ACID
AcidFS allows interaction with the filesystem using transactions with ACID semantics. Git is used as a back end, and AcidFS integrates with the transaction package allowing use of multiple databases in a single transaction.
The motivation for this package is the fact that it often is convenient for certain very simple problems to simply write and read data from a fileystem, but often a database of some sort winds up being used simply because of the power and safety available with a system which uses transactions and ACID semantics. For example, you wouldn’t want a web application with any amount of concurrency at all to be writing directly to the filesystem, since it would be easy for two threads or processes to both attempt to write to the same file at the same time, with the result that one change is clobbered by another, or even worse, the application is left in an inconsistent, corrupted state. After thinking about various ways to attack this problem and looking at Git’s datastore and plumbing commands, it was determined that Git was a very good fit, allowing a graceful solution to this problem.
In a nutshell:
All of the above limitations are a result of the locking used to synchronize commits. For the most part, during a transaction, nothing special needs to be done to manage concurrency since Git’s storage model makes management of multiple, parallel trees trivially easy. At commit time, however, any new data has to be merged with the current head which may have changed since the transaction began. This last step should be synchronized such that only one instance of AcidFS is attempting this at a time. The mechanism, currently, for doing this is use of the fcntl module which takes advantage of an advisory locking mechanism available in Unix kernels.
AcidFS is easy to use. Just create an instance of acidfs.AcidFS and start using the filesystem:
import acidfs
fs = acidfs.AcidFS('path/to/my/repo')
fs.mkdir('foo')
with fs.open('/foo/bar', 'w') as f:
print >> f, 'Hello!'
If there is not already a Git repository at the path specified, one is created. An instance of AcidFS is not thread safe. The same AcidFS instance should not be shared across threads or greenlets, etc.
The transaction package is used to commit and abort transactions:
import transaction
transaction.commit()
# If no exception has been thrown, then changes are saved! Yeah!
Note
If you’re using Pyramid, you should use pyramid_tm. For other WSGI frameworks there is also repoze.tm2.
The transaction package has built in support for providing metadata about a particular transaction. This metadata is used to set the commit data for the underlying git commit for a transaction. Use of these hooks is optional but recommended to provide meaningful audit information in the history of your repository. An example is the best illustration:
import transaction
current = transaction.get()
current.note('Added blog entry: "Bedrock Bro Culture: Yabba Dabba Dude!"')
current.setUser('Fred Flintstone')
current.setExtendedInfo('email', 'fred@bed.rock')
A users’s name may also be set by using the setExtendedInfo method:
current.setExtendedInfo('user', 'Fred Flintstone')
The transaction might look something like this in the git log:
commit 3aa61073ea755f2c642ef7e258abe77215fe54a2
Author: Fred Flintstone <fred@bed.rock>
Date: Sun Sep 16 22:08:08 2012 -0400
Added blog entry: "Bedrock Bro Culture: Yabba Dabba Dude!"
An instance of AcidFS exposes a transactional filesystem view of a Git repository. Instances of AcidFS are not threadsafe and should not be shared across threads, greenlets, etc.
Paths
Many methods take a path as an argument. All paths use forward slash / as a separator, regardless of the path separator of the underlying operating system. The path / represents the root folder of the repository. Paths may be relative or absolute: paths beginning with a / are absolute with respect to the repository root, paths not beginning with a / are relative to the current working directory. The current working directory always starts at the root of the repository. The current working directory can be changed using the chdir() and cd() methods.
Constructor Arguments
repo
The path to the repository in the real, local filesystem.
head
The name of a branch to use as the head for this transaction. Changes made using this instance will be merged to the given head. The default, if omitted, is to use the repository’s current head.
create
If there is not a Git repository in the indicated directory, should one be created? The default is True.
bare
If the Git repository is to be created, create it as a bare repository. If the repository is already created or create is False, this argument has no effect.
name
Name to be used as a sort key when ordering the various databases (datamanagers in the parlance of the transaction package) during a commit. It is exceedingly rare that you would need to use anything other than the default, here.
Open a file for reading or writing.
Implements the semantics of the open function in Python’s io module, which is the default implementation in Python 3. Opening a file in text mode will return a file-like object which reads or writes unicode strings, while opening a file in binary mode will return a file-like object which reads or writes raw bytes.
Because the underlying implementation uses a pipe to a Git plumbing command, opening for update (read and write) is not supported, nor is seeking.
A context manager that changes the current working directory only in the scope of the ‘with’ context. Eg:
import acidfs
fs = acidfs.AcidFS('myrepo')
with fs.cd('some/folder'):
fs.open('a/file') # relative to /some/folder
fs.open('another/file') # relative to /
Return list of files in indicated directory. If path is omitted, the current working directory is used.
Create a new directory, including any ancestors which need to be created in order to create the directory with the given path.
Returns boolean indicating whether a file or directory exists at the given path.