Getting Started with Depot

Configuring DepotManager

The DepotManager is the entity in charge of configuring and handling file storages inside your application. To start saving files the first required step is to configure a file storage through the DepotManager.

This can be done using DepotManager.configure() which accepts a storage name (used to identify the storage in case of multiple storages) and a set of configuration options:

DepotManager.configure('default', {
    'depot.storage_path': './files'
})

By default a depot.io.local.LocalFileStorage storage is configured, LocalFileStorage saves files on the disk at the storage_path. You can use one of the available storages through the .backend option. To store data on GridFS you would use:

DepotManager.configure('my_gridfs', {
    'depot.backend': 'depot.io.gridfs.GridFSStorage',
    'depot.mongouri': 'mongodb://localhost/db'
})

Every other option apart the .backend one will be passed to the storage as a constructor argument. You can even use your own storage by setting the full python path of the class you want to use.

By default the first configured storage is the default one, which will be used whenever no explict storage is specified, to change the default storage you can use DepotManager.set_default() with the name of the storage you want to make the default one.

Getting a Storage

Once you have configured at least one storage, you can get it back using the DepotManager.get() method. If you pass a specific storage name it will retrieve the storage configured for that name:

depot = DepotManager.get('my_gridfs')

Otherwise the default storage can be retrieved by omitting the name argument:

depot = DepotManager.get()

Save and Manage Files

Saving and Retrieving Files

Once you have a working storage, saving files is as easy as calling the FileStorage.create() method passing the file (or the bytes object) you want to store:

depot = DepotManager.get()
fileid = depot.create(open('/tmp/file.png'))

The returned fileid will be necessary when you want to get back the stored file. By default the name, content_type and all the properties available through the StoredFile object are automatically detect from the argument file object.

If you want to explicitly set filename and content type they can be passed as arguments to the create method:

fileid = depot.create(open('/tmp/file.png'), 'thumbnail.png', 'image/png')

Getting the file back can be done using FileStorage.get() from the storage itself:

stored_file = depot.get(fileid)

Getting back the file will only retrieve the file metadata and will return a StoredFile object. This object can be used like a normal Python file object, so if you actually want to read the file content you should then call the read method:

stored_file.content_type  # This will be 'image/png'
image = stored_file.read()

If you don’t have the depot instance available, you can use the DepotManager.get_file() method which takes the path of the stored file. Paths are in the form depot_name/fileid:

stored_file = DepotManager.get_file('my_gridfs/%s' % fileid)

Replacing and Deleting Files

If you don’t need a file anymore it can easily be deleted using the FileStorage.delete() method with the file id:

depot.delete(fileid)

The delete method is guaranteed to be idempotent, so calling it multiple times will not lead to errors.

The storage can also be used to replace existing files, replacing the content of a file will actually also replace the file metadata:

depot.replace(fileid, open('/tmp/another_image.jpg'),
              'thumbnail.jpg', 'image/png')

This has the same behavior of deleting the old file and storing a new one, but instead of generating a new id it will reuse the existing one. As for the create call the filename and content type arguments can be omitted and will be detected from the file itself when available.

Storing data as files

Whenever you do not have a real file (often the case with web uploaded content), you might not be able to retrieve the name and the content type from the file itself, of those values might be wrong.

In such case depot.io.utils.FileIntent can be provided to DEPOT instead of the actual file, depot.io.utils.FileIntent can be used to explicitly tell DEPOT which filename and content_type to use to store the file. Also non files can be provided to FileIntent to store raw data:

# Works with file objects
file_id = self.fs.create(
    FileIntent(open('/tmp/file', 'rb'), 'file.txt', 'text/plain')
)

# Works also with bytes
file_id = self.fs.create(
    FileIntent(b'HELLO WORLD', 'file.txt', 'text/plain')
)

f = self.fs.get(file_id)
assert f.content_type == 'text/plain'
assert f.filename == 'file.txt'
assert f.read() == b'HELLO WORLD'

Depot for the Web

File Metadata

As Depot has been explicitly designed for web applications development, it will provide all the file metadata which is required for HTTP headers when serving files or which are common in the web world.

This is provided by the StoredFile you retrieve from the file storage and includes:

  • filename -> Original name of the file, if you need to serve it to the user for download.
  • content_type -> File content type, for the response content type when serving file back the file to the browser.
  • last_modified -> Can be used to implement caching and last modified header in HTTP.
  • content_length -> Size of the file, is usually the content length of the HTTP response when serving the file back.

Serving Files on HTTP

In case of storages that directly support serving files on HTTP (like depot.io.awss3.S3Storage , depot.io.boto3.S3Storage and depot.io.gcs.GCSStorage) the stored file itself can be retrieved at the url provided by StoredFile.public_url. In case the public_url is None it means that the storage doesn’t provide direct HTTP access.

In such case files can be served using a DepotMiddleware WSGI middleware. The DepotMiddlware supports serving files from any backend, supports ETag caching and in case of storages directly supporting HTTP it will just redirect the user to the storage itself.

Unless you need to achieve maximum performances it is usually a good approach to just use the WSGI Middleware and let it serve all your files for you:

app = DepotManager.make_middleware(app)

By default the Depot middleware will serve the files at the /depot URL using their path (the same as passed to the DepotManager.get_file() method). So in case you need to retrieve a file with id 3774a1a0-0879-11e4-b658-0800277ee230 stored into my_gridfs depot the URL will be /depot/my_gridfs/3774a1a0-0879-11e4-b658-0800277ee230.

Changing the base URL and caching can be done through the DepotManager.make_middleware() options, any option passed to make_middleware will be forwarded to DepotMiddleware.

Handling Multiple Storages

Using Multiple Storages

Multiple storage can be used inside the same application, most common operations require the storage itself or the full file path, so you can use multiple storage without risk of collisions.

To start using multiple storage just call the DepotManager.configure() multiple times and give each storage a unique name. You will be able to retrieve the correct storage by name.

Switching Default Storage

Once you started uploading files to a storage, it is best to avoid configuring another storage to the same name. Doing that will probably break all the previously uploaded files and will cause confusion.

If you want to switch to a different storage for saving your files just configure two storage giving the new storage an unique name and switch the default storage using the DepotManager.set_default() function.

Replacing a Storage through Aliases

Originally DEPOT only permitted switching the default storage, that way you could replace the storage in use whenever you needed and keep the old files around as the previous storage was still available. This was by the way only permitted for the default storage, since version 0.0.7 the DepotManager.alias() feature is provided which permits to assign alternative names for a storage.

If you only rely on the alternative name and never use the real storage name, you will be able to switch the alias to whatever new storage you want while the files previously uploaded to the old storage keep wFor example if you are storing all your user avatars locally you might have a configuration like:

DepotManager.configure('local_avatars', {
    'depot.storage_path': '/var/www/lfs'
})
DepotManager.alias('avatar', 'local_avatars')

storage = DepotManager.get('avatar')
fileid = storage.create(open('/tmp/file.png'), 'thumbnail.png', 'image/png')

Then when switching your avatars to GridFS you might switch your configuration to something like:

DepotManager.configure('local_avatars', {
    'depot.storage_path': '/var/www/lfs'
})
DepotManager.configure('gridfs_avatars', {
    'depot.backend': 'depot.io.gridfs.GridFSStorage',
    'depot.mongouri': 'mongodb://localhost/db'
})
DepotManager.alias('avatar', 'gridfs_avatars')

storage = DepotManager.get('avatar')
fileid = storage.create(open('/tmp/file.png'), 'thumbnail.png', 'image/png')

Note

While you can keep using the avatar name for the storage when saving files, it’s important that the local_avatars storage continues to be configured as all the previously uploaded avatars will continue to be served from there.

Performing Backups between Storages

When in need to perform a backup between two storages, the best practice is to rely on the backend specific tools. Those are usually faster than trying to copy each file one by one in python.

In case you have the need to perform backups through the DEPOT apis themselves, you can configure a second FileStorage where you can copy all the files using the second storage FileStorage.replace() method:

DepotManager.configure('local_avatars', {
    'depot.storage_path': '/var/www/lfs'
})
DepotManager.configure('backup_avatars', {
    'depot.storage_path': '/var/www/lfs_backup'
})

storage = DepotManager.get('local_avatars')
backup = DepotManager.get('backup_avatars')

for fileid in storage.list():
    f = storage.get(fileid)
    backup.replace(f, f)

Note

This backup method will be very slow compared to native backup tools of the storage in use. As it has to download the file locally to reupload it to the backup storage.