One of the big requests whilst we were replacing httplib with the requests package in 2.0 was why didn't we use a HTTP library that supports asynchronous API calls.

The intention for 2.0 and replacing the HTTP backend classes was to improve the usability of the project, by making SSL certificates easier to manage, improving the maintainability of our source code by using an active 3rd party package and also improving performance and stability.

Apache Libcloud already has documentation on threaded libraries like gevent and callback-based libraries like Twisted, see using libcloud in multithreaded environments for examples.

PEP 492, implemented in Python 3.5 provides a new coroutine protocol using methods, __await__ for classes, a coroutine method wrapper, or a method that returns a coroutine object. Also async iterators and context managers have been introduced.

We would like to take advantage of the new language features by offering APIs in Apache Libcloud without breaking backward compatibility and compatibility for users of <Python 3.5.

Use cases for this would be:

  • Being able to fetch Node or StorageObjects from multiple geographies or drivers simultaneously.
  • Being able to quickly upload or download storage objects by parallelizing operations on the StorageDriver.
  • Being able to call a long-running API method (e.g. generate report), whilst running other code.

Design 1 - async context managers PR 1016

This design would allow drivers to operate in 2 modes, the first is for synchronous method calls, they return list or object data as per usual. The second mode, API methods like NodeDriver.list_nodes would return a coroutine object and could be awaited or gathered using an event loop.

import asyncio

from integration.driver.test import TestNodeDriver
from libcloud.async_util import AsyncSession

driver = TestNodeDriver('apache', 'libcloud')

async def run():
    # regular API call
    nodes = driver.list_nodes()

    async with AsyncSession(driver) as async_instance:
        nodes = await async_instance.list_nodes()

    assert len(nodes) == 2

loop = asyncio.get_event_loop()
loop.run_until_complete(run())
loop.close()

Design 2 - Additional methods in each driver for coroutines PR 1027

This is the second design concept for async support in Libcloud.

The concept here is to have Asynchronous Mixins, LibcloudConnection uses requests and LibcloudAsyncConnection uses aiohttp for async transport see

The LibcloudAsyncConnection is an implementation detail of AsyncConnection, which is the API for the drivers to consume see

The drivers then use this mixin for their custom connection classes, e.g.

class GoogleStorageConnection(ConnectionUserAndKey, AsyncConnection):
    ...

They then inherit from libcloud.storage.base.StorageAsyncDriver, which uses a new set of base methods, e.g. iterate_containers_async and can be implemented like this:

        async def iterate_containers_async(self):
            response = await self.connection.request_async('/')
            if response.status == httplib.OK:
                containers = self._to_containers(obj=response.object,
                                                 xpath='Buckets/Bucket')
                return containers

            raise LibcloudError('Unexpected status code: %s' % (response.status),
                                driver=self)

Now the consumer can more or less do this:

from libcloud.storage.providers import get_driver
from libcloud.storage.types import Provider

import asyncio

GoogleStorageDriver = get_driver(Provider.GOOGLE_STORAGE)
driver = GoogleStorageDriver(key=KEY, secret=SECRET)

def do_stuff_with_object(obj):
    print(obj)

async def run():
    tasks = []
    async for container in driver.iterate_containers_async():
        async for obj in driver.iterate_container_objects_async(container):
            tasks.append(asyncio.ensure_future(do_stuff_with_object(obj)))
    await asyncio.gather(*tasks)

loop = asyncio.get_event_loop()
loop.run_until_complete(run())
loop.close()

Design 3 - Initializer with "async" mode

This option is similar to 2, except that if a driver is instantiated with "async=True", then all driver class methods would return coroutine objects. Internally, it would patch the Connection class with the AsyncConnection class.

The downside of this is that all method calls to a driver would need to be awaited or used by an event loop.

from libcloud.storage.providers import get_driver
from libcloud.storage.types import Provider

import asyncio

GoogleStorageDriver = get_driver(Provider.GOOGLE_STORAGE)
driver = GoogleStorageDriver(key=KEY, secret=SECRET, async=True)

def do_stuff_with_object(obj):
    print(obj)

async def run():
    tasks = []
    async for container in driver.iterate_containers():
        async for obj in driver.iterate_container_objects(container):
            tasks.append(asyncio.ensure_future(do_stuff_with_object(obj)))
    await asyncio.gather(*tasks)

loop = asyncio.get_event_loop()
loop.run_until_complete(run())
loop.close()

Give us feedback

Got a better idea? Have an API or design, the question we're asking is "if you wanted to use Libcloud for an async application, what would the code look like?" This helps us design the API and the implementation details can follow.

Feel free to comment on the mailing list or on the pull requests, or raise your own pull-request with an API design.