Have your say - async support in Apache Libcloud
One of the big requests whilst we were replacing httplib
with the requests
package in 2.0 was why didn’t
we use a HTTP library that supports asynchronous API calls.
The intention for 2.0 and replacing the HTTP backend classes was to improve the usability of the project, by making SSL certificates easier to manage, improving the maintainability of our source code by using an active 3rd party package and also improving performance and stability.
Apache Libcloud already has documentation on threaded libraries like gevent and callback-based libraries like Twisted, see using libcloud in multithreaded environments for examples.
PEP 492, implemented in Python 3.5 provides a new coroutine protocol using methods,
__await__
for classes, a coroutine method wrapper, or a method that returns a coroutine object.
Also async iterators and context managers
have been introduced.
We would like to take advantage of the new language features by offering APIs in Apache Libcloud without breaking backward compatibility and compatibility for users of <Python 3.5.
Use cases for this would be:
- Being able to fetch
Node
orStorageObject
s from multiple geographies or drivers simultaneously. - Being able to quickly upload or download storage objects by parallelizing operations on the
StorageDriver
. - Being able to call a long-running API method (e.g. generate report), whilst running other code.
Design 1 - async context managers PR 1016
This design would allow drivers to operate in 2 modes, the first is for synchronous method calls, they return list or object
data as per usual. The second mode, API methods like NodeDriver.list_nodes
would return a coroutine object
and could be awaited or gathered using an event loop.
import asyncio
from integration.driver.test import TestNodeDriver
from libcloud.async_util import AsyncSession
driver = TestNodeDriver('apache', 'libcloud')
async def run():
# regular API call
nodes = driver.list_nodes()
async with AsyncSession(driver) as async_instance:
nodes = await async_instance.list_nodes()
assert len(nodes) == 2
loop = asyncio.get_event_loop()
loop.run_until_complete(run())
loop.close()
Design 2 - Additional methods in each driver for coroutines PR 1027
This is the second design concept for async support in Libcloud.
The concept here is to have Asynchronous Mixins, LibcloudConnection
uses requests and LibcloudAsyncConnection
uses aiohttp for async transport see
The LibcloudAsyncConnection is an implementation detail of AsyncConnection, which is the API for the drivers to consume see
The drivers then use this mixin for their custom connection classes, e.g.
class GoogleStorageConnection(ConnectionUserAndKey, AsyncConnection):
...
They then inherit from libcloud.storage.base.StorageAsyncDriver
, which uses a new set of base methods, e.g. iterate_containers_async
and can be implemented like this:
async def iterate_containers_async(self):
response = await self.connection.request_async('/')
if response.status == httplib.OK:
containers = self._to_containers(obj=response.object,
xpath='Buckets/Bucket')
return containers
raise LibcloudError('Unexpected status code: %s' % (response.status),
driver=self)
Now the consumer can more or less do this:
from libcloud.storage.providers import get_driver
from libcloud.storage.types import Provider
import asyncio
GoogleStorageDriver = get_driver(Provider.GOOGLE_STORAGE)
driver = GoogleStorageDriver(key=KEY, secret=SECRET)
def do_stuff_with_object(obj):
print(obj)
async def run():
tasks = []
async for container in driver.iterate_containers_async():
async for obj in driver.iterate_container_objects_async(container):
tasks.append(asyncio.ensure_future(do_stuff_with_object(obj)))
await asyncio.gather(*tasks)
loop = asyncio.get_event_loop()
loop.run_until_complete(run())
loop.close()
Design 3 - Initializer with “async” mode
This option is similar to 2, except that if a driver is instantiated with “async=True
”,
then all driver class methods would return coroutine objects. Internally, it would
patch the Connection class with the AsyncConnection class.
The downside of this is that all method calls to a driver would need to be awaited or used by an event loop.
from libcloud.storage.providers import get_driver
from libcloud.storage.types import Provider
import asyncio
GoogleStorageDriver = get_driver(Provider.GOOGLE_STORAGE)
driver = GoogleStorageDriver(key=KEY, secret=SECRET, async=True)
def do_stuff_with_object(obj):
print(obj)
async def run():
tasks = []
async for container in driver.iterate_containers():
async for obj in driver.iterate_container_objects(container):
tasks.append(asyncio.ensure_future(do_stuff_with_object(obj)))
await asyncio.gather(*tasks)
loop = asyncio.get_event_loop()
loop.run_until_complete(run())
loop.close()
Give us feedback
Got a better idea? Have an API or design, the question we’re asking is “if you wanted to use Libcloud for an async application, what would the code look like?” This helps us design the API and the implementation details can follow.
Feel free to comment on the mailing list or on the pull requests, or raise your own pull-request with an API design.