Traversable attributes in Pykka

In Pykka 0.13–which was released almost two weeks ago–traversing the attributes of an actor is about 8.3 times faster than it used to be. To paraphrase Apple: “8.3X faster. That’s amazing!” (Update: This was written a couple of hours before the news of Jobs’ passing arrived. May he continue to inspire us.)

So, what is “traversable attributes”? Let’s take a few steps back.

If we were a conservative actor adhering strictly to the actor model, we surely wouldn’t share our attributes with anybody else. We would expect other actors to send serializable messages to us, asking nicely to get the value of the attribute, or maybe asking for something else. Of course, the other actors wouldn’t even know the attribute existed unless we told them, and even then they wouldn’t ever dream of requesting a reference to our attribute or altering the attribute directly. It would be indecent. It would break the rules of the actor model. It would be unsafe.

When using Pykka, you can keep to the traditional way of passing messages back and forth between the actors. You start the actor by calling the Actor.start() class method, which returns an ActorRef object. This object can safely be passed around and even shared between threads. The ActorRef object got two methods for sending messages to the actor, called send_one_way() and send_request_reply(). This is nice enough by itself, and it gives you a way to build concurrent applications which is easier to reason about–just as promised by advocates of the actor model–than when you do the thread and lock management dance. You can quickly hack together a simple actor implementation like this from scratch for each and every application you make using e.g. Thread and Queue. I’ve done this a couple of times, and it works.

But, I wanted a bit more, so I created Pykka.

First, I wanted to get rid of verbose dict messages all over my code base. I just wanted to call regular methods and access regular attributes on regular objects. Pykka provides a safe way of doing this, called ActorProxy. An ActorProxy is nothing more than a wrapper around an ActorRef. It does all it’s magic by sending messages to the actor, just like you used to do yourself.

from pykka.actor import ThreadingActor

class X(ThreadingActor):
    y = 1
>>> x = X.start().proxy()
>>> x.y.get()
1
>>> x.stop()

Second, I wanted to be able to organize actors like regular code, e.g. by splitting them into multiple classes. Imagine a running actor a which have the attribute b. The “subobject” b have the method c().

from pykka.actor import ThreadingActor

class B(object):
    def c(self):
        return 1 + 1

class A(ThreadingActor):
    b = B()

If you call a.b.c(), the following happens:

  1. We send a message to actor a requesting attribute b, and immediately get a future object back which is our handle to the result which will be available in the future.

  2. Actor a gets the message, looks up attribute b, and returns a copy of the object referenced by the b attribute.

  3. We call c() on the future, but the Future class doesn’t have an attribute called c, so it fails. Alternatively, we use the future correctly and call get() on the future to get the real result, a copy of b. Then we call c() on the copy of b. The method c() is now running, but it is running in the caller’s thread, and not in the actor a like I wanted it to do.

>>> a = A.start().proxy()
>>> a.b.c()
AttributeError: 'ThreadingFuture' object has no attribute 'c'
>>> a.b.get().c()
2   # Result calculated in the caller's thread
>>> a.stop()

The simple attribute access that the ActorProxy provides isn’t enough to make this work.

To make the a.b.c() method call be executed in the actor a instead of the caller’s thread, we need to traverse attribute b without having it returned to us, so that we can get to c() while still inside the actor a, and call its method c(). We need what we in Pykka call traversable attributes.

To make an attribute traversable, the only thing we need to do is to mark it as such by adding the attribute pykka_traversable to the traversable attribute:

from pykka.actor import ThreadingActor

class B(object):
    pykka_traversable = True
    def c(self):
        return 1 + 1

class A(ThreadingActor):
    b = B()
>>> a = A.start().proxy()
>>> a.b.c().get()
2   # Result calculated by the actor `a`
>>> a.stop()

When you access a regular attribute of a Pykka actor, you just get a future object, which, when you call get() on it, will return a copy of the attribute. When you access a traversable attribute of a Pykka actor, you get a brand new ActorProxy which wraps the same ActorRef, but method calls and attribute accesses on the new proxy object will work on the actor’s attribute instead of the actor itself.

Speeding up access to traversible attributes

If you’re still following, you’re maybe wondering how we sped up access to traversable attributes with a factor of 8.3. The answer is a few lines up: “you get a brand new ActorProxy.”

So, why should that matter?

If you split your actor into multiple classes using traversable attributes, you’re probably going to use each traversable attribute more than once. Maybe really often. Turns out, creating brand new ActorProxy objects for the same attribute over and over again is kind of wasteful.

How did you find out?

John Bäckstrand was irritated by Mopidy being almost unusable on his slow system, and attacked the problem in the scientific way: by measuring where the bottleneck was. John quickly pointed out that access to second-level attributes, which required the traversal of a traversable attribute, was five times slower than access to first-level attributes, which didn’t involve traversable attributes. This observation made it obvious that the creation of new ActorProxy objects whenever we accessed traversable attributes–even though the proxy objects didn’t contain any state and was fully reusable–probably needed refinement.

To be sure we fixed the issue, we started by writing a performance test which compared attribute access with and without the traversal of a traversable attribute.

# Using Pykka 0.12.4
test_direct_plain_attribute_access took 0.958s
test_direct_callable_attribute_access took 0.977s
test_traversible_plain_attribute_access took 8.259s
test_traversible_callable_attribute_access took 8.344s

Then, the fix was short and easy: Cache and reuse ActorProxy objects.

diff --git a/pykka/proxy.py b/pykka/proxy.py
index 27c075b..4c6b908 100644
--- a/pykka/proxy.py
+++ b/pykka/proxy.py
@@ -58,6 +58,7 @@ class ActorProxy(object):
         self.actor_ref = actor_ref
         self._attr_path = attr_path or tuple()
         self._known_attrs = None
+        self._actor_proxies = {}
 
     def _update_attrs(self):
         self._known_attrs = self.actor_ref.send_request_reply(
@@ -88,7 +89,10 @@ class ActorProxy(object):
         if attr_info['callable']:
             return _CallableProxy(self.actor_ref, attr_path)
         elif attr_info['traversable']:
-            return ActorProxy(self.actor_ref, attr_path)
+            if attr_path not in self._actor_proxies:
+                self._actor_proxies[attr_path] = ActorProxy(
+                    self.actor_ref, attr_path)
+            return self._actor_proxies[attr_path]
         else:
             message = {
                 'command': 'pykka_getattr',

The result was immediate: The performance test for traversable attribute access showed an 8.3X improvement.

# Using Pykka 0.13
test_direct_plain_attribute_access took 0.953s
test_direct_callable_attribute_access took 0.988s
test_traversible_plain_attribute_access took 0.984s
test_traversible_callable_attribute_access took 1.006s

Mopidy use Pykka’s traversable attributes heavily to organize its backend code. Obviously, we try to avoid wiring up lots of actors in Mopidy’s unit tests, but we’ve been lazy and use some actors in the tests. These five lines of code inserted at the right place in a dependency made Mopidy’s test suite run 20% faster, and made John’s use case run 166% faster.

We could use more of five-line patches like that :-)