Proxying Python file-like objects for fun and profit

As part of a project I’m working on, I wanted to be able to do some “side processing” while writing to a file-like object. The processing is basically checksumming on-the-fly. I’m essentially doing something like:

what I’d like is to be able to also get the data read from source and use hashlib’s update mechanism to get a checksum of the object. The easiest way to do it would be using temporary storage (an actual file or a StringIO), but I’d prefer to avoid that since the files can be quite large. The second way to do it is to read the source twice. But since that may come from a network, it makes no sense to read it twice just to get the checksum. A third way would be to have destination be a file-like derivative that updates an internal hash with each read block from source, and then provides a way to retrieve the hash.

Instead of creating my own file-like where I’d mostly be “passing through” all the calls to the underlying destination object (which incidentally also writes to a network resource), I decided to use padme which already should do most of what I need. I just needed to unproxy a couple of methods, add a new method to retrieve the checksum at the end, and presto.

A first implementation looks like this:

This however doesn’t work for reasons I was unable to fathom on my own:

This is clearly because super(sha256file, self) refers to the *class* and I need the *instance* which is the one with the write method. So Zygmunt helped me get a working version ready:

here’s the explanation of what was wrong:

– first of all the exception tells you that the super-object (which is a relative of base_proxy) has no write method. This is correct. A proxy is not a subclass of the proxied object’s class (some classes cannot be subclasses). The solution is to call the real write method. This can be accomplished with type(self).__proxiee__.write()

– second of all we need to be able to hold state, namely the hash attribute (I’ve renamed it to _hash but it’s irrelevant to the problem at hand). Proxy objects can store state, it’s just not terribly easy to do. The proxied object (here a file) may or may not be able to store state (here it cannot). The solution is to make it possible to access some of the state via standard means. The new (small) satateful_proxy class implements __setattr__ and __delattr__ in the same way __getattribute__ was always implemented. That is, those methods look at the __unproxied__ set to know if access should be routed to the original or to the proxy.
– the last problem is that __unproxied__ is only collected by the proxy_meta meta-class. It’s extremely hard to change that meta-class (because padme.proxy is not the real class that you ever use, it’s all a big fake to make proxy() both a function-like and class-like object.)

The really cool thing about all this is not so much that my code is now working, but that those ideas and features will make it into an upcoming version of Padme 🙂 So down the line the code should become a bit simpler.

Leave a Reply