Python-Safethread

Layering black boxes for code isolation

Registered by Adam Olsen on 2008-10-25

Currently, python's implementation is split into two segments: That in C, and that in Python. Although the C part has full access to python methods and attributes, it rarely touches them on subclasses, and python code has little access to internal C details. I believe this a grossly undervalued feature, as it can serve as the basis for compiler optimizations, secure (restricted) sandboxing, and forcibly killed threads.

Blueprint information

Status:: Not started

Approver:: None

Priority:: Undefined

Drafter:: None

Direction:: Needs approval

Assignee:: None

Definition:: New

Series goal:: None

Implementation:: Unknown

Milestone target:: None

Related branches

Related bugs

Sprints

Whiteboard

My current intent is that black boxes be layered from the bottom up. If a type is defined in a lower layer it will explicitly list attributes and methods to be exposed to higher layers. The higher layer will see those attributes and methods as normal, but anything not exported is completely invisible (no name conflicts and no way to access it). In contrast, a type defined in a higher layer cannot export anything to the lower layers (they ignore exporting), but the lower layer can access anything desired if done so explicitly.

Types defined sibling layers have no higher-lower relationship with one another, and as such are completely opaque to each other. Any interaction they desire must be through an interface defined by a common lower layer.

Benefits:
* static code analysis, ie compiling. The application, expected to be dynamic, can be completely separated from the static language implementation. This allows the language implementation to be compiled statically. Likewise, if the app is static, but loads plugins, the plugins can be isolated.
* secure (restricted) sandboxes. Because the lower layers can completely control what attributes are exposed they can build proxy objects that expose nothing. This can be done by the app, rather than building the proxy into the language implementation. A capability system could be built on this.
* forcibly killable threads. If critical code for resource management is moved into a lower layer then the higher layer need not be relied on to clean up. Any active functions of the higher layer can have their stack forcibly unwound, bypassing try/finally blocks, so long as any stack segments involving a lower layer (such as via a with-statement) are gracefully exited with an appropriate exception. This assumes the lower layer only blocks in cancellable functions, which becomes a requirement for a robust implementation.

Problems
this may be too strict: imagine every module in the stdlib being its own black box; which black box does the app build on? What if some modules use each other? Or they're not intended to use each other, but the app passes objects around? There needs to be to access anybody's exported __eq__. While the core may want to ensure type safety, most the stdlib probably doesn't care. So what if you pass your own type to something expecting a datetime?

Another problem is what box does getattr run in? It seems that every box should get its own getattr. Ditto similar builtins and operators. That might need to be extensible though, which is a big issue.. it implies being able to implicitly declare functions as being per-box. Note that you can't simply use the caller's box: the function may get passed to a lower level box, which although it trusts the function itself, it doesn't trust the code that gave it.

An alternative approach would be to make single-underscore-prefix into box-private while normal attributes would be public. If you want to avoid using untrusted attributes you need to always use private names, and perhaps have a property to map them together. This still needs per-box getattr, but most other functions use only public names and thus need no modification. It also implies private names go into a separate __dict__, but that's likely implied anyway.

[Augh, this thing has no sense of changes over time! Nor does it have a history if I delete anything. Alas..]

You could impose memory limits by attributing all created objects to the thread they're created in. The penalty would only be activated when trying to allocate the nested box though; a low-level (outer) box should be able to keep going, over-allocating the nested box's limit. This could be extended to include CPU limits or possibly to switch between "contexts" in a single thread. Multiple nestings could be done by chaining (the consumption of a box is not only its own objects, but those of any box nested inside as well.)

The one caveat is that if you create a string, which gets interned, then other threads intern the same thing (reusing your instance), you'll get charged for it even though you never directly exported it. Also, it's not clear what would result from extending an existing list attributed to another box.

This assumes some sort of proxying is used as part of a broader security model, to prevent sibling boxes from abusing each other's limits.

(?)

Work Items

This blueprint contains Public information

Everyone can see this information.

Subscribers

No subscribers.