Abstracting out state/data storage, so that other stores than Redis can be used

Registered by justinsb

Currently Redis dependencies are spread throughout the code. It is not clear that Redis will be faster or more scalable; it also seems that enterprises will probably want more traditional storage options (SQL databases) for reporting purposes.

This blueprint is for making the data storage pluggable, with Redis and SQL as the first two implementations.

Blueprint information

Status:
Complete
Approver:
None
Priority:
Undefined
Drafter:
None
Direction:
Needs approval
Assignee:
justinsb
Definition:
New
Series goal:
None
Implementation:
Implemented
Milestone target:
None
Started by
Soren Hansen
Completed by
Soren Hansen

Whiteboard

I'm looking at abstracting out the data storage layer so that people that don't want to use Redis don't have to. For example, I believe that traditional SQL databases will boost uptake for private clouds.

We should stay away from religious wars here (SQL vs NoSQL)! We all know what the right answer is, we just don't all agree :-)

The main issues:

Supporting secondary indexes:
Redis is a key-value store, so secondary indexes (indexes on a column other than the primary key in the relational world) have to be implemented using separate key/value pair. Currently this logic is spread throughout the codebase. I propose moving this into the data abstraction layer. Incidentally, this is not currently done atomically in Redis, so once it is moved into data storage layer fixing this might be more practical.

Rigorous column definitions:
While Redis doesn't require fixed column definitions (schemas?), it is clear that we can't rename columns once users are running this in production, nor can we just introduce new columns without thinking about existing objects/rows. It's also easy to make mistakes where you get the wrong column name (e.g. missing off an _id suffix). I propse that we declare the fields at the class level, in the same way as traditional ORMs.

Use a standard ORM for the SQL case:
Rather than reinvent the wheel, it would seem to make sense to use an existing Python ORM for the SQL interface. Then the Redis work would either be to write a new backend for that ORM, or perhaps a limited reimplementation using the same metadata. http://www.sqlobject.org/ looks simple and good, as does http://www.sqlalchemy.org/

Vish: Would be great to have a standard orm. I don't like the idea of duplicating something that many people have done and then having to support it. Hopefully there is something simple that fits our needs.

One very important piece which will relate to the data store: we need an abstraction for resource pools. These pools can be quite simple, but operations on them need to be atomic. This applies to a lot of network-related things like:
vlans
ip addresses
ports
so a generic way of adding and removing items to the pool, and then associating the used items with other data objects would be fantastic. We are doing this in various ways already (see Vpn in the AuthManager), but sets in redis fit this very well.

Another outstanding question is whether or not Users/Projects/Roles actually should be part of the generic data model that we decide on and we have some kind of adaptor to coax them into ldap or other authentication stores.

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.