Friday, August 14, 2009

Object-oriented accessors considered (sometimes) harmful

One of the big principles of object-oriented (OO) programming is encapsulation, which says that data inside an object should be hidden from the outside world, and all accesses to the data should go through methods. That's where OO accessor/mutator methods come from.

Encapsulation brings a number of advantages, like being able to change the internal representation of an object, validating any attempt to change state, blurring the difference between stored values and computed values, avoiding using object storage for action at distance, etc. All of this has been discussed for many years in numerous books and papers, and is wonderfully implemented in Moose (see its Attributes manpage).

However, encapsulation also brings a number of disadvantages, especially in Perl which, contrary to Java or Eiffel, is a multi-paradigm language. Below I'm discussing those disadvantages, not with the intention of throwing away encapsulation, but rather to issue a warning : don't blindly adopt encapsulation everywhere; think and weight where it is a gain and where it is a hindrance.

Good Perl idioms are no longer available


The canonical example in OO literature is the "point object" (even in the Moose synopsis). Now, in contrary to objects like a device driver or database handle, that would have lots of internal data to hide, a point object contains nothing but public information (its x and y coordinates). If we have direct access to the stored values, we can use a number of convenient Perl idioms, while doing the same using getter/setter methods is much more cumbersome. Consider the following examples:

# string interpolation
print "point is at coordinates $point->{x} / $point->{y}\n";

# symmetry transform
($point->{x}, $point->{y}) = ($point->{y}, $point->{x});

# zoom
$point->{$_} *= $zoom_factor for qw/x y/;

# temporary push aside
{ local $point->{x} += $far_away;
do_something_without_that_point_in_the_way();
} # point automatically comes back to its previous location

# nested datastructures and possibly auto-vivification
push @{$point->{styles}}, qw/square big/;
$point->{menu}{options}{color} = 'green';


Translating such idioms to getter/setter methods is left as an exercise to the reader...

No obvious distinction between "setter" methods and other methods

When I read some unknown code and find something like
  $obj->{foo} = $x;
$obj->bar($y);
I immediately know that the state of the object changes at line 1, while I have no clue at line 2 : I must go to the doc of method bar (or even worse, at the source code), to know if this is printing something out, modifying a database record, or just setting a bar attribute in memory inside object $obj.

Hard to debug

Sometimes the best way to find a bug is to go step by step in the debugger and examine which values sit in memory. However, if we use chained accessors to traverse a datastructure, the number of steps to go through becomes a nightmare. For example consider the following hypothetical line in a Catalyst controller :
  if ($c->request->body->length < $self->min_body) {
If I come across that line while debugging, I know that there are chances that the bug might sit in the min_body method, so I want to step into that method; but before getting there I'll have to step through all accessor methods for request, body and length, which is totally uninteresting for my debugging purpose.

Non-scalar attributes must either copy or reinvent an API

If an object contains an arrayref attribute, directly accessible and publicly documented as such, then every client can happily push, shift, reverse, grep, etc. into the arrayref using common Perl syntax.

If on the contrary, that arrayref is considered private, and clients must use getter/setter methods to access it, then clients have no choice but copying the entire array back and forth between the remote object and the local handling code. Needless to say, this has an impact on performance. In order to avoid it, the object could also implement additional methods to insert an item into the array, remove the last, etc. ... but this is reinventing a list API, and it will never be as rich and flexible as the collection of builtin Perl constructs for dealing with lists!

The same reasoning goes of course for hashref attributes.

No generic data traversal modules

Data traversal modules like Data::Visitor, or import/export modules like JSON or YAML, cannot take any decisions when they come to an opaque object that one is not supposed to peek into. This means that generic traversal tools are just inapplicable, or that every object has to implement some support for traversal (like a visit_value method); but writing such methods for every new class is kind of tedious.

It may be the case that meta-information as managed by Moose could perhaps help in traversal algorithms (ask the metaclass about which methods link to "subobjects" of the current one), but I'm not aware if somebody has already worked on that.

Methods are slower than direct access to attributes

No much to expand here: obviously method calls have a cost, especially in Perl which is more dynamic than other OO languages, so the compiler has less information to possibly optimize the method calls. So when traversing deep datastructures through nested accessor calls, there is a definite impact on performances.

Conclusion

In Perl, fully encapsulated objects are sometimes the best solution, sometimes not; weight these considerations before taking strong design decisions.

An interesting design is the one of DBI : objects (handles) are totally encapsulated, yet they exploit the power of tie to expose their attributes through a conventional hashref API, instead of OO getter and setter methods. This is a very clever compromise.

As far as I am concerned, I purposedly designed DBIx::DataModel to fully exploit the dual nature of Perl objects, having both the OO API for executing methods, and the hashref API for accessing column values. I wouldn't necessarily do that everywhere, but for row objects in a database, which are very open in nature, this just seemed an appropriate solution.

1 comment: