How Not to Kill Your Testability Using Statics

I have meant to write about this topic for a while and actually had a whole article about statics in object oriented programming already lined up, yet never got around to publishing it because what I wrote didn't sit entirely right with me. That first version was prompted by Miško Hevery's "Static Methods are Death to Testability", which I was trying to respond to. Somehow this never worked out into something worthwhile. Recently I have come across proponents of what can only be called "Class Oriented Programming". This rekindled my interest in the topic of statics in an Object Oriented environment and helped me approach this article from a more useful angle.

"Class Oriented Programming" is what people do when they write classes which are all static methods and properties and are never once instantiated. I'll try to explain why this adds virtually nothing vis-à-vis procedural programming, what these people are really missing out on by ignoring objects and why, against all odds, statics don't automatically kill testability. While this article focuses on PHP, the concepts apply equally across many languages.

Couples

The core issue this all comes down to really is coupling of code. Code couples to other code in many ways. Take this line for example:

$foo = substr($bar, 42);

This line depends on both the variable $bar and the function substr. Hopefully $bar is just a local variable defined a little further up in the same file and scope, so this coupling is trivial. The function substr though needs to be defined and callable at this time. substr is a core PHP function and has been there since forever. It's also a trivial function which you hardly want to replace by anything else, so this is quite a no-brainer coupling. If you couldn't even depend on substr, you may as well stop with this project right this instant.

Now, what about this:

$foo = normalizer_normalize($bar);

normalizer_normalize is a function of the Intl package, which is only included in PHP since 5.3 and even then is not necessarily installed and enabled. This opens up two questions: Do you want to set the minimum requirement for your project to be 5.3 or do you need to support older versions, and can you depend on the Intl package to be installed? This coupling of code to a specific other piece of code is not as trivial and has certain implications.

Now, what about this:

class Foo {

    public static function bar() {
        return Database::fetchAll("SELECT * FROM `foo` WHERE `bar` = 'baz'");
    }

}

This is a very typical example of Class Oriented Programming. This creates a hardcoded coupling of the Foo class to the Database class. And not only does it couple the code, it also assumes that the Database class has previously been initialized; it must have established a database connection sometime before. So the Foo class is expected to be used like this:

Database::connect('localhost', 'user', 'password');
$bar = Foo::bar();

The Foo::bar method has an implicit dependency on the availibility of the Database class, it being initialized and it is irreversibly coupled to the Database class. You cannot use the Foo class without the Database class, and the Database class supposedly requires a database connection. How can you ensure that the database connection is working whenever anybody calls Database::fetchAll? One technique people tend to resort to is something like this:

class Database {

    protected static $connection;

    public static function connect() {
        if (!self::$connection) {
            $credentials = include 'config/database.php';
            self::$connection = some_database_adapter($credentials['host'], $credentials['user'], $credentials['password']);
        }
    }

    public static function fetchAll($query) {
        self::connect();

        // use self::$connection...
        // here be dragons...

        return $data;
    }

}

In other words, if Database::fetchAll is called, it tries to make sure that a connection exists by calling the connect method, which gets the necessary credentials from a configuration file if necessary. This in return means that Database is coupled to the file config/database.php. If that file doesn't exist, it can't function. Worse though, the Database class is then basically coupled to one database. If you wanted to supply alternative database credentials, you're in a real mess.

Whichever way you turn it, in the above example you have implicit dependencies galore. Foo not only depends on Database, it also depends on Database being ready to be used. Database in turn depends on a specific file in a certain directory. By implication, Foo depends on a specific file in a certain directory, even though that is not apparent at all in any of its code. Overall, you have global state galore. Each piece depends on another piece being set to a certain state just so, and none of this is apparent or formally specified in any way.

Seems familiar...

Doesn't that sound a lot like regular procedural code? Let's rewrite the above example as procedural code:

function database_connect() {
    global $database_connection;
    if (!$database_connection) {
        $credentials = include 'config/database.php';
        $database_connection = some_database_adapter($credentials['host'], $credentials['user'], $credentials['password']);
    }
}

function database_fetch_all($query) {
    global $database_connection;
    database_connect();

    // use $database_connection...
    // ...

    return $data;
}

function foo_bar() {
    return database_fetch_all("SELECT * FROM `foo` WHERE `bar` = 'baz'");
}

Try to spot the difference to the above class oriented code.

Hint: The only difference is in the visibility of Database::$connection/global $database_connection. In the class oriented example, the connection is only visible to the Database class itself, while in the procedural code above it's a globally visible global variable. The code still has all the same dependencies, couplings, problems and functionality as before. And visibility is really not a very crucial factor in improving dependency management. There's no real difference whether a variable is called global $database_connection or Database::$connection, it's just different syntax for the same thing, both are simply global state. A tiny sliver of namespacing through the use of classes is certainly better than nothing, but it doesn't change anything fundamental.

Obligatory car analogy:

Class Oriented Programming is like buying a car to let it sit in the driveway, opening and closing its doors repeatedly, jumping around on the seats, occasionally turning on the windshield wipers, yet never once turning the ignition key and taking it for a drive. It is missing the point entirely.

Let's turn the ignition key

Now, let's try some real object oriented code, starting with the implementation of Foo:

class Foo {

    protected $database;

    public function __construct(Database $database) {
        $this->database = $database;
    }

    public function bar() {
        return $this->database->fetchAll("SELECT * FROM `foo` WHERE `bar` = 'baz'");
    }

}

The Foo class is now decoupled from the specific Database class. All it needs is a Database, not the Database. This is a big difference. When instantiating Foo, some object which has the characteristics of Database needs to be passed. This can be an instance of the Database class itself or any descendent of it. This means we can supply an alternative implementation of the database class which gets its data from somewhere else. Or which adds a caching layer. Or which is simply a mock object used for testing, not an actual database connection (let this one sink in properly, it's huge). Since the database object needs to be instantiated now, this also means we can have several different database connections to different databases with different credentials, instead of just one at a time. Well, let's implement Database:

class Database {

    protected $connection;

    public function __construct($host, $user, $password) {
        $this->connection = some_database_adapter($host, $user, $password);
        if (!$this->connection) {
            throw new Exception("Couldn't connect to database");
        }
    }

    public function fetchAll($query) {
        // use $this->connection ...
        // ...
        return $data;
    }

}

First of all, notice how much simpler the implementation is. Database::fetchAll does not need to check for the state of the connection. In order to call Database::fetchAll, the class first needs to be instantiated. In order to instantiate it, you need to pass some credentials to the constructor. If the credentials aren't valid or the database connection could not be established for whatever reason, an exception is thrown and the Database object will not be instantiated. This all means that if and when you call Database::fetchAll, you are guaranteed to have a working database connection. This means that the Foo class can simply specify in its constructor that it needs a Database $database and can rest assured that it will have a valid database connection.

Without an instance of Foo, you cannot call Foo::bar. Without a valid instance of Database, you cannot instantiate Foo. Without valid database credentials you cannot instantiate Database.

Let this sink in properly: you cannot even use code if certain preconditions aren't satisfied. Not just in some abstract logical sense that something will go wrong if the preconditions aren't met, but it is actually impossible to execute code whose preconditions aren't met.

Compare this to the class oriented code: You can call Foo::bar at any time, but it will fail in some unspecified way if the Database isn't ready. Database::fetchAll can be called anytime, but will fail in some unspecified way if there's some problem with the config/database.php file. Database::connect establishes global state, which all the other operations depend on, yet this dependency is not enforced by anything.

Shooting up

Let's look at this from the perspective of the code that is actually using Foo. Here's the procedural example:

$bar = foo_bar();

It is perfectly possible to write this line anywhere at any time and it will be executed. How it will behave very much depends on what the global state of the database connection is. First of all, who'd've thunk this? How does this code in any way signify that it depends on a global database state? So, you need to add some error handling here:

$bar = foo_bar();
if (!$bar) {
    // something's wrong with $bar, abort!
} else {
    // all ok, let's proceed
}

Anything may go wrong with this at any time, and you won't even really know what went wrong where in which implicit dependency of foo_bar.

To contrast, here the Class Oriented implementation:

$bar = Foo::bar();
if (!$bar) {
    // something's wrong with $bar, abort!
} else {
    // all ok, let's proceed
}

Well, guess what, there's no difference. You will still need the same error handling afterwards and it's still hard to tell what went wrong if something went wrong. That's because a static method call is just a function call, which doesn't differ from any other kind of function call.

Now, the object oriented code:

$foo = new Foo;
$bar = $foo->bar();

PHP will complain with a fatal error right when it hits the statement new Foo. You specified that Foo needs a Database instance, yet did not supply one. This code won't even run. Which is exactly correct, because we promised to initialize some database, yet failed to do so.

$db  = new Database;
$foo = new Foo($db);
$bar = $foo->bar();

PHP will again complain, because we did not pass any credentials to the database, which we specified as obligatory in Database::__construct. Oops.

$db  = new Database('localhost', 'user', 'password');
$foo = new Foo($db);
$bar = $foo->bar();

OK, now we have satisfied all the dependencies we promised we would, this is now ready to run. But let's assume the database credentials are either invalid or the database has some problem and the connection could not be established. In this case, an Exception will be thrown when executing new Database(...). The following lines won't even be executed. So, there's no need for error checking after calling $foo->bar() (well, you may want to check what exactly you got back, but it pretty sure isn't a database connection error). If something was wrong with any of its dependencies, that line would not even have been executed in the first place. And if an exception was being thrown at any point, it would be trivial to figure out what went wrong, since exceptions contain a lot of information about where they were thrown and what the program's current local state was at that time.

The object oriented approach may appear more complex. In the above example, the procedural/class oriented code is only a single line to execute Foo::bar/foo_bar, whereas the object oriented approach takes three lines. But that's missing the point. We did not initialize the database in the procedural code above, which is what we need to do anyway. The procedural approach also requires error handling after the fact and at every point in the process. Its error handling is also messy, since it's terrifically hard to track down which of the implicit dependencies caused an error. It is also hiding dependencies by hardcoding them. Not only is it not explicit what went wrong if something goes wrong, it's not even explicit what other code your code depends on in order to function.

The object oriented approach makes all dependencies explicit and obvious. Foo needs a Database and Database needs credentials. This fact doesn't change for either approach, the object oriented code simply makes this explicit, obvious, in your face; because it specifies its requirements right in the source code and PHP will enforce them.

The procedural code places responsibilities on each function. If you call Foo::bar, that function better make damn sure it does its job correctly. Foo::bar now has to worry about giving you back whatever it is you wanted. It will delegate to Database::fetchAll. Now that method has to worry about giving Foo::bar whatever it is it wanted. So it frantically spins around trying to create some database connection, then returning some data. And if anything goes wrong at any point... who knows what you're going to get back and from where.

The object oriented approach places more responsibility, but also power, on the caller. Oh, you want to call Foo::bar? Well, you better give it a database connection, because boy oh boy does it ever need one. What database connection? Doesn't really matter, as long as it's some sort of Database instance. This is the power of Dependency Injection. It is made explicit and obligatory what the dependencies are, but it is left entirely up to the caller how these dependencies are fulfilled.

In procedural code, you are establishing many hardcoded dependencies and couplings between various parts of your code to the point that everything rigidly depends on everything else. You are creating one monolithic piece of software. That's not to say this can't work, it's to say that it is a very rigid machine, which is hard to take apart. For small applications, this may work okay. For large applications, it devolves into a big ball of mud which is impossible to test, extend or debug:

static-couples

In dependency injected, object oriented code, you are creating many small pieces which are each self-sufficient. They have a clearly defined interface for other pieces to use, and they clearly specify what they need from other pieces in order to function. No piece binds itself explicitly to other pieces though, this happens later. In the procedural/class oriented code you couple Foo to Database while you are writing the code, then you run the code. In object oriented code, you specify that Foo needs some sort of Database, but leave a lot of wiggle room for what that can be. You then tie a specific instance of Foo to a specific instance of Database at the time you want to use Foo:

object-oriented-couples

The class oriented approach is deceptively simple at the time of calling functions, but nails everything down with cross-dependencies all over the place. The object oriented approach leaves everything flexible and isolated until the time of use, at which point it may look more complex, but is much more manageable.

And Finally: Statics

So the warranted question is what static is good for in object oriented programming. Static class properties are useful for static data. That is, data that a class instance depends on, but which never changes and is possibly large. Totally hypothetical example:

class Database {

    protected static $types = array(
        'int'    => array('internalType' => 'Integer', 'precision' => 0,      ...),
        'string' => array('internalType' => 'String',  'encoding'  => 'utf-8', ...),
        ...
    )

}

Let's assume this database class needs to do something with different types of data coming from a database and maps database types to internal types. To do this it needs a type map. This map is always the same for each instance of Database and is used by several methods of Database. Well, why not make it a static property, which is never modified but only ever read from? It saves a little bit of memory, because the data is shared between all instances of Database. Since the data is only ever accessed from inside the class itself, this doesn't create any external dependencies. Static properties should never be made publicly accessible, since then they're just global variables. And we have seen where that goes...

Static properties may also be useful to cache some data which is identical across all instances of a class once it has been computed and can easily be shared across instances as optimization. There are some such instances where this is useful. Static properties mostly present an optimization technique, they should not be viewed as a programming philosophy.

Static methods are useful as utility methods, the biggest of which are alternative constructors. The big issue of static methods is that they create a hard coupling, a dependency. When you call Foo::bar(), that line of code is coupled to a very specific class Foo. This may or may not be a problem, it requires very deliberate consideration. This is not a problem under the following circumstances:

  1. The dependency is guaranteed to exist. This is the case if the call is internal, or if the dependency is part of the runtime environment anyway. For example:

    class Database {
    
        ...
    
        public function __construct($host, $user, $password) {
            $this->connection = new PDO(...);
        }
    
        ...
    
    }
    

    This couples the Database class to the PDO class. But PDO is an integral part of the underlying platform, it's the database connector provided by PHP. If that's not guaranteed to be available, then what is? You're also unlikely to want to substitute that class for something else, since you have to decide to use a database connector API at some point. If you want to use PDO as your database connector, you will have to use the PDO class somewhere.

  2. The call is mainly for internal use. An example from a Bloom filter implementation:

    class BloomFilter {
    
        ...
    
        public function __construct($m, $k) {
            ...
        }
    
        public static function getK($m, $n) {
            return ceil(($m / $n) * log(2));
        }
    
        ...
    
    }
    

    This tiny utility function simply provides a wrapper for a specific algorithm, which helps calculate a good number for the $k argument used in the constructor. Since it must be called before the class is instantiated, it must be static. This algorithm has no external dependencies and is unlikely to ever be substituted. It is used like this:

    $m = 10000;
    $n = 2000;
    $b = new BloomFilter($m, BloomFilter::getK($m, $n));
    

    This use does not introduce any particular dependency on BloomFilter which wouldn't already be there to begin with.

  3. You're coupling the code anyway, which is the case for alternative constructors. The classic built-in example in PHP is the DateTime class. It can be instantiated in two different ways:

    $date = new DateTime('2012-11-04');
    $date = DateTime::createFromFormat('d-m-Y', '04-11-2012');
    

    Both forms result in an instance of DateTime and both of these lines are coupled to the DateTime class either way. The DateTime::createFromFormat static method is an alternative object constructor, resulting in the same thing as new DateTime, but with slightly different functionality. It has no impact on coupling or dependencies beyond what new DateTime doesn't already have. Anywhere you write new Class, you may as well write Class::method(), it makes no difference.

Pretty much any other use of static methods has an impact on coupling and implicit dependencies. Whether this is acceptable needs to be judged for each case. You should definitely try to avoid such impactful static method calls. Writing OOP means you are writing many classes which are entirely self-sufficient, encapsulated, isolated, independent of each other, then at some very specific points you write some code which instantiates those classes and passes dependencies around, which are the points at which code gets coupled to other code. Try to keep those coupling points as few as possible at very deliberately chosen locations in the application.

A word about abstraction

Why do you want to do all of this managing of dependencies of self sufficient objects? Because it allows you to abstract. In large, complex applications, there's always complexity (d'uh). The problem is how you manage this complexity. To make up an example, you have a class Application which represents your entire application. It talks to a class User, which represents a user. This gets data from a Database. The Database needs a DatabaseDriver. The DatabaseDriver needs credentials. And so on and so on. If you just call Application::start() statically, which calls User::getData() statically, which calls the database statically and so on and you expect each layer to sort out its own dependencies internally, you're in for a real mess if something goes wrong. You cannot assert that a call to Application::start() will or won't work, because you have no idea whether the hidden dependencies will be able to sort themselves out. Worse yet, you have no recourse to influence what exactly Application::start() does besides altering its source code, and the code of classes it makes calls to, and the code of classes they make calls to etc. ad infinitum.

In order to create complex applications which have many moving parts, you need to create individual parts which you can depend upon, then forget about. You need to be able to write objects which you can make certain assertions of. Like the above mentioned assertion that "if I have an instance of Database, I can query the database". You cannot make this assertion when writing Database::fetchAll(...) somewhere in your code, because you have no idea what the global state of the Database class is.

You can make that assertion inside of this function though:

function (Database $database) {
    ...
}

If any of the code inside this function is ever executed, that means an instance of Database was passed successfully, which means the $database object was instantiated successfully. If you have coded your Database class properly, you can assert that because you have an instance of it, you can use it to make database queries. If this precondition was not fulfilled, none of the code inside that function that wants to use $database would ever be executed in the first place. Which means you do not need to do any error checking whether something may be going wrong with $database or any of its dependencies. It allows you to forget its dependencies even exist.

Without being able to ignore and forget about all the dependencies of your dependencies, you cannot write any meaningful complex applications. Database may be a tiny wrapper class or a giant multi-layered beast with dependencies galore, it may start out as a small wrapper and morph into a giant beast over time, you may subclass the Database class and pass some instance of some child class to the function; none of this matters to your function (Database $database) as long as the public interface of Database does not change. If your classes are properly decoupled from other parts of your application through dependency injection, you can test each one in isolation by mocking its dependencies. When you have tested a class enough to be satisfied it is working as it should, you can close that mental box in your head, simply assert that to make database calls, I use an instance of the Database class and move on to the next, more complex step.

And that is why you use classes and objects and why "class oriented programming" is missing the point entirely.

About the author

David C. Zentgraf is a web developer working partly in Japan and Europe and is a regular on Stack Overflow. If you have feedback, criticism or additions, please feel free to try @deceze on Twitter, take an educated guess at his email address or look it up using time-honored methods. This article was published on kunststube.net. And no, there is no dirty word in "Kunststube".