Design Pattern: Entity-Attribute-Value

2-D directed graph, Sparse Matrix. from Pajek network: Graph Drawing contest 1996

The entity-attribute-value pattern[PPR, en.WP] is one of those programming tools which is a hammer: once you know it, everything looks like a nail. And maybe, if you squint hard, you really can hammer just about any data problem with it. Which does not make it the right tool. Actually, it is almost never the right tool.

Which of course is why I am seriously considering using it. But let me write about the pattern to see if I know it well enough to explain it.

Most relational database systems are horizontal. Each row has columns to represent all the values for a given entity, like an employee’s name, phone number, date of hire, and an identification number. The number of variables to be tracked about each person is known, and the process of collecting, validating, storing, retrieving, etc. is pretty straight-forward.

But when the things known about someone can be variable, we have to keep adding columns. If management wants to store an employee’s nicknames, for example, well, how many nicknames do you track? one? three? How about children’s names? is there an arbitrary cut-off that employees may only have 6 kids? Should the “significant other” be tracked? Only one at a time? what about previous S.O.s? track domestic partnerships, marriages, divorces, in-laws?

That is when the entity-attribute-value model (EAV) might be usefully implemented: when the number of possible attributes could be really large but in practice is actually quite limited. Using a single central table with a minimum number of columns to represent each entity, one then separates out attributes for that entity, and metadata regarding those attributes, and finally the actual values of the attributes.

Attributes might usefully be imagined as additional columns in a relational database. Rather than having a large number of columns left empty, each ‘row’ (entity) only has ‘columns’ which are filled in. New columns can be created on-the-fly as needed (in theory – this is one of the potential pitfalls of this model.) Attributes have a name, a description, and a ‘type’ which can be quite complex – using only primitive types for values, integer, string, etc., is code smell for this pattern being overly complicated for the application. Additional attribute columns might be a regular expression which can be used to validate input, minimum/maximum values, etc.

This is also where EAVs usually become hybrids; few applications need ‘pure’ EAV. While attributes may be heterogeneous across the entities, a given class of entity may have very homogenous character. An example might be postal addresses within a given country or region; an addressee, an optional organizational address, a street address (consisting of a numeric building identifier, a street name, and optionally a building unit identifier), a regional identifier, a postal code, and a country code. This could easily be a sub-schema table in normal relational format.

Values need to match the attribute. An hourly wage would require a pretty precise set of metadata: a wage paid in Indonesian rupiah can be stored as an integer (the 1/100 sen being obsoleted by inflation), while wages in US dollars can be stored in single-precision floating point and Bitcoin should be in double-precision floating point numbers. Then values would need to be stored in a simple key = value pair in an appropriate table like

CREATE TABLE `doublePrecisionValues`. ( \
    `id` BINARY(16) NOT NULL , \
    `entity_id` BINARY(16) NOT NULL , \
    `attribute_id` BINARY(16) NOT NULL , \
    `value` DOUBLE NOT NULL );

This connects the entity, attribute, and value and allows for multiple indexing. The simple, short rows in tables can result in unusually large numbers of rows to store the same amount of data, which is why this model is sometimes described as vertically-oriented. Combining the three elements to make a single query, however, could be freakishly convoluted:

SELECT VALUE FROM doublePrecisionValues AS VALUES
    JOIN objects AS employee
    ON employee.id = VALUES.entity_id
    JOIN attributes AS employee_attribute
    ON employee_attribute.id = VALUES.attribute_id
    WHERE VALUES.entity_id = '$employeeID' AND
    employee_attribute.attribute_name = 'wage';

Each group of entities, like this employee group, would probably require custom coding. Not necessarily a bad thing, but if it happens often it would be prone to error. And I do not see an easy or obvious route to automating query composition. Yet.

On another hand, the Design Patterns PHP for EAV looks soooo attractive! Each entity class would require a custom initialization, and design would need to go into self-initializing attributes from the database. If one is just looking at a CRUD shimmed onto a hybrid EAV, it could work for my use!

But I will do more research on the Data Mapper pattern[F] before I make any commitments.