Content

Updated by Andreas Pfohl 10 months ago

# Data architecture for custom fields of type "**hierarchy"**

## Context and Problem Statement

What is the data architecture for serving a hierarchy of labels with associated metadata to an OpenProject custom field implementation?

## Decision Drivers

* The data architecture needs to structure labels in a hierarchical way (like a tree), where each label has associated metadata.

* The structure can change at any point in time.

* ~~The structure must be recreated at any historic point in time.~~

* apparently not wanted/needed

* effort too high for gained benefit

* Changes to the structure need to be recorded throughout the life-time.

* The data architecture must be capable to be used for filtering based on given labels.

* When the hierarchical structure changes, it must be possible to update pointers to it (the custom field).

* When the hierarchical structure changes, it must be possible to to let pointers point to "older" versions of the structure.

* Changes to the structure must be auditable.

* Labels of the same level in the structure must maintain an order, which is manually changeable.

## Considered Options

* Single Table with always extending tree (with soft-deletes)

* Single Table with out ID pointer

* Single Table with ID pointer

* ~~ltree in PostgreSQL~~

* ~~Real graph database~~

* Event Sourcing

## Decision Outcome

Chosen option: "Single Table with always extending tree (with soft-deletes)", because it's the simplest option satisfying all (updated) decision drivers to a point where it makes sense.

### Consequences

* Good, because the data structure will be able to describe the hierarchical type of the new custom field value.

* Good, because CRUD is easy to provide for the data structure.

* Good, because current filter approach is supporting reference based filtering.

* Good, because updates are reflected on custom values automatically.

* Good, because changes to the data structure are comprehensible throughout the history.

* Good, because data structure is sortable.

* Neutral, because specific workflows need to be adopted by users and admins to maintain data consistency.

* Neutral, because change tracking has to be implemented separately.

* Bad, because historic revision of the data structure can not be recreated and used for introspection and filtering.

### Confirmation

Begin of the insertionThe decision was confirmed by the Team Titan in an urge to start working on it.End of the insertion Begin of the deletion ~~TBD~~End of the deletion

## Pros and Cons of the Options

### Single Table with always extending tree (with soft-deletes)

If a user deletes a node, the node and it's children are marked as `deprecated`/`deleted` and not deleted.

If a user updates a node (change label and short, change parent, change order), custom values keep references.

Changes are recorded in a logging table.

* Good, because structure is relatively simple to create. (with gems)

* Good, because historic changes are comprehensible.

* Neutral, logging must be implemented manually. (papertrail doesn't work for whole tables)

* Neutral, read performance of large trees can be poor.

* Bad, because tree grows with every change made.

### Single Table with out ID pointer

Table only shows current tree.

Whenever a custom value is set on a work package, a distinctive string is set as it's value.

* Good, because "old" assigned labels are not changed when the tree is updated.

* Good, because performance is adequate.

* Bad, because filtering on old trees not easily doable.

* Bad, because losing ability to update custom values on tree changes.

### Single Table with ID pointer

`id` | `name` | `short` | `parent_id` | (`child_ids`)

Table only shows current tree.

Using a single table to hold the hierarchical structures. (closure tree gem).

* Good, because simple implementation (Work packages and Project do this already).

* Good, because speed is not a big concern.

* Bad, because having historical hierarchies is very hard to do (maybe copies of whole table parts, or: [https://wiki.postgresql.org/wiki/Temporal\_Extensions](https://wiki.postgresql.org/wiki/Temporal_Extensions))

* Bad, because custom values can not have persistent values (see current list custom field implementation).

### ~~ltree in PostgreSQL~~

`~~ltree~~` ~~is a method to have some tooling in PostgresSQL to query hierarchical structures:~~ [~~https://www.postgresql.org/docs/current/ltree.html~~](https://www.postgresql.org/docs/current/ltree.html)

`~~root.parent.child.*~~`

* ~~Good, because query language already there~~

* ~~Good, becuase speed is not a concern~~

* ~~Bad, because metadata like~~ `~~short~~` ~~needs to be encoded into the labels~~

* ~~Bad, because no historic data per default~~

### ~~Real graph database~~

~~Using a real graph database would give us most the flexibilities needed: querying, metadata~~

* ~~Good, because it fits the tree as graph representation naturally~~

* ~~Good, because performance~~

* ~~Bad, because we would need another running database just for this~~

* ~~Bad, because no historic data per default (maybe with snapshots)~~

### Event sourced structure

With Event Sourcing we wouldn't store complete trees in a table but rather record events that describe the changes made to a tree.

In PostgresSQL we would have a table having a structure like: `id` | `tree_id` | `event_type` | `sequence_number` | `timestamp` | `data`.

From that table we could recreate any historical tree at any point in time. To speed things up, we would need to introduce certain read models.

* Good, because it's the most flexible concept that covers all decision drivers.

* Good, because it has historic data built-in by default.

* Neutral, because performance might be a concern, but can be mitigated with the use of read and write models.

* Bad, because it's very complex to implement.

## More Information

{You might want to provide additional evidence/confidence for the decision outcome here and/or document the team agreement on the decision and/or define when/how this decision the decision should be realized and if/when it should be re-visited. Links to other decisions and resources might appear here as well.}

Back

Top Menu

Side Menu

Content