Content
Updated by Andreas Pfohl 4 months ago
# Data architecture for custom fields of type "**hierarchy"** hierarchical attributes
## Context and Problem Statement
What is the data architecture for serving a hierarchy of labels tag with associated metadata to an OpenProject custom field implementation?
## Decision Drivers
* The data architecture needs to structure labels tags in a hierarchical way (like a tree), where each label tag has associated metadata.
* The structure can change at any point in time.
* Changes to the structure need to be recorded throughout the life-time.
* The data architecture must be capable to be used for filtering based on given labels. tags.
* When the hierarchical structure changes, it must be possible to update pointers to it (the custom field).
* When the hierarchical structure changes, it must be possible to to let pointers point to "older" versions of the structure.
* Changes to the structure must be auditable.
## Considered Options
* Single Table with always extending tree (with soft-deletes)
* Single Table with out ID pointer
* Single Table with ID pointer
* ltree in PostgreSQL
* Real graph database
* Event Sourcing
## Decision Outcome
Chosen option: "{title of option 1}", because {justification. e.g., only option, which meets k.o. criterion decision driver | which resolves force {force} | … | comes out best (see below)}.
### Consequences
* Good, because {positive consequence, e.g., improvement of one or more desired qualities, …}
* Bad, because {negative consequence, e.g., compromising one or more desired qualities, …}
* …
### Confirmation
{Describe how the implementation of/compliance with the ADR is confirmed. E.g., by a review or an ArchUnit test. Although we classify this element as optional, it is included in most ADRs.}
## Pros and Cons of the Options
### Single Table with always extending tree (with soft-deletes)
Whenever things are changed in the tree, the tree is appended and the now out-of-date pate is marked "deleted"/"deprecated".
* Good, because historic trees can still be obtained.
* Bad, because size and performance of large trees needs to be performant (custom indexing or lookup tables might help to reduce tree)
### Single Table with out ID pointer
Whenever a custom field is set on a work package, a distinctive string is set as it's value.
* Good, because "old" assigned labels are not changed when the tree is updated.
* Good, because performance is adequate.
* Bad, because filtering on old trees not easily doable.
### Single Table with ID pointer
`id` | `name` | `short` | `parent_id` | (`child_ids`)
Using a single table to hold the hierarchical structures. (closure tree gem)
* Good, because simple implementation (Work packages and Project do this already)
* Good, because speed is not a big concern
* Bad, because having historical hierarchies is very hard to do (maybe copies of whole table parts, or: [https://wiki.postgresql.org/wiki/Temporal\_Extensions](https://wiki.postgresql.org/wiki/Temporal_Extensions))
### ltree in PostgreSQL
`ltree` is a method to have some tooling in PostgresSQL to query hierarchical structures: [https://www.postgresql.org/docs/current/ltree.html](https://www.postgresql.org/docs/current/ltree.html)
`root.parent.child.*`
* Good, because query language already there
* Good, becuase speed is not a concern
* Bad, because metadata like `short` needs to be encoded into the labels
* Bad, because no historic data per default
### Real graph database
Using a real graph database would give us most the flexibilities needed: querying, metadata
* Good, because it fits the tree as graph representation naturally
* Good, because performance
* Bad, because we would need another running database just for this
* Bad, because no historic data per default (maybe with snapshots)
### Event sourced structure
With Event Sourcing we wouldn't store complete trees in a table but rather record events that discribe the changes made to a tree.
In PostgresSQL we would have a table having a strcuture like: `id` | `tree_id` | `event_type` | `sequence_number` | `timestamp` | `data`.
From that table we could recreate any historical tree at any point in time. To speed things up, we would need to introduce certain read models.
* Good, becuase it's the most flexible concept
* Good, becuase it has historic data build it by default
* Neutral, because performance might be a concern, but can be mitigated with the use of read and write models
* Bad, because it's very complex to implement
## More Information
{You might want to provide additional evidence/confidence for the decision outcome here and/or document the team agreement on the decision and/or define when/how this decision the decision should be realized and if/when it should be re-visited. Links to other decisions and resources might appear here as well.}
## Context and Problem Statement
What is the data architecture for serving a hierarchy of labels
## Decision Drivers
* The data architecture needs to structure labels
* The structure can change at any point in time.
* Changes to the structure need to be recorded throughout the life-time.
* The data architecture must be capable to be used for filtering based on given labels.
* When the hierarchical structure changes, it must be possible to update pointers to it (the custom field).
* When the hierarchical structure changes, it must be possible to to let pointers point to "older" versions of the structure.
* Changes to the structure must be auditable.
## Considered Options
* Single Table with always extending tree (with soft-deletes)
* Single Table with out ID pointer
* Single Table with ID pointer
* ltree in PostgreSQL
* Real graph database
* Event Sourcing
## Decision Outcome
Chosen option: "{title of option 1}", because {justification. e.g., only option, which meets k.o. criterion decision driver | which resolves force {force} | … | comes out best (see below)}.
### Consequences
* Good, because {positive consequence, e.g., improvement of one or more desired qualities, …}
* Bad, because {negative consequence, e.g., compromising one or more desired qualities, …}
* …
### Confirmation
{Describe how the implementation of/compliance with the ADR is confirmed. E.g., by a review or an ArchUnit test. Although we classify this element as optional, it is included in most ADRs.}
## Pros and Cons of the Options
### Single Table with always extending tree (with soft-deletes)
Whenever things are changed in the tree, the tree is appended and the now out-of-date pate is marked "deleted"/"deprecated".
* Good, because historic trees can still be obtained.
* Bad, because size and performance of large trees needs to be performant (custom indexing or lookup tables might help to reduce tree)
### Single Table with out ID pointer
Whenever a custom field is set on a work package, a distinctive string is set as it's value.
* Good, because "old" assigned labels are not changed when the tree is updated.
* Good, because performance is adequate.
* Bad, because filtering on old trees not easily doable.
### Single Table with ID pointer
`id` | `name` | `short` | `parent_id` | (`child_ids`)
Using a single table to hold the hierarchical structures. (closure tree gem)
* Good, because simple implementation (Work packages and Project do this already)
* Good, because speed is not a big concern
* Bad, because having historical hierarchies is very hard to do (maybe copies of whole table parts, or: [https://wiki.postgresql.org/wiki/Temporal\_Extensions](https://wiki.postgresql.org/wiki/Temporal_Extensions))
### ltree in PostgreSQL
`ltree` is a method to have some tooling in PostgresSQL to query hierarchical structures: [https://www.postgresql.org/docs/current/ltree.html](https://www.postgresql.org/docs/current/ltree.html)
`root.parent.child.*`
* Good, because query language already there
* Good, becuase speed is not a concern
* Bad, because metadata like `short` needs to be encoded into the labels
* Bad, because no historic data per default
### Real graph database
Using a real graph database would give us most the flexibilities needed: querying, metadata
* Good, because it fits the tree as graph representation naturally
* Good, because performance
* Bad, because we would need another running database just for this
* Bad, because no historic data per default (maybe with snapshots)
### Event sourced structure
With Event Sourcing we wouldn't store complete trees in a table but rather record events that discribe the changes made to a tree.
In PostgresSQL we would have a table having a strcuture like: `id` | `tree_id` | `event_type` | `sequence_number` | `timestamp` | `data`.
From that table we could recreate any historical tree at any point in time. To speed things up, we would need to introduce certain read models.
* Good, becuase it's the most flexible concept
* Good, becuase it has historic data build it by default
* Neutral, because performance might be a concern, but can be mitigated with the use of read and write models
* Bad, because it's very complex to implement
## More Information
{You might want to provide additional evidence/confidence for the decision outcome here and/or document the team agreement on the decision and/or define when/how this decision the decision should be realized and if/when it should be re-visited. Links to other decisions and resources might appear here as well.}