Content
Updated by Andreas Pfohl 4 months ago
# Data architecture for custom fields of type "**hierarchy"**
## Context and Problem Statement
What is the data architecture for serving a hierarchy of labels with associated metadata to an OpenProject custom field implementation?
## Decision Drivers
* The data architecture needs to structure labels in a hierarchical way (like a tree), where each label has associated metadata.
* The structure can change at any point in time.
* ~~The structure must be recreated at any historic point in time.~~
* apparently not wanted/needed
* effort too high for gained benefit
* Changes to the structure need to be recorded throughout the life-time.
* The data architecture must be capable to be used for filtering based on given labels.
* When the hierarchical structure changes, it must be possible to update pointers to it (the custom field).
* When the hierarchical structure changes, it must be possible to to let pointers point to "older" versions of the structure.
* Changes to the structure must be auditable.
* Labels of the same level in the structure must maintain an order, which is manually changeable. changable.
## Considered Options
* Single Table with always extending tree (with soft-deletes)
* Single Table with out ID pointer
* Single Table with ID pointer
* ~~ltree ltree in PostgreSQL~~ PostgreSQL
* ~~Real Real graph database~~ database
* Event Sourcing
## Decision Outcome
Chosen option: "{title of option 1}", because {justification. e.g., only option, which meets k.o. criterion decision driver | which resolves force {force} | … | comes out best (see below)}.
### Consequences
* Good, because {positive consequence, e.g., improvement of one or more desired qualities, …}
* Bad, because {negative consequence, e.g., compromising one or more desired qualities, …}
* …
### Confirmation
{Describe how the implementation of/compliance with the ADR is confirmed. E.g., by a review or an ArchUnit test. Although we classify this element as optional, it is included in most ADRs.}
## Pros and Cons of the Options
### Single Table with always extending tree (with soft-deletes)
If a user deletes a node, Whenever things are changed in the node tree, the tree is appended and it's children are the now out-of-date pate is marked as `deprecated`/`deleted` and not deleted. "deleted"/"deprecated".
If a user updates a node (change label and short, change parent, change order), custom values keep references.
Changes are recorded in a logging table.
* Good, because structure is relatively simple to create. (with gems)
* Good, because historic changes are comprehensible.
* Neutral, logging must trees can still be implemented manually. (papertrail doesn't work for whole tables) obtained.
* Neutral, read Bad, because size and performance of large trees can needs to be poor.
* Bad, because tree grows with every change made. performant (custom indexing or lookup tables might help to reduce tree)
### Single Table with out ID pointer
Table only shows current tree.
Whenever a custom value field is set on a work package, a distinctive string is set as it's value.
* Good, because "old" assigned labels are not changed when the tree is updated.
* Good, because performance is adequate.
* Bad, because filtering on old trees not easily doable.
* Bad, because losing ability to update custom values on tree changes.
### Single Table with ID pointer
`id` | `name` | `short` | `parent_id` | (`child_ids`)
Table only shows current tree.
Using a single table to hold the hierarchical structures. (closure tree gem). gem)
* Good, because simple implementation (Work packages and Project do this already). already)
* Good, because speed is not a big concern. concern
* Bad, because having historical hierarchies is very hard to do (maybe copies of whole table parts, or: [https://wiki.postgresql.org/wiki/Temporal\_Extensions](https://wiki.postgresql.org/wiki/Temporal_Extensions))
* Bad, because custom values can not have persistent values (see current list custom field implementation).
### ~~ltree ltree in PostgreSQL~~ PostgreSQL
`~~ltree~~` ~~is `ltree` is a method to have some tooling in PostgresSQL to query hierarchical structures:~~ [~~https://www.postgresql.org/docs/current/ltree.html~~](https://www.postgresql.org/docs/current/ltree.html) structures: [https://www.postgresql.org/docs/current/ltree.html](https://www.postgresql.org/docs/current/ltree.html)
`~~root.parent.child.*~~` `root.parent.child.*`
* ~~Good, Good, because query language already there~~ there
* ~~Good, Good, becuase speed is not a concern~~ concern
* ~~Bad, Bad, because metadata like~~ `~~short~~` ~~needs like `short` needs to be encoded into the labels~~ labels
* ~~Bad, Bad, because no historic data per default~~ default
### ~~Real Real graph database~~ database
~~Using Using a real graph database would give us most the flexibilities needed: querying, metadata~~ metadata
* ~~Good, Good, because it fits the tree as graph representation naturally~~ naturally
* ~~Good, Good, because performance~~ performance
* ~~Bad, Bad, because we would need another running database just for this~~ this
* ~~Bad, Bad, because no historic data per default (maybe with snapshots)~~ snapshots)
### Event sourced structure
With Event Sourcing we wouldn't store complete trees in a table but rather record events that describe discribe the changes made to a tree.
In PostgresSQL we would have a table having a structure strcuture like: `id` | `tree_id` | `event_type` | `sequence_number` | `timestamp` | `data`.
From that table we could recreate any historical tree at any point in time. To speed things up, we would need to introduce certain read models.
* Good, because becuase it's the most flexible concept that covers all decision drivers.
* Good, because becuase it has historic data built-in build it by default. default
* Neutral, because performance might be a concern, but can be mitigated with the use of read and write models. models
* Bad, because it's very complex to implement. implement
## More Information
{You might want to provide additional evidence/confidence for the decision outcome here and/or document the team agreement on the decision and/or define when/how this decision the decision should be realized and if/when it should be re-visited. Links to other decisions and resources might appear here as well.}
## Context and Problem Statement
What is the data architecture for serving a hierarchy of labels with associated metadata to an OpenProject custom field implementation?
## Decision Drivers
* The data architecture needs to structure labels in a hierarchical way (like a tree), where each label has associated metadata.
* The structure can change at any point in time.
* ~~The structure must be recreated at any historic point in time.~~
* apparently not wanted/needed
* effort too high for gained benefit
* Changes to the structure need to be recorded throughout the life-time.
* The data architecture must be capable to be used for filtering based on given labels.
* When the hierarchical structure changes, it must be possible to update pointers to it (the custom field).
* When the hierarchical structure changes, it must be possible to to let pointers point to "older" versions of the structure.
* Changes to the structure must be auditable.
* Labels of the same level in the structure must maintain an order, which is manually changeable.
## Considered Options
* Single Table with always extending tree (with soft-deletes)
* Single Table with out ID pointer
* Single Table with ID pointer
* ~~ltree
* ~~Real
* Event Sourcing
## Decision Outcome
Chosen option: "{title of option 1}", because {justification. e.g., only option, which meets k.o. criterion decision driver | which resolves force {force} | … | comes out best (see below)}.
### Consequences
* Good, because {positive consequence, e.g., improvement of one or more desired qualities, …}
* Bad, because {negative consequence, e.g., compromising one or more desired qualities, …}
* …
### Confirmation
{Describe how the implementation of/compliance with the ADR is confirmed. E.g., by a review or an ArchUnit test. Although we classify this element as optional, it is included in most ADRs.}
## Pros and Cons of the Options
### Single Table with always extending tree (with soft-deletes)
If a user deletes a node,
If a user updates a node (change label and short, change parent, change order), custom values keep references.
Changes are recorded in a logging table.
* Good, because structure is relatively simple to create. (with gems)
* Good, because historic changes are comprehensible.
* Neutral, logging must
* Neutral, read
* Bad, because tree grows with every change made.
### Single Table with out ID pointer
Table only shows current tree.
Whenever a custom value
* Good, because "old" assigned labels are not changed when the tree is updated.
* Good, because performance is adequate.
* Bad, because filtering on old trees not easily doable.
* Bad, because losing ability to update custom values on tree changes.
### Single Table with ID pointer
`id` | `name` | `short` | `parent_id` | (`child_ids`)
Table only shows current tree.
Using a single table to hold the hierarchical structures. (closure tree gem).
* Good, because simple implementation (Work packages and Project do this already).
* Good, because speed is not a big concern.
* Bad, because having historical hierarchies is very hard to do (maybe copies of whole table parts, or: [https://wiki.postgresql.org/wiki/Temporal\_Extensions](https://wiki.postgresql.org/wiki/Temporal_Extensions))
* Bad, because custom values can not have persistent values (see current list custom field implementation).
### ~~ltree
`~~ltree~~` ~~is
`~~root.parent.child.*~~`
* ~~Good,
* ~~Good,
* ~~Bad,
* ~~Bad,
### ~~Real
~~Using
* ~~Good,
* ~~Good,
* ~~Bad,
* ~~Bad,
### Event sourced structure
With Event Sourcing we wouldn't store complete trees in a table but rather record events that describe
In PostgresSQL we would have a table having a structure
From that table we could recreate any historical tree at any point in time. To speed things up, we would need to introduce certain read models.
* Good, because
* Good, because
* Neutral, because performance might be a concern, but can be mitigated with the use of read and write models.
* Bad, because it's very complex to implement.
## More Information
{You might want to provide additional evidence/confidence for the decision outcome here and/or document the team agreement on the decision and/or define when/how this decision the decision should be realized and if/when it should be re-visited. Links to other decisions and resources might appear here as well.}