Content
Updated by Andreas Pfohl 4 months ago
# Data architecture for custom fields of type "**hierarchy"**
## Context and Problem Statement
What is the data architecture for serving a hierarchy of labels with associated metadata to an OpenProject custom field implementation?
## Decision Drivers
* The data architecture needs to structure labels in a hierarchical way (like a tree), where each label has associated metadata.
* The structure can change at any point in time.
* ~~The structure must be recreated at any historic point in time.~~
* apparently not wanted/needed
* effort too high for gained benefit
* Changes to the structure need to be recorded throughout the life-time.
* The data architecture must be capable to be used for filtering based on given labels.
* When the hierarchical structure changes, it must be possible to update pointers to it (the custom field).
* When the hierarchical structure changes, it must be possible to to let pointers point to "older" versions of the structure.
* Changes to the structure must be auditable.
* Labels of the same level in the structure must maintain an order, which is manually changeable.
## Considered Options
* Single Table with always extending tree (with soft-deletes)
* Single Table with out ID pointer
* Single Table with ID pointer
* ~~ltree in PostgreSQL~~
* ~~Real graph database~~
* Event Sourcing
## Decision Outcome
Chosen option: "Single Table with always extending tree (with soft-deletes)", because it's the simplest option satisfying all (updated) decision drivers to a point where it makes sense.
### Consequences
* Good, because the data structure will be able to describe the hierarchical type of the new custom field value.
* Good, because CRUD is easy to provide for the data structure.
* Good, because current filter approach is supporting reference based filtering.
* Good, because updates are reflected on custom values automatically.
* Good, because changes to the data structure are comprehensible throughout the history.
* Good, because data structure is sortable.
* Neutral, because specific workflows need to be adopted by users and admins to maintain data consistency.
* Neutral, because change tracking has to be implemented separately.
* Bad, because historic revision of the data structure can not be recreated and used for introspection and filtering.
### Confirmation
The decision was confirmed by the Team Titan in an urge to start working on it. TBD
## Pros and Cons of the Options
### Single Table with always extending tree (with soft-deletes)
If a user deletes a node, the node and it's children are marked as `deprecated`/`deleted` and not deleted.
If a user updates a node (change label and short, change parent, change order), custom values keep references.
Changes are recorded in a logging table.
* Good, because structure is relatively simple to create. (with gems)
* Good, because historic changes are comprehensible.
* Neutral, logging must be implemented manually. (papertrail doesn't work for whole tables)
* Neutral, read performance of large trees can be poor.
* Bad, because tree grows with every change made.
### Single Table with out ID pointer
Table only shows current tree.
Whenever a custom value is set on a work package, a distinctive string is set as it's value.
* Good, because "old" assigned labels are not changed when the tree is updated.
* Good, because performance is adequate.
* Bad, because filtering on old trees not easily doable.
* Bad, because losing ability to update custom values on tree changes.
### Single Table with ID pointer
`id` | `name` | `short` | `parent_id` | (`child_ids`)
Table only shows current tree.
Using a single table to hold the hierarchical structures. (closure tree gem).
* Good, because simple implementation (Work packages and Project do this already).
* Good, because speed is not a big concern.
* Bad, because having historical hierarchies is very hard to do (maybe copies of whole table parts, or: [https://wiki.postgresql.org/wiki/Temporal\_Extensions](https://wiki.postgresql.org/wiki/Temporal_Extensions))
* Bad, because custom values can not have persistent values (see current list custom field implementation).
### ~~ltree in PostgreSQL~~
`~~ltree~~` ~~is a method to have some tooling in PostgresSQL to query hierarchical structures:~~ [~~https://www.postgresql.org/docs/current/ltree.html~~](https://www.postgresql.org/docs/current/ltree.html)
`~~root.parent.child.*~~`
* ~~Good, because query language already there~~
* ~~Good, becuase speed is not a concern~~
* ~~Bad, because metadata like~~ `~~short~~` ~~needs to be encoded into the labels~~
* ~~Bad, because no historic data per default~~
### ~~Real graph database~~
~~Using a real graph database would give us most the flexibilities needed: querying, metadata~~
* ~~Good, because it fits the tree as graph representation naturally~~
* ~~Good, because performance~~
* ~~Bad, because we would need another running database just for this~~
* ~~Bad, because no historic data per default (maybe with snapshots)~~
### Event sourced structure
With Event Sourcing we wouldn't store complete trees in a table but rather record events that describe the changes made to a tree.
In PostgresSQL we would have a table having a structure like: `id` | `tree_id` | `event_type` | `sequence_number` | `timestamp` | `data`.
From that table we could recreate any historical tree at any point in time. To speed things up, we would need to introduce certain read models.
* Good, because it's the most flexible concept that covers all decision drivers.
* Good, because it has historic data built-in by default.
* Neutral, because performance might be a concern, but can be mitigated with the use of read and write models.
* Bad, because it's very complex to implement.
## More Information
{You might want to provide additional evidence/confidence for the decision outcome here and/or document the team agreement on the decision and/or define when/how this decision the decision should be realized and if/when it should be re-visited. Links to other decisions and resources might appear here as well.}
## Context and Problem Statement
What is the data architecture for serving a hierarchy of labels with associated metadata to an OpenProject custom field implementation?
## Decision Drivers
* The data architecture needs to structure labels in a hierarchical way (like a tree), where each label has associated metadata.
* The structure can change at any point in time.
* ~~The structure must be recreated at any historic point in time.~~
* apparently not wanted/needed
* effort too high for gained benefit
* Changes to the structure need to be recorded throughout the life-time.
* The data architecture must be capable to be used for filtering based on given labels.
* When the hierarchical structure changes, it must be possible to update pointers to it (the custom field).
* When the hierarchical structure changes, it must be possible to to let pointers point to "older" versions of the structure.
* Changes to the structure must be auditable.
* Labels of the same level in the structure must maintain an order, which is manually changeable.
## Considered Options
* Single Table with always extending tree (with soft-deletes)
* Single Table with out ID pointer
* Single Table with ID pointer
* ~~ltree in PostgreSQL~~
* ~~Real graph database~~
* Event Sourcing
## Decision Outcome
Chosen option: "Single Table with always extending tree (with soft-deletes)", because it's the simplest option satisfying all (updated) decision drivers to a point where it makes sense.
### Consequences
* Good, because the data structure will be able to describe the hierarchical type of the new custom field value.
* Good, because CRUD is easy to provide for the data structure.
* Good, because current filter approach is supporting reference based filtering.
* Good, because updates are reflected on custom values automatically.
* Good, because changes to the data structure are comprehensible throughout the history.
* Good, because data structure is sortable.
* Neutral, because specific workflows need to be adopted by users and admins to maintain data consistency.
* Neutral, because change tracking has to be implemented separately.
* Bad, because historic revision of the data structure can not be recreated and used for introspection and filtering.
### Confirmation
The decision was confirmed by the Team Titan in an urge to start working on it.
## Pros and Cons of the Options
### Single Table with always extending tree (with soft-deletes)
If a user deletes a node, the node and it's children are marked as `deprecated`/`deleted` and not deleted.
If a user updates a node (change label and short, change parent, change order), custom values keep references.
Changes are recorded in a logging table.
* Good, because structure is relatively simple to create. (with gems)
* Good, because historic changes are comprehensible.
* Neutral, logging must be implemented manually. (papertrail doesn't work for whole tables)
* Neutral, read performance of large trees can be poor.
* Bad, because tree grows with every change made.
### Single Table with out ID pointer
Table only shows current tree.
Whenever a custom value is set on a work package, a distinctive string is set as it's value.
* Good, because "old" assigned labels are not changed when the tree is updated.
* Good, because performance is adequate.
* Bad, because filtering on old trees not easily doable.
* Bad, because losing ability to update custom values on tree changes.
### Single Table with ID pointer
`id` | `name` | `short` | `parent_id` | (`child_ids`)
Table only shows current tree.
Using a single table to hold the hierarchical structures. (closure tree gem).
* Good, because simple implementation (Work packages and Project do this already).
* Good, because speed is not a big concern.
* Bad, because having historical hierarchies is very hard to do (maybe copies of whole table parts, or: [https://wiki.postgresql.org/wiki/Temporal\_Extensions](https://wiki.postgresql.org/wiki/Temporal_Extensions))
* Bad, because custom values can not have persistent values (see current list custom field implementation).
### ~~ltree in PostgreSQL~~
`~~ltree~~` ~~is a method to have some tooling in PostgresSQL to query hierarchical structures:~~ [~~https://www.postgresql.org/docs/current/ltree.html~~](https://www.postgresql.org/docs/current/ltree.html)
`~~root.parent.child.*~~`
* ~~Good, because query language already there~~
* ~~Good, becuase speed is not a concern~~
* ~~Bad, because metadata like~~ `~~short~~` ~~needs to be encoded into the labels~~
* ~~Bad, because no historic data per default~~
### ~~Real graph database~~
~~Using a real graph database would give us most the flexibilities needed: querying, metadata~~
* ~~Good, because it fits the tree as graph representation naturally~~
* ~~Good, because performance~~
* ~~Bad, because we would need another running database just for this~~
* ~~Bad, because no historic data per default (maybe with snapshots)~~
### Event sourced structure
With Event Sourcing we wouldn't store complete trees in a table but rather record events that describe the changes made to a tree.
In PostgresSQL we would have a table having a structure like: `id` | `tree_id` | `event_type` | `sequence_number` | `timestamp` | `data`.
From that table we could recreate any historical tree at any point in time. To speed things up, we would need to introduce certain read models.
* Good, because it's the most flexible concept that covers all decision drivers.
* Good, because it has historic data built-in by default.
* Neutral, because performance might be a concern, but can be mitigated with the use of read and write models.
* Bad, because it's very complex to implement.
## More Information
{You might want to provide additional evidence/confidence for the decision outcome here and/or document the team agreement on the decision and/or define when/how this decision the decision should be realized and if/when it should be re-visited. Links to other decisions and resources might appear here as well.}