Content
Updated by Andreas Pfohl 5 months ago
# Data architecture for hierarchical attributes
## Context and Problem Statement
What is the data architecture for serving a hierarchy of tag with associated metadata to an OpenProject custom field implementation?
## Decision Drivers
* The data architecture needs to structure tags in a hierarchical way (like a tree), where each tag has associated metadata.
* The structure can change at any point in time.
* Changes to the structure need to be recorded throughout the life-time.
* The data architecture must be capable to be used for filtering based on given tags.
* When the hierarchical structure changes, it must be possible to update pointers to it (the custom field).
* When the hierarchical structure changes, it must be possible to to let pointers point to "older" versions of the structure.
* Changes to the structure must be auditable.
## Considered Options
* Single Table Event sourced structure
* ltree in PostgreSQL Using paper trails
* Real graph database
* Event Sourcing No historic data is captured
## Decision Outcome
Chosen option: "{title of option 1}", because {justification. e.g., only option, which meets k.o. criterion decision driver | which resolves force {force} | … | comes out best (see below)}.
### Consequences
* Good, because {positive consequence, e.g., improvement of one or more desired qualities, …}
* Bad, because {negative consequence, e.g., compromising one or more desired qualities, …}
* …
### Confirmation
{Describe how the implementation of/compliance with the ADR is confirmed. E.g., by a review or an ArchUnit test. Although we classify this element as optional, it is included in most ADRs.}
## Pros and Cons of the Options
### Single Table Event sourced structure
`id` {example | `name` description | `short` pointer to more information | `parent_id` | (`child_ids`) …}
Using a single table to hold the hierarchical structures.
* Good, because simple implementation (Work packages and Project do this already) {argument a}
* Good, because speed is not a big concern {argument b}
* Bad, Neutral, because having historical hierarchies is very hard to do (maybe copies of whole table parts, or: [https://wiki.postgresql.org/wiki/Temporal\_Extensions](https://wiki.postgresql.org/wiki/Temporal_Extensions))
### ltree in PostgreSQL
`ltree` is a method to have some tooling in PostgresSQL to query hierarchical structures: [https://www.postgresql.org/docs/current/ltree.html](https://www.postgresql.org/docs/current/ltree.html)
`root.parent.child.*`
* Good, because query language already there {argument c}
* Good, becuase speed is not a concern
* Bad, because metadata like `short` needs to be encoded into the labels {argument d}
* Bad, because no historic data per default …
### Real graph database
Using a real graph database would give us most the flexibilities needed: querying, metadata paper trails
{example | description | pointer to more information | …}
* Good, because it fits the tree as graph representation naturally {argument a}
* Good, because performance {argument b}
* Bad, Neutral, because we would need another running database just for this {argument c}
* Bad, because no {argument d}
* …
### No historic data per default (maybe with snapshots)
### Event sourced structure is captured
With Event Sourcing we wouldn't store complete trees in a table but rather record events that discribe the changes made to a tree.
In PostgresSQL we would have a table having a strcuture like: `id` {example | `tree_id` description | `event_type` pointer to more information | `sequence_number` | `timestamp` | `data`. …}
From that table we could recreate any historical tree at any point in time. To speed things up, we would need to introduce certain read models.
* Good, becuase it's the most flexible concept because {argument a}
* Good, becuase it has historic data build it by default because {argument b}
* Neutral, because performance might be a concern, but can be mitigated with the use of read and write models {argument c}
* Bad, because it's very complex to implement {argument d}
* …
## More Information
{You might want to provide additional evidence/confidence for the decision outcome here and/or document the team agreement on the decision and/or define when/how this decision the decision should be realized and if/when it should be re-visited. Links to other decisions and resources might appear here as well.}
### ~~Requirements~~
* ~~The structure must be able represent a tree, where every node has metadata, too.~~
* ~~Historical data~~
* ~~If the hierarchy is changed, it will result in conflict with already assigned values~~
* ~~We must be able to preserve historical value assignments, which also need access to the historical hierarchy~~
* ~~We must be able to update conflicting value assignments -> Important: auditability (journals)~~
* ~~Filtering must be able to find values based on historical hierarchies (filter query language needed?)~~
### ~~IMPORTANT~~
* ~~we need to document all decisions, as there will be some heavy lifters~~
## Context and Problem Statement
What is the data architecture for serving a hierarchy of tag with associated metadata to an OpenProject custom field implementation?
## Decision Drivers
* The data architecture needs to structure tags in a hierarchical way (like a tree), where each tag has associated metadata.
* The structure can change at any point in time.
* Changes to the structure need to be recorded throughout the life-time.
* The data architecture must be capable to be used for filtering based on given tags.
* When the hierarchical structure changes, it must be possible to update pointers to it (the custom field).
* When the hierarchical structure changes, it must be possible to to let pointers point to "older" versions of the structure.
* Changes to the structure must be auditable.
## Considered Options
* Single Table
* ltree in PostgreSQL
* Real graph database
* Event Sourcing
## Decision Outcome
Chosen option: "{title of option 1}", because {justification. e.g., only option, which meets k.o. criterion decision driver | which resolves force {force} | … | comes out best (see below)}.
### Consequences
* Good, because {positive consequence, e.g., improvement of one or more desired qualities, …}
* Bad, because {negative consequence, e.g., compromising one or more desired qualities, …}
* …
### Confirmation
{Describe how the implementation of/compliance with the ADR is confirmed. E.g., by a review or an ArchUnit test. Although we classify this element as optional, it is included in most ADRs.}
## Pros and Cons of the Options
### Single Table
`id`
Using a single table to hold the hierarchical structures.
* Good, because simple implementation (Work packages and Project do this already)
* Good, because speed is not a big concern
* Bad,
### ltree in PostgreSQL
`ltree` is a method to have some tooling in PostgresSQL to query hierarchical structures: [https://www.postgresql.org/docs/current/ltree.html](https://www.postgresql.org/docs/current/ltree.html)
`root.parent.child.*`
* Good, because query language already there
* Good, becuase speed is not a concern
* Bad, because metadata like `short` needs to be encoded into the labels
* Bad, because no historic data per default
### Real graph database
Using a real graph database would give us most the flexibilities needed: querying, metadata
* Good, because performance
* Bad,
* Bad, because no
* …
### No
### Event sourced structure
With Event Sourcing we wouldn't store complete trees in a table but rather record events that discribe the changes made to a tree.
In PostgresSQL we would have a table having a strcuture like: `id`
From that table we could recreate any historical tree at any point in time. To speed things up, we would need to introduce certain read models.
* Good, becuase it's the most flexible concept
* Good, becuase it has historic data build it by default
* Neutral, because performance might be a concern, but can be mitigated with the use of read and write models
* Bad, because it's very complex to implement
* …
## More Information
{You might want to provide additional evidence/confidence for the decision outcome here and/or document the team agreement on the decision and/or define when/how this decision the decision should be realized and if/when it should be re-visited. Links to other decisions and resources might appear here as well.}
### ~~Requirements~~
* ~~The structure must be able represent a tree, where every node has metadata, too.~~
* ~~Historical data~~
* ~~If the hierarchy is changed, it will result in conflict with already assigned values~~
* ~~We must be able to preserve historical value assignments, which also need access to the historical hierarchy~~
* ~~We must be able to update conflicting value assignments -> Important: auditability (journals)~~
* ~~Filtering must be able to find values based on historical hierarchies (filter query language needed?)~~
### ~~IMPORTANT~~
* ~~we need to document all decisions, as there will be some heavy lifters~~