BroadaxProof

Please verify my entries by your thinking like broad ax.

Fertile Forest Model (1/n) Hierarchical Data in a RDB [1]

Roles Of Model

The role of the model to store hierarchical data in RDB are summarized in two paragraphs as:

  1. To store the hierarchical data in RDB table.
  2. To write the query to find the relevant nodes efficiently for any nodes.

(1) To Store The Hierarchical Data In RDB Table.

We can define the criteria whether the model has function to store hierarchical data, as:

Can reproduce the original hierarchical structure by the saved all nodes.

If can not reproduce hierarchical data by all records in RDB, we can not say that the model has the function. It is too basic thing that needless to say. Therefore, I think that nobody can disagree it.

(2) To Write The Query To Find The Relevant Nodes Efficiently.

In general, we save data into a database for searching them. We think naturally as: If to save hierarchical data in a database, various relationship node of any nodes (such as "subtree", "parent node", "child nodes", "ancestor nodes", "descendant nodes" and etc.) will be searchable. It is the important role of a model to find related nodes of any nodes efficiently.

In the figure below, [A], [B], [C], ... represents a node.

      [A]
       |
   +---+---+
   |       |
  [B]     [C]
   |       |
 +-+-+   +-+-+
 |   |   |   |
[D] [E] [F] [G]
             |
           +-+-+
           |   |
          [H] [I]

When a human search the sub-tree of [C] visually, pick up all the nodes by tracing the branches extending downward from [C]. In this example, set of partial tree of the [C] is as:。

  [C]
   |
 +-+-+
 |   |
[F] [G]
     |
   +-+-+
   |   |
  [H] [I]

To trace a branches is regularity routine. Therefore, a lot of people think as "If to store hierarchical data in a database, we can find related nodes of any nodes easily".

In the hierarchical data, use frequently SQL to find a related nodes sets. Describe with the example in previous figure as:

Basic Node Relation Set of Nodes
[B] Parent [A]
[B] Children [D] [E]
[H] Ancestors [A] [C] [G]
[C] Descendants [F] [G] [H] [I]
[C] Subtree [C] [F] [G] [H] [I]

The word of "Efficient" is most important phrase in this items. If efficiency is unnecessary, to devise search query is also unnecessary. Firstly get all nodes, and them examining whether or not to match with conditions by programming language.

However, it is not realistic to get all nodes everytime in a table that has record of millions. When use the way to get all nodes against huge table, need execution time and a large amount of memory. If the table has serveral hundreds of million records and match nodes are a few, it is natural to design the system to be acquired at a high speed by specifying conditions.

Focus Of Problem

If the reason for unwieldy a hierarchical data in RDB is only "To store the hierarchical data in RDB table", does not exist problems that plague the database engineers around the world. Because. "Adjacency List Model" was demonstrated that it can not be described efficient query to find subtree. However, Adjacency List Model can store hierarchical data in RDB.

There are several reasons why it is difficult to write an efficient find query.

  1. For being searchable, it is necessary to store all the ancestor nodes of each node.
  2. The number of nodes relationship to store, is increased according exponentially to the depth of the tree.
  3. The depth of hierarchy is not fixed with depending on contents of data.
  4. It is difficult to impart index to hierarchical data in a database. Because tree structure having the two-dimensional spread, but index is one-dimensional structure.
  5. etc.

Consider to find a set of descendant nodes from specified node. It is a simple story If can determine for each node by tracing branch lines to ancestor direction. However, SQL is a "declarative programming". Therefore, can not implement for determining relationship for each node, like imperative programming in a separate process.

In declarative programming, how can we express the relationship of the subtree and ancestor nodes in the search queries?

This is the essence of a challenge that all database engineers around the world given. The problem of hierarchical data in RDB attaches at one point that "describes the search queries to efficiently retrieve related node from any node".