BroadaxProof

Please verify my entries by your thinking like broad ax.

Fertile Forest Model (2/n) Hierarchical Data in a RDB [2]

Additional Columns

Conventional database engineers had studied four models for solving the problem to store hierarchical data in a database. They are:

  1. Adjacency List Model
  2. Path Enumeration Model
  3. Nested Set Model
    Nested Intervals Model
  4. Closure Table Model

Common thing about each model, is to add some columns into RDB table for storing hierarchical data. Columns to be added are classified in two types by its roles. These are as:

  1. Hierarchical Structure Columns
  2. Search Efficiency Columns

(1) Hierarchical Structure Columns

Necessary to store the hierarchical data in RDB table. Also used as a column to specify a search conditions in a finding query.

(2) Search Efficiency Columns

To be added for increasing the search efficiency. These are added when can not write a efficient finding query by hierarchical structure columns. If the purpose is only to save hierarchical data in a database, it is not necessary. Depending on the model, there is a case that does not use "search efficiency column".

Criteria Of The Model

Superiority of the model to handle hierarchical data can be determined by the two criteria as:

  1. Has the execution speed of the search query in a range that can withstand practical use.
  2. Number of total column to save tree structure data is more fewer.

(1) Has The Execution Speed Of The Search Query In A Range That Can Withstand Practical Use.

When there is a website that takes two minutes until to see the next page, nobody thinks that the website has enough speed to withstand practical use. In the current Japanese Internet environment, even if five seconds will be considered as slow. Because of a database has applications to be used in the dynamic display of web page, we can think the criteria whether human feel slowness by turning over a web page.

When the search query can refer to the index, it keeps the execution speed that can withstand practical use, even if contains several hundreds million records in the table. When the query can not be referenced index, even if the records are only the several hundreds of million records, can not get enough execution speed depending on the conditions. The first criterion is able to be said to be the same as "whether can write the search query to be refered the index".

When find relative nodes in table that contains hierarchical data, the most often used purpose is for sub-tree and child nodes. Therefore, "write a practical search query in a tree structure data model" is said as:

Can write a search query to find a subtree and child nodes of an arbitrary node with refering the index.

When the search query for finding sub-tree and child nodes can refer index, even if a table has a billion number of nodes and a hundred hierarchy of depth, it can keep search speed of withstand practical use.

(2) Number Of Total Column To Save Tree Structure Data Is More Fewer.

Number of total column is the words that represented in consideration of the number of records to store hierarchical data in RDB. We consider about some models. The model has n-number of hierarchical structure column, and a rule that each node must be saved by m-number of records. In a case of that model, the number of total column is (n * m). Since the number of records to save is m = 1 in most of the model. Therefore, we can consider that total number of columns equals hierarchical structure the number of columns.

A model has the total number of records for saving nodes as small as possible, we can say that it is an excelent and economical model design. Because a few amount of data reduces capacity of DB server, and impact of finding query on the search speed.

The second criterion is no longer so importance now. Small capacity of the hard disk was an important issue in past, but now, the evolution of the hardware makes to reduce the problems related to capacity.

However, nobody says to disagree about the terms "it is an excellent model that contains columns for storing hierarchical data in RDB as few as possible". I think so.

The Points

Contents of this chapter contains an important point in understanding the FF model. Please check the following three points, and then proceed to the next chapters.

  1. A model to store hierarchical data in a database contains two kinds of column. One is "column for storing hierarchical data", and Two is "column for efficiency search".
  2. A criteria of excellent model for storing hierarchical data in a database, is that it can write a search query to find a subtree and child nodes of an arbitrary node with reference the index.
  3. An second criteria is that total number of hierarchical structure column and search efficiency columns is fewer.