2018-05-19

Dispatcher Pattern (the resolving for Expression problem of Visitor pattern by Java).

Introduction

Some software engineers considers that Visitor Pattern is the most difficult in design pattern. I got understood Visitor Pattern on beginning of May 2018. The understanding is as follow:

Visitor Pattern is used when we want to switch the expression for a collection that contains different elements without branching by if statement.

Expression Problem

I learned that Visitor Pattern has an unresolved problem known as Expression Problem by the article linked below.

Rethink of Visitor Pattern (in Japanese language)

I understood that the main point of Expression Problem is the following points.

When an element is added, the existing class has to be rewritten, in order to add expression.

I thought "It would be great if I could design Visitor Pattern that can add elements and expressions without rewritten the existing class". After little thinking, I found a design. It is the Dispatcher Pattern introduced in this article.

Design of Dispatcher Pattern

The protagonist of the Dispatcher Pattern is the Dispatcher class. All expression classes implements the Expression interface and is delegated to the Dispatcher class. The Dispatcher class dispatch all the elements passed from the client using "polymorphism by delegation".

Class and Interface

Item class

These are any object.

class ItemA
{
    String getAnimalName() {
        return "cat";
    }
}

class ItemB
{
    String getColorName() {
        return "purple";
    }
}

class ItemC
{

}

class ItemD
{

}

class ItemE
{

}

Expression class

All expression classes used in the Dispatcher pattern implement the Expression interface. Common methods of all expression classes are defined on the interface that inherits Expression. Exp is defined in this example.

interface Expression<T>
{

}

interface Exp<T> extends Expression<T>
{
    void print(T item);
}

class ExpA implements Exp<ItemA>
{
    @Override
    void print(ItemA item) {
        Log.d(
                "DEMO",
                "I am ExpressionA, treat ItemA only. Animal name is " + item.getAnimalName()
        );
    }
}

class ExpB implements Exp<ItemB>
{
    @Override
    void print(ItemB item) {
        Log.d(
                "DEMO",
                "I am ExpressionB, treat ItemB only. Color name is " + item.getColorName()
        );
    }
}

class ExpX<T> implements Exp<T>
{
    @Override
    void print(T item) {
        Log.d(
                "DEMO",
                "I am ExpressionX, treated " + item.getClass().getSimpleName() + "."
        );
    }
}

Dispatcher class

AbstractDispatcher is generic superclass in Dispatcher pattern. Dispatcher inheriting AbstractDispatcher implements Exp inheriting Expression.

abstract class AbstractDispatcher<X extends Expression>
{
    private Map<Class<?>, X> expressions = new HashMap<>();

    @SuppressWarnings("unchecked")
    public <T> void set(Class<T> clazz, Expression<T> expression) {
        expressions.put(clazz, (X) expression);
    }

    @Nullable
    protected final X dispatch(@NonNull Object item) {
        return expressions.get(item.getClass());
    }
}

class Dispatcher<I, X extends Exp<I>>
        extends AbstractDispatcher<X>
        implements Exp<I>
{
    @Override
    void print(I item) {
        X exp = dispatch(item);
        if (exp == null) {
            Log.d(
                "DEMO",
                "Unknown item: " + item.getClass().getSimpleName()
            );

            return;
        }

        exp.print(item);
    }
}

Example and Logging

// setup expressions.
Dispatcher<Object, Exp<Object>> dispatcher = new Dispatcher<>();
{
    dispatcher.set(ItemA.class, new ExpA());
    dispatcher.set(ItemB.class, new ExpB());
    dispatcher.set(ItemC.class, new ExpX<>());
    dispatcher.set(ItemD.class, new ExpX<>());

    // dispatcher.set(ItemA.class, new ExpB());     // error
}

// setup elements.
List<Object> list = new ArrayList<>();
{
    list.add(new ItemB());
    list.add(new ItemB());
    list.add(new ItemC());
    list.add(new ItemA());
    list.add(new ItemB());
    list.add(new ItemC());
    list.add(new ItemA());
    list.add(new ItemD());
    list.add(new ItemE());
}

// execute.
for (Object it : list) {
    dispatcher.print(it);
}

I am ExpressionB, treat ItemB only. Color name is purple
I am ExpressionB, treat ItemB only. Color name is purple
I am ExpressionX, treated ItemC.
I am ExpressionA, treat ItemA only. Animal name is cat
I am ExpressionB, treat ItemB only. Color name is purple
I am ExpressionX, treated ItemC.
I am ExpressionA, treat ItemA only. Animal name is cat
I am ExpressionX, treated ItemD.
Unknown item: ItemE

Example of Adding elements and expressions

In Dispatcher Pattern, can optionally add elements and expressions to elements as in the example below. We can see that there is no need to rewrite the existing code.

class ItemF
{

}

class ExpF implements Exp<ItemF>
{
    @Override
    void print(ItemF item) {
        Log.d(
                "DEMO",
                "I am new ExpressionF, treat ItemF only."
        );
    }
}

// setup expressions.
Dispatcher<Object, Exp<Object>> dispatcher = new Dispatcher<>();
{
    ...
+++    dispatcher.set(ItemF.class, new ExpF());
}

// setup elements.
List<Object> list = new ArrayList<>();
{
    ...
+++    list.add(new ItemF());
}

Feature of Dispatcher Pattern

non-instusive (non-invasice)

The accept method used in Visitor Pattern is unnecessary. Therefore, We can use any classes as an element class, even for classes that can not be changed, as used in libraries.

Sharing Expression.

Element classes and expression classes are not a one-to-one relationship, and it is possible for a single expression class to handle multiple elements. Of course, it has not to rewrite existing classes. See ExpX in the above example. If the element classes in charge of ExpX inherit a common superclass, expression can be aggregated into ExpX.

Polymorphism by Delegation

Dispatcher Pattern correspond element and expression with Map. Set the key as class of element and the value as instance of expression class in Map instance, can dispatch expresion for each element. I named it "Polymorphism by Delegation".

Comparison with Visitor Pattern

Checking	Visitor Pattern	Dispatcher Pattern
accept method of element class	necessary	unnecessary (non-intusive)
polymorphism	Polymorphism by method overloading	Polymorphism by Delegation
dispatching	Double	Single (Haha)

Multiphase return value

The return value obtained from Dispatcher can not be multiphase. However, if we consider the return value as a new element group, can apply Dispatcher Pattern to the return value.

Future Verification

I thought that Dispatcher Pattern was resolved to type safety, but possible that an advanced programmer will find some holes. Even if this design was failure, I believe that Dispatcher Pattern is wider than Viditor Pattern, because it can handle any elements such as classes included in the library.

If this new pattern is established in type safety, Dispatcher Pattern will be written instead of Visitor Pattern in future design pattern related books. At that time, I hope that my name "Stew Eucen" will be introduced in the books. :)

2015-12-26

Fertile Forest Model (6/n) To Find Nodes (Kinships)

Fertile Forest SQL

To Find Nodes (Kinships)

I am going to express query finding kinship nodes by the following tree structure data. [A], [B], ... indicated in figure means nodes.

DEPTH|
   0 | [A]--+-----------+
     |      |           |
   1 |     [B]--+---+  [C]--+---+
     |          |   |       |   |
   2 |         [D] [E]     [F] [G]--+---+
     |                              |   |
   3 |                             [H] [I]
-----+-------------------------------------------
     |  0   1   2   3   4   5   6   7   8   QUEUE

To Find Sibling Nodes

On Sun, 18 Oct 2015 14:30:00 +0000UTC, I discovered an once query to find sibling nodes of any node. The considering procedure is as:

[X] means base node.

The node in conditions as follows is as Qh (Queue head).

Has QUEUE lesser than [X].

Has DEPTH lesser than [X].

Has greatest QUEUE in above conditions.

The node in conditions as follows is as Qt (Queue tail).

Has QUEUE greater than [X].

Has DEPTH lesser than [X].

Has least QUEUE in above conditions.

Sibling nodes meet conditions as follow:

The node has QUEUE between Qh and Qt.
(Qh < QUEUE < Qt)

Has same DEPTH of [X].

I am going to create a query to find for sibling nodes of [F]. Fq means QUEUE of [F], and Fd means DEPTH of [F]. The subquery to find Qh and Qt of [F] as follow.

QhSBQ (subquery to find Qh):
  SELECT MAX(ff_queue)
    FROM ff_table
    WHERE ff_depth < Fd
      AND ff_queue < Fq

QtSBQ (subquery to find Qt):
  SELECT MIN(ff_queue)
    FROM ff_table
    WHERE ff_depth < Fd
      AND ff_queue > Fq

The constitution of main query that includes two subqueries is as:

SELECT * FROM ff_table
  WHERE ff_depth = Fd
    AND ff_queue > (QhSBQ)
    AND ff_queue < (QtSBQ)
;

In summary, we got the query to find sibling nodes as follow.

SELECT * FROM ff_table
  WHERE ff_depth = Fd
    AND ff_queue > (
      SELECT MAX(ff_queue) FROM ff_table
        WHERE ff_depth < Fd AND ff_queue < Fq
    )
    AND ff_queue < (
      SELECT MIN(ff_queue) FROM ff_table
        WHERE ff_depth < Fd AND ff_queue > Fq
    )
;

This query contains a bug that causes a runtime error, as well as the query to retrieve the descendant node. Because of, when can not find Qh or Qt, the subquery result is NULL.

To avoid the bug by using the COALESCE() as:

SELECT * FROM ff_table
  WHERE ff_depth = Fd
    AND ff_queue >=
      COALESCE(
        (
          SELECT MAX(ff_queue) + 1 FROM ff_table
            WHERE ff_depth < Fd
              AND ff_queue < Fq
        ),
        0
      )
    AND ff_queue <=
      COALESCE(
        (
          SELECT MIN(ff_queue) - 1 FROM ff_table
            WHERE ff_depth < Fd
              AND ff_queue > Fq
        ),
        0x7fffffff
      )
;

To Find Any Kind Of Kinship Nodes.

On Sun, 18 Oct 2015 14:30:00 +0000UTC, I have just discovered an single query for finding any kinship nodes, by applying query to find sibling nodes.

Consider the generalization to find kinship nodes on the basis of the search of the brother node.

We can define sibling node as:

Set of child nodes having parent node of the target node.

Replace "parent node" to "the node which has one upper depth" the parent node.

Set of child nodes having "the node which has one upper depth" of the target node.

In addition, replace "child node" to "node which has one downer depth".

Set of "node which has one downer depth" having "the node which has one upper depth" of the target node.

Now, preparation of generalization is over.

In the figure below, all cousin nodes of [D] is ([D], [E], [F], [G]).

DEPTH|
   0 | [A]--+-----------+
     |      |           |
   1 |     [B]--+---+  [C]--+---+
     |          |   |       |   |
   2 |         [D] [E]     [F] [G]--+---+
     |                              |   |
   3 |                             [H] [I]
-----+-------------------------------------------
     |  0   1   2   3   4   5   6   7   8   QUEUE

When to express the cousin nodes of [D] by using the definition of sibling node, the expression is as follows.

Set of "node which has two downer depth" having "the node which has two upper depth" of [D].

The "one" has changed to the "two" in the definition of cousin nodes of [D].

The relatives other than one own direct line of ancestors and descendants is said "collateral". Below is a list of collateral level.

Collateral	Level
Sibling	1
Cousin	2
Second Cousin	3
Third Cousin	4
...	...

Replace the changed word between the definition of sibling nodes and cousin node, to N.

Set of "node which has N downer depth" having "the node which has N upper depth" of target node.

When rewritten definition of the query to find sibling nodes for generalization, it is as follows. [X] means base node, Xq means QUEUE of [X], Xd means DEPTH of [X] and NN means "n".

SELECT * FROM ff_table
  WHERE ff_depth = Xd
    AND ff_queue >=
      COALESCE(
        (
          SELECT MAX(ff_queue) + 1 FROM ff_table
            WHERE ff_depth <= Xd - NN
              AND ff_queue < Xq
        ),
        0
      )
    AND ff_queue <=
      COALESCE(
        (
          SELECT MIN(ff_queue) - 1 FROM ff_table
            WHERE ff_depth <= Xd - NN
              AND ff_queue > Xq
        ),
        0x7fffffff
      )
;

We can find collateral nodes of the same depth as the base node by using query above. However, we can not get right result in the following cases.

DEPTH of Base Node < Collateral Level

The above query get extra nodes from over the root node of target node on next tree.

We can avoid the bug easily by adding "AND Xd < NN" into WHERE clause of the query. However, we know the condition before creating the query. Therefore, it would be reasonable to use "if statement" for branching in actual library of FF model.

Perfect Generalization To Find Kinship Nodes

I am going to expand the generalization further, for finding piblings (uncles and aunts) and niblings (nephews and nieces).

Rewrite the definition as follows. It is changed the upper and lower depth to "Nup" and "Ndown".

Set of "node which has Ndown downer depth" having "the node which has Nup upper depth" of target node.

Nup and Ndown will change by relatives.

Collateral	Nup	Ndown
Sibling	1	1
Cousin	2	2
Second Cousin	3	3
Thiird Cousin	4	4
Pibling	2	1
Grand Pibling	3	1
Great-Grand Pibling	4	1
Nibling	1	2
first Cousin Once Removed	2	3

I am going to generalize query to find kinship nodes by using Nup and Ndown.

SELECT * FROM ff_table
  WHERE ff_depth = Xd - Nup + Ndown
    AND ff_queue >=
      COALESCE(
        (
          SELECT MAX(ff_queue) + 1 FROM ff_table
            WHERE ff_depth <= Xd - Nup
              AND ff_queue < Xq
        ),
        0
      )
    AND ff_queue <=
      COALESCE(
        (
          SELECT MIN(ff_queue) - 1 FROM ff_table
            WHERE ff_depth <= Xd - Nup
              AND ff_queue > Xq
        ),
        0x7fffffff
      )
;

I named the query above "The Formula Query" of Fertile Forest Model.

FF model can find any relatives nodes! This is the result of FF Model, I found.

When I found the formula query, I got exhilarating feeling as same as a per billion of evolving special theory of relativity into the general theory of relativity.

Collateral Level

We got generalizing query to find any kinship nodes. However, the generalizing query was not considered about "collateral discrete level". When to use the query for finding cousin nodes, the results includes sibling node of base node. Therefore, we can not say that cousin() method has correctly represent name.

When getting data from the hierarchical data, the same level nodes are treated on the same level. Therefore, even though the sibling node is mixed on the results to find cousin nodes, we can insist "It is specification".

However, when others ask us "can you identify the cousin nodes and sibling nodes?", answer is "yes", if we have a spirit of software engineer. To solve this problem, we consider the following method.

In the library of the FF model, it includes a "discrete degree column of collateral" into the query to find kinship nodes.

As a result, we can use "processing with the exception of sibling nodes from the search results cousin nodes".

Because it takes time to explain the details, I implemented in my libraries. Please check on the following versions.

2015-12-26

Fertile Forest Model (5/n) To Find Nodes (Descendants)

Fertile Forest SQL

To Find Nodes (Descendants)

I am going to express query finding subtree nodes by the following tree structure data. [A], [B], … indicated in figure means nodes.

DEPTH|
   0 | [A]--+-----------+
     |      |           |
   1 |     [B]--+---+  [C]--+---+
     |          |   |       |   |
   2 |         [D] [E]     [F] [G]--+---+
     |                              |   |
   3 |                             [H] [I]
-----+-------------------------------------------
     |  0   1   2   3   4   5   6   7   8   QUEUE

To Find Descendant Nodes

I explain the concept of a query to find descendant nodes of the node [B].

DEPTH|
     |
   0 |  [A]---+--------------+
     |  [ ]   |              |
   1 |  [ ]  [B]---+----+   [C]---+----+
     |  [ ]  [ ]   |    |   [ ]   |    |
   2 |  [ ]  [ ]  [D]  [E]  [ ]  [F]  [G]---+----+
     |  [ ]  [ ]  [ ]  [ ]  [ ]  [ ]  [ ]   |    |
   3 |  [ ]  [ ]  [ ]  [ ]  [ ]  [ ]  [ ]  [H]  [I]
     |  [ ]  [ ]  [ ]  [ ]  [ ]  [ ]  [ ]  [ ]  [ ]
-----+----------------------------------------------------
     |   0    1    2    3    4    5    6    7    8   QUEUE

The above figure has been depicted as a bar graph. Please consider a bar of each node as building. Can not see buildings behind building that has same or more height. For example, can not see [F] from the [B].

When ceate query to find descendant nodes of node [B], consider as follows.

On the right side of [B], look for the building of more than the height of the [B].

The nodes between “found node” and [B] are descendant nodes.

The building that can be seen from [B] is the [C]. Thus, descendant nodes of [B] are between [B] and [C]. These are [C] [D] and [E].

Consider the actual query to find descendant nodes. Target to find descendant nodes is [B]. Query to find with conditions same or less depth from [B] on right side of [B] has next WHERE clause.

WHERE ff_queue > 1 AND ff_depth <= 1

Can determine the scope of the descendant nodes by “QUEUE of node on the far left” in that condition.

SELECT MIN(ff_queue)
  FROM ff_tables
  WHERE ff_queue > 1 AND ff_depth <= 1
;

When the query in above is defined as SBQ, the query to find descendant nodes of [B] is as follows.

SELECT *
  FROM ff_tables
  WHERE ff_queue > 1
    AND ff_queue < (SBQ)
;

To summarize the above, we get the query to find the descendant nodes of [B].

SELECT *
  FROM ff_tables
  WHERE ff_queue > 1
    AND ff_queue <
      (
        SELECT MIN(ff_queue)
          FROM ff_tables
          WHERE ff_queue > 1 AND ff_depth <= 1
      )
;

The value of MIN(ff_queue) obtained in the subquery is four. Thus, the query to find the descendant nodes of [B] is equivalent as:

SELECT *
  FROM ff_tables
  WHERE ff_queue > 1
    AND ff_queue < 4
;

The basic structure of the query to find the descendants node is in above. However, a bug remains in the query for actual using. Because there are some cases that subquery returns NULL. For example, when find descendant nodes of node [G], there is no building which has same or more height of [G] on right side of [G]. Since the subquery result is NULL, the entire query will result in an error.

To avoid this error, rewrite the query as follows. This new query gives us compensation results without having to be aware of the QUEUE position.

SELECT *
  FROM ff_tables
  WHERE ff_queue > 1
    AND ff_queue <=
      COALEPSE(
        (SELECT MIN(ff_queue) - 1
          FROM ff_tables
          WHERE ff_queue > 1 AND ff_depth <= 1
        ),
        0x7fffffff
      )
;

To Find Subtree

Subtree is a set of nodes that include base node and descendant nodes of base node. To create query to find subtree is so easy. It is a little bit change from query of descendant nodes.

SELECT *
  FROM ff_tables
  WHERE ff_queue >= 1
    AND ff_queue <=
      COALEPSE(
        (SELECT MIN(ff_queue) - 1
          FROM ff_tables
          WHERE ff_queue > 1 AND ff_depth <= 1
        ),
        0x7fffffff
      )
;

To Find Child Nodes

To create query to find child nodes is easy. It is to add one condition into WHERE clause in query to descendant nodes. 1 + 1 of the added condition means (DEPTH of [B]) + 1.

SELECT *
  FROM ff_tables
  WHERE ff_queue > 1
    AND ff_queue <=
      COALEPSE(
        (SELECT MIN(ff_queue) - 1
          FROM ff_tables
          WHERE ff_queue > 1 AND ff_depth <= 1
        ),
        0x7fffffff
      )
    AND ff_depth = 1 + 1
;

To Find Grandchild Nodes

It is easy as same as query to find child nodes. You Know that 1 + 2 means (DEPTH of [B]) + 2.

SELECT *
  FROM ff_tables
  WHERE ff_queue > 1
    AND ff_queue <=
      COALEPSE(
        (SELECT MIN(ff_queue) - 1
          FROM ff_tables
          WHERE ff_queue > 1 AND ff_depth <= 1
        ),
        0x7fffffff
      )
    AND ff_depth = 1 + 2
;

To Find Great-Grandchild Nodes And Great-Great-Grandchild Nodes

Can generalize the query to find descendant nodes as same as ancestor nodes. When QUEUE of the base node is QQ and the DEPTH is DD, the query to find the descendent nodes of down to n-generations is generalized as follows. In case of great-grandchild: n = 3, In case of great-great-grandchild: n = 4. It is so neat.

SELECT *
  FROM ff_tables
  WHERE ff_queue > QQ
    AND ff_queue <=
      COALEPSE(
        (SELECT MIN(ff_queue) - 1
          FROM ff_tables
          WHERE ff_queue > QQ AND ff_depth <= DD
        ),
        0x7fffffff
      )
    AND ff_depth = DD + n
;

To Find Descendant Nodes By Range Specification

When add a condition into the WHERE clause of a query to find all descendant nodes, can create the query to find an descendant nodes of the range that specify. The query to find descendant nodes of [B] down to n-generations is as follows.

SELECT *
  FROM ff_tables
  WHERE ff_queue > 1
    AND ff_depth >= 1 + n
    AND ff_queue <=
      COALEPSE(
        (SELECT MIN(ff_queue) - 1
          FROM ff_tables
          WHERE ff_queue > 1 AND ff_depth <= 1
        ),
        0x7fffffff
      )
;

Even if the tree data has several hundred depths, the query to find the descendant nodes is so simple. Because it only specifies the depth difference between the descendant node numerically.

2015-12-26

Fertile Forest Model (4/n) To Find Nodes (Ancestors)

Fertile Forest SQL

To Find Nodes (Ancestors)

I am going to express query finding ancestor nodes by the following tree structure data. [A], [B], ... indicated in figure means nodes.

DEPTH|
   0 | [A]--+-----------+
     |      |           |
   1 |     [B]--+---+  [C]--+---+
     |          |   |       |   |
   2 |         [D] [E]     [F] [G]--+---+
     |                              |   |
   3 |                             [H] [I]
-----+-------------------------------------------
     |  0   1   2   3   4   5   6   7   8   QUEUE

To Find Ancestor Nodes

Consider a query to find the ancestor node of [F]. Those satisfying the following conditions will be the ancestor node of [F].

Has the above DEPTH than [F]. (ff_depth < 2)

Nodes to the left than [F]. (ff_queue < 5)

The node at the far right for each DEPTH.
(MAX(ff_queue), GROUP BY ff_depth)

Conditions of "Has the above DEPTH than [F]" and "Nodes to the left than [F]" are specified as the WHERE clause such as the following.

WHERE ff_queue < 5 AND ff_depth < 2

Procedure to find "Node at the far right for each depth" as:

Group by ff_depth, and select maximum number of ff_queue.

Get the node with the ff_queue that matches the maximum number of ff_queue.

To summarize the above, it is as:

SELECT *
  FROM ff_tables
  WHERE ff_queue IN
    (SELECT MAX(ff_queue)
      FROM ff_tables
      WHERE ff_queue < 5 AND ff_depth < 2
      GROUP BY ff_depth
    )
;

To Find Parent Node

To create a query to find parent node is so easy. It is a little bit to modify WHERE clause in the finding query of ancestor node.

SELECT *
  FROM ff_tables
  WHERE ff_queue IN
    (SELECT MAX(ff_queue)
      FROM ff_tables
      WHERE ff_queue < 5 AND ff_depth = (2 - 1)
      GROUP BY ff_depth
    )
;

ff_depth < 2 has changed ff_depth = (2 - 1) in the subquery. (2 - 1) means that (DEPTH of [F] - 1).

To Find Root Node

To create query to find root node of base node is like the parent node. By just slightly modified search query ancestor node.

SELECT *
  FROM ff_tables
  WHERE ff_queue IN
    (SELECT MAX(ff_queue)
      FROM ff_tables
      WHERE ff_queue < 5 AND ff_depth = 0
      GROUP BY ff_depth
    )
;

To Find Grandparent Node

In example of the parent node and root node, the query was created by simply changing a little bit from query of ancestor nodes. The query to retrieve a grandparent node and great-grandparents nodes can be created by the same idea.

Grandparent node of [F]

SELECT *
  FROM ff_tables
  WHERE ff_queue IN
    (SELECT MAX(ff_queue)
      FROM ff_tables
      WHERE ff_queue < 5
        AND ff_depth = 2 - 2
      GROUP BY ff_depth
    )
;

Great-grandparent node of [H]

SELECT *
  FROM ff_tables
  WHERE ff_queue IN
  (SELECT MAX(ff_queue)
    FROM ff_tables
    WHERE ff_queue < 7
      AND ff_depth = 3 - 3
    GROUP BY ff_depth
  )
;

To Generalize The Query To Find Ancestor Node Before N-Generations

When base node has QUEUE = QQ, DEPTH = DD, can generalize the query to find ancestor nodes before n-generations as:

SELECT *
  FROM ff_tables
  WHERE ff_queue IN
    (SELECT MAX(ff_queue)
      FROM ff_tables
      WHERE ff_queue < QQ AND ff_depth = DD - n
      GROUP BY ff_depth
    )
;

To Find Ancestor Nodes By Range Specification

When add a condition into the WHERE clause of a query to find all ancestor nodes, can create the query to find an ancestor nodes of the range that specify.

Query to find the ancestor nodes of [H] up to 2 generations.

SELECT *
  FROM ff_tables
  WHERE ff_queue IN
    (SELECT MAX(ff_queue)
      FROM ff_tables
      WHERE ff_queue < 7 AND ff_depth < 3
        AND ff_depth >= 3 - 2
      GROUP BY ff_depth
    )
;

2015-12-26

Fertile Forest Model (3/n) Model Design

Fertile Forest SQL

I am going to write core design of Fertile Forest Model. This content is was reprinted with permission from the official site of the Fertile Forest Model.

Model Design

FF model is designed by new idea.

Matrix Of Hierarchical Structure

I am going to explain the FF-Model visually by tree data as:

      [A]
       |
   +---+---+
   |       |
  [B]     [C]
   |       |
 +-+-+   +-+-+
 |   |   |   |
[D] [E] [F] [G]
             |
           +-+-+
           |   |
          [H] [I]

(1) Modifies form of entire tree as:

[A]
 |
 +-------+
 |       |
[B]     [C]
 |       |
 +---+   +---+
 |   |   |   |
[D] [E] [F] [G]
             |
             +---+
             |   |
            [H] [I]

(2) Further modifies for each node as only one in vertical positions.

[A]--+-----------+
     |           |
    [B]--+---+  [C]--+---+
         |   |       |   |
        [D] [E]     [F] [G]--+---+
                             |   |
                            [H] [I]

Give numbers of XY axis position for each node in likening to the grid plain.

0 | [A]--+-----------+
  |      |           |
1 |     [B]--+---+  [C]--+---+
  |          |   |       |   |
2 |         [D] [E]     [F] [G]--+---+
  |                              |   |
3 |                             [H] [I]
--+-------------------------------------
  |  0   1   2   3   4   5   6   7   8

Each node has a unique XY coordinates in this figure. To be able to identify the hierarchical data position of the node by the XY coordinates. This is the gist of FF model.

I defined the XY coordinates as follows.

X-axis: QUEUE
Y-axis: DEPTH

A table of FF-Model can have some trees. When in the case, QUEUE of 2nd (or more) root nodes are depending on size of trees before.

DEPTH|
   0 | [A]--+-----------+                  [J]--+---------------+
     |      |           |                       |               |
   1 |     [B]--+---+  [C]--+---+              [K]--+          [L]--+
     |          |   |       |   |                   |               |
   2 |         [D] [E]     [F] [G]--+---+          [M]--+---+      [N]
     |                              |   |               |   |
   3 |                             [H] [I]             [O] [P]
   --+----------------------------------------------------------------------
     |  0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  QUEUE

RDB Table Design

Add two columns for containing QUEUE and DEPTH into RDB table as tree-storing column.

id	name	ff_queue	ff_depth
1	A	0	0
2	B	1	1
3	C	4	1
4	D	2	2
5	E	3	2
6	F	5	2
7	G	6	2
8	H	7	3
9	I	8	3

CREATE TABLE `ff_tables` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(225) DEFAULT NULL,
  `ff_queue` int(11) NOT NULL,
  `ff_depth` int(11) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;

INSERT INTO `ff_tables`
    (`id`, `name`, `ff_queue`, `ff_depth`)
  VALUES
    (1, 'A', 0, 0), (2, 'B', 1, 1), (3, 'C', 4, 1),
    (4, 'D', 2, 2), (5, 'E', 3, 2), (6, 'F', 5, 2),
    (7, 'G', 6, 2), (8, 'H', 7, 3), (9, 'I', 8, 3)
;

ff_queue must be UNIQUE for all nodes in a table. But can not use UNIQUE KEY for ff_queue, because it makes error when to graft nodes.

QUEUE is likened as a serial number by certain rules to all nodes. From this fact, FF model is also referred to as a "Serialized Tree Model".

Index To Be Applied To A Table.

All basic queries can be referred to index to find nodes in FF-Model. Grant the index in tables as:

CREATE TABLE `ff_tables` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `ff_depth` int(11) NOT NULL,
  `ff_queue` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `ff_queue_index` (`ff_queue`),
  KEY `ff_depth_index` (`ff_depth`,`ff_queue`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;

ff_queue_index is used in the search in general. It is the most important index of the FF model.

ff_depth_index is a composite index. When to find descendant nodes, subquery finds the end of node in subtree. At that time, the index is used for the search speed.

To Reproduce The Tree Structure Data From Stored Data

A model has two roles for storing hierarchical data in RDB. (First chapter first section)

To store the hierarchical data in RDB table.

To write the query to find the relevant nodes (such as "sub-tree", "parent node", "children nodes", "ancestor nodes" and "descendant nodes") efficiently for any nodes.

I verify the steps to reproduce a tree structure from data stored in RDB table of the FF model.

(1) To map all nodes saved in RDB table onto the grid plain by QUEUE and DEPTH columns.

DEPTH|
     |
   0 | [A]
     |
   1 |     [B]         [C]
     |
   2 |         [D] [E]     [F] [G]
     |
   3 |                             [H] [I]
-----+-------------------------------------------
     |  0   1   2   3   4   5   6   7   8   QUEUE

(2) When take to the left on "upper one depth" from each node, "first found node" is parent node.

DEPTH|
     |
   0 | [A]--+-----------+
     |      |           |
   1 |     [B]--+---+  [C]--+---+
     |          |   |       |   |
   2 |         [D] [E]     [F] [G]--+---+
     |                              |   |
   3 |                             [H] [I]
-----+-------------------------------------------
     |  0   1   2   3   4   5   6   7   8   QUEUE

(3) By modifying the above figure, you get the figure below.

      [A]
       |
   +---+---+
   |       |
  [B]     [C]
   |       |
 +-+-+   +-+-+
 |   |   |   |
[D] [E] [F] [G]
             |
           +-+-+
           |   |
          [H] [I]

We have just seen that FF model clears the condition of second role of model to store hierarchical data in RDB (To store the hierarchical data in RDB table).

In the next section "To Find Nodes", I will verify the second item of model condition (To write the query to find the relevant nodes efficiently for any nodes).

2015-12-26

Fertile Forest Model (2/n) Hierarchical Data in a RDB [2]

Fertile Forest SQL

Additional Columns

Conventional database engineers had studied four models for solving the problem to store hierarchical data in a database. They are:

Adjacency List Model

Path Enumeration Model

Nested Set Model
Nested Intervals Model

Closure Table Model

Common thing about each model, is to add some columns into RDB table for storing hierarchical data. Columns to be added are classified in two types by its roles. These are as:

Hierarchical Structure Columns

Search Efficiency Columns

(1) Hierarchical Structure Columns

Necessary to store the hierarchical data in RDB table. Also used as a column to specify a search conditions in a finding query.

(2) Search Efficiency Columns

To be added for increasing the search efficiency. These are added when can not write a efficient finding query by hierarchical structure columns. If the purpose is only to save hierarchical data in a database, it is not necessary. Depending on the model, there is a case that does not use "search efficiency column".

Criteria Of The Model

Superiority of the model to handle hierarchical data can be determined by the two criteria as:

Has the execution speed of the search query in a range that can withstand practical use.

Number of total column to save tree structure data is more fewer.

(1) Has The Execution Speed Of The Search Query In A Range That Can Withstand Practical Use.

When there is a website that takes two minutes until to see the next page, nobody thinks that the website has enough speed to withstand practical use. In the current Japanese Internet environment, even if five seconds will be considered as slow. Because of a database has applications to be used in the dynamic display of web page, we can think the criteria whether human feel slowness by turning over a web page.

When the search query can refer to the index, it keeps the execution speed that can withstand practical use, even if contains several hundreds million records in the table. When the query can not be referenced index, even if the records are only the several hundreds of million records, can not get enough execution speed depending on the conditions. The first criterion is able to be said to be the same as "whether can write the search query to be refered the index".

When find relative nodes in table that contains hierarchical data, the most often used purpose is for sub-tree and child nodes. Therefore, "write a practical search query in a tree structure data model" is said as:

Can write a search query to find a subtree and child nodes of an arbitrary node with refering the index.

When the search query for finding sub-tree and child nodes can refer index, even if a table has a billion number of nodes and a hundred hierarchy of depth, it can keep search speed of withstand practical use.

(2) Number Of Total Column To Save Tree Structure Data Is More Fewer.

Number of total column is the words that represented in consideration of the number of records to store hierarchical data in RDB. We consider about some models. The model has n-number of hierarchical structure column, and a rule that each node must be saved by m-number of records. In a case of that model, the number of total column is (n * m). Since the number of records to save is m = 1 in most of the model. Therefore, we can consider that total number of columns equals hierarchical structure the number of columns.

A model has the total number of records for saving nodes as small as possible, we can say that it is an excelent and economical model design. Because a few amount of data reduces capacity of DB server, and impact of finding query on the search speed.

The second criterion is no longer so importance now. Small capacity of the hard disk was an important issue in past, but now, the evolution of the hardware makes to reduce the problems related to capacity.

However, nobody says to disagree about the terms "it is an excellent model that contains columns for storing hierarchical data in RDB as few as possible". I think so.

The Points

Contents of this chapter contains an important point in understanding the FF model. Please check the following three points, and then proceed to the next chapters.

A model to store hierarchical data in a database contains two kinds of column. One is "column for storing hierarchical data", and Two is "column for efficiency search".

A criteria of excellent model for storing hierarchical data in a database, is that it can write a search query to find a subtree and child nodes of an arbitrary node with reference the index.

An second criteria is that total number of hierarchical structure column and search efficiency columns is fewer.

2015-12-26

Fertile Forest Model (1/n) Hierarchical Data in a RDB [1]

Fertile Forest SQL

Roles Of Model

The role of the model to store hierarchical data in RDB are summarized in two paragraphs as:

To store the hierarchical data in RDB table.

To write the query to find the relevant nodes efficiently for any nodes.

(1) To Store The Hierarchical Data In RDB Table.

We can define the criteria whether the model has function to store hierarchical data, as:

Can reproduce the original hierarchical structure by the saved all nodes.

If can not reproduce hierarchical data by all records in RDB, we can not say that the model has the function. It is too basic thing that needless to say. Therefore, I think that nobody can disagree it.

(2) To Write The Query To Find The Relevant Nodes Efficiently.

In general, we save data into a database for searching them. We think naturally as: If to save hierarchical data in a database, various relationship node of any nodes (such as "subtree", "parent node", "child nodes", "ancestor nodes", "descendant nodes" and etc.) will be searchable. It is the important role of a model to find related nodes of any nodes efficiently.

In the figure below, [A], [B], [C], ... represents a node.

      [A]
       |
   +---+---+
   |       |
  [B]     [C]
   |       |
 +-+-+   +-+-+
 |   |   |   |
[D] [E] [F] [G]
             |
           +-+-+
           |   |
          [H] [I]

When a human search the sub-tree of [C] visually, pick up all the nodes by tracing the branches extending downward from [C]. In this example, set of partial tree of the [C] is as:。

  [C]
   |
 +-+-+
 |   |
[F] [G]
     |
   +-+-+
   |   |
  [H] [I]

To trace a branches is regularity routine. Therefore, a lot of people think as "If to store hierarchical data in a database, we can find related nodes of any nodes easily".

In the hierarchical data, use frequently SQL to find a related nodes sets. Describe with the example in previous figure as:

Basic Node	Relation	Set of Nodes
[B]	Parent	[A]
[B]	Children	[D] [E]
[H]	Ancestors	[A] [C] [G]
[C]	Descendants	[F] [G] [H] [I]
[C]	Subtree	[C] [F] [G] [H] [I]

The word of "Efficient" is most important phrase in this items. If efficiency is unnecessary, to devise search query is also unnecessary. Firstly get all nodes, and them examining whether or not to match with conditions by programming language.

However, it is not realistic to get all nodes everytime in a table that has record of millions. When use the way to get all nodes against huge table, need execution time and a large amount of memory. If the table has serveral hundreds of million records and match nodes are a few, it is natural to design the system to be acquired at a high speed by specifying conditions.

Focus Of Problem

If the reason for unwieldy a hierarchical data in RDB is only "To store the hierarchical data in RDB table", does not exist problems that plague the database engineers around the world. Because. "Adjacency List Model" was demonstrated that it can not be described efficient query to find subtree. However, Adjacency List Model can store hierarchical data in RDB.

There are several reasons why it is difficult to write an efficient find query.

For being searchable, it is necessary to store all the ancestor nodes of each node.
The number of nodes relationship to store, is increased according exponentially to the depth of the tree.
The depth of hierarchy is not fixed with depending on contents of data.
It is difficult to impart index to hierarchical data in a database. Because tree structure having the two-dimensional spread, but index is one-dimensional structure.
etc.

Consider to find a set of descendant nodes from specified node. It is a simple story If can determine for each node by tracing branch lines to ancestor direction. However, SQL is a "declarative programming". Therefore, can not implement for determining relationship for each node, like imperative programming in a separate process.

In declarative programming, how can we express the relationship of the subtree and ancestor nodes in the search queries?

This is the essence of a challenge that all database engineers around the world given. The problem of hierarchical data in RDB attaches at one point that "describes the search queries to efficiently retrieve related node from any node".