Zemucan: A Syntax Assistant for DB2 (Source Code) -

--- Internal description --- Andrés Gómez Casanova --- 2010-03-20 ---

Internal description

The core of this application is described in this section. The first thing the application does is to load the grammar and create a graph that represents that grammar. Then, it could analyze the commands written by the user, by doing a lexical analysis and then a syntactic analysis.

Graph creation

This process has to be done before assisting the user. The GrammarReaderController will invoke the corresponding class, and that class, that is a GrammarReader implementation will create the graph. The grammar could be a XML file, where all the grammar is described. But also, it could be just a simple Java class, that has the grammar defined in it. Or it could be eventually a class that read files that describe the grammar in a BNF format.

The graph creation could be a little complex to understand. This process has many internal steps, that ease the grammar definition, but understand all the steps is a little difficult. For example, each command can be described in a separated file, and for each file, a graph will be created; then, these graphs have to be merged in just one, and simplified. Simplified means that different nodes could represent the same token, so the graph has to be modified in order to have just one node.

To conclude, the graph creation will have these steps:

The GrammarReaderController by reflection will retrieve the grammar reader constructor.
For each grammar file.
- Read the file and create an associated graph.
Take all the graphs, and merge them in just one graph. This step could generate several nodes for the same word (token.)
- Simplify the graph, in order to not have several nodes for the same word. For more explanation take a look at the video merging.avi in the doc/video/merging directory of the repository or in YouTube http://www.youtube.com/watch?v=uhMZ5kMlhKs.
- Add extra nodes if necessary (Help node and License node.)
- Validate the graph; it means that each node has a possible way that comes from StartingNode and a way that goes to EndingNode. If not, raise an exception telling that the grammar is invalid. For a detailed explanation, take a look at the video checking.avi in the doc/video/checking directory of the repository or in YouTube http://www.youtube.com/watch?v=mA2AQilJ8m0.
Now that the graph is valid, returns the StartingNode to the Syntactic part.

The only important part for the analyzes is the StartingNode, because any node could be reached from this node. It is not necessary to know all the graph in the analyzes part.

Console

The first thing the console does when it is executed, is to create a second thread that will read the grammar. This second thread is created and suspended. Then, the first thread loads the console components, like jLine, and shows the prompt to the user. Once the prompt is displayed and it is ready to receive orders from the user, the second thread (that was waiting for the first thread to finish the loading phase) will be awaken, and the grammar is loaded in background.

XML grammar

When using the XML grammar, the graph will be created following this process:

Read all nodes from the grammar file, and create a node for each one. Each node will be associated in hash.
Change the flag first phase as done.
Retrieve the associations from the configuration file, by retrieving all two members of those associations from the hash. Create a double relation parent-child.
Change the flag seconde phase as done, and put extra tokens (License, Help) in the graph.

Assisting user

Assisting the user is the most important function of the application. A user writes part of a phrase that represents a command, and then it makes a call to the application in order to receive assistance.

The application takes the user's phrase, and divides it in tokens. This process is call Lexical analyzes. Each token could be:

A word.
A parenthesis.
A comma.
A quote.

And the words are separated by spaces, commas, quotes, or other symbol.

All these elements are defined in the delimiters section of the grammar file (If the grammar file is a XML file.)

Then, the list of tokens is passed to the syntactic analysis and each token is compared to a node to find a possible way in the graph.

Finally, the set of possible options at the end of the way is returned to the user.

This graph is just a simplified example of the real graph created for assistance.

In this graph, we can see that each node of the graph represents a token in the grammar. The connections between nodes represents the order of the tokens when writing a command.

Lexical analysis

This part takes the phrase written by the user and converts it into tokens. The criterion to select a token is the set of delimiters defined in the grammar file. Spaces are delimiters, but they cannot be tokens. All other characters can be delimiters and characters. Such an example are the quotes ( " ), comma ( , ), parenthesis ( ( ) ), etc.

Once, the phrase is converted into a list of token that will be sent to the syntactic analysis.

db2 create table t1(c1 int, c2 char(10))

CONVERTION PROCESS

| db2 | create | table | t1 | ( | c1 | int | , | c2 | char | ( | 10 | ) | ) |

Syntactic analysis

The received parameter is the list of tokens, and internally, it has access to a graph that represents the grammar defined in the grammar files. This graph has been built when the application was executed. This process was described before.

The process consists in taking the first element of the list of tokens, and see if one way from the first node corresponds to that token. The first node is called Starting node, and part of its options is the db2 command.

If an option was found, then, the position in the graph is moved to that node. Then, the process is done again for each member of the list of tokens.

The process stops if the list does not have more elements, meaning that the possible options of the current node will be returned to the user.

Also, the process stops, if there is not matching between the list and the graph. That means that the user is typing an invalid command (or the grammar does not have all the possible options.) Or, if the navigation in the graph has arrived to the Ending node.

If the user typed an invalid command, there is not options returned to the user. If the process has arrived to Ending node, there is not option, and the command can be executed. If the list of tokens is in the middle of the graph, the list of possibles ways in the graph, are returned as options to the user.

1) db2

StartingNode
  |--db2 <--<
  |--db2ilist


2) create

db2
  |--alter
  |--create <--<
  |--drop


3) table

create
  |--index
  |--table <--<
  |--tablespace


4) t1

table
  |--<tableName> <--<


5) (

<tableName>
  |--(

...

Assistance cases

As part of the assistance, the application evaluates 9 different cases for assistance. Before describing the cases, it is better to explain two concepts, phrases and options.

The current last word of the command is a prefix of a possible command. For example in the command ''db2 create ta'' the word/token ''ta'' is prefix of ''table'' and ''tablespace''.
After the current token of the typed command, there could be several option. For example in the command ''db2 alter'' there are some options like ''table'', ''index'', ''bufferpool''.

Well, now it is possible to explain the cases.

Case 1: There are multiple phrases and multiple options, so all of them are presented to the user.
```
'db2'
|- create, catalog, attach
|- db2ilist, db2auto, db2fm, db2fmcu
```
Case 2: There are multiples phrases but just one option, so all of them are presented.
```
TODO I don't have an example
```
Case 3: There are multiples phrase but no options. Then, calculate the prefix of all phrases and replace the last token of the phrase to return.
```
'db2 c'
|- create, catalog
|- [EMPTY]
```
Case 4: There in one phrase and multiple options. Then, all of them are presented to the user.
```
'db2fm'
|- db2fmcu
|- -i, -t
```
Case 5: There in one phrase and one option. Then, all of them are presented to the user.
```
'db2 create table'
|- tablespace
|- <tableName>
```
Case 6: There in just one phrase. Then, replace the last token of the command with the phrase.
```
'db2 create tables'
|- tablespace
|- [EMPTY]
```
Case 7: There are multiple options with no phrase. Then, add a final space at the command and if the options have a common prefix, then put it in the command.
```
'db2 create'
|- [EMPTY]
|- table, tablespace
```
Case 8: There is one option with no phrases. Then, add a final space at the command and add the option if this is reserved.
```
'db2 create table '
|- [EMPTY]
|- <tableName>
```
Case 9: There are not options nor phrases. Then, it does not show anything.
```
'db2 create table '
|- [EMPTY]
|- [EMPTY]
```

Documentation

Project Documentation

Modules

Misc

External Links