######  ######            ##

                         # ## #   ##  ##           ##

                           ##     ##  ##  ####   ######

                           ##     #####  ##  ##    ##

                           ##     ##  ## ##  ##    ##

                           ##     ##  ## ##  ##    ## ##

                          ####   ######   ####      ###

 

                         Technnical Reference

                                 Version 1.00

                     By P.E.Colla (LU7DID) (c) 2000,2001

 

 

DISCLAIMER

 

AIML is (c) the AI Foundation 2000, this document uses the AIML 1.0 standard, for a full description of the tagset refer to the original AIML specification.

 

All experimental tags implemented by TBot doesn't have support from the

AI Foundation nor there is any guarantee it would ever be supported, use them at your own risk.

 

 

 

 

 

 

DISCLAIMER.. 1

Abstract 3

License Information. 3

Demo Bots. 4

TBOT – Console Bot 4

BOTSERV – HTTP/Telnet Server 4

VERAIML – AIML Files Verifier 4

Architecture. 5

Object Architecture. 5

TAIMLBase. 5

TTopicList 5

TTopic. 5

TPattern. 6

TTemplateList 6

TTemplate. 6

TEventList 6

TEvent 6

Inference Engine. 6

MegaHAL. 7

Learning. 7

Evaluation. 7

Service Objects. 8

Internal Objects. 8

Third Party Objects. 8

External Packages. 9

GraphMaster Architecture. 10

Functional Architecture. 11

Pattern Matching Cycle. 12

Functional Description. 13

AIML Usage. 13

AIML Event Model 13

Events. 13

Timers. 13

Learning. 14

Assisted Learning. 14

Remote Loading. 14

Bot Query. 14

Cache Management 14

A.L.I.C.E. Compatibility. 14

 

 

 

 

 

 

 

 

 

 


Abstract

 

This document summarizes technnical information related to the architecture and internal organization of the collection of objects that are used by TBot.

 

TBot has to be perceived as a development tool rather than a fully functional bot package, it implements the objects that make up a fully functional AIML 1.0 compliant bot but needs a “host” program to instantiate them.

 

Two sample programs are provided with the distribution:

 

TBot, console oriented single user implementation.

BotServ, HTTP/Telnet server, multi-user oriented.

 

TBot has been implemented with the goal to be used on research processes related to AIML and Artificial Intelligence, as such it’s not suitable to sustain the performance and stability requirements of a “production” oriented bot; it’s usage on such scenarios is highly non recommended.

 

 

License Information

 

TBot is free for radioamateur and experimental uses, commercial use requires a written consent from me.

 

I could be really sophisticated in legal terms, but in a nutshell use it at your own risk and responsibility, the author bears no liability of any kind from the usage of this program.

 

Public information had been used to understand the requirements and the

tools used to develop the program; however all components featured on it comes from my own development with the exception of:

 

·         RegExpr Regular Expression Parser (c) 1999 Andrey V. Sorokin

·         EZ Delphi Structures Library (c) 1999 Julian M. Bucknall

·         LibXMLParser (c) 2000 Stefan Heymann

·         ICS TCP/IP Suite for Delphi5 (c) F.Piette 1999,2000

 

 

Even if some of the sources are included the copyright still belongs to

the authors of those excellent packages; anybody interested into compiling programs from the sources should get the most updated version of those packages before.

 

The AIML files included in the distribution had been taken from the

standard distribution created by the ALICE AI Foundation who retain the

copyright over them; by the moment this program is downloaded they could be pretty outdated so to get a fresher copy is recommended.

 

If you need support, have suggestions, encountered some bug or just are willing to provide feedback feel you free to drop a note to:

 

AX.25:   LU7DID@LU7DID.#ADR.BA.ARG.SOAM

Inet:    [email protected]

 

 

 

 

 

Demo Bots

 

 

The following demo bots are included in the package:

 

TBOT – Console Bot

 

TBot is a simple, console oriented, implementation of the TBot object suite; it’s own nature makes it ideal for quick evaluations.

 

From the programmer standpoint it shows also:

 

 

This program operates with a single session.

 

BOTSERV – HTTP/Telnet Server

 

BotServ is an implementation that provides HTTP (Web) and Telnet interfaces to the TBot object suite allowing to serve many users simultaneously.

 

From the programmer standpoint it shows also:

 

 

BotServ is not really optimized for high performance under load, it’s actually a single thread program so the maximum concurrent number of users is limited.

VERAIML – AIML Files Verifier

 

 

VerAIML is a very specific implementation of the TBot suite showing how to configure it as a tool, in this case a AIML verification tool.

 

In essence, VerAIML is a bot that scans itself and highlight different problems with the AIML it’s executing. Originally written to aid in the clean-up of the AIML standard set in order to detect loops and idle categories.

 


Architecture

 

 

TBot is implemented thru a number of objects which operates in a hierarchy, the highest in that hierarchy and the ones a programmer would normally see are two.

 

 

Object Architecture

 

Detailed programming oriented information on the main objects of the suit is provided separately, the information on this document should be handled just as an overview.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


TAIMLBase

 

Main object to wrap the Knowledge Structures (Graphmaster & Inference Engine) and dispatch related services.

 

TTopicList

 

Root object for all topics stored on the Graphmaster.

TTopic

 

Root object for all patterns that belongs to a given topic.

TPattern

 

Object to encapsulate the Graphmaster tree.

TTemplateList

 

Root object for all templates.

TTemplate

 

Object to encapsulate a given template (both structure and logic).

TEventList

 

Root object for all events.

TEvent

 

Object to encapsulate a given event.

 

Inference Engine

 

The inference engine is a secondary knowledge object to be used when the input could not be mapped by the graphmaster into any valid template (response).

 

It’s closely modelled after the Jason Hutchens’s Megahal program in terms of the algorithms used (albeit the code itself had been developed from scratch).

 

The inference engine is composed by three objects:

 

 

The overall architecture of the inference engine is as follows:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


MegaHAL

 

As an optional, the engine of the MegaHAL program (3rd order Markov chain) could also plugged in, the distribution MegaHAL.c file was wrapped in a DLL structure and included in the distribution.

 

Even if internally MegaHAL works different than MkDict (4th order Markov chain) a layer of commonalization has been made, so from the bot objects the methods, properties and events implemented on both are similar.

 

Learning

 

The learning process involves the trainning of the inference engine with a body of sentences consisting on:

 

 

The original knowledge of the inference engine comes from the parsing of the trainning body detecting all the keywords on it and building a 4th level Markov chain where relative frequencies for all the preceding and following keywords is computed. So the inference engine could build likely sentences (not necessarily the same learned) from that information.

 

The structure is shown in the following picture:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


The initial learning is performed from the file *.trn configured and additional learning could be performed thru the appropriate usage of the ad-hoc AIML tags.

 

Evaluation

 

When the inference engine evaluates an input it performs the following logic steps:

 

 

Once all relevant random sentences are formed a selection is made based on the probability of the sentence (max probability in normal mode, minimum probability  in surprise mode) and the number of keywords it

contains.

 

If more than one sentence fits the criteria a random selection is finally made.

 

The inference engine is inferior to the graphmaster approach to keep a coherent conversation and to react to vague inputs, it’s main advantage is to be able to produce a (most of the time) coherent answer to inputs where the graphmaster could not solve a concrete pattern to match the input.

 

It’s also a good tool to handle encyclopedic data and help in the learning process of the graphmaster.

 

 

Service Objects

 

 

Internal Objects

 

Several additional objects are used

 

 

Third Party Objects

 

The following third party objects were used

 

 

External Packages

 

The following external packages are used:

 

 

 

 

 


GraphMaster Architecture

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


The graphmaster as implemented in the TBot objects is composed by the coordinated interrelation of several objects.

 

The TTopicList object implements the higher level of the topic collection, it’s a single object per Graphmaster and essentially encapsulates the logic to manage topics (add, remove, get, list, etc) and operates as a “Topic Factory” at load time.

 

For each topic a TTopic object is created, there must be at least one object per Graphmaster with no maximum number set; the TTopic object encapsulates the logic to insert the first level of the patterns within the Graphmaster (TPattern objects).

 

The TTopic object implements a double entry for TPattern objects of the first level, a sequential access to all dependent objects (implemented thru a Single Linked List) and a Hash Table to all keywords of the dependent TPattern objects. The TTopic object operates as a pattern factory for the first level of pattern atoms during load.

 

The Hash Table is used to resolve words and empty tags while the Linked List is used to solve XML tags.

 

The most complex object of the Graphmaster is implemented thru TPattern which consist on a Pattern “atom” (a word, an empty AIML tag or an AIML tag).

 

Each TPattern object could reference back the TTopic object it belongs to; the keyword of it is unique within the Topic for words and AIML empty tags and could be multiple for AIML tags.

 

The TPattern object manages several references to other objects:

 

 

Each TPattern object of level N acts as the object Factory for the TPattern objects of level N+1.

 

Only one uncontexted template could be referenced by each Pattern object, but an unlimited number of contexted templates could be referenced by a single Pattern object, a contexted template is dependent of a

THAT verification.

 

Each context (Pattern Keyword+THAT) could point to a single TTemplate, but a single TTemplate could be referenced by many context on the same or different TPattern objects.

 

Each unique template is stored as an object of the class TTemplate, which stores the full AIML tree to resolve it in terms of AIML predicates (including externally defined tags and macros).

 

For management purposes (destruction mostly) all template objects are chained on a single linked list structure homed on a TTemplateList object which acts as a “template factory” at load time; no sequential scanning of template objects is performed during evaluation.

 

A separate structure made of TEvent objects represents discrete events defined on the AIML set, each TEvent object could reference an unlimited number of TTemplate objects that would be executed in case the event occurs.

 

A TTemplate object could exists either because it’s referenced by a TPattern object, a TEvent object or both at the same time. Also, nothing prevents a single TTemplate object to be referenced by more than one TPattern object or more than one TEvent object in any combination.

 

 

 

Functional Architecture

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


The complexities of the GraphMaster are kept hidden from the host application of the objects thru the encapsulation of all the required objects into two interface objects; TAIMLBase which contains all the objects implementing the GraphMaster (one per running instance of the bot) and a TAIMLBot object which contains the pattern matching engine and all the resources to provide a running environment for each instance of the bot (one per session held by the bot).

Each actual implementation of a bot must implement a “listener” object which operates as the front-end for the particular transport technology selected to implement the bot and manages the creation of the two higher level objects, it’s instantiation and the proper usage of methods, events and properties of them.

 

The TAIMLBase object is created before the AIML set is loaded and contains all the methods to perform the verification and loading of the AIML set into the Graphmaster, creating automatically all the crossreferences it contains.

 

The TAIMLBot object is created whenever a given bot session needs to be performed and contains all the methods to capture, normalize, process and format all inputs into responses.

 

Both objects builds all the internal crossreferences needed to communicate among them at creation time.

 

Each instance of TAIMLBot creates a “private” environment where the session has it’s own resources in terms of stacks, status, variables and execution instance and it could concurrently coexists with many others at the same time; no status related to the Pattern Matching process is stored on the AIMLBase object so it’s fully re-entrant and re-usable.

 

Pattern Matching Cycle

 

When a session is instanced thru the creation of a TAIMLBot object a reference to the existing TAIMLBase object needs to be passed to it.

 

Based on that all the required event handlers are initialized between both as appropriate in order to be used during the pattern matching process. Also, a local configuration is performed based on the content of the

supplied initialization file containing both parameters and optional bot properties.

 

An external input provided to the TAIMLBot object creates a minimal “pre-normalization” to be performed internally followed by the firing of the “OnStart” event; the AIML logic should catch that event and execute all the detailed normalization and input conditioning based on AIML definitions returning as a result the actual input to be subject to the pattern matching process.

 

That input is feed to the parsing engine contained into the TAIMLBot automatically.

 

In order to perform the parsing of it an attempt is made with the last used topic.

 

First a sequential scan is performed on the selected topic for all AIML tags stored under it; if a hit is detected and a template is associated with it control is passed to the referenced TTemplate object in order to provide an answer.

 

TTemplate object could recurse the pattern matching process as needed thru internal events, also it could get resources that belongs to the session instance (such as variable values) thru the same mechanism.

 

If no AIML tags results in a hit an attempt is performed with the “_” token, then the first word of the input to match a keyword into one of the dependent TPattern objects, if a match is found the process is recursed down the graphmaster levels recursively with subsequent words if not an attempt with the “*” is made.

 

The overall algorithm is a reasonably accurate implementation of a conventional GraphMaster.

 

For the TPattern object that contains the keyword of the last word if a TTemplate object is references methods on it are invoked to produce a result (same as before).

 

All context are checked first, context references has higher priorities than the uncontexted reference in terms of template selection.

 

If the word could not be match at any level of the recursion a sequential match is attempted with all the TPattern objects containing empty AIML tags (if any); each one being evaluated accordingly and tried to be match with the word been pursued; any hit will lead either the execution of the associated (contexted or uncontexted templates if any) if it’s the last segment of the input or the continuation of the recursion down the graphmaster for the remaining unmatched words.

 

When the matching is not successful on the initial topic a sequential search on all stored topics is started; it’s assumed that at least there is one full wildcard pattern in the entire Graphmaster in order to guarantee an answer.

 

As said, the evaluation starts always with the previously used topic except the case when it’s the last topic of the list; in such case the matching begins sequentially with the first topic of the list; the last topic of the list is assumed to have the least fitting patterns (the most general patterns) and as such not doing this would make the matching to be performed always with them.

 

 

Functional Description

 

 

AIML Usage

 

The bot implementation should be allowed to use standard AIML 1.0 files, extensions using the experimental tagset could also be made.

 

 

 

AIML Event Model

 

Two types of events are supported, inmediate events (events) and deferred events (timers).

 

Events

 

Events could be fired at any moment with the <event/> tag, a value attached to it is optional.

 

Events are catched by the <onevent/> tag which is equivalent to a <pattern></pattern> structure and is associated to a given template structure.

 

When an event is fired all categories with a matching <onevent/> tag are executed; the compound result of the execution of all the templates are returned to the category structure that fired the event.

 

Some events are fired automatically by the engine as a part of the pattern matching cycle:

 

 

Timers

 

Timers are events whose execution is deferred.

 

The activation is performed with the <timer></timer> tag, the pattern matching is performed with the regular pattern event since essentially it’s a delayed <srai> operation.

 

The event is fired “interval” after being armed.

 

Timers are fired once per instantiation, if multiple firing is required the timer has to be rearmed once the expiration occurs.

 

A timer could be disarmed at any time by placing a 0 interval to it.

 

 

Learning

 

 

TBot has been used as a research tool to explorate a restricted domain of learning, some experimental features has been implemented to support that effort.

 

Assisted Learning

 

This is the edition of the AIML files to incorporate new categories to fit the user input.

Remote Loading

 

Load of AIML files from a remote bot. Implemented thru both HTTP Get requests and XML/SOAP messages.

Bot Query

 

Recursing of patterns into a remote bot is supported thru an experimental extension of the <srai> tag.

Cache Management

 

AIML incorporated thru remote acquisition or dynamically created thru remote queries could be managed with experimental extensions of the tags <save>, <forget> and <learn>

 

 

A.L.I.C.E. Compatibility

 

TBot is compatible with the ALICE implementation (ProgramD) to the extend it implements standard AIML 1.0 facilities.

 

Limited or no support from experimental AIML extensions implemented in ALICE is provided.