![]() |
![]() |
![]() |
|||||||||||||||
| Knowledge Engineering with Topic Maps < Evolution of a Perl-based Knowledge Portal < < Home | |||||||||||||||||
|
Knowledge Engineering with Topic MapsTopic Maps compete with RDF in the Semantic Web arena. Like RDF, TMs are a vehicle to represent highly irregular data in a consistent form. Both, RDF and TMs, have been designed to provide notations for meta data and shallow knowledge. Example facts are the author of the document is John Doe or according to a forecast, at the end of 2004 France Telecom will own at least 40% of the scandinavian telephone market. Paradoxically, it is the higher complexity of Topic Maps which made us prefer it over RDF. While RDF statements always take the form of a triple (subject, predicate and object), TM statements (associations) are more heavy-weight, but flexible. The higher expressitivity for the human author has to be paid with a much higher complexity of the model. Probably more important, though, is that Topic Maps have a more considerate approach to the addressing of subjects: While both, TM and RDF use URIs to address subjects, TM authors can decide whether the thing denoted by the URI is the subject or whether a URI only indicates the identity of the subject. Map AuthoringFor machine to machine interchange an XML notation for Topic Maps (XTM[XTM]) was defined. For human authoring purposes, though, XTM has proven to be too verbose, so several shorthand notations have been developed over the years. We use AsTMa=[AsTMa] which is almost as expressive as XTM while being much more concise. With it we developed our topic maps, most of them containing course-specific material. The map with the theme 'Apache' contains for example: # example topic here apache-software-foundation is-a foundation bn: ASF, Apache Software Foundation oc: http://www.apache.org/foundation/ in: provides support for the Apache community in (mission): HTTP reference implementation should remain in public in: ASF members -- board of directors -- ASF Licence oc (member-list): http://www.apache.org/foundation/members.html # two associations there (is-a-member) member: roy-fielding group : apache-software-foundation (holds-position-at) position: chairman holder: roy-fielding organisation: apache-software-foundation The first text block represents a topic in the map. The identifier of this topic is apache-software-foundation; that identifier can be used to later refer to the topic inside the map. As the Apache software foundation is an instance of the abstract concept foundation, we encode this after the keyword is-a. We do not say anything else about foundations here, so that topic is created implicitely by an AsTMa= processor. Every topic can have names, official ones, abbreviated ones or otherwise. These so-called basenames are listed in lines starting with the keyword bn. Another property of topics are occurrences; these are URLs (or URIs for that matter) and provide connections between the topic map and the outside world. In the case that text content should reside directly inside the topic map, we use the keyword in (for inlined text) to represent what the TM standards call resource data occurrences. Both, URL occurrences and resource data occurrences, can be of a type, such as in oc (list) or in (history). This helps to qualify the relationship between the topic and the occurrence. While topics are like bookmarks on steroids, they are actually not the central concept of TMs: main means of expressing knowledge are the relationships between topics. These are expressed within associations. They have a similar rationale as RDF triples in that they connect concepts in a particular way. Associations are themselves of a particular type (is-a-member and holds-position-at above). They relate two or more concepts, such as roy-fielding and apache-software-foundation, in a rather explicit way: Roy Fielding plays the role of being member and the ASF plays the role of being a group in that particular association (Fig. 1).
Topic maps so built with AsTMa= are plain text files which can be quite convenient in a UNIX environment, especially to track changes with standard tools like diff and revision control systems. To load them into a TM processing system (like a server) an AsTMa= processor parses these files. Ontology EngineeringTMs themselves can capture a specific fraction of the domain knowledge, namely taxonomies (type systems)[LMGThess]. To continue with the above example, we could define a tiny type system by stating that foundation and industry-consortium both are a specialization of an organization: foundation is-subclass-of organization industry-consortium is-subclass-of organization TMs, though, are not expressive enough to fully encode ontological constraints. The latter not only organizes the concepts of an application domain (such like 'Apache' or 'Internet Security') into a class tree, or - more generally - into a directed, acyclic graph. Fully-fledged ontologies also include rules which mandate (or suggest) how instances of these concepts have to look like or in which way they may or may not be related. For this non-trivial task we developed a language (AsTMa![AsTMa]). It enables us to exert some control on what vocabulary our maps should use and how our own maps should be formed. To constrain, for example, that all associations of type is-a-member must connect exactly two things, namely one thing which must be an instance of concept person (like Roy Fielding is) and the other partner which must be an instance of organization we can write:
forall $a [ (is-a-member) ]
=> exists $a ] (is-a-member)
member : $p
group : $o [
AND
exists [ $p is-a person ]
AND
exists [ $o is-a organization ]Without going into the language details, the forall clause will bind all associations of type is-a-member to a loop variable$a. The rest of the constraint will be evaluated for each such value. In the above rule, we obviously expect that for each such detected association that very association exists in such a way as it has a member and a group component. If that can be verified and the corresponding topics are bound to new, local variables $p and $o, then we further expect that $p has now been bound to something which is an instance of person and $o is an instance of an organization. To indicate whether we allow such patterns to appear exactly as we specify or whether such a pattern expresses only a minimal expectation, we use the square brackets ][ for the former and [] for the latter. Tightly connected with a constraint language is another language which can be used to query content inside topic maps. Also here a certain degree of sophistication is necessary, consider for instance a query for all organizations which are mentioned in a map. Even if it never was explicitely stated that the ASF is an organization, the query processor should be able to respect the underlying ontology to draw its conclusion:
forall [ $o is-a organization ]
return
($o, $o/bn)
The above query would return a list of organization information, first the internal identifier, then all
basenames of that very topic.
Such a query language cannot only be used to extract topic map content from a backend. It can also be used to directly generate content in form of XML, and - for advanced applications - also in form of topic maps themselves. Sophisticated applications so can transform maps from one vocabulary into another. Topic Map ViewsWe soon realized that browsing through a plain map can be rather confusing and distracting for users. All a user sees in a generic user interface are topics and outgoing links to related topics. The more connections between these topics, the higher is the semantic value of the map, but the more disorienting the topic itself becomes as there is no ranking of any kind for these connections to other topics. It proved difficult for humans (the presenter as well as students) not to lose the plot in this multi-dimensional structure. So we introduced an additional concept, that of topic map views[BaZaRevisit]. Realizing that not all content may be relevant in every context, our views filter what is to be shown for a particular topic and what is to be suppressed. All the aspects of a topic, sequencing of topic characteristics and rendering information, can be controlled via such a view. The filter associated with the view also contains sequencing information of the otherwise unsorted set of topics in the map. This mechanism can be used to define linear paths through the map. Naturally we allowed one map to have several views attached to it. This allows to keep the knowledge itself in the maps and to maintain different views for different audiences. This decoupling of content and presentation had an effect on how we prepare course material ourselves. The process has been split into two phases, whereby first we freely organize information we already have or actively seek into maps. Only in a second step we shop in the topic map store and consider what we are willing to talk about at a particular occasion.
|
||||||||||||||||