![]() |
![]() |
![]() |
||||||||||||||
| TMCL, TMQL and Topic Map Merging < < Home | ||||||||||||||||
|
TMCL, TMQL and Topic Map Merging
IntroductionWhen topic maps are going to be merged, then according to the subject uniqueness assertion every covered subject should have at most one topic. The detailed conditions under which two topics are going to be identified may be controversial and may depend on the use case. According to [XTM] and [SAM], Topic Map processors should detect following situations to trigger the merging process:
In the following we will show how application specific conditions can be expressed using AsTMa!, a TM constraint language. In section 3 we will use this mechanism to also specify the generic merging conditions mandated by the standards. Once we found a declarative way to specify the conditions for merging, we try to implement merging itself in section 4 using AsTMa? as proposed TM transformation language. Finally we discuss this experiment and its potential consequences. Application Specific MergingAll TM processors are expected to detect the generic merging conditions and to perform merging whenever a map is loaded or modified. Applications are still free to do more aggressive merging if they are convinced that topics are about the same thing. The canonical example (provided by Steve Pepper) is an application which identifies persons based on their email addresses. In this example persons are represented by topics and their email addresses might be stored in specially typed occurrences. Using AsTMa=[AsTMa=] we can define this as: john (person) bn: John Johnson oc (email): mailto:john@example.com mr-johnson (person) bn: Mr. Johnson oc (email): mailto:john@example.com If the value of an occurrence property of, say, type email is chosen as identifying property, then it is now up to the application to merge these two topics into one. Whether this is now hard-coded or can be configured in some flexible manner will depend on the implementation. One may now wonder whether - in contrast to a direct implementation - the triggering of merging can be controlled more declaratively. One option is to use a Topic Map ontology which serves as vocabulary, as taxonomy but - more importantly here - as a vehicle to express constraints on a topic map. Given the AsTMa![AsTMa!] expression forall [ $a (person)
oc (email) : $email ]
=> not exists [ $b (person)
oc (email) : $email ]
we have constrained maps to use emails uniquely, i.e. only one topic
may have a particular email value within a given map. Informally,
above constraint means that for every topic which is a
person there must not exists another topic (with a different id) which
has the same value for an occurrence of type
email. A TM engineer may now feed this
constraint into a TM processor: If the constraint is violated then the
map is insufficiently merged.
The interesting observation we can make is that a uniqueness constraint now can be arbitrarily complex. If we assume that a constraint language can cover all possible constellations of topics and associations within a map then uniqueness not only may be induced by the fact that two values of a particular topic characteristic are equal. As a zeitgeist example consider an application which merges tentatively suspects on the condition that they are probably involved in some illegal activity: forall [ $a (suspect)
(sells-heroin)
seller : $s
amount : $v1
$v1 (monetary-value)
in : $money ]
=> not (
exists [ $b (suspect)
(owns-account)
account : $a
owner : $b
(transaction)
account : $a
amount : $v2
$v2 (monetary-value)
in : $money ]
)Accordingly, if it happens that there is one observation that one suspect sells heroin for the same amount as can be observed to have been entered in another suspects' account, then for this particular application this is sufficient to identify the two - maybe unknown - suspects as one. Generic MergingOnce we understand how a constraint language can declaratively define when topics have to merge, then we might also try to use this language to express such rules for the merging rules in XTM and SAM. Merging based on a common subject indicator is trivially induced by forall [ $a
sin : $sin ]
=> not exists [ $b
sin : $sin ] is-reified-by gm-by-subject-indicator
Again, we used the same scheme to single out two different
topics having the same value for a subject indicator.
In the same token we can also cover the other cases: forall [ $a reifies $r ]
=> not exists [ $b reifies $r ] is-reified-by gm-by-reification
forall [ $a
bn @ $scope : $bn ]
=> not exist [ $b
bn @ $scope : $bn ] is-reified-by gm-by-scoped-name
forall [ $a
bn ($type) : $bn ]
=> not exist [ $b
bn ($type) : $bn ] is-reified-by gm-by-typed-nameMerging as TM TransformationNow that we managed to capture uniqueness as a normal topic map constraint then the obvious question might be whether the merging process itself can be generalized. This would be the task of a language which takes a map with not yet merged topics returning another, fully merged map. Transformations are specialized query operations, so we should expect AsTMa?[AsTMa?] should be able to express such a transformation. Ideally, we would want a function in AsTMa? which performs exactly such transformation: map $merged := merge ('tm:opera')In that we declare a variable $merged which will hold a merged version of our map. The function merge would take one URL to a map as parameter. The function will have to detect not-yet-merged topics according to our preferences. Let us return to our original example that persons with the same email address should be identified: define function merge (map $m) returns map {
in $m
exists [ $a (person)
oc (email) : $email
@c ]
and
exists [ $b (person)
oc (email) : $email
@d ]
.....
return
# build new topic and also take care of the associations
or
return
$m
}The exists clauses are used to find two different topics of type person which also have a common occurrence of type email with the same value. Once we have these, a topic has to be generated into which all characteristics have to be copied. This can be done via {$b} (person)
oc (email) : {$email}
{@c}
{@d}
where we use the already bound variables to construct a new
topic. Note, that we simply have put all characteristics of the first
(bound to @c) and all of the second topic (bound to
@d) into it. We also reused $b
as the topic id; this is rather arbitrary, we could have used a new id
as well.
In case no two topics with a common email address can be identified, then the branch after the or is evaluated; in that case the map is passed back knowing it is merged. To take care of the associations and to consistently replace in them $a with $b we hand over the newly generated map to another function which handles the replacement of $a wherever it occurs as association type: define function merge (map $m) returns map {
in $m
exists [ $a (person)
oc (email) : $email
@c ]
and
exists [ $b (person)
oc (email) : $email
@d ]
and
exists @rest_topics [ $c ]+ # $c must not be same as $a or $b
and
exists @rest_assocs [ (*) ]+ # all associations
return
merge_assoc_type ( # call function to clean $a from associations
$a, # this topic in associations should be substituted with
$b, # the one which survived
# new topic with merged characteristics
{$b} (person)
oc (email) : {$email}
{@c}
{@d}
+ # plus
{@rest_topics} # the remainder of the map
+
{@rest_assocs}
)
or
return
$m
}To capture the rest of the map (everything except our two topics) we used exists @rest_topics [ $c ]+ and exists @rest_assocs [ (*) ]+to bind all other topics and all associations via greedy matching. All parts, the newly merged topic, all other topics and the associations are then passed into the function merge_assoc_type as one map. To let this function know which topics originally triggered the merge and which topic ids to replace, we pass them as parameters as well. The new function merge_assoc_type works quite similar: Again we use an exists clause to identify what we are interested in; here it is any association of type $a. Again we match the rest (associations include topics as well as topics as those are involved in sum-ergo-sum associations): define function merge_assoc_type (topic $a, topic $b, map $m) returns map {
in $m
exists [ ($a)
@x ]
and
exists @rest [ ($c) ]+ # matches everything else
return
merge_assoc_type ( # recursively do it for the new map
$a,
$b,
# create new association, but with other id as type
({$b})
{@x}
+ # plus
{@rest} # the remainder again
)
or
return
merge_assoc_role ($a, $b, $m)
}Once all associations where $a is the type are recursively processed, the or clause will call merge_assoc_role. Its task is to continue the replacement $a to $b whenever $a is a role in an association: define function merge_assoc_role (topic $a, topic $b, map $m) returns map {
in $m
exists [ ($t)
$a : @p
@x ]
and
exists @rest [ ($t) ]+ # matches everything else
return
merge_assoc_role ( # recursively do it for the new map
$a,
$b,
# create new association, but with other id as role
({$t})
{$b} : {@p}
{@x}
+ # plus
{@rest} # the remainder again
)
or
return
merge_assoc_player ($a, $b, $m)
}As you may have guessed, the last function merge_assoc_player replaces $a with $b wherever $a is the player in an association: define function merge_assoc_player (topic $a, topic $b, map $m) returns map {
in $m
exists [ ($t)
$r : $a
@x ]
and
exists @rest [ ($s) ]+ # matches everything else
return
merge_assoc_player ( # recursively do it for the new map
$a,
$b,
# create new association, but with other id as role
({$t})
{$r} : {$b}
{@x}
+ # plus
{@rest} # the remainder again
)
or
return
merge ($m)
}You may notice that we call our original function merge again on the current version of the map once all associations are cleaned from $a. It means that we have completed one merge cycle regarding two topics. Of course, other topics may have to be merged as well, or - potentially - our recent merge created a new situation which may trigger a new merge. SummarySo what does this actually show us? There are probably two answers to this: Firstly, if the AsTMa* languages to constrain and process topic maps are to be tested for fitness whether they are capable to express medium complex operations, then the above can be seen as an indication of their potential. The slightly more remarkable second point is that, if a constraint and a query language are sufficiently expressive to (a) define an application sensitive concept of uniqueness and to (b) implement it, this implies that - theoretically - we do not need to maintain a uniqueness facility directly within the Topic Map paradigm itself. We could define a simplistic operation to connect two topic maps and maybe only define intrinsic merging as are sharing a subject indicator and having explicitly the same subject. All other rules could be left to the application designer and ontology engineer. One reason why it still may be justified to incorporate specific merging methods in standardizing documents is probably performance: A dedicated TM processor with hard-wired rules will be magnitudes faster than any future TMCL processor. And - at the time of writing - there is no such processor. The consequences for a formal model for Topic Map engineering are significant, though. In such a model merging could be simply seen as one possible map-to-map transformation, nothing more. References
|
|||||||||||||||