logo
Searching < XTMPath, Manipulating Topic Map Data Structures < < Home 

PrevUpNext

Searching

Before we can search we need a simple topic map. Let us assume that we cover a popular Australian theme in AsTMa=

VB (beer-brand)
bn: Victoria Bitter
oc: http://www.fosters.com.au/beer/about/brands/beer/vic_bitter.asp
in (marketing): Australia's favorite full strength beer brand

BudWeiser (beer-brand)
bn: BudWeiser
oc: http://www.budweiser.com/

(is-owned-by)
owner    : anheuser-busch
property : BudWeiser
and that we have stored this into a file beers-r-us.atm. Then we can easily read that in using the CPAN module XTM::base:
use XTM;
use XTM::AsTMa;
my $tm = new XTM (tie => new XTM::AsTMa (file => 'beers-r-us.atm'));
After that the $tm object contains our topic map.

To make use of XTM::Path you will need to import the module and to instantiate an object from it. This object can be used for searching within maps or map components like topics and associations.

use XTM::Path;
my $xtmp = new XTM::Path (default => $tm);
Our map $tm is used as the default search context if nothing else is specified later.

At first let us find all the topics with instance type beer-brand using the method find.

my @beers = $xtmp->find(
              'topic[instanceOf/topicRef/@href = "#beer-brand"]');
find returns a list of object references, each pointing to an XTM::topic object. Since we have not specified any new context, the context to search for defaults to our map.

The XTMPath expression itself will be familiar to XPath users: First the XTMPath processor will try to find topic nodes within the map and then it will try to filter out those for which the predicate instanceOf/topicRef/@href = "#beer-brand" evaluates to true.

The instanceOf/topicRef/@href might look a bit confusing at first but it all comes from the way a topic is represented in XTM. The BudWeiser topic in our map would be

<topic id="BudWeiser">
  <instanceOf>
    <topicRef xlink:href="#beer-brand"/>
  </instanceOf>
  <occurrence>
     <resourceRef xlink:href="http://www.budweiser.com/"/>
  </occurrence>
</topic>
As usual, the @ character is used to address an attribute while the rest is simply traversing through XTM elements.

What is probably going to be hardest for a beginner is to correctly identify a path to the desired element, especially if one is used to work only with AsTMa=. The best way to overcome this problem would be to have a sample topic and association available in XML and identify the path from there. The XTM standard itself might be handy as well.

If we would further want to iterate over each of these topics and extract base names and occurrences of each we will have to provide the current topic as the search context:

foreach my $topic (@beers) {
  my @basenames   =  $xtmp->find('baseNameString/text()',        $topic);
  my @occurrences =  $xtmp->find('occurrence/resourceRef/@href', $topic);
  print qq{<a href="$occurrences[0]">$basenames[0]</a>\n};
}
We can observe that asking for text() or an attribute value like @href will give us a Perl string scalar and no object as a DOM programmer would expect. Still, we will receive a list of values from which we only selected the first one for output.

That a path expression is interpreted relative to a context is not new to XPath developers. There are some considerable differences, though, how XTMPath treats axes. First, there is are some limitations in the current implementation in that it supports only child and descendent axes using the '/' syntax. More conceptually, in contrast to generic XPath, the XTMPath processor is well aware of the underlying DTD. It can immediately determine whether an expression like topic/member can return any value or not.

The knowledge can also be used to implement some DWIMming (Do What I Mean) functionality which Perl programmers feel so comfortable with: Instead of explicitely defining a path to the parts in which we are interested, we can leave the details of the route up to the processor. For example, the following XTMPath expressions are equivalent:

/topic/occurrence/resourceData
/topic//resourceData
topic/resourceData
In all cases the processor will look first for a topic data structure in the current context (a map object). Since it knows that a resourceData component must be inside an occurrence component, there is no need to elaborate on the intermediate steps or to use the descendant axis '//'. Only in situations in which there is no unique path the processor will flag an error. In this light, there is no difference between using '//' or '/', or using '/' at the front of an expression, for that matter.

Other limitations have to do with the fact that XTMPath is used within a programming environment. So, for instance, predicates of the form position() = 2 are not implemented as it is assumed that the programming language has already list processing features at hand.


PrevUpNext