![]() |
![]() |
![]() |
||||||||||||||
| Technologies < Evolution of a Perl-based Knowledge Portal < < Home | ||||||||||||||||
|
TechnologiesWhen it came to select a web application server infrastructure, we - unsurprisingly - relied on environments based on the Apache/mod_perl combo. Specifically, we used Mason[Mason] for those part of the server which require much and rather complex pre-processing before displaying any content. For a more document-centric publishing scenario, we used AxKit[AxKit] as all our documents were in some XML format, if not in DocBook[DocBook]. For the TM processing itself, we had developed our own packages. In the following we shortly present these technologies to give an idea of how they interact at our site. MasonMason is a Perl implementation of the popular "ASP/PHP/JSP-style" whereby the developer can freely mix programming code with HTML code. Mason is purportedly used at large commercial sites and has a long-standing developer community. In contrast to more corporate-style infrastructures (.NET, JavaBeans) which mandate a stricter separation of business logic (retrieval and modification of backend data) and interface logic (what should be shown to the user at a particular time), Mason is not particularly religious about this issue; it does, however, suggest such a separation within a single component. Mason's name alludes to the fact that an application is built from components; some of these may produce content which will be directly conveyed to the users, others are for internal purposes and behave like subroutines of programming languages. These only create the content which is returned to the calling component. A Mason component can be organized in sections, e.g. one covering the documentation, another section listing the arguments which can be passed into the component. The init section will be executed first and mostly covers the business logic whereas the final (HTML) output consists of the rest of the component:
<%doc>
Optional documentation here
</%doc>
<%args>
$tmpath => 'default-value'
</%args>
<%init>
my ($tmurl, $vurl) = split /@/, $tmpath;
...
# lots of backend processing
# store the results to be shown in a Perl data structure
<%/init>
<!-- above created Perl structured is rendered into HTML
Perl statements start with % or are enclosed in <% %> pairs -->
<table>
% foreach my $i (keys %{$info->{peers}}) {
<tr><td <%$info->{dict}%></td></tr>
% }
</table>
<& _navigation.mc &>Stored as file test.mc in the server's document root, a user invoking the URL /test.mc or with explicit arguments like /test.mc?tmpath=something-else would trigger the execution of such a component. To factor out common code, several options exist: like in the above example, a component can also invoke a subcomponent during its execution (_navigation.mc). The named component will then generate content which is inserted at the place of invocation. Mason also entertains the concept of a catch-all component (called an autohandler) within one level of the document hierarchy. As every request will first be handled by that component, it is the perfect place to keep the layout information in this single spot: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html xmlns:html="http://www.w3.org/1999/xhtml"> <!-- this is the place where web designers live --> <%$m->call_next%> </html> If a request addresses /test.mc then the autohandler generates the layout. But before the content can be sent to the user, the output of test.mc is inserted via the call_next invocation. The Mason handler object stored in $m keeps track of which components are next. AxKitAxKit, another brainchild of Matt Seargant, is the Perl pendant to XML applications servers like Cocoon[Cocoon] or XML Tamino. Like most of the XML infrastructures it mandates a much more disciplined approach to content management. In its simplest form, content is maintained in XML documents stored directly in the file system, something which we do for specifications and articles maintained in DocBook (.dbk). One option is to use file extensions to trigger the AxKit Perl handler to take control for specific files: <Files "\.dbk$"> PerlHandler AxKit AxAddStyleMap text/xsl Apache::AxKit::Language::LibXSLT AxAddProcessor text/xsl /stylesheets/docbook.xslt </Files> The other AxKit-specific directive above tells the AxKit processor two things: first which technology for XSLT transformations is to be used (we use libxslt[libxslt]) and, secondly, which particular XSLT stylesheet should be used for this class of documents. Whenever a request for a .dbk document arrives at the server, AxKit will take over, read the requested DocBook document from the file system and then subject it to an XSLT transformation with that stylesheet. The result of this process is then sent via Apache to the client. You not only have control over the XSLT technology used (other alternatives might by Xerces or Sablotron), you can also choose a more script-like technology which Matt called XPathScript: <Files "\.dbk$"> PerlHandler AxKit AxAddStyleMap application/x-xpathscript Apache::AxKit::Language::XPathScript AxAddProcessor application/x-xpathscript /stylesheets/frame.xps </Files>Such .xps scripts may contain Perl code like this
<%
$t->{'news:title'}{pre} = "<h2>";
$t->{'news:title'}{post} = "</h2>";
%>
to the effect that whenever the AxKit processor traverses an XML document and encounters there an XML tag
title with the namespace prefix news it will take the enclosed text
content, wrap it into <h2> and </h2> tags and output the lot. The
fact that this is all Perl, gives a lot of flexibility for processing which is difficult to find in a pure XSLT solution.
With the prolific use of XML languages, often enough one single transformation is not enough to completely render the content. Using DocBook, for instance, you may want to transform the DocBook tags with one XSLT stylesheet. Then, depending on whether the user has requested an HTML page for reading on the screen or one to be printed out onto paper, different procedures can be used: AxAddStyleMap application/x-xsp Apache::AxKit::Language::XSP AxAddStyleMap application/x-xpathscript Apache::AxKit::Language::XPathScript <AxMediaType screen> <AxStyleName "#default"> AxAddProcessor application/x-xsp . AxAddProcessor application/x-xpathscript /stylesheets/frame.xps </AxStyleName> <AxStyleName printable> AxAddProcessor application/x-xsp . AxAddProcessor application/x-xpathscript /stylesheets/docbook_print.xps </AxStyleName> <AxStyleName nullable> </AxStyleName> </AxMediaType>A request to some_docbook.dbk?style=printable will select the second option which uses docbook_print.xps in its final stage. The example above also shows that one processing path may have not only a single transformation but several chained together (output of previous stage is input of next stage). This pipelining is the XML manifestation of the powerful UNIX design pattern using filters and pipes[RayUNIX]. Also above we made use of another AxKit feature, XML server pages (XSP). This mechanism allows to mix programming with XML content. Adding programming code to an XML document, though, contradicts XML's philosphy of dumb data. For this reason we prefer to use so-called taglibs. They associate meaning as conveyed by a subroutine in a Perl package with a particular XML namespace. Whenever a taglib aware XML processor encounters a tag in such a namespace it will try to invoke an appropriate subroutine within the package. To make use of such functionality we simply embed an appropriate tag into an XSP document:
<xsp:page xmlns:xsp="http://www.apache.org/1999/XSP/Core"
xmlns:b="http://ns.it.bond.edu.au/xsp/blog/v1">
...
<b:blog
src = "dbi:mysql:database=topicmaps;user=nobody;table=blogs"
limit = "20"/>
...
</xsp:page>
The AxKit XSP processor will traverse the requested document at every request. The processor will realize the
namespace http://ns.it.bond.edu.au/xsp/blog/v1 and will identify it with one of the
registered taglib packages. It will then invoke the subroutine and insert all content, replacing
<b:blog ..../> by it.
To indicate the use of such a library we first register it in the Apache configuration: AxAddXSPTaglib AxKit::XSP::BlogThis particular taglib offers one subroutine blog. It takes the address of a MySQL database table as a parameter and returns - in an XMLified form - the last blog entries of the addressed blog. package AxKit::XSP::Blog; use Apache::AxKit::Language::XSP::SimpleTaglib; $NS = 'http://ns.it.bond.edu.au/xsp/blog/v1';Since writing taglibs can be tricky, we made use of a wrapper, SimpleTagLib above. Note that in this process we specified the XML namespace within the Perl package. In the same package we implement the function blog as
sub blog : attribOrChild(youngest) ..... {
# fetch the latest blog entries from the database
# build a hash structure which mirrors the result XML structure
# return this, SimpleTaglib will create the XML data for us
}The gain for this seemingly cumbersome procedure is that content structure and content semantics are cleanly separated. This drastically reduces the costs for evolving a site over a prolonged period of time. Apache mod_perlAccording to the Apache request cycle requests undergo particular phases, some of which are handled by specific, installed Apache modules, one of them being mod_perl. It itself allows to install Perl-based request handlers which are usually activated in the content generation phase. Perl-based infrastructures, such as Mason and AxKit use exactly this mechanism. To ensure that AxKit handles requests for XSP documents you would include in the configuration file a set of directives like that: <Files *.xsp> PerlHandler AxKit ... </Files> In other parts of the server we preferred to use HTML::Mason only. In some cases, though, we found it most convenient to first let our Mason components deal with the data handling with the backend. Only when all content is collected, we generate the final output using AxKit. To stack both systems we simply use the Perl module Apache::Filter: PerlModule Apache::Filter <Files *.xsp> PerlHandler HTML::Mason AxKit ... </Files> Perl TM PackageTo parse and access TM information we have been building appropriate Perl packages (Perl XTM, available on CPAN). The current, official implementation is quite dated and follows closely the data structure as insinuated by the XML syntax of Topic Maps (XTM). This has led to a bulky implementation and is also mostly responsible for the sluggish performance. To load a topic map it can be 'tied' with an external resource which carries the content: # reading a topic map description from an XML file use XTM::XML; my $tm = new XTM (tie => new XTM::XML (file => 'mymap.xtm')); From then onwards methods for the $tm object can be used to extract topics and associations based in their identifier and using a simplified query language (like "where basename matches the regular expression") or via high-level methods which find the complete neighborhood of a given topic:
$vortex = $tm->induced_vortex ('some-topic-id',
{
't_types' => [ 't_types' ],
't_instances' => [ 't_instances' ],
'a_instances' => [ 'a_instances', 0, 20 ],
'topic' => [ 'topic' ],
'roles' => [ 'role', 0, 10 ],
'members' => [ 'member' ],
},
);
The data structure returned contains all types of this topic, all instances (be they topics or associations),
the topic itself with all properties and all associations where the topic is a role or plays a role. The Mason
component responsible for displaying topic information only has to render this data structure into HTML.
A major design problem appeared with the introduction of topic maps views. As detailed in [BaZaRevisit] we needed additional data structures to hold sequencing and rendering information. Unfortunately, this information is tightly connected with various subcomponents of the associated map, such as "that occurrence of that topic must be on position 4 and must be in italic". At that point we had no vision how to cleanly separate the map from the view and decided to piggyback the view information onto the existing topic map data structures.
|
|||||||||||||||