Table of Contents
XSH acts as a command interpreter. Individual commands must be separated with a semicolon. Each command may be followed by a pipeline redirection to capture the command's output. In the interactive shell, backslash may be used at the end of line to indicate that the command follows on the next line.
A pipeline redirections may be used either to feed the command's output to a unix command or to store it in a XSH string variable.
In the first case, the syntax is xsh-command | shell-command ; where xsh-command is any XSH command and shell-command is any command (or code) recognized by the default shell interpreter of the operating system (i.e. on UNIX systems by sh or csh, on Windows systems by cmd). Brackets may be used to join more shell commands (may depend on which shell is used).
Example 1. Count attributes of words containing string foo in its name or value.
xsh> ls //words/@* | grep foo | wc
In order to store a command's output in a string variable, the pipeline redirection must take the form xsh-command |> $variable where xsh-command is any XSH command and $variable is any valid name for a string variable.
help command gives a list of all XSH commands.
help type gives a list of all argument types.
help followed by a command or type name gives more information on the particular command or argument type.
XSH is intended to query and manipulate XML and HTML documents. Use one of the open/open-*/create commands to load an XML or HTML document from a local file, external URL (such as http:// or ftp://), string or pipe. While loading, XSH parses and optionally validates (see validation and load-ext-dtd) the document. Parsed documents are stored in memory as DOM trees, that can be navigated and manipulated quite similarly to a local filesystem.
Every opened document is associated with an identifier (id), that is a symbolic name for the document in XSH and can be used for example as a prefix of xpath.
In the current version, XSH is only able to save documents locally. To store a document on any other location, use ls command and pipe redirection to feed the XML representation of the document to any external program that is able to store it on a remote location.
Example 3. Store XSH document DOC on a remote machine using Secure Shell
xsh> ls DOC:/ | ssh my.remote.org 'cat > test.xml'
turn on backup file creation
use a catalog file during all parsing processes
clone a given document
close document (do not save it, though)
make a new document from a given XML fragment
display a list of open files
file name
identifier
turn off backup file creation
load an XML, HTML, or Docbook SGML document from a file, pipe or URI
load and insert XInclude sections
save a document as XML or HTML
make a given document the current one
process selected elements from an XML stream (EXPERIMENTAL)
set on/off changing current document to newly open/created files
With XSH, it is possible to browse document trees as if they were a local filesystem, except that XPath expressions are used instead of ordinary UNIX paths.
Current position in the document tree is called the current node. Current node's XPath may be queried with pwd command. In the interactive shell, current node is also displayed in the command line prompt. Remember, that beside cd command, current node (and document) is silently changed by all variant of open command, create command and temporarily also by the node-list variant of the foreach statement.
Documents are specified in a similar way as harddrives on DOS/Windows(TM) systems (except that their names are not limitted to one letter in XSH), i.e. by a prefix of the form doc: where doc is the id associated with the document.
To mimic the filesystem navigation as closely as possible, XSH contains several commands named by analogy of UNIX filesystem commands, such as cd, ls and pwd.
xsh scratch:/> open docA="testA.xml" xsh docB:/> open docB="testB.xml" xsh> pwd docB:/ xsh docB:/> cd docA:/article/chapter[title='Conclusion'] xsh docA:/article/chapter[5]> pwd docA:/article/chapter[5] xsh docA:/article/chapter[5]> cd previous-sibling::chapter xsh docA:/article/chapter[4]> cd .. xsh docA:/article> select docB xsh docB:/>
change current context node
mark elements to be folded by list command
show a given node location (as a cannonical XPath)
list a given part of a document as XML
show current context node location (as a cannonical XPath)
define XPath extension function (EXPERIMENTAL)
register namespace prefix to use XPath expressions
register XHTML namespace prefix to use XPath expressions
register XSH namespace prefix to use XPath expressions
make a given document the current one
unfold elements folded with fold command
undefine extension function (EXPERIMENTAL)
unregister namespace prefix
XPath expression
XSH provides mechanisms not only to browse and inspect the DOM tree but also to modify its content by providing commands for copying, moving, and deleting its nodes as well as adding completely new nodes or XML fragments to it. It is quite easy to learn these commands since their names or aliases mimic their well-known filesystem analogies. On the other hand, many of these commands have two versions one of which is prefixed with a letter "x". This "x" stands for "cross", thus e.g. xcopy should be read as "cross copy". Let's explain the difference on the example of xcopy.
When you copy, you have to specify what are you copying and where are you copying to, so you have to specify the source and the target. XSH is very much XPath-based so, XPath is used here to specify both of them. However, there might be more than one node that satisfies an XPath expression. So, the rule of thumb is that the "cross" variant of a command places one and every of the source nodes to the location of one and every destination node, while the plain variant works one-by-one, placing the first source node to the first destination, the second source node to the second destination, and so on (as long as there are both source nodes and destinations left).
xsh> create a "<X><A/><Y/><A/></X>"; xsh> create b "<X><B/><C/><B/><C/><B/></X>"; xsh> xcopy a://A replace b://B; xsh> copy b://C before a://A; xsh> ls a:/; <?xml version="1.0" encoding="utf-8"?> <X><C/><A/><Y/><C/><A/></X> xsh> ls b:/; <?xml version="1.0" encoding="utf-8"?> <X><A/><A/><C/><A/><A/><C/><A/><A/></X>
As already indicated by the example, another issue of tree modification is the way in which the destination node determines the target location. Should the source node be placed before, after, or into the resulting node? Should it replace it completely? This information has to be given in the location argument that usually precedes the destination XPath.
Now, what happens if source and destination nodes are of incompatible types? XSH tries to avoid this by implicitly converting between node types when necessary. For example, if a text, comment, and attribute node is copied into, before or after an attribute node, the original value of the attribute is replaced, prepended or appended respectively with the textual content of the source node. Note however, that element nodes are never converted into text, attribute or any other textual node. There are many combinations here, so try yourself and see the results.
You may even use some more sofisticated way to convert between node types, as shown in the following example, where an element is first commented out and than again uncommented. Note, that the particular approach used for resurrecting the commented XML material works only for well-balanced chunks of XML.
Example 4. Using string variables to convert between different types of nodes
create doc <<EOF;
<?xml version='1.0'?>
<book>
<chapter>
<title>Intro</title>
</chapter>
<chapter>
<title>Rest</title>
</chapter>
</book>
EOF
# comment out the first chapter
ls //chapter[1] |> $chapter_xml;
add comment $chapter_xml replace //chapter[1];
ls / 0;
# OUTPUT:
<?xml version="1.0"?>
<book>
<!-- <chapter>
<title>Intro</title>
</chapter>
-->
<chapter>
<title>Rest</title>
</chapter>
</book>
# un-comment the chapter
$comment = string(//comment()[1]);
add chunk $comment replace //comment()[1];
ls / 0;
# OUTPUT:
<?xml version="1.0"?>
<book>
<chapter>
<title>Intro</title>
</chapter>
<chapter>
<title>Rest</title>
</chapter>
</book>
clone a given document
copy nodes (in the one-to-one mode)
string-like expression
create a node in on a given target location
relative destination specification (such as after, before, etc.)
quickly modify node value/data using Perl code
move nodes (in the one-to-one mode)
node type specification (such as element, attribute, etc.)
normalizes adjacent textnodes
load and insert XInclude sections
remove given nodes
quickly rename nodes with in-line Perl code
set document's charset (encoding)
set document's standalone flag
strip leading and trailing whitespace
copy nodes (in the all-to-every mode)
create nodes on all target locations
move nodes (in the all-to-every mode)
XPath expression
transform document with XSLT
apply XUpdate commands on a document
What a scripting language XSH would be had it not some kind of conditional statements, loops and other stuff that influences the way in which XSH commands are processed.
Most notable XSH's feature in this area is that some of the basic flow control statements, namely if, unless, while and foreach have two variants, an XPath-based one and a Perl-based one. The XPath-based variant uses xpath expressions to specify the condition or node-lists to iterate, while the other one utilizes perl-code for this purpose. See descriptions of the individual statements for more detail.
call user-defined routine (macro)
single XSH command or a block of XSH commands
sub-routine (macro) declaration
exit XSH shell
loop iterating over a node-list or perl array
if statement
conditionally include another XSH source in current position
include another XSH source in current position
iterate a block over current subtree
immediately exit an enclosing loop
start the next iteration of an enclosing loop
restart an iteration on a previous node
restart the innermost enclosing loop block
return from a subroutine
switch into normal execution mode (quit test-mode)
process selected elements from an XML stream (EXPERIMENTAL)
do not execute any command, only check the syntax
throw an exception
try/catch statement
undefine sub-routine (macro)
negated if statement
simple while loop
Beside the possibility to browse the DOM tree and list some parts of it (as described in Navigation), XSH provides commands to obtain other information related to open documents as well as the XSH interpreter itself. These commands are listed bellow.
calculate a given XPath expression and enumerate node-lists
list all user-defined routines (macros)
displays various information about a document
display a list of open files
show document's DTD
show document's original character encoding
on-line documentation
show a given node location (as a cannonical XPath)
list a given part of a document as XML
List namespaces in current scope (or in scope of given nodes)
list current settings using XSH syntax
print given stuff on standard console output
show current context node location (as a cannonical XPath)
check if the document is valid (according to a DTD, RelaxNG, or XSD schemas)
validate a document against a DTD, RelaxNG, or XSD schemas
display a list of defined variables
show version information
XSH commands accept different types of arguments, such as usual strings (expression) or XPath expressions. Notably, these two types and types based on them support string variable interpolation. See documentation of the individual types for more information.
single XSH command or a block of XSH commands
character encoding (codepage) identifier
string-like expression
file name
identifier
relative destination specification (such as after, before, etc.)
node type specification (such as element, attribute, etc.)
in-line code in Perl programming language
XPath expression
In the current version, XSH supports two types of variables: string (scalar) variables and node-list variables. Perl programmers that might miss some other kinds of variables (arrays or hashes) may use the support for interacting with Perl to access these types (see some examples below).
These two kinds of variables differ syntactically in the prefix: string variables are prefixed with a dollar sign ($) while node-list variables are prefixed with a percent sign (%).
Every string variable name consists of a dollar sign ($) prefix and an id, that has to be unique among other scalar variables, e.g. $variable. Values are assigned to variables either by simple assignments of the form $variable = xpath or by capturing the output of some command with a variable redirection of the form command |> $variable.
String variables may be used in string expressions, XPath expressions, or even in perl-code as $id or ${id}. In the first two cases, variables act as macros in the sense that all variables occurences are replaced by the corresponding values before the expression itself is evaluated.
To display current value of a variable, use the print command, variables command or simply the variable name:
xsh> $b="chapter";
xsh> $file="${b}s.xml";
xsh> open f=$file;
xsh> ls //$b[count(descendant::para)>10]
xsh> print $b
chapter
xsh> $b
$b='chapter';
xsh> variables
$a='chapters.xml';
$b='chapter';
Every string variable name consists of a percent sign (%) prefix and an id, that has to be unique among other node-list variables, e.g. %variable.
Node-list variables can be used to store lists of nodes that result from evaluating an XPath. This is especially useful when several changes are performed on some set of nodes and evaluating the XPath expression repeatedly would take too long. Other important use is to remember a node that would otherwise be extremely hard or even impossible to locate by XPath expressions after some changes to the tree structure are made, since such an XPath cannot be predicted in advance.
Although node-list variables act just like XPath expressions that would result in the same node-list, for implementation reasons it is not possible to use node-list variables as parts of complex XPath expressions except for one case. They may be only used at the very beginning of an XPath expression. So while constructions such as %creatures[4], %creatures[@race='elf'], or %creatures/parents/father do work as expected, string(%creatures[2]/@name) //creature[%creatures[2]/@name=@name], or %creatures[@race='elf'][2] do not. In the first two cases it is because node-list variables cannot be evaluated in the middle of an XPath expression. The third case fails because this construction actually translates into a sequence of evaluations of self::*[@race='elf'][2] for each node in the %creatures node-list, which is not equivallent to the intended expression as the [2] filter does not apply to the whole result of %creatures[@race='elf'] at once but rather to the partial results.
Fortunatelly, it is usually possible to work around these unsupported constructions quite easily. This is typically done by introducing some more variables as well as using the foreach statement. The following example should provide some idea on how to do this:
# work around for $name=string(%creatures[2]/@name) xsh> foreach %creatures[2] $name=string(@name) # work around for ls //creature[%creatures[2]/@name=@name] xsh> ls //creature[$name=@name] # work around for ls %creatures[@race='elf'][2] xsh> %elves = %creatures[@race='elf'] xsh> ls %elves[2]
Remember, that when a node is deleted from a tree it is at the same time removed from all node-lists it occurs in. Note also, that unlike string variables, node-list variables can not be (and are not intended to be) directly accessed from Perl code.
All XSH string variables are usual Perl scalar variables from the XML::XSH::Map namespace, which is the default namespace for any Perl code evaluated from XSH. Thus it is possible to arbitrarily intermix XSH and Perl assignments:
xsh> ls //chapter[1]/title
<title>Introduction</title>
xsh> $a=string(//chapter[1]/title)
xsh> eval { $b="CHAPTER 1: ".uc($a); }
xsh> print $b
CHAPTER 1: INTRODUCTION
If needed, it is, however, possible to use any other type of Perl variables by means of evaluating a corresponding perl code. The following example demonstrates using Perl hashes to collect and print some simple racial statistics about the population of Middle-Earth:
foreach a:/middle-earth/creature {
$race=string(@race);
eval { $races{$race}++ };
}
print "Middle-Earth Population (race/number of creatures)"
eval {
echo map "$_/$races{$_}\n",
sort ($a cmp $b), keys(%races);
};
variable assignment
string-like expression
identifier
temporarily assign new value to a variable
XPath expression
The following commands are used to modify the default behaviour of the XML parser or XSH itself. Some of the commands are switch between two different modes according to a given expression (which is expected to result either in zero or non-zero value). Other commands also working as a flip-flop have their own explicit counterpart (e.g. verbose and quiet or debug and nodebug). This misconsistency is due to historical reasons.
The encoding and query-encoding options allow to specify character encoding that should be expected from user as well as the encoding to be used by XSH on output. This is particularly useful when you work with UTF-8 encoded documents on a console which supports only 8-bit characters.
The options command displays current settings by means of XSH commands. Thus it can not only be used to review current values, but also to store them future use, e.g. in ~/.xshrc file.
xsh> options | cat > ~/.xshrc
turn on backup file creation
display many annoying debugging messages
turn on/off serialization of empty tags
character encoding (codepage) identifier
choose output charset
turn on/off pretty-printing
turn on/off ignorable whitespace preservation
turn on/off external DTD fetching
turn off backup file creation
turn off debugging messages
list current settings using XSH syntax
turn on/off parser's ability to fill default attribute values
turn on/off parser's tendency to expand entities
turn on/off transparent XInclude insertion by parser
make the parser more pedantic
declare the charset of XSH source files and terminal input
turn off many XSH messages
turn on/off parser's ability to fix broken XML
define XPath extension function (EXPERIMENTAL)
register namespace prefix to use XPath expressions
register XHTML namespace prefix to use XPath expressions
register XSH namespace prefix to use XPath expressions
switch into normal execution mode (quit test-mode)
turn on/off serialization of DTD DOCTYPE declaration
set on/off changing current document to newly open/created files
do not execute any command, only check the syntax
undefine extension function (EXPERIMENTAL)
unregister namespace prefix
turn on/off validation in XML parser
make XSH print many messages
sets TAB completion for axes in xpath expressions in the interactive mode
turn on/off TAB completion for xpath expressions in the interactive mode
To allow more complex tasks to be achieved, XSH provides ways for interaction with the Perl programming language and the system shell.
Perl is a language optimized for scanning arbitrary text files, extracting information from those text files, and printing reports based on that information. It's also a good language for many system management tasks. The language is intended to be practical (easy to use, efficient, and complete). XSH itself is written in Perl, so it is extremely easy to support this language as an extension to XSH.
Perl expressions or blocks of code can either be simply evaluated with the perl command, used to do quick changes to nodes of the DOM tree (see map command), used to provide list of strings to iterate over in a foreach loop, or to specify more complex conditions for if, unless, and while statements.
To prevent conflict between XSH internals and the evaluated perl code, XSH runs such code in the context of a special namespace XML::XSH::Map. As described in the section Variables, XSH string variables may be accessed and possibly assigned from Perl code in the most obvious way, since they actually are Perl variables defined in the XML::XSH::Map namespace.
The interaction between XSH and Perl actually works also the other way round, so that you may call back XSH from the evaluated Perl code. For this, Perl function xsh is defined in the XML::XSH::Map namespace. All parameters passed to this function are interpreted as XSH commands. To simplify evaluation of XPath expressions, another three functions: The first one, named count, returns the same value as would be printed by count command in XSH on the same XPath expression. The second function, named literal, returns the result of XPath evaluation as if the whole expression was wrapped with the XPath string() function. In other words, literal('doc:expression') returns the same value as count('doc:string(expression)'). The third function, named xml_list, returns the result of the XPath search as a XML string which is equivallent to the output of a ls on the same XPath expression (without indentation and without folding or any other limitation on the depth of the listing).
In the following examples we use Perl to populate the Middle-Earth with Hobbits whose names are read from a text file called hobbits.txt, unless there are some Hobbits in Middle-Earth already.
XSH allows the user to extend the set of XPath functions by providing an extension function written in Perl. This can be achieved using the register-function command. The perl code implementing an extension function works as a usual perl routine accepting its arguments in @_ and returning the result. The following conventions are used:
The arguments passed to the perl implementation by the XPath engine are either simple scalars or XML::LibXML::NodeList objects, depending on the types of the XPath arguments. The implementation is responsible for checking the argument number and types. The implementation may use arbitrary XML::LibXML methods to process the arguments and return the result. (XML::LibXML perl module documentation can be found for example at http://search.cpan.org/author/PHISH/XML-LibXML-1.54/LibXML.pm).
The implementation SHOULD NOT, however, MODIFY the document. Doing so could not only confuse the XPath engine but result in an critical error (such as segmentation fault).
Calling XSH commands from extension function implementations is not currently allowed.
The perl code must return a single value, which can be of one of the following types: a simple scalar (a number or string), XML::LibXML::Boolean object reference (result is a boolean value), XML::LibXML::Literal object reference (result is a string), XML::LibXML::Number object reference (resulat is a float), XML::LibXML::Node (or derived) object reference (result is a nodeset consisting of a single node), or XML::LibXML::NodeList (result is a nodeset). For convenience, simple (non-blessed) array references consisting of XML::LibXML::Node objects can also be used for a nodeset result instead of a XML::LibXML::NodeList.
In the interactive mode, XSH interprets all lines starting with a exclamation mark (!) as shell commands and invokes the system shell to interpret them (this is to mimic FTP command-line interpreters).
xsh> !ls -l -rw-rw-r-- 1 pajas pajas 6355 Mar 14 17:08 Artistic drwxrwxr-x 2 pajas users 128 Sep 1 10:09 CVS -rw-r--r-- 1 pajas pajas 14859 Aug 26 15:19 ChangeLog -rw-r--r-- 1 pajas pajas 2220 Mar 14 17:03 INSTALL -rw-r--r-- 1 pajas pajas 18009 Jul 15 17:35 LICENSE -rw-rw-r-- 1 pajas pajas 417 May 9 15:16 MANIFEST -rw-rw-r-- 1 pajas pajas 126 May 9 15:16 MANIFEST.SKIP -rw-r--r-- 1 pajas pajas 20424 Sep 1 11:04 Makefile -rw-r--r-- 1 pajas pajas 914 Aug 26 14:32 Makefile.PL -rw-r--r-- 1 pajas pajas 1910 Mar 14 17:17 README -rw-r--r-- 1 pajas pajas 438 Aug 27 13:51 TODO drwxrwxr-x 5 pajas users 120 Jun 15 10:35 blib drwxrwxr-x 3 pajas users 1160 Sep 1 10:09 examples drwxrwxr-x 4 pajas users 96 Jun 15 10:35 lib -rw-rw-r-- 1 pajas pajas 0 Sep 1 16:23 pm_to_blib drwxrwxr-x 4 pajas users 584 Sep 1 21:18 src drwxrwxr-x 3 pajas users 136 Sep 1 10:09 t -rw-rw-r-- 1 pajas pajas 50 Jun 16 00:06 test drwxrwxr-x 3 pajas users 496 Sep 1 20:18 tools -rwxr-xr-x 1 pajas pajas 5104 Aug 30 17:08 xsh
To invoke a system shell command or program from the non-interactive mode or from a complex XSH construction, use the exec command.
Since UNIX shell commands are very powerful tool for processing textual data, XSH supports direct redirection of XSH commands output to system shell command. This is very similarly to the redirection known from UNIX shells, except that here, of course, the first command in the pipe-line colone is an XSH command. Since semicolon (;) is used in XSH to separate commands, it has to be prefixed with a backslash if it should be used for other purposes.
In the first two cases (where dollar sign appears) store the result of evaluation of the xpath in a variable named $id. In this case, xpath is evaluated in a simmilar way as in the case of the count: if it results in a literal value this value is used. If it results in a node-list, number of nodes occuring in that node-list is used. Use the string() XPath function to obtain a literal values in these cases.
Example 10. Arithmetic expressions
xsh> $a=5*100 xsh> $a $a=500 xsh> $a=(($a+5) div 10) xsh> $a $a=50.5
Example 11. Counting nodes
xsh> $a=//chapter xsh> $a $a=10 xsh> %chapters=//chapter xsh> $a=%chapters xsh> $a $a=10
Example 12. Some caveats of counting node-lists
xsh> ls ./creature <creature race='hobbit' name="Bilbo"/> ## WRONG (@name results in a singleton node-list) !!! xsh> $name=@name xsh> $name $name=1 ## CORRECT (use string() function) xsh> $name=string(@name) xsh> $name $name=Bilbo
In the other two cases (where percent sign appears) find all nodes matching a given xpath and store the resulting node-list in the variable named %id. The variable may be later used instead of an XPath expression.
Enable creating backup files on save (default).
This command is equivalent to setting the $BACKUPS variable to 1.
call id [xpath | expression]*
Call an XSH subroutine named id previously created using def. If the subroutine requires some paramters, these have to be specified after the id. Node-list parameters are given by means of xpath expressions. String parameters have to be string expressions.
catalog expression
cd [xpath]
Change current context node (and current document) to the first node matching a given xpath argument.
close [id]
Copies nodes matching the first xpath to the destinations determined by the location directive relative to the second xpath. If more than one node matches the first xpath than it is copied to the position relative to the corresponding node matched by the second xpath according to the order in which are nodes matched. Thus, the n'th node matching the first xpath is copied to the location relative to the n'th node matching the second xpath.
The possible values for location are: after, before, into, replace and cause copying the source nodes after, before, into (as the last child-node). the destination nodes. If replace location is used, the source node is copied before the destination node and the destination node is removed.
Some kind of type conversion is used when the types of the source and destination nodes are not equal. Thus, text, cdata, comment or processing instruction node data prepend, append or replace value of a destination attribute when copied before,after/into or instead (replace) an attribute, and vice versa.
Attributes may be copied after, before or into some other attribute to append, prepend or replace the destination attribute value. They may also replace the destination attribute completely (both its name and value). To copy an attribute from one element to another, simply copy the attribute node into the destination element.
Elements may be copied into other elements (which results in appending the child-list of the destination element), or before, after or instead (replace) other nodes of any type except attributes.
count xpath
create id expression
Create a new document using expression to form the root element and associate it with a given identifier.
xsh> create t1 root xsh> ls / <?xml version="1.0" encoding="utf-8"?> <root/> xsh> create t2 "<root id='r0'>Just a <b>test</b></root>" xsh> ls / <?xml version="1.0" encoding="utf-8"?> <root id='r0'>Just a <b>test</b></root> xsh> files scratch = new_document.xml t1 = new_document1.xml t2 = new_document2.xml
Define a new XSH subroutine named id. The subroutine may require zero or more parameters of nodelist or string type. These are declared as a whitespace-separated list of (so called) parametric variables (of nodelist or string type). The body of the subroutine is specified as a command-block. Note, that all subroutine declarations are processed during the parsing and not at run-time, so it does not matter where the subroutine is defined.
The routine can be later invoked using the call command followed by the routine name and parameters. Nodelist parameters must be given as an XPath expressions, and are evaluated just before the subroutine's body is executed. String parameters must be given as (string) expressions. Resulting node-lists/strings are stored into the parametric variables before the body is executed. These variables are local to the subroutine's call tree (see also the local command). If there is a global variable using the same name as some parametric variable, the original value of the global variable is replaced with the value of the parametric variable for the time of the subroutine's run-time.
Note that subroutine has to be declared before it is called with call. If you cannot do so, e.g. if you want to call a subroutine recursively, you have to pre-declare the subroutine using a def with no command-block. There may be only one full declaration (and possibly one pre-declaration) of a subroutine for one id and the declaration and pre-declaration has to define the same number of arguments and their types must match.
def l3 %v {
ls %v 3; # list given nodes upto depth 3
}
call l3 //chapter;
Example 14. Commenting and un-commenting pieces of document
def comment
%n # nodes to move to comments
$mark # maybe some handy mark to recognize such comments
{
foreach %n {
if ( . = ../@* ) {
echo "Warning: attribute nodes are not supported!";
} else {
echo "Commenting out:";
ls .;
local $node = "";
ls . |> $node;
add comment "$mark$node" replace .;
}
}
}
def uncomment %n $mark {
foreach %n {
if (. = ../comment()) { # is this node a comment node
local $string = substring-after(.,"$mark");
add chunk $string replace .;
} else {
echo "Warning: Ignoring non-comment node:";
ls . 0;
}
}
}
# comment out all chapters with no paragraphs
call comment //chapter[not(para)] "COMMENT-NOPARA";
# uncomment all comments (may not always be valid!)
$mark="COMMENT-NOPARA";
call uncomment //comment()[starts-with(.,"$mark")] $mark;
doc-info [expression]
In the present implementation, this command displays information provided in the <?xml ...?> declaration of a document: version, encoding, standalone, plus information about level of gzip compression of the original XML file.
dtd [id]
Print external or internal DTD for a given document. If no document identifier is given, the current document is used.
empty-tags expression
If the value of expression is 1 (non-zero), empty tags are serialized as a start-tag/end-tag pair (<foo></foo>). This option affects both ls and save and possibly other commands. Otherwise, they are compacted into a short-tag form (<foo/>). Default value is 0.
This command is equivalent to setting the $EMPTY_TAGS variable.
enc [id]
Print the original document encoding string. If no document identifier is given, the current document is used.
encoding enc-string
exec expression [expression ...]
execute the system command(s) in expressions.
exit [expression]
fold xpath [expression]
This feature is still EXPERIMENTAL! Fold command may be used to mark elements matching the xpath with a xsh:fold attribute from the http://xsh.sourceforge.net/xsh/ namespace. When listing the DOM tree using ls xpath fold, elements marked in this way are folded to the depth given by the expression (default depth is 0 = fold immediately).
xsh> fold //chapter 1 xsh> ls //chapter[1] fold <chapter id="intro" xsh:fold="1"> <title>...</title> <para>...</para> <para>...</para> </chapter>
foreach xpath|perl-code command|command-block
If the first argument is an xpath expression, execute the command-block for each node matching the expression making it temporarily the current node, so that all relative XPath expressions are evaluated in its context.
If the first argument is a perl-code, it is evaluated and the resulting perl-list is iterated setting the variable $__ (note that there are two underscores!) to be each element of the list in turn. It works much like perl's foreach, except that the variable used consists of two underscores.
help command|argument-type
if xpath|perl-code command-block [ elsif command-block ]* [ else command-block ]
Execute command-block if a given xpath or perl-code expression evaluates to a non-emtpty node-list, true boolean-value, non-zero number or non-empty literal. If the first test fails, check all possibly following elsif conditions and execute the corresponding command-block for the first one of them which is true. If none of them succeeds, execute the else command-block (if any).
Example 18. Display node type
def node_type %n {
foreach (%n) {
if ( . = self::* ) { # XPath trick to check if . is an element
echo 'element';
} elsif ( . = ../@* ) { # XPath trick to check if . is an attribute
echo 'attribute';
} elsif ( . = ../processing-instruction() ) {
echo 'pi';
} elsif ( . = ../text() ) {
echo 'text';
} elsif ( . = ../comment() ) {
echo 'comment'
} else { # well, this should not happen, but anyway, ...
echo 'unknown-type';
}
}
}
ifinclude filename
Include a file named filename and execute all XSH commands therein unless the file was already included using either include of ifinclude.
indent expression
If the value of expression is 1, format the XML output while saving a document by adding some nice ignorable whitespace. If the value is 2 (or higher), XSH will act as in case of 1, plus it will add a leading and a trailing linebreak to each text node.
Note, that since the underlying C library (libxml2) uses a hardcoded indentation of 2 space characters per indentation level, the amount of whitespace used for indentation can not be altered on runtime.
This command is equivalent to setting the $INDENT variable.
insert node-type expression [namespace expression] location xpath
iterate xpath command-block
Iterate works very much like the XPath variant of foreach, except that iterate evaluates the command-block as soon as a new node matching a given xpath is found. As a limitation, the xpath expresion used with iterate may only consist of one XPath step, i.e. it cannot contain an XPath step separator /.
What are the benefits of iterate over a foreach loop, then? Well, under some circumstances it is efficiency, under other there are none. To clarify this, we have to dive a bit deeper into the details of XPath implementation. By definition, the node-list resulting from evaluation of an XPath has to be ordered in the canonical document order. That means that an XPath implementation must contain some kind of a sorting algorithm. This would not itself be much trouble if a relative document order of two nodes of a DOM tree could be determined in a constant time. Unfortunately, the libxml2 library, used behind XSH, does not implement mechanisms that would allow this complexity restriction (which is, however, quite natural and reasonable approach if all the consequences are considered). Thus, when comparing two nodes, libxml2 traverses the tree to find their nearest common ancestor and at that point determines the relative order of the two subtrees by trying to seek one of them in a list of right siblings of the other. This of course cannot be handled in a constant time. As a result, the sorting algorithm, reasonably efficient for a constant time comparison (polynomial of a degree < 1.5) or small node-lists, becomes rather unusable for huge node-lists with linear time comparison (still polynomial but of a degree > 2).
The iterate command provides a way to avoid sorting the resulting nodelist by limiting allowed XPath expression to one step (and thus one axis) at a time. On the other hand, since iterate is implemented in Perl, a proxy object glueing the C and Perl layers has to be created for every node the iterator passes by. This (plus some extra subroutine calls) makes it about two to three times slower compared to a similar tree-traversing algorithm used by libxml2 itself during XPath evaluation.
Our experience shows that iterate beats foreach in performance on large node-lists (>=1500 nodes, but your milage may vary) while foreach wins on smaller node-lists.
The following two examples give equivallent results. However, the one using iterate may be faster esp. if the number of nodes being counted is very large.
Example 19. Count inhabitants of the kingdom of Rohan in productive age
cd rohan/inhabitants;
iterate child::*[@age>=18 and @age<60] { perl $productive++ };echo "$productive inhabitants in productive age";
Example 20. Using XPath
$productive=count(rohan/inhabitants/*[@age>=18 and @age<60]);
echo "$productive inhabitants in productive age";
Use e.g. | time cut pipe-line redirection to benchmark a XSH command on a UNIX system.
keep_blanks expression
last [expression]
The last command is like the break statement in C (as used in loops); it immediately exits an enclosing loop. The optional expression argument may evaluate to a positive integer number that indicates which level of the nested loops to quit. If this argument is omitted, it defaults to 1, i.e. the innermost loop.
Using this command outside a subroutine causes an immediate run-time error.
lcd expression
Changes the filesystem working directory to expression, if possible. If expression is omitted, changes to the directory specified in HOME environment variable, if set; if not, changes to the directory specified by LOGDIR environment variable.
load_ext_dtd expression
This command acts in a very similar way as assign does, except that the variable assignment is done temporarily and lasts only for the rest of the nearest enclosing command-block. At the end of the enclosing block or subroutine the original value is restored. This command may also be used without the assignment part and assignments may be done later using the usual assign command.
Note, that the variable itself is not lexically is still global in the sense that it is still visible to any subroutine called subsequently from within the same block. A local just gives temporary values to global (meaning package) variables. Unlike Perl's my declarations it does not create a local variable. This is known as dynamic scoping. Lexical scoping is not implemented in XSH.
To sum up for Perl programmers: local in XSH works exactly the same as local in Perl.
ls xpath [expression]
List the XML representation of all nodes matching xpath. The optional expression argument may be provided to specify the depth of XML tree listing. If negative, the tree will be listed to unlimited depth. If the expression results in the word fold, elements marked with the fold command are folded, i.e. listed only to a certain depth (this feature is still EXPERIMENTAL!).
Unless in quiet mode, this command prints also number of nodes matched on stderr.
If the xpath parameter is omitted, current context node is listed to the depth of 1.
This command provides an easy way to modify node's data (content) using arbitrary Perl code.
Each of the nodes matching xpath is passes its data to the perl-code via the $_ variable and receives the (possibly) modified data using the same variable.
Since element nodes do not really have any proper content (they are only a storage for other nodes), node's name (tag) is used in case of elements. Note, however, that recent versions of XSH provide a special command rename with a very similar syntax to map, that should be used for renaming element, attribute, and processing instruction nodes.
move command acts exactly like copy, except that it removes the source nodes after a succesfull copy. Remember that the moved nodes are actually different nodes from the original ones (which may not be obvious when moving nodes within a single document into locations that do not require type conversion). So, after the move, the original nodes do not exist neither in the document itself nor any nodelist variable.
See copy for more details on how the copies of the moved nodes are created.
namespaces [xpath]
next [expression]
The next command is like the continue statement in C; it starts the next iteration of an enclosing loop. The optional expression argument may evaluate to a positive integer number that indicates which level of the nested loops should be restarted. If omitted, it defaults to 1, i.e. the innermost loop.
Using this command outside a loop causes an immediate run-time error.
Disable creating backup files on save.
This command is equivalent to setting the $BACKUPS variable to 0.
normalize xpath
normalize puts all text nodes in the full depth of the sub-tree underneath each node selected by a given xpath, into a "normal" form where only structure (e.g., elements, comments, processing instructions, CDATA sections, and entity references) separates text nodes, i.e., there are neither adjacent Text nodes nor empty Text nodes.
[open [HTML|XML|DOCBOOK] [FILE|PIPE|STRING]] id=expression
Load a new XML, HTML or SGML DOCBOOK document from the file (actually arbitrary URL), command output or string provided by the expression. In XSH the document is given a symbolic name id. To identify the documentin commands like close, save, validate, dtd