XPath
Encyclopedia : X : XP : XPA : XPath
XPath (XML Path Language) is a terse (non-XML) syntax for addressing portions of an XML document.
Originally motivated by a desire to provide a common syntax and behavior model between XPointer and XSL, XPath has rapidly been adopted by developers as a small query language.
Notation
The most common kind of XPath expression (and the one which gave the language its name) is a path expression. A path expression is written as a sequence of steps to get from one XML node (the current 'context node') to another node or set of nodes. The steps are separated by "/" (i.e. path) characters. Each step has three components:
- Axis Specifier
- Node Test
- Predicate
Abbreviated syntax
The compact notation allows many defaults and abbreviations for common cases. The simplest XPath takes a form such as/A/B/C
More complex expressions can be constructed by specifying an axis other than the default 'child' axis, a node test other than a simple name, or predicates, which can be written in square brackets after any step. For example, the expression
A//B/*[1]
[1]'), whatever its name ('*'), that is a child ('/') of a B element that itself is a child or other, deeper descendant ('//') of an A element that is a child of the current context node (the expression does not begin with a '/').Expanded syntax
In the full, unabbreviated syntax, the two examples above would be written/child::A/child::B/child::Cchild::A/descendant-or-self::node()/child::B/child::*[1]
child or descendant-or-self) is explicitly specified, followed by :: and then the node test, such as A or node() in the examples above.Axis specifiers
The Axis Specifier indicates navigation direction within the tree representation of the XML document. The axes available, in the full and then the abbreviated syntax, are:- child
- default, does not need specifying in abbreviated syntax
- attribute
@- descendant
- not available in abbreviated syntax
- descendant-or-self
//- parent
..i.e. dot-dot- ancestor
- not available in abbreviated syntax
- ancestor-or-self
- not available in abbreviated syntax
- following
- not available in abbreviated syntax
- preceding
- not available in abbreviated syntax
- following-sibling
- not available in abbreviated syntax
- preceding-sibling
- not available in abbreviated syntax
- self
.i.e. dot- namespace
- not available in abbreviated syntax
//a/@href selects an attribute called href in an a element anywhere in the document tree. The self axis is most commonly used within a predicate to refer to the currently selected node. For example, h3[.='See also'] selects an element called h3 in the current context, whose text content is See also.Node tests
Node tests may consist of specific node names or more general expressions. In the case of an XML document in which the namespace prefixgs has been defined, //gs:enquiry will find all the enquiry nodes in that namespace. Other node test formats are:
- comment()
- finds an XML comment node, e.g.
- text()
- finds a node of type text, e.g. the
helloinhello - processing-instruction()
- finds XML processing instructions such as
. In this case,processing-instruction('php')would match. - node()
- finds any node at all.
Predicates
Expressions of any complexity can be specified in square brackets, that must be satisfied before the preceding node will be matched by an XPath. Examples include//a[@href='help.php'], which will match an a element with an href attribute whose value is help.php. There is no limit to the number of predicates in a step, and they need not be confined to the last step in an XPath. They can also be nested to any depth. Paths specified in predicates begin at the context of the current step (i.e. that of the immediately preceding node test) and do not alter that context.
//a[@href='help.php'][../../div/@class='header']/@target will select the value of the target attribute of an a element, provided the a element has an href attribute whose value is help.php, and provided the a element has a parent div element that itself has a class attribute of value header.
Functions and operators
XPath 1.0 defines four data types: node-sets (sets of nodes with no intrinsic order), strings, numbers and booleans.The available operators are:
- The "/", "//" and "[...]" operators, used in path expressions, as described above.
- A union operator, "|", which forms the union of two node-sets.
- Boolean operators "and" and "or", and a function "not()"
- Arithmetic operators "+", "-", "*", "div" (divide), and "mod"
- Comparison operators "=", "!=", "<", ">", "<=", ">="
- Functions to manipulate strings: concat(), substring(), contains(), substring-before(), substring-after(), translate(), normalize-space(), string-length()
- Functions to manipulate numbers: sum(), round(), floor(), ceiling()
- Functions to get properties of nodes: name(), local-name(), namespace-uri()
- Functions to get information about the processing context: position(), last()
- Type conversion functions: string(), number(), boolean()
Node set functions
- position()
- returns a number representing the position of this node compared to its siblings in set matched by the XPath to this point.
- count(node-set)
- returns the number of nodes matched by the XPath in its argument.
String functions
- string(object?)
- converts any of the four XPath data types into a string according to built-in rules. The argument may be an XPath, in which case the matched node(s) are converted into the returned string.
- concat(string, string, string*)
- concatenates any number of strings
- contains(s1, s2)
- returns
trueifs1containss2 - normalize-space(string?)
- all leading and trailing whitespace is removed and any sequences of whitespace characters are replaced by a single space. This is very useful when the original XML may have been prettyprint formatted, which could make further string processing unreliable.
Boolean functions
- not(boolean)
- negates any boolean expression.
Number functions
- sum(node-set)
- converts the string values of all the nodes found by the XPath argument into numbers, according to the built-in casting rules, then returns the sum of these numbers.
=, !=, <=, <, >= and > . Boolean expressions may be combined with brackets () and the boolean operators and and or as well as the not() function described above. Numeric calculations can use *, +, -, div and mod . Strings can consist of any Unicode characters.Inside or outside of predicates, entire node-sets can be combined ('unioned') using the pipe character |.
v[x or y] | w[z] will return a single node-set consisting of all the v elements that have x or y child-elements, as well as all the w elements that have z child-elements, that were found in the current context.
//item[@price > 2*@discount] selects items whose price attribute is at least twice the numeric value of the discount attribute.
XPath 2.0
XPath 1.0 was published as a W3C Recommendation on November 16 1999. XPath 2.0 is in the final stages of the W3C approval process. XPath 2.0 represents a significant increase in the size and capability of the XPath language.
The most notable change is that XPath 2.0 has a much richer type system; XPath 2.0 supports atomic types, defined as built-in types in XML Schema, and may also import user-defined types from a schema. Every value is now a sequence (a single atomic value or node is regarded as a sequence of length one). XPath 1.0 node-sets are replaced by node sequences, which may be in any order.
To support richer type sets, XPath 2.0 offers a greatly expanded set of functions and operators.
XPath 2.0 is in fact a subset of XQuery 1.0. It offers a for expression which is cut-down version of the "FLWOR" expressions in XQuery. It is possible to describe the language by listing the parts of XQuery that it leaves out: the main examples are the query prolog, element and attribute constructors, the remainder of the "FLWOR" syntax, and the typeswitch expression.
See also
External links
Implementations
- ; Implementations for Database Engines
- : OpenLink Virtuoso
From Wikipedia, the Free Encyclopedia. Original article here. Support Wikipedia by contributing or donating.
All text is available under the terms of the GNU Free Documentation License See Wikipedia Copyrights for details.
