The Translation FileThe file produced by &nxtranslate is entirely determined by the
contents of the translation file. This chapter discusses the format of
a translation file as well as listing "location strings" for the
external formats.OverviewTranslation files are written in xml and read using an xml
parser. For this reason they must be a valid xml
file. There are many places to find more information
about XML W3C is the
definitive standard while Tellme
Studio has a one page overview of what XML
is. This means that the following rules must be
adhered to
Every opening tag must have a corresponding closing
tag at the same level. This means that
]]> is
allowed while
]]> and
]]> are
not.Tags and attribute names are case sensitive. Therefore
]]> and
]]> are distinct tags. While this
can lead to confusion when writing a translation file it is easily
avoided in practice.Attribute values must be inside single (') or double (")
quotes.Tags and attribute names cannot start with a number or
special character. Another way of saying this is that the name must start
with a letter.Certain characters will break the parsing of the
xml. The characters, and how to create them are <
(), >
(), &
(), "
(), and &apos
().Empty tags,
]]>, can be replaced with a
single tag, ]]>. This convenience
will make more sense during the discussion of translation files when
specifying information outside of the file.There are some other rules to note about the translation
file. It is not simply a XML file, there are additional
constraints. However, the translation file is not directly validated
to follow these constraints, but failing to follow them will result in
the program exiting early without creating a NeXus file. Also,
NXtranslate is intended to be used to write any file readable by the
NeXus API, so the translation file is not validated against definition
files. This decision was made on the basis of
performance since it was determined that most of the time a "standard"
translation file will be used to convert a large number of
files. First some definitions used througout this
document.
Translation file definitions&napiAn abbreviation for the NeXus Abstract Program Interface.nodeA point in the
hierarchy, it can either contain other nodes (be a parent with
children) or not (a leaf node). Any pair of opening an closing tags
represents a single node.groupA node that contains
other nodes.fieldA node that does not
contain other nodes (a leaf node). In other places in NeXus this is
sometimes refered to as a "data" or a
"SDS".retrieverAn
object whose job is to retrieve information from a source external to
the translation file. Which retriever is created is determined by the
value of &mime-type . The retriever is initialized using the value of
&source . Information is produced by the retriever using the &location
.special attributeAn
attribute that is interpreted by &nxtranslate as a command to deal
with external information. The special attributes are &mime-type , &source ,
&location , and &make-link .&mime-typeA keyword that
denotes what library to use to retrieve information from an external
source. It can be a valid mime type.&sourceA string denoting
what a retriever should use to initialize itself. This is generally a
file on the local system for the retriever to
open. &locationA string passed to
the retriever for it to generate data from. For example, when using
the NeXus retriever this is a path to a particular node in the file
which will be written out to the resulting NeXus
file.&link-tagThis denotes a
node that is a link to another node in the file. It must have a
&make-link attribute. All other attributes will be
ignored&make-linkThe attribute
denoting what a &link-tag node should be linked to. The syntax for
describing location is the same as for the NeXus
retriever. If this attribute appears in a node other than
&link-tag it will be treated as a normal
attribute.primative typeAny of the
following types (ignoring bit-length): NX_UINT
(unsigned integer), NX_INT (signed integer),
NX_FLOAT (floating point number),
NX_CHAR (character),
NX_BOOLEAN (boolean, or true/false),
NX_BINARY (binary value). At the moment
NX_BOOLEAN and NX_BINARY
are not supported by &nxtranslate and the NeXus API supports only one
dimension arrays of NX_CHAR.
Now that the definitions have been presented the other constraints of a translation file can be explained.
The root node in a file will be
]]>. There will be nothing
before or after it, and only one of them. The NXroot can be used to
set global values for &mime-type and &source .Only groups can exist directly inside the
root. This is a constraint of the NeXus API.Every node (except the NXroot
and &link-tag ) needs a name and
type. If the node has a &location then the type
can be omitted since the retriever will provide it.Groups cannot have any attribute other than the
special ones. Fields can have any attribute. This reflects a
restriction in the NeXus API and does not constrain the contents of
resulting NeXus files in any way.Groups cannot have any data in them. In other words
things similar to 1 2 3
4]]> are incorrect.To specify the dimensions of a field, use square
brackets [] affter the type. A single precision floating point array
with five elements would have
type="NX_FLOAT32[5]". If the field has only one
element, or is a character array, the dimensions can be left off. For
character arrays, the dimensions are ignored.To specify the type of a attribute denote the
primative type separated from the value using square brackets. For
numeric types only scalars are allowed. If no type is specified it is
assumed to be a character array (length is determined
automatically).Simple TranslationWhile &nxtranslate is the anything to NeXus translator, it is
possible to have everything specfied in the translation file. shows a translation file where no
information will be taken from any other file.Simple translation file test_simple.xmlGeorge Usertext/plain
The data is a simple parabola, f(x)=x^2
0 1 2 3 4 5 6 7 8 9 10
0 1 4 9 16 25 36 49 64 81 100
George Usertext/plain
The data is a two dimensional parabola,
f(x,y)=x^2+y^2
1.0 4.7 2.3 1.6
3.3 6.2 9.2
11.89 32.98 16.18
13.45 39.44 60.53
43.73 41.00 85.64
106.73 89.93 87.20
]]>
This example follows all of the rules laid out in the previous
section and serves to introduce several of the features of the
translation file. First a style note though, in XML files there is a
concept of "ignorable whitespace". These are carriage returns
(), line feeds (), tabs (),
and spaces. These are ignored (as suggested by the term "ignorable
whitespace") and are present to aid those looking at the raw XML to
see the node hierarchy.The main purpose of is
to show how to specify information in a translation file. Line 4
demonstrates the method for strings. Here the
name is author and the
type is NX_CHAR. The
length of the character array is determined from the actual string
supplied rather than what is specified in the
type attribute. The value is created by reading
in the supplied string, converting tabs, carriage returns, and line
feeds into a single space, turning any sections of multiple whitespace
into a single space, then chopping off any whitespace at both ends of
the string. This allows the person writting the file to add whitespace
in strings as needed to make the raw XML easier to read, without
changing what is written into the final NeXus file.Next to look at is how arrays of numbers are specified. Lines
24-27 show both one and two dimensional arrays. The dimension of the
array is specified with the type as discussed above. The thing to
notice here is that arrays of numbers are specified as comma delimited
lists. The brackets in the list of values are "syntatic sugar". When
the values are read in &nxtranslate converts them into commas then
converts multiple adjacent commas into a single comma. The purpose of
this is so translation file authors can more easily see each dimension
of the array that they wrote. The brackets can also be removed
altogether as seen in line 24.Translation from NeXusNext is to show how to use NXtranslate to bring in information
from external sources.
demonstrates various features of importing information from external
sources, including modifying it before writing.Translation from NeXus file test_nexus.xmlThe functional form of the data
]]>
As suggested earlier the root node (line 1) has defined a
&source and &mime-type to use for creating a retriever. Line 2
demonstrates that entire entries can be copied from one file to the
next and that the name of a node can be changed. In this case it is
from entry1 to
entry_1D. Lines 4-7 show how to copy over an
entire group and add a new field to it. For finer control of what is
added and have the ability to change attributes look at lines
9-12. Line 11 shows how to change the dimensions of the field by using
the type attribute. Please note that this will
not work for character arrays and the total number of array items must
remain constant. Also, the type itself cannot be changed (single
precision float to double precision float, etc.). Since the dimensions
of the f_x_y array change it makes sense to
change the axes for plotting. This is done in both line 9 and 10 by
specifying the attribute and its new value. To add another attribute
just specify it similarly. Line 11 demonstrates erasing the
axes attribute. Specify the attribute with an
empty string as the value.These two examples have shown the way to set up a translation
file. You can import information from multiple files by declaring
another &source and &mime-type . There are a couple of things to know
about these as well. The default &mime-type is
"application/x-NeXus" so it does not need to be
specified. For each &source , whatever &mime-type was defined in the
parent node will be used for the current &source . shows what, in principle, could
be done with &nxtranslate as more retrievers get
written.While retrievers that import information from
mySQL and jpeg images would be nice, they do not currently
exist.A contrived exampleGeorge User
]]>
Anatomy of LinksThe two nodes involved in a link are the source and link. The
source is the original version of the information, the link is the
copy. There is no way to decipher which is the original and which is
the copy without direct comparison of ids using the NeXus api. Links
can be either to a group or field. Links to attributes are not
supported by the &napi . A link to a group and field are both shown in
. The first link is to a group whose
name was group1, while the second link is to a
field array1.Two links]]>
Strings for TranslationThe previous section discussed how to write a translation file
and several of its features. This section will explain in more detail
the strings available for use in a translation file. In principle this
section is incomplete because there may exist retrievers that the
authors have not been informed of so consider this list
incomplete. Also, by nature, the retrievers are quite decouple so the
location strings for each retriever can be significantly different
from the others.NeXusAs seen earlier in this chapter the &mime-type for NeXus files
is application/x-NeXus. Similarly the &location
strings are as simple as possible. NeXus files are organized
hierarchically similar to the translation file. A good analogy is to
compare it to a file system where the groups are directories and the
fields are files. Using this analogy the &location strings are
absolute paths to the directory or file to be copied. Since there
examples of NeXus location strings in and there is only one other thing to
mention, the path separator is a forward slash, "/".
Simple ASCIIThe &mime-type for the simple ASCII retriever is
text/plain. The functionality of the simple
ASCII retriever is limited. This is to emphasize the methodology for
building retrievers, rather than build a general purpose one. All of
the location strings are integers defining the line number to use. The
first line of the file is zero.SNS HistogramThe &mime-type for the SNS histogram retriever is
application/x-SNS-histogram.The &location is of the general form
[...,dim2,dim1][...,dimY,dimX]#{tag_name_1|operator_1}keyword_1{tag_name_2|
operator2}keyword_2...
Notice that the &location is divided into two parts, declaration and
definition, separated by #. The declaration
describes the dimension of the retrieved data. The definition
describes which information the data consists of. Both of these will
be described in greated detail below.The declaration part, [...,dim2,dim1][...,dimY,dimX]
surrounded by square brackets, contains between the first brackets the size of each dimension of the array to be returned, separated by commas, and between the second set of brackets, the dimensions of the array to read from. The values are
specified as positive integers. The current version of the retriever returns an array of the same size as the initial array, no matter the dimensions given between the first set of brackets.
The definition part,
{tag_name_1|operator_1}keyword_1{tag_name_2|operator2}...,
is where selecting the data to be transfered from the SNS histogram
file is described. Each part of the definition consists of a
tag_name and
operator separated by a vertical slash "|"
. Multiple definitions can exist in a
single &location separated by keywords. If the
definition is missing, then all of the available data will be
retrieved.The possible values for the tag_name are
pixelIDSelect using unique pixel identifiers. Applicable for all
detectors.pixelXSelect using column numbers. Applicable for all area
detectorspixelYSelect using row numbers. Applicable for all area
detectorsTbinSelect using time channels. Applicable for all
detectorsThe operator can be of one of two forms
loop(start,end,increment) is used
to specify a series of identifiers that runs inclusively from
start to end in steps of
increment.List of identifiers. The identifiers specify which
data to include. The identifiers must be separated by commas.The keyword is used to link various
declarations together into unions and intersections. Keywords are
entirely optional. Keywords that work on two definitions are left
associative.
!The logical "not" operator. This negates the definition
following it. Must be placed just in front of the curly braces it is
associated with.()Grouping operation. This can be used to clarify what
order multiple keywords are applied. No associative parentheses are allowed within
the curly braces.ANDThe logical "and" operator. This generates the
intersection of two definitions. This parameter is case sensitive.ORThe logical "or" operator. This generates the union of
two definitions. This parameter is case sensitive.Examples[150,256,167][304,256,167]#{pixelID|loop(1,38400,1)}This retrieves the first 38400 pixel identifiers and
put the data into a 150x256x167 array where the 167 dimension
changes the fastest. In this example, there are 167 time channels,
256 columns, and 150 rows. The data are coming from a binary file where the data
are stored as a 304x256x167 flat array [50,256,100][304,256,167]#{pixelID|loop(1,12800,1)}AND{Tbin|loop(1,100,1)}This retrieves the union of the first 12800 pixel
identifiers with the first 100 time channels then places the data into
a 50x256x100 array. One must keep in mind that if the array declared
is of a different size than the data defined, an error will be
generated.[7,167][304,256,167]#{pixelX|45,53,60,61,62,34500,34501}This retrieves a series of
columns.XML retrieverThe &mime-type for the XML retriever is
text/xml. The XML retriever is built on top of
libxml2's document object model (DOM) parser. Because of this the
entire file for information to be retrieved from is loaded into memory
as a character arrays. The DOM API was chosen to allow for jumping
around the source file without needed to parse its contents multiple
times. The location string will be formatted according to the
following rules:
The location string for a field will look like a
(unix) path. Each level of the hierarchy is separated by a forward
slash, "/".To specify the type the value is preceeded using a
name separated using a colon, ":". The allowed names are
"INT8,INT16,INT32,UINT8,UINT16,UINT32,FLOAT32,FLOAT64". If no name is
specified it is (implicitly) a string. Therefore to get "the_answer"
as a double precision float the location is
"FLOAT64:/numbers/the_answer".In the case where the field has a "type" attribute
with the value being one of the types above that will be used rather
than as a character array. Specifying the type in the location will
override what is in the source file.Arrays can be specified as part of the type as either
an attribute in the XML file or in the location string. To get a six
element integer array use the location "/numbers/array" which points to a whitespace delimited list. Multiple dimensions are specified by using a comma
delimited list in the square brackets
(i.e. "INT16[3,2]:/numbers/array")To get an attribute specify it at the end of a path
separated by a hash symbol, "#". Therefore to get attr2 as a single
precision float the location is
"FLOAT32:/numbers#attr2".
This methodolgy does not allow for automatically detecting the type of
an imported attribute (it will be read as a string), or
differentiating two fields at the same level with the same tag
name.