The Translation File The file produced by &nxtranslate is entirely determined by the contents of the translation file. This chapter discusses the format of a translation file as well as listing "location strings" for the external formats. Overview Translation files are written in xml and read using an xml parser. For this reason they must be a valid xml file. There are many places to find more information about XML W3C is the definitive standard while Tellme Studio has a one page overview of what XML is. This means that the following rules must be adhered to Every opening tag must have a corresponding closing tag at the same level. This means that ]]> is allowed while ]]> and ]]> are not. Tags and attribute names are case sensitive. Therefore ]]> and ]]> are distinct tags. While this can lead to confusion when writing a translation file it is easily avoided in practice. Attribute values must be inside single (') or double (") quotes. Tags and attribute names cannot start with a number or special character. Another way of saying this is that the name must start with a letter. Certain characters will break the parsing of the xml. The characters, and how to create them are < (), > (), & (), " (), and &apos (). Empty tags, ]]>, can be replaced with a single tag, ]]>. This convenience will make more sense during the discussion of translation files when specifying information outside of the file. There are some other rules to note about the translation file. It is not simply a XML file, there are additional constraints. However, the translation file is not directly validated to follow these constraints, but failing to follow them will result in the program exiting early without creating a NeXus file. Also, NXtranslate is intended to be used to write any file readable by the NeXus API, so the translation file is not validated against definition files. This decision was made on the basis of performance since it was determined that most of the time a "standard" translation file will be used to convert a large number of files. First some definitions used througout this document. Translation file definitions &napiAn abbreviation for the NeXus Abstract Program Interface. nodeA point in the hierarchy, it can either contain other nodes (be a parent with children) or not (a leaf node). Any pair of opening an closing tags represents a single node. groupA node that contains other nodes. fieldA node that does not contain other nodes (a leaf node). In other places in NeXus this is sometimes refered to as a "data" or a "SDS". retrieverAn object whose job is to retrieve information from a source external to the translation file. Which retriever is created is determined by the value of &mime-type . The retriever is initialized using the value of &source . Information is produced by the retriever using the &location . special attributeAn attribute that is interpreted by &nxtranslate as a command to deal with external information. The special attributes are &mime-type , &source , &location , and &make-link . &mime-typeA keyword that denotes what library to use to retrieve information from an external source. It can be a valid mime type. &sourceA string denoting what a retriever should use to initialize itself. This is generally a file on the local system for the retriever to open. &locationA string passed to the retriever for it to generate data from. For example, when using the NeXus retriever this is a path to a particular node in the file which will be written out to the resulting NeXus file. &link-tagThis denotes a node that is a link to another node in the file. It must have a &make-link attribute. All other attributes will be ignored &make-linkThe attribute denoting what a &link-tag node should be linked to. The syntax for describing location is the same as for the NeXus retriever. If this attribute appears in a node other than &link-tag it will be treated as a normal attribute. primative typeAny of the following types (ignoring bit-length): NX_UINT (unsigned integer), NX_INT (signed integer), NX_FLOAT (floating point number), NX_CHAR (character), NX_BOOLEAN (boolean, or true/false), NX_BINARY (binary value). At the moment NX_BOOLEAN and NX_BINARY are not supported by &nxtranslate and the NeXus API supports only one dimension arrays of NX_CHAR. Now that the definitions have been presented the other constraints of a translation file can be explained. The root node in a file will be ]]>. There will be nothing before or after it, and only one of them. The NXroot can be used to set global values for &mime-type and &source . Only groups can exist directly inside the root. This is a constraint of the NeXus API. Every node (except the NXroot and &link-tag ) needs a name and type. If the node has a &location then the type can be omitted since the retriever will provide it. Groups cannot have any attribute other than the special ones. Fields can have any attribute. This reflects a restriction in the NeXus API and does not constrain the contents of resulting NeXus files in any way. Groups cannot have any data in them. In other words things similar to 1 2 3 4]]> are incorrect. To specify the dimensions of a field, use square brackets [] affter the type. A single precision floating point array with five elements would have type="NX_FLOAT32[5]". If the field has only one element, or is a character array, the dimensions can be left off. For character arrays, the dimensions are ignored. To specify the type of a attribute denote the primative type separated from the value using square brackets. For numeric types only scalars are allowed. If no type is specified it is assumed to be a character array (length is determined automatically). Simple Translation While &nxtranslate is the anything to NeXus translator, it is possible to have everything specfied in the translation file. shows a translation file where no information will be taken from any other file. Simple translation file <filename>test_simple.xml</filename> George User text/plain The data is a simple parabola, f(x)=x^2 0 1 2 3 4 5 6 7 8 9 10 0 1 4 9 16 25 36 49 64 81 100 George User text/plain The data is a two dimensional parabola, f(x,y)=x^2+y^2 1.0 4.7 2.3 1.6 3.3 6.2 9.2 11.89 32.98 16.18 13.45 39.44 60.53 43.73 41.00 85.64 106.73 89.93 87.20 ]]> This example follows all of the rules laid out in the previous section and serves to introduce several of the features of the translation file. First a style note though, in XML files there is a concept of "ignorable whitespace". These are carriage returns (), line feeds (), tabs (), and spaces. These are ignored (as suggested by the term "ignorable whitespace") and are present to aid those looking at the raw XML to see the node hierarchy. The main purpose of is to show how to specify information in a translation file. Line 4 demonstrates the method for strings. Here the name is author and the type is NX_CHAR. The length of the character array is determined from the actual string supplied rather than what is specified in the type attribute. The value is created by reading in the supplied string, converting tabs, carriage returns, and line feeds into a single space, turning any sections of multiple whitespace into a single space, then chopping off any whitespace at both ends of the string. This allows the person writting the file to add whitespace in strings as needed to make the raw XML easier to read, without changing what is written into the final NeXus file. Next to look at is how arrays of numbers are specified. Lines 24-27 show both one and two dimensional arrays. The dimension of the array is specified with the type as discussed above. The thing to notice here is that arrays of numbers are specified as comma delimited lists. The brackets in the list of values are "syntatic sugar". When the values are read in &nxtranslate converts them into commas then converts multiple adjacent commas into a single comma. The purpose of this is so translation file authors can more easily see each dimension of the array that they wrote. The brackets can also be removed altogether as seen in line 24. Translation from NeXus Next is to show how to use NXtranslate to bring in information from external sources. demonstrates various features of importing information from external sources, including modifying it before writing. Translation from NeXus file <filename>test_nexus.xml</filename> The functional form of the data ]]> As suggested earlier the root node (line 1) has defined a &source and &mime-type to use for creating a retriever. Line 2 demonstrates that entire entries can be copied from one file to the next and that the name of a node can be changed. In this case it is from entry1 to entry_1D. Lines 4-7 show how to copy over an entire group and add a new field to it. For finer control of what is added and have the ability to change attributes look at lines 9-12. Line 11 shows how to change the dimensions of the field by using the type attribute. Please note that this will not work for character arrays and the total number of array items must remain constant. Also, the type itself cannot be changed (single precision float to double precision float, etc.). Since the dimensions of the f_x_y array change it makes sense to change the axes for plotting. This is done in both line 9 and 10 by specifying the attribute and its new value. To add another attribute just specify it similarly. Line 11 demonstrates erasing the axes attribute. Specify the attribute with an empty string as the value. These two examples have shown the way to set up a translation file. You can import information from multiple files by declaring another &source and &mime-type . There are a couple of things to know about these as well. The default &mime-type is "application/x-NeXus" so it does not need to be specified. For each &source , whatever &mime-type was defined in the parent node will be used for the current &source . shows what, in principle, could be done with &nxtranslate as more retrievers get written.While retrievers that import information from mySQL and jpeg images would be nice, they do not currently exist. A contrived example George User
]]> Anatomy of Links The two nodes involved in a link are the source and link. The source is the original version of the information, the link is the copy. There is no way to decipher which is the original and which is the copy without direct comparison of ids using the NeXus api. Links can be either to a group or field. Links to attributes are not supported by the &napi . A link to a group and field are both shown in . The first link is to a group whose name was group1, while the second link is to a field array1. Two links ]]> Strings for Translation The previous section discussed how to write a translation file and several of its features. This section will explain in more detail the strings available for use in a translation file. In principle this section is incomplete because there may exist retrievers that the authors have not been informed of so consider this list incomplete. Also, by nature, the retrievers are quite decouple so the location strings for each retriever can be significantly different from the others. NeXus As seen earlier in this chapter the &mime-type for NeXus files is application/x-NeXus. Similarly the &location strings are as simple as possible. NeXus files are organized hierarchically similar to the translation file. A good analogy is to compare it to a file system where the groups are directories and the fields are files. Using this analogy the &location strings are absolute paths to the directory or file to be copied. Since there examples of NeXus location strings in and there is only one other thing to mention, the path separator is a forward slash, "/". Simple ASCII The &mime-type for the simple ASCII retriever is text/plain. The functionality of the simple ASCII retriever is limited. This is to emphasize the methodology for building retrievers, rather than build a general purpose one. All of the location strings are integers defining the line number to use. The first line of the file is zero. SNS Histogram The &mime-type for the SNS histogram retriever is application/x-SNS-histogram. The &location is of the general form [...,dim2,dim1][...,dimY,dimX]#{tag_name_1|operator_1}keyword_1{tag_name_2| operator2}keyword_2... Notice that the &location is divided into two parts, declaration and definition, separated by #. The declaration describes the dimension of the retrieved data. The definition describes which information the data consists of. Both of these will be described in greated detail below. The declaration part, [...,dim2,dim1][...,dimY,dimX] surrounded by square brackets, contains between the first brackets the size of each dimension of the array to be returned, separated by commas, and between the second set of brackets, the dimensions of the array to read from. The values are specified as positive integers. The current version of the retriever returns an array of the same size as the initial array, no matter the dimensions given between the first set of brackets. The definition part, {tag_name_1|operator_1}keyword_1{tag_name_2|operator2}..., is where selecting the data to be transfered from the SNS histogram file is described. Each part of the definition consists of a tag_name and operator separated by a vertical slash "|" . Multiple definitions can exist in a single &location separated by keywords. If the definition is missing, then all of the available data will be retrieved. The possible values for the tag_name are pixelID Select using unique pixel identifiers. Applicable for all detectors. pixelX Select using column numbers. Applicable for all area detectors pixelY Select using row numbers. Applicable for all area detectors Tbin Select using time channels. Applicable for all detectors The operator can be of one of two forms loop(start,end,increment) is used to specify a series of identifiers that runs inclusively from start to end in steps of increment. List of identifiers. The identifiers specify which data to include. The identifiers must be separated by commas. The keyword is used to link various declarations together into unions and intersections. Keywords are entirely optional. Keywords that work on two definitions are left associative. ! The logical "not" operator. This negates the definition following it. Must be placed just in front of the curly braces it is associated with. () Grouping operation. This can be used to clarify what order multiple keywords are applied. No associative parentheses are allowed within the curly braces. AND The logical "and" operator. This generates the intersection of two definitions. This parameter is case sensitive. OR The logical "or" operator. This generates the union of two definitions. This parameter is case sensitive. Examples [150,256,167][304,256,167]#{pixelID|loop(1,38400,1)} This retrieves the first 38400 pixel identifiers and put the data into a 150x256x167 array where the 167 dimension changes the fastest. In this example, there are 167 time channels, 256 columns, and 150 rows. The data are coming from a binary file where the data are stored as a 304x256x167 flat array [50,256,100][304,256,167]#{pixelID|loop(1,12800,1)}AND{Tbin|loop(1,100,1)} This retrieves the union of the first 12800 pixel identifiers with the first 100 time channels then places the data into a 50x256x100 array. One must keep in mind that if the array declared is of a different size than the data defined, an error will be generated. [7,167][304,256,167]#{pixelX|45,53,60,61,62,34500,34501} This retrieves a series of columns. XML retriever The &mime-type for the XML retriever is text/xml. The XML retriever is built on top of libxml2's document object model (DOM) parser. Because of this the entire file for information to be retrieved from is loaded into memory as a character arrays. The DOM API was chosen to allow for jumping around the source file without needed to parse its contents multiple times. The location string will be formatted according to the following rules: The location string for a field will look like a (unix) path. Each level of the hierarchy is separated by a forward slash, "/". To specify the type the value is preceeded using a name separated using a colon, ":". The allowed names are "INT8,INT16,INT32,UINT8,UINT16,UINT32,FLOAT32,FLOAT64". If no name is specified it is (implicitly) a string. Therefore to get "the_answer" as a double precision float the location is "FLOAT64:/numbers/the_answer". In the case where the field has a "type" attribute with the value being one of the types above that will be used rather than as a character array. Specifying the type in the location will override what is in the source file. Arrays can be specified as part of the type as either an attribute in the XML file or in the location string. To get a six element integer array use the location "/numbers/array" which points to a whitespace delimited list. Multiple dimensions are specified by using a comma delimited list in the square brackets (i.e. "INT16[3,2]:/numbers/array") To get an attribute specify it at the end of a path separated by a hash symbol, "#". Therefore to get attr2 as a single precision float the location is "FLOAT32:/numbers#attr2". This methodolgy does not allow for automatically detecting the type of an imported attribute (it will be read as a string), or differentiating two fields at the same level with the same tag name.