The Translation File

The Translation File The file produced by &nxtranslate is entirely determined by the contents of the translation file. This chapter discusses the format of a translation file as well as listing "location strings" for the external formats. Overview Translation files are written in xml and read using an xml parser. For this reason they must be a valid xml file. There are many places to find more information about XML W3C is the definitive standard while Tellme Studio has a one page overview of what XML is. This means that the following rules must be adhered to Every opening tag must have a corresponding closing tag at the same level. This means that ]]> is allowed while ]]> and ]]> are not. Tags and attribute names are case sensitive. Therefore ]]> and ]]> are distinct tags. While this can lead to confusion when writing a translation file it is easily avoided in practice. Attribute values must be inside single (') or double (") quotes. Tags and attribute names cannot start with a number or special character. Another way of saying this is that the name must start with a letter. Certain characters will break the parsing of the xml. The characters, and how to create them are < (), > (), & (), " (), and &apos (). Empty tags, ]]>, can be replaced with a single tag, ]]>. This convenience will make more sense during the discussion of translation files when specifying information outside of the file. There are some other rules to note about the translation file. It is not simply a XML file, there are additional constraints. However, the translation file is not directly validated to follow these constraints, but failing to follow them will result in the program exiting early without creating a NeXus file. Also, NXtranslate is intended to be used to write any file readable by the NeXus API, so the translation file is not validated against definition files. This decision was made on the basis of performance since it was determined that most of the time a "standard" translation file will be used to convert a large number of files. First some definitions used througout this document. Translation file definitions &napiAn abbreviation for the NeXus Abstract Program Interface. nodeA point in the hierarchy, it can either contain other nodes (be a parent with children) or not (a leaf node). Any pair of opening an closing tags represents a single node. groupA node that contains other nodes. fieldA node that does not contain other nodes (a leaf node). In other places in NeXus this is sometimes refered to as a "data" or a "SDS". retrieverAn object whose job is to retrieve information from a source external to the translation file. Which retriever is created is determined by the value of &mime-type . The retriever is initialized using the value of &source . Information is produced by the retriever using the &location . special attributeAn attribute that is interpreted by &nxtranslate as a command to deal with external information. The special attributes are &mime-type , &source , &location , and &make-link . &mime-typeA keyword that denotes what library to use to retrieve information from an external source. It can be a valid mime type. &sourceA string denoting what a retriever should use to initialize itself. This is generally a file on the local system for the retriever to open. &locationA string passed to the retriever for it to generate data from. For example, when using the NeXus retriever this is a path to a particular node in the file which will be written out to the resulting NeXus file. &link-tagThis denotes a node that is a link to another node in the file. It must have a &make-link attribute. All other attributes will be ignored &make-linkThe attribute denoting what a &link-tag node should be linked to. The syntax for describing location is the same as for the NeXus retriever. If this attribute appears in a node other than &link-tag it will be treated as a normal attribute. primative typeAny of the following types (ignoring bit-length): NX_UINT (unsigned integer), NX_INT (signed integer), NX_FLOAT (floating point number), NX_CHAR (character), NX_BOOLEAN (boolean, or true/false), NX_BINARY (binary value). At the moment NX_BOOLEAN and NX_BINARY are not supported by &nxtranslate and the NeXus API supports only one dimension arrays of NX_CHAR. Now that the definitions have been presented the other constraints of a translation file can be explained. The root node in a file will be ]]>. There will be nothing before or after it, and only one of them. The NXroot can be used to set global values for &mime-type and &source . Only groups can exist directly inside the root. This is a constraint of the NeXus API. Every node (except the NXroot and &link-tag ) needs a name and type. If the node has a &location then the type can be omitted since the retriever will provide it. Groups cannot have any attribute other than the special ones. Fields can have any attribute. This reflects a restriction in the NeXus API and does not constrain the contents of resulting NeXus files in any way. Groups cannot have any data in them. In other words things similar to 1 2 3 4]]> are incorrect. To specify the dimensions of a field, use square brackets [] affter the type. A single precision floating point array with five elements would have type="NX_FLOAT32[5]". If the field has only one element, or is a character array, the dimensions can be left off. For character arrays, the dimensions are ignored. To specify the type of a attribute denote the primative type separated from the value using square brackets. For numeric types only scalars are allowed. If no type is specified it is assumed to be a character array (length is determined automatically). Simple Translation While &nxtranslate is the anything to NeXus translator, it is possible to have everything specfied in the translation file. shows a translation file where no information will be taken from any other file. Simple translation file <filename>test_simple.xml</filename> George User text/plain The data is a simple parabola, f(x)=x^2 0 1 2 3 4 5 6 7 8 9 10 0 1 4 9 16 25 36 49 64 81 100 George User text/plain The data is a two dimensional parabola, f(x,y)=x^2+y^2 1.0 4.7 2.3 1.6 3.3 6.2 9.2 11.89 32.98 16.18 13.45 39.44 60.53 43.73 41.00 85.64 106.73 89.93 87.20 ]]> This example follows all of the rules laid out in the previous section and serves to introduce several of the features of the translation file. First a style note though, in XML files there is a concept of "ignorable whitespace". These are carriage returns (), line feeds (), tabs (), and spaces. These are ignored (as suggested by the term "ignorable whitespace") and are present to aid those looking at the raw XML to see the node hierarchy. The main purpose of is to show how to specify information in a translation file. Line 4 demonstrates the method for strings. Here the name is author and the type is NX_CHAR. The length of the character array is determined from the actual string supplied rather than what is specified in the type attribute. The value is created by reading in the supplied string, converting tabs, carriage returns, and line feeds into a single space, turning any sections of multiple whitespace into a single space, then chopping off any whitespace at both ends of the string. This allows the person writting the file to add whitespace in strings as needed to make the raw XML easier to read, without changing what is written into the final NeXus file. Next to look at is how arrays of numbers are specified. Lines 24-27 show both one and two dimensional arrays. The dimension of the array is specified with the type as discussed above. The thing to notice here is that arrays of numbers are specified as comma delimited lists. The brackets in the list of values are "syntatic sugar". When the values are read in &nxtranslate converts them into commas then converts multiple adjacent commas into a single comma. The purpose of this is so translation file authors can more easily see each dimension of the array that they wrote. The brackets can also be removed altogether as seen in line 24. Translation from NeXus Next is to show how to use NXtranslate to bring in information from external sources. demonstrates various features of importing information from external sources, including modifying it before writing. Translation from NeXus file <filename>test_nexus.xml</filename> The functional form of the data ]]> As suggested earlier the root node (line 1) has defined a &source and &mime-type to use for creating a retriever. Line 2 demonstrates that entire entries can be copied from one file to the next and that the name of a node can be changed. In this case it is from entry1 to entry_1D. Lines 4-7 show how to copy over an entire group and add a new field to it. For finer control of what is added and have the ability to change attributes look at lines 9-12. Line 11 shows how to change the dimensions of the field by using the type attribute. Please note that this will not work for character arrays and the total number of array items must remain constant. Also, the type itself cannot be changed (single precision float to double precision float, etc.). Since the dimensions of the f_x_y array change it makes sense to change the axes for plotting. This is done in both line 9 and 10 by specifying the attribute and its new value. To add another attribute just specify it similarly. Line 11 demonstrates erasing the axes attribute. Specify the attribute with an empty string as the value. These two examples have shown the way to set up a translation file. You can import information from multiple files by declaring another &source and &mime-type . There are a couple of things to know about these as well. The default &mime-type is "application/x-NeXus" so it does not need to be specified. For each &source , whatever &mime-type was defined in the parent node will be used for the current &source . shows what, in principle, could be done with &nxtranslate as more retrievers get written.While retrievers that import information from mySQL and jpeg images would be nice, they do not currently exist. A contrived example George User