NAME nxingest - 1.4 USAGE nxingest mapping_file nexus_file [output_file] DESCRIPTION nxingest extract the metadata from a NeXus file to create an XML file according to a mapping file. The mapping file will defines the structure (names and hierarchy) and content (from the NeXus file, from the mapping file or from the current time) of the oputput file. See below for a description of the maping file. This tool use the NeXus api so any of the supported format (HDF4, HDF5 and XML) can be read. The output file parameter is optional. if not present nxingest will write the results into output.xml To be accepted by ICAT, the output XML should match the ICAT3 XML schema. See http://nile.dl.ac.uk/trac/isis/browser/Software/icat_xsd/icatXSD.xsd NB: This work is ongoing; the schema hasn't been released yet. This tool use the NeXus api so any of the supported format (HDF4, HDF5 and XML) can be read. nxingest can retrieve metadata from the neXus file, from the mapping file (fix parameter). it can also retrieve the actual time. If the metadata is a date or time, nxingest may modify it to extract only part of the data (e.g. the year) or reformat it to have consistant dates into ICAT. NeXus syntax. ------------- NeXus Data is divided in different classes that hold data sets. The data sets may hold any type of data from a single byte to unlimited dimension arrays. The data sets and the classes may also have attributes. To collect data from a neXus file, you have to build the path to the data you want. - Simple dataset (singular string or number) The path is the name of the different classes separated by '/'the last name is the name of the dataset. e.g. /run/title - Attribute (singular string or number) The attribute name is separated from the dataset by a '.' e.g. /run/data.units - Arrays Most of the data will be stored as multi dimensional arrays. We may want to extract particular information from the data. - Specific value from an array A null or positive number between square brackets after the data set name. nxingest consider all dataset an uni-dimension. e.g. /run/data_array[3] - Derived value nxingest may derived a few value from an array. To express that, you have to put the name of the derived parameter between square brackets. Available values are : [AVG] Average [STD] Standard Deviation [MIN] Minimum Value [MAX] Maximum Value [SUM] Sum of all values e.g. /run/sample/temperature_log/value[AVG] - Generic classes NeXus defined generic classes type that user can name freely. nxingest can use some of these to generalise the mapping files for similar instrument. By Writing the class type under rounded brackets like {NXentry} the program will substitue it with the actual class name from the current file. This is currenlty only available for {NXentry}, {NXinstrument} and {NXuser} e.g. /{NXentry}/{NXinstrument}/source/name is equivalent to /run/MUSR/source/name /entry_0/I18/source/name Also there may be more than 1 user define in a NeXus file. nxingest will loop over each of them if the mapping include a special node 'user_tbl'. Mapping File ------------ The mapping file is an xml document that will control the execution of nxingest, defining the structure and content of the output of nxingest. nxingest will scan the mapping file analysing all the element nodes it find. There are 3 major types of node : 1) Table node that define the hierarchy of the output document. e.g. the mapping : is mapped into : 1') User Table node is a specific case where the node is scan several time according to the number of {NXuser} type classes are present in the neXus file. at each iteration, nxingest will replace the string {NXuser} by the correct name found in the file. e.g. the mapping : ... is mapped into : ... ... ... 2) 'Tag' node which define a simple metadata record. It has 2 child node that contain the name of the output element and e.g. the mapping : name path_to_metadata is mapped into : metadata_from_nexus_file hdf5_version /.HDF5_Version HDF5 Version used in \ creating the file. is mapped into : hdf5_version 3.5 HDF5 Version used in creating the\ file. Metadata Sources ---------------- the source of the metadata is defined by nodes of type 'fix', 'nexus', 'special' and 'mix'. if the type is special. the begining of the text will contain a modifier (fix:, nexus:, time:or sys: ) The value is then the text without the modifier. 1) Fix string from the mapping file itself. - Node type fix - Node type : special; modifier fix: 2) From the NeXus file. - Node type nexus - Node type : special; modifier nexus: - Node type : special; modifier time:nexus(...) See below. 3) Time - Node type : special; modifier time: Time can be expressed in multiple format, so the the value after the modifier will be composed in 3 parts : time:source ; input_format; output format - The source can be 'now' for the current time or 'nexus()' with the path to the time string between the parenthesis. - input and output format are optional. The s/w expect an integer. Currently the possible values are 0 '2007-05-23T12:48:05' (default) 1 '2007-05-23 12:48:05' 2 '2007-05-23' 3 '12:48:05' 4 '20070523' 5 '200705' 6 '2007' 7 '23/05/2007' 4) System - sys:filename gives the filename of the NeXus file. - sys:location gives the path of the NeXus file. - sys:size gives the size in bytes of the NeXus file. 5) Mix of all 3. - Node type : mix; To combine several sources, several modifiers used with node type 'special' are used separated with '|'. e.g. nexus:/{NXentry}/{NXinstrument}.short_name | fix:_ | time:now ; 0 ; 5 should create something like : 'MUSR_2007' EXAMPLE Here is an example of the minimal xml document that can be accepted by ICAT. 000001 01 MUSR ll56 tester, developer test-1 EXPERIMENT_RAW file_01 /test/inv_1/file_01.jpg And the mapping file to create such document. inv_number path_to_metadata visit_id 1 instrument path_to_metadata user_id path_to_metadata role path_to_metadata name path_to_metadata dataset_type EXPERIMENT_RAW name path_to_metadata location path_to_metadata KNOWN ISSUES No test of Ingestion into ICAT has been done so far. Error handling and logging should be improved. The list of time format is relatively limited, it should be expanded. Only the columns NAME, UNITS, STRING_VALUE or NUMERIC_VALUE and DESCRIPTION from the parameter table can be filled at the moment. the column RANGE_TOP, RANGE_BOTTOM and ERROR are currently ignored. HISTORY Version 1.0 05/06/2007 First version. Version 1.1 13/06/2007 Remove trailing white space around string. Version 1.2 16/08/2007 Bug correction: if the string is too short for the expected transformation, an out of bound exception would have been raised by substr Version 1.3 31/08/2007 Bug correction in test. (Logical OR and not bitwise OR) Add a filesize calculation function. Version 1.4 04/09/2007 The parameters will not be created if there is no value attached to them. DEPENDANCIES - NeXux Library. - mxml library version 2.2.2 You may have to change the MXML_WRAP constant in mxml.h to a much higher value to avoid carriage return to appear in the xml document. AUTHOR Laurent Lerusse Science and Technology Facility Council e-Science Center - Data Management Group e-mail : l.lerusse@stfc.ac.uk COPYRIGHT nxingest 1.4 - extraction of metadata from NeXus files into a xml document Copyright (C) 2007 STFC e-Science Center This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; version 2 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.