That is, enough meta-data is stored within the files to completely describe the format of the raw, user data such that the library can properly read back that data on any other platform. This allows these file formats to be portable and circumvents such hazards as endianess and machine-dependent data-type format differences. Other features include, associating names with the user data stored within PDB and HDF5 files so that the data can be easily referenced, the ability to access some subpart of any data set, being able to associate attributes with the data, and mechanisms to group data sets into directory type structures to provide organization. Both libraries provide a variety of APIs.
It uses a familiar filesystem hierarchy; it is flexible, self-describing, and portable across operating systems and hardware; can store text and binary data, can h5dwrite array used by parallel applications MPIhas a large number of language plugins; and is fairly easy to use.
In a previous articleI introduced HDF5, focusing on the concepts and strengths. I want to give a quick introduction to HDF5 through some simple code examples.
The goal is not to dive deep into HDF5 but to illustrate the basics of using it.
I'll start with Python because it is a widely used language and the HDF5 Python library h5py is very easy to use and very easy to understand. I also want to illustrate how to use HDF5 with a compiled language.
It is included with many Python distributions and with most Linux distributions. For the examples here, I use the Anaconda Python distribution for Python 2. The examples I use in this article are fairly simple and are derived from the Quick Start page on the h5py website.
The first example simply illustrates a few concepts, such as: Starting Out with h5py 01! If the file exists, it will overwrite; if it doesn't exist, it will create the file. Remember that HDF5 is really a container for data objects.
H5dwrite array the file will be non-zero in size, even if no data or attributes are written into it. After the file is opened and created, a data set with integers is created mydataset in line At this point, only the object for the dataset is created in the file dataspace.
Line 16 puts data into the data object using numpy.
Notice that you can put data into the object, and the h5py library will take care of updating the HDF5 file. We could also modify the data in the file.
Recall that in Python, almost everything is an object, so it has properties. Lines 18, 20, 22, and 24 print out some of the properties of the HDF5 file line 24 as well as the first data set lines 18, 20, and Because HDF5 is object based, it fits well with the object nature of Python. On line 26, a subgroup to the root group subgroup is created; then, on line 28, a new data set that resides in this subgroup is created using a float data type that starts with 50 elements.
Notice that a method of the group object is used for this.
On line 31, a new dataset is created. What is unique is that the dataset is created in a new subgroup named subgroup2. H5py will automatically create the subgroup if it doesn't exist. The output from this example Python script is show below: The NumPy integer type represents integers with 32 bits int Another short Python script reads the HDF5 file and outputs some of the attributes.
This can be done fairly easily using the h5py function visit. This function recursively walks the HDF5 file so you can discover the objects in the file, including groups and data sets. With this function, you can print the "names" of the objects.
Listing 2 is a simple script for walking the HDF5 file and printing the names of the objects. Walking the HDF5 File 01! Compiled languages are a little different. Using HDF5 with compiled languages is not quite as easy as with Python, but it is not difficult.
The developers of HDF5 have created a number of functions and subroutines to be used for manipulating data and objects in an HDF5 file that make programming straightforward.
For this article, a CentOS 7. It's not difficult to build a Fortran executable with gfortran and the HDF5 library that comes with the distribution. The HDF Group has provided some sample Fortran 90 code to get started, as well as more complex examples.
With the use of these examples, LIsting 3 shows a Fortran 90 version of the first sample Python code. This code is just an example, and I wanted to keep it short in the interest of space.For example, storing an array of one-hundred doubles named f64Primitive to a PDB file would be written in C as: status = PD_write(tempFile, "f64Primitive()", "double", f64Primitive); The same user-level data space and type information would be conveyed to HDF5 as.
Calls H5Aiterate, where one attribute is a string array attrvstr.c - Shows how to read variable length string attribute (space is allocated by HDF5, but freed by user).
Datasets: coll_test.c - Uses H5Sset_none to tell H5Dwrite call that there will be no data. 4-th process HAS to participate since we are in a . MPI does not store any “metadata” for applications, such as the datatypes themselves, dimensionality of an array, names for specific objects, etc.
This is where scientific data libraries that build on top of MPI, such as HDF5, come in to help applications since they handle data at a higher level. dataset = H5Dcreate2(*file, name, H5T_NATIVE_UINT8, dataspace, H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);.
For writing out a typical particle array with six coordinates (position and momentum), H5hut (top) uses only 10 lines of code, while equivalent HDF5 calls (bottom) for implementing the same functionality and performance tunings require at least 35 lines.
I created a very, very simple function which, upon three integers builds an array up and insert it into a dset. if I add a single array into a dataset, it works fine: percorso_sottodir = 'train_data'.