| 105 | X-ray microtomography is a powerful tool to analyze and understand internal otherwise invisible mechanisms of small |
| 106 | animals. Resolution and duration of experiments with living objects are currently limited by radiation damage. The |
| 107 | compressed sensing theory has demonstrated the feasibility to recover signals from the under sampled data and, hence, |
| 108 | opens up the possibility to reduce radiation. These reconstruction techniques are computationally very |
| 109 | demanding and have therefore been not used for synchrotron experiments up to now. |
| 110 | |
| 111 | The master thesis will be performed within an international project that aims to develop a novel instrumentation for |
| 112 | ultrafast imaging at synchrotron light sources. We expect the student to get familiar with the latest developments in the |
| 113 | field of compressive sensing theory and its application to tomographic image reconstruction. Advanced methods |
| 114 | described in literature have to be evaluated using realistic sample datasets. Promising algorithms should to be |
| 115 | implemented. To take advantage of the latest high-performance computing hardware, the selected algorithms have to be adapted for better mapping to massively parallel architectures. The implementation in OpenCL will be optimized for latest GPU architectures from AMD and NVIDIA. |
| 116 | |
| 117 | == Enhancing the quality of tomographic reconstruction by advanced iterative algorithms optimized for parallel architectures == |
| 118 | * Contact person: Suren A. Chilingaryan <csa@suren.me> |
| 119 | * [raw-attachment:1407-master-astor.pdf Detailed announcement] |
| 120 | * Required Skills: Strong C and Python knowledge, numerical algorithms in image processing. Experience with parallel programming is a plus. |
| 121 | * Experience Gained: Synchrotron Imaging, 4D Tomography, Image Segmentation, Optical Flow, Parallel programming, GPU programming. |
| 122 | |
| 123 | Recent developments in X-ray microtomography (SR-μCT) facilitate the investigation of internal morphology and structural changes in small living organisms in 4D (3D + time). In order to analyze internal dynamics existing instrumentation records hundreds of 3D |
| 124 | volumes with high-resolution within a few minutes. The first step in data analysis is segmented of the functional units. Currently this is a manual task requiring months of work of highly skilled biologists. |
| 125 | |
| 126 | The aim of this work is to develop algorithms for semi-automatic segmentation of 4D tomographic volumes and to implement them. One possible solution is to use the optical-flow in sequences of 3D volumes and use it to map manual segmentations of the selected volumes to consecutive frames. The algorithms have to be optimized for the latest parallel computing architectures. The work is embedded in national and international collaborations for high data-rate processing and performed within an interdisciplinary team of computer scientists, synchrotron physicists, and biologists. |
| 127 | |
| 128 | == High-speed tracking of fluorescent nanoparticles in 3D and with subnanometer precision using Parallel Accelerators == |
| 129 | * Contact person: Suren A. Chilingaryan <csa@suren.me> |
| 130 | * [raw-attachment:1307-tvt-v1.pdf Detailed announcement] |
| 131 | * Required Skills: Good knowledge of C/C++ programming language. Prior knowledge of parallel programming models is a plus. |
| 132 | * Experience Gained: Parallel programming, GPU programming, Image processing |
| 133 | |
| 134 | In the field of organic and printed electronics (e. g. polymer solar cells, OLEDs or Li-Ion batteries) there is a growing demand for thin functional layers with highly homogeneous surface topology. If these layers are coated from the liquid phase, the coating and |
| 135 | drying steps affect the surface quality. During the drying process, Marangoni convection might occur, leading to surface inhomogeneities. To get a better understanding of convection process we apply μPIV using fluorescent nanoparticles to resolve |
| 136 | the respective flow field in the liquid phase. In case of a 3D a multifocal system is used to acquire images in different layers at |
| 137 | the same time. |
| 138 | |
| 139 | During experiment a 4 GB of data is recorded every second by 5 high-speed cameras. It is a challenge to analyze such amount of |
| 140 | data interactively and extract particle trajectories. At a first step, we expect student to parallelize the data evaluation codes and |
| 141 | optimize for latest GPU architectures from AMD and NVIDIA. On the second stage, the codes should be modified to run in GPU- |
| 142 | cluster environment. |
| 143 | |
| 144 | == Optimizing high speed data transfer and processing of DAQ systems with NVIDIAs GPUDirect == |
| 145 | * Contact person: Suren A. Chilingaryan <csa@suren.me> |
| 146 | * [raw-attachment:1407-master-gpudirect.pdf Detailed announcement] |
| 147 | * Required Skills: Good knowledge of C/C++ programming language as well es Linux kernel and driver development. Knowledge of parallel programming models is a plus. |
| 148 | * Experience Gained: Parallel programming, GPU programming, RDMA data transfer mechanisms. |
| 149 | |
| 150 | Recent data acquisition systems are characterized by increasing data rates and the need for efficient online analysis and monitoring. |
| 151 | Conventional CPUs are no longer able to handle the increased computational demands of scientific processes. In the field of high |
| 152 | performance computing, GPUs with their modern and simple methods to utilize parallel processing make for an easily accessible |
| 153 | alternative to classical CPU computing. Unfortunately, the gap between computational capabilities of GPU systems and |
| 154 | throughput of system memory has grown tremendously and becomes the main factor limiting performance. This is especially harmful for PCI-express (PCIe) based data acquisition systems using multiple GPU cards for data processing. Using standard |
| 155 | approaches to handle PCIe devices, the data will be copied into the system memory, sometimes multiple times, at each stage of data |
| 156 | processing pipeline. For instance, the standard pipeline consisting of 3 stages (data readout from the frame-grabber card, |
| 157 | preprocessing on GPU, and dispatch to the remote server over network or Infiniband interface) will include 4 copies in system |
| 158 | memory at least and usually more depending on the hardware and software configuration. |
| 159 | |
| 160 | Recently NVIDIA revealed the GPUDirect for RDMA technology to relieve the load on the system memory. The GPUDirect/RDMA technology enables point to point transfers between PCIe devices and NVIDIA GPU on the same bus bypassing the system memory entirely. The alternative technology is called GPUDirect for Video and developed by NVIDIA specially for high-speed frame grabbers. For his Diploma work, the student is supposed to: |
| 161 | * Compare GPUDirect/RDMA and GPUDirect/Video technologies , |
| 162 | * Provide GPUDirect-enabled drivers for our FPGA-based data acquisition platform (FDAP), |
| 163 | * Investigate if similar approach could be used to transfer data between FDAP and Infiniband adapters directly , |
| 164 | * Evaluate the technology in terms of latency and throughput compared to the existing drivers, |
| 165 | * Check if and how the GPUDirect technology can be used with UFO parallel processing framework to split load across the nodes in GPU cluster. |
| 166 | The performance benefit of technology should be demonstrated for realistic scenarios like ultra high-speed X-ray |
| 167 | tomography. |