CFN Data Scientist Research Opportunities

Date posted:
Monday, February 26, 2018
The Center for Functional Nanomaterials

CFN Data Scientist Research Opportunities
Key CFN materials characterization facilities, including those at beam lines of the NSLS-II
synchrotron, now involve large-scale data collection and/or significant data analytics challenges.
Understanding the underlying mechanisms in complex chemical and electrochemical processes
drives our strategic focus on methods to interrogate functional materials in operando.
Transmission electron microscopy, electron energy loss spectroscopy and X-ray based core-level
spectroscopy are among the state-of-the-art tools being adapted to gain insights into structurefunction
relationships in complex, nanostructured materials under operating conditions. Another
strategic focus area, directed assembly and other approaches to control nanostructured material
architecture over multiple length scales, poses the challenge of measuring emergent structure and
properties. Characterization using small- and wide-angle X-ray scattering generates large, multidimensional
(space plus time) data sets. Challenges start with feature identification and tracking;
opportunities include innovative ways to fully exploit the ability to gather upwards of 100,000
scans per day at the NSLS-II CMS end station. For further description of the CFN strategic plan,
The initial research directions for the new CFN data scientist will be chosen to align with one or
more emerging project. The addition of new data analytics capability realized in an appropriate
software vehicle is envisaged to significantly advance the capabilities of target facilities and the
research performed using them. Following are brief descriptions of three current research
directions, two based in experimental facilities and one directed to theory-experiment
Linking core-level spectroscopy to local structure motifs: One scientific challenge in the use of
X-ray spectroscopy in operando is to extract key structural features at various stages of an
(electro)chemical process. First principles computational spectroscopy methods are widely used
to interpret experimental spectra based on known atomic structure models. However, to associate
spectra to structures poses an inverse problem. Demonstrating an approach to solve this
spectrum-structure relationship would significantly advance the utility of operando studies.
Adapting new techniques for data analysis, we have made progress on a pilot problem related to
the local structure around dopants in metal nanoparticles
[]. We are pursuing a particularly promising approach
that combines machine learning and first principles modeling to link spectra to specific structural
descriptors, for model problems taken from catalysis and energy storage. We seek to build on
these initial results, with a set of tools, databases and links into workflow software. How does
one integrate from experimental databases and first principles capabilities through data analytics
to an extensible capability for interpretation of operando data, ideally on the fly? (Lead: D. Lu,
Theory and Computation)
Optimized interpretation and application of X-ray scattering: X-ray scattering is a powerful
tool for probing material order at the nanoscale (via small-angle scattering) and the
molecular/atomic scale (wide-angle scattering/diffraction). X-ray scattering images are
essentially the square of the Fourier Transform of the real-space structure of the sample. X-ray
instruments at modern synchrotrons generate data at a prodigious rate. Development of
automated data analysis methods that can keep up with the pace of data collection is crucial for
efficient use of this scarce experimental resource. We have tested deep learning (convolutional
neural networks) as a means of automatically analyzing these datasets, i.e., inferring sample
structure by classifying images []. Longer term, we
envision an autonomous x-ray scattering instrument, one that can automatically analyze data and
even make decisions about what experiment to perform next. There are many outstanding
challenges in order to robustly use machine-learning for real-time data collection. How can the
known physics of scattering (or of the material being studied) be used to improve machinelearning
methods? How can one transfer the learning from one set of data/tags to rather different
kinds of data/tags? How can one use the output of machine-learning to guide experimental
decisions? (Lead: K. Yager, Electronic Nanomaterials)
Artificially Intelligent-Scanning Transmission Electron Microscopy (AI-STEM): AI-STEM is
an emerging area that encapsulates automated sample loading, STEM data acquisition, data
selection, analysis, and reporting. The purpose of this is to increase throughput, improve
statistical sampling, and remove human bias from the data acquisition process. However, STEM
can acquire data at a wide range of length scales – from tens of microns down to the sub-atomic
scales. Therefore, to sensibly extract information automatically from STEM images becomes
challenging. Below are shown a few atomic-resolution images of different crystals image along
different crystallographic orientations. There are clear patterns in those images. However, the
mapping between the patterns and the crystal structures that they belong to can be difficult even
to an expert in STEM. Fundamentally, this poses a multi-scale problem. How would one
decipher the crystallographic information from a atomic-resolution projection image without
using prior knowledge? What methods would one use to constructively incorporate prior
knowledge? (Lead: H. Xin, Electron Microscopy)