Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Peer reviewed version of the manuscript published in final form at DOI: 10.1002/jbio.201500238.
Article type: Full Article
Quantitative multi-image analysis for biomedical Raman spectroscopic imaging
Hedegaard M A B1*, Bergholt M S2, and Stevens M M2
*Corresponding Author: E-mail: [email protected]
1Department of Chemical Engineering, Biotechnology and Environmental Technology, University of Southern Denmark, Campusvej 55, 5230 Odense M, Denmark2Department of Materials, Department of Bioengineering and Institute of Biomedical Engineering, Imperial College London, London SW7 2AZ, United Kingdom
Received: ((will be filled in by the editorial staff))Revised: ((will be filled in by the editorial staff))Published online: ((will be filled in by the editorial staff))
Keywords: Raman spectroscopic imaging, multi-image analysis, biochemical quantification
1
Abstract
Imaging by Raman spectroscopy enables unparalleled label-free insights into cell and tissue
composition at the molecular level. With established approaches limited to single image
analysis, there are currently no general guidelines or consensus on how to quantify
biochemical components across multiple Raman images. Here, we describe a broadly
applicable methodology for the combination of multiple Raman images into a single image
for analysis. This is achieved by removing image specific background interference, unfolding
the series of Raman images into a single dataset, and normalisation of each Raman spectrum
to render comparable Raman images. Multivariate image analysis is finally applied to derive
the contributing ‘pure’ biochemical spectra for relative quantification. We present our
methodology using four independently measured Raman images of control cells and four
images of cells treated with strontium ions from substituted bioactive glass. We show that the
relative biochemical distribution per area of the cells can be quantified. In addition, using k-
means clustering, we are able to discriminate between the two cell types over multiple Raman
images. This study shows a streamlined quantitative multi-image analysis tool for improving
cell/tissue characterisation and opens new avenues in biomedical Raman spectroscopic
imaging.
1. Introduction
Raman spectroscopy is an optical vibrational technique based on inelastic light scattering of
molecules. In recent years there has been an increase in the use of Raman spectroscopic
imaging for the characterisation and analysis of cells and tissues [1, 2], diseases [3, 4], and
spectral pathology [5]. Much of this increase is due to the label-free biochemical information
(i.e. specific structures of proteins, lipids, and nucleic acids) with submicron spatial resolution
that Raman spectroscopic imaging reveals. Recently, Raman spectroscopic imaging has also
been used in regenerative medicine [6, 7].
Biochemical information from Raman spectroscopic imaging can generally be extracted and
analysed using multivariate and cluster analyses. While extensive efforts have been devoted to
Raman cluster analysis in biomedicine, only limited work has been conducted using
quantitative analysis. Recent attention has been drawn to the use of more sophisticated factor-
based analysis including multivariate curve resolution, alternating least squares (MCR-ALS),
and spectral unmixing techniques based on pure pixel approaches such as vertex component
2
analysis (VCA) or the N-FINDR algorithm (N-FINDR) [8-11]. Spectral unmixing methods
assume that the acquisition of a large number of spectra result in few spectra in the dataset
that contain nearly pure components known as ‘endmembers’. For high resolution Raman
spectroscopic imaging, this assumption is usually fulfilled for cells and tissues [8].
Applying Raman spectroscopic imaging to biomedicine often involves characterising cells
and tissues over multiple timepoints and repetitions, and can also involve the comparison of
two or more conditions. One of the recurring issues when analysing multiple Raman images is
their comparability. This is particularly important when the quantification or clustering of
different cell/tissues is involved. Raman spectroscopic imaging is further confounded by
several factors. First, biomedical samples are often immersed in a solvent (e.g. phosphate-
buffered saline [PBS]) on a substrate such as MgF2 that gives rise to varying spectral
backgrounds. Second, cell and tissue spectra can contain superimposed Raman- and auto-
fluorescence signals with prominent variance across images. Third, measured spectra can be
affected by varying cell/tissue morphology and optical properties (e.g. variation in refractive
indices, absorption, scattering and anisotropy), which influences absolute intensity. Finally,
Raman spectroscopy instruments experience day-to-day variation through factors such as
fluctuations in laser excitation power and optical focusing. Together, these effects contribute
differently to each Raman image according to the specific experiment and sample under
consideration. For this reason, spectral analyses are usually performed using each Raman
image separately. However, a separated analysis does not enable comparative studies, which
compromises the reproducibility and significance of results when quantification or clustering
are involved. As a result, the quantitative nature of Raman spectroscopy imaging is not fully
exploited in experiments. Since there are no general guidelines on how to quantify
biochemical components in multiple Raman images, there is a need to introduce standardised
methodologies when a series of independent images are acquired.
Here, we present a broadly applicable methodology that enables a user to combine a series of
Raman spectroscopic images into a single image for analysis. This is achieved by removing
image specific background interferences, unfolding the images into a single dataset, and
normalisation of each Raman spectra to render comparable images. Spectral unmixing is
finally applied as an example to derive the contributing endmembers for relative
quantification of biochemical components. We present our methodology using four
independently measured Raman images of control cells and four images of cells treated with
strontium ions from substituted bioactive glass. These cells have both differences in
substrate/solvent backgrounds, and total intensity that require correction before the images are 3
directly comparable. Our method is a multi-image analysis tool and illustrates the application
of Raman spectroscopy imaging for quantitative analysis in biomedicine, using a small but
known set of Raman spectroscopic images.
2. Methods for Multi-Image Analysis
Figure 1 shows an overview of our methodology for comparative analysis and quantification
of biochemical content using a series of Raman images. The image analysis consists of six
steps: (i) pre-processing (individual solvent/substrate/auto-fluorescence baseline correction of
images) and removal of spectra with no cellular information; (ii) dataset assembly and
unfolding into an image matrix; (iii) normalisation of the unfolded matrix to reduce absolute
intensity fluctuations among different images; (iv) image analysis (factor analysis or spectral
unmixing) to extract the relative contributions of biochemical components; (v) reconstruction
of ’pure' biochemical images; and (vi) relative quantification and clustering of biochemical
content. The selection of pre-processing and image analyses depends on the specific
experiment and sample under consideration. In this work, we applied a selection of pre-
processing and image analysis methods that are generally applicable for the conventional,
non-confocal Raman spectroscopic imaging of cells.
2.1 Pre-processing: Individual solvent/substrate/auto-fluorescence correction of images
To remove the image specific solvent and substrate background that are unrelated to the
biochemical information, we performed the pre-processing of each Raman image separately
before analysis (Figure 1). Since the solvent and substrate background spectrum can change
significantly in a single image (e.g. due to optical focus effects), its contribution must be
estimated and subtracted for each pixel independently. For known substrate and solvent
backgrounds, extended multiple scatter correction-spectral interference subtraction (EMSC-
SIS) is an efficient method for cells and tissue [12]. EMSC-SIS is an extension to extended
multiplicative scatter correction (EMSC) [13]. EMSC can effectively correct for offsets, linear
and quadratic effects observed across images caused by drifts in the instrument and changes
in optical focus between cells. The spectral interference subtraction component of EMSC-SIS
estimates the amount of background interference by using solvent/substrate reference spectra
and adjusting according to a corrected cell spectrum. For Raman images of cells, both solvent
and substrate references are readily obtained since these backgrounds are present around cells.
To improve the solvent and substrate correction, k-means clustering can advantageously be
used to obtain the most representable solvent/substrate reference spectra.4
2.2 Combining and normalising images in one analysis for direct comparison
Following the image specific solvent/substrate/autofluorescence background correction, all
images are combined and unfolded into a single matrix (Figure 1). During this step, it is
important to keep track of the original position of each spectrum in its respective image since
this is later used to reconstruct the resulting biochemical images. In order to make the series
of images comparable in intensity, the entire dataset is then normalised. Several normalisation
strategies are commonly used to correct for sample specific properties and unwanted
instrument effects such as optical focusing effects and laser power fluctuations. These include
peak normalisation, vector normalisation, standard normal variate (SNV), multiple scatter
correction (MSC) [14], and extended multiple scatter correction (EMSC) [13]. The optimal
method of normalisation should be chosen according to the specific experiment.
2.3 Image analysis methods
A variety of multivariate image analyses have been applied to Raman images including pure
pixel spectral unmixing techniques, Principal Component Analysis (PCA), Multivariate Curve
Resolution (MCR), constrained least squares regression of reference basis spectra, and cluster
analysis. Pure pixel spectral unmixing is a family of techniques that aims to identify the
corners of the simplex spanned by the matrix of spectra. These distinct spectra containing
biochemical signatures are termed ’endmembers'. Spectral unmixing assumes that a linear
combination of endmembers can explain the biochemical variance in the dataset. After
spectral unmixing and image reconstruction, each spectrum can be associated with a set of
abundance values for respective endmembers. Algorithms belonging to this family of
techniques include Vertex Component Analysis [10] and N-FINDR [11]. Both techniques
have previously been applied to Raman images [6-8, 15]. Briefly, the N-FINDR algorithm
searches for the spectral vectors spanning the largest volume of the respective image dataset.
Since this process is computationally intensive, it is not uncommon to apply dimension
reduction by PCA before implementing the method. PCA can also be used to determine the
number of components needed to characterise the specific dataset. Typically, including a large
number of endmembers results in redundant endmembers that do not add new biochemical
information for characterisation. It is therefore recommended to develop several models
implementing different numbers of endmembers. Results should be evaluated in terms of
model fit quality and the biomedical interpretation of the endmembers and resulting
abundance images.
5
Cluster algorithms represent another well-used technique for image analysis. These include
Hierarchical Cluster Analysis, k-means, and fuzzy c-means [2]. Cluster analysis has
traditionally been used to group spectra according to their similarity; the resulting image is
then constructed using the representative cluster groups. For the resulting group, a
representative spectrum can then be calculated as the mean of the group for hierarchical
cluster analysis or as the resulting k centroid in k-means. The k-means cluster is a commonly
used partitioning method [16]. In general, k-means clustering involves the partitioning of a
data set into clusters so that the differences between the data within each cluster are
minimised and the differences between clusters are maximised according a defined distance
measure. Typically, the Euclidian distance is applied for spectral datasets. First, the algorithm
selects k random spectra as starting centroids; a centroid denotes the centre or mean of the
clusters. Second, distances are calculated between every spectrum and these centroids, with
each spectrum subsequently assigned to a cluster whose centroid is nearest. When all spectra
have been assigned to the k centroids, a new set of centroids is calculated based on the mean
of the spectra associated with each centroid. This process is repeated until the assignment
does not change and the incremental improvement is below a given threshold. An alternative
stop criterion is the maximum number of iterations. A comparison of different clustering
techniques has been described elsewhere [8].
3. Results: Multi-image analysis example
In order to demonstrate multi-image analysis for quantitative Raman spectroscopic imaging,
we used a dataset of cells representing two biological conditions. We present the methodology
using non-confocal Raman spectroscopy images of cells in PBS solution and on MgF2
substrates. Specifically, we show an example where we corrected for solvent and substrate
background, assembled the entire dataset, normalised and finally performed quantitative
multi-image analysis.
3.1 Dataset
The Raman image dataset we used represented cells cultured under two biological conditions
and comes from the same dataset we presented previously in Autefage et al. [7]. This dataset
consisted of control mesenchymal stem cells (MSCs) grown in osteogenic media (control) and
cells exposed to ions from 100% strontium substituted bioactive glass (Sr 100). The goal of
the original study [7] was to evaluate how simple strontium ion incorporation within the
biomaterial 45S5 bioactive glass (BG) influenced the global response of human MSCs 6
(hMSC). Specifically, hMSCs were seeded at 4,000 cells/cm2 on MgF2 slides and cultured in
CTL or Sr 100-conditioned medium for 5 days. The cells were then fixed in 3.7% (vol/vol)
formaldehyde in PBS for 15 minutes and washed with PBS. We found and previously
reported a significant upregulation in cholesterol and cholesterol esters in the Sr 100 cells
compared to the control. The dataset we now use in the current paper is a subset of the dataset
presented in [7] and contains only four control and four Sr 100 cells. Its purpose here is to
serve only as an example with which to demonstrate the application of multi image Raman
analysis in biomedicine.
The Raman spectroscopy images of individual cells were acquired using a Renishaw 2000
(Renishaw, Gloucestershire, UK) with a 514.5 nm laser excitation (Melles Griot, Carlsbad,
CA) using 30 mW power at the sample. Each spectrum was acquired using 1.0 second
exposure time per spectrum with a water immersion objective 60× (NA = 1.0) (Nikon, Tokyo,
Japan) which gives a spatial resolution of ∼0.8 µm per pixel. A 600 groove/mm grating
covering the spectral region 512 cm−1 to 3,035 cm−1 was used for all Raman maps. Cells
were measured independently on different days (on MgF2 substrate in PBS solution) and
therefore contained both a substrate and solvent background.
3.2 Individual solvent and baseline correction of Raman spectroscopic images
Each Raman spectroscopy image was first corrected for solvent and substrate using EMSC-
SIS. To identify the reference and background spectra for correction, k-means cluster analysis
was performed for each image separately (see supplementary information for details).
Clusters representing substrate/solvent background and cells were identified and used for
EMSC-SIS correction. Figure 2 shows the 1 to 99 percentile raw Raman spectra and the pre-
processed spectra after EMSC-SIS correction. These results demonstrate that the EMSC-SIS
technique can efficiently remove the solvent/substrate background over a series of images.
Subsequently, we removed the areas in the image containing only solvent and substrate
background from the dataset since these would essentially contribute zero spectra.
3.3 Combining and normalising Raman images
Following removal of solvent and substrate background spectra from each individual image,
the data was unfolded into eight separate matrices. These were then combined into a single
matrix to generate a large dataset including all the respective Raman spectra (Figure 1). The
complete dataset was then normalised to make the individual images comparable. In the 7
example presented, we applied EMSC using the mean spectrum of the combined dataset as
cell reference spectrum. Figure 2 also shows the representative 1 to 99 percentile Raman
spectra after normalisation using EMSC. These images show that by using EMSC, the
spectral variance caused by power density/focusing variations can be significantly reduced,
which enables comparative image analysis.
3.4 Applying N-FINDR spectral unmixing
Due to the ability of high spatial resolution Raman spectroscopy imaging to resolve small
cellular components, we selected N-FINDR for image analysis [8]. We applied N-FINDR
spectral unmixing to the dataset of eight cells to extract the ’pure' endmember spectra across
all images. In order to estimate the number of endmembers, we applied PCA to the image
dataset. We found that five components explained 97.22% of the variance (PC1: 96.72%,
PC2: 0.37%, PC3: 0.06%, PC4: 0.04% and PC5: 0.03%). Subsequently we fitted a N-FINDR
model and also found that five endmembers adequately described the biochemical variation in
the multiple Raman images.
Endmember 1 was essentially associated with cytoplasmic proteins (e.g. 1004 cm–1 of
phenylalanine, 1660 cm–1 [Amide I], and 2940 cm−1 [CH3 stretching of proteins]).
Endmember 2 was also found in the cytoplasm but represented regions with higher lipid and
phospholipid content (e.g. 2850 and 2885 cm−1 [symmetric and asymmetric CH2,
respectively]). Endmember 3 was associated with cholesterol/cholesterol esters and phosphor
lipids as indicated by the presence of subtle bands at 717 cm−1 of choline groups and peaks at
1298, 1745, 2850 cm−1 and 2885 cm−1. This third endmember differed significantly from
endmembers 2 and 4 by the presence of cholesterol, cholesterol esters and higher protein to
lipid content. This endmember showed a very high abundance in the strontium bioactive glass
treated cells indicating an up regulation of cholesterol and cholesterol esters across the entire
cytoplasm as indicated in Figure 4c. This is in contrast to the very low abundance of both
endmember 2 and 4 in both control and treated cells. This could indicate that the lipids
represented by endmember 2 and 4 are localised in smaller compartments. Endmember 4 was
also associated with another specific type of lipid and phospholipid showing typical features
at 2850 cm−1 and 2885 cm−1 (symmetric and asymmetric CH2). Endmember 5 essentially
represented the nucleus rich areas with bands associated with DNA and RNA at 785, 1093,
1121, 1335, and 1574 cm−1 assigned to DNA, DNA-backbone stretching vibration, the U, T,
C (ring-breathing modes in the DNA/RNA bases), and the ring-breathing mode of DNA/RNA 8
and DNA bases, respectively. To generate abundance images showing the specific
biochemical components, we fed the endmembers to a non-negativity constrained least
squares algorithm. For each endmember, we constructed a merged image that represented the
abundance or relative concentration (Figure 3B).
3.5 Relative quantification and clustering
In this work, we performed a relative area quantification, which is a principle comparable to
fluorescence quantification for calculating the relative area of the cell associated with specific
chemical components. Following the N-FINDR analysis, each spectrum in the respective
images has an abundance value between 0 and 1 for each endmember spectrum. If a threshold
is set on the abundance value, pixels can be selected that predominantly have contents of that
endmember. In the example shown, we set the threshold at 0.3 and counted the number of
pixels with a higher abundance value. This threshold was selected as being higher than the
average abundance value for all endmembers (0.2). After counting pixels for each endmember
for each cell, we calculated the percentage of the total number of pixels representing each
endmember individually for each cell. The mean percentage abundance with standard
deviations (SD) for control and Sr 100 cells are shown in Figure 3C. These results showed
that Sr 100 cells were associated with distinct abundance profiles. A Mann-Whitney test
further showed that endmember 3 was associated with upregulated cholesterol and cholesterol
esters (p < 0.05) for the Sr 100 cells, which was in line with previous results [7].
To further explore the application of multi-image analysis, we fed the 40 N-FINDR
abundance images (8 cells × 5 biochemical images) into a k-means clustering analysis (Figure
4). Our analysis showed that the cells could be clustered across multiple images and that
control cells and Sr 100 could easily be visually distinguished according to changes in
cholesterol and cholesterol ester contents (Figure 3). These results demonstrate that the multi-
image analysis method can realise robust quantitative compositional analysis that can be used
for segmentation and clustering.
4. Discussion
Raman spectroscopic imaging offers a simple and yet highly effective label-free means by
which to interrogate the biochemical composition of cells and tissue with a high degree of
molecular specificity [4, 17]. This has recently enabled the comprehensive label-free
characterisation of biological cells and tissue across a vast number of biomedical disciplines.
Quantitative imaging analysis in biomedical applications, however, remains challenging. Due 9
to the spectral variability among different images, conventional approaches have been limited
to single image analysis that does not take full advantage of the quantitative characteristics of
Raman spectroscopy. In this work we describe a general methodology for Raman image
analysis that extends biomedical experiments from single image to multiple image analyses.
To our knowledge, this is the first detailed report on multi-image analysis in Raman
spectroscopic imaging (Figure 1). To demonstrate the practicality of our method, we showed
an example of cellular characterisation using non-confocal Raman spectroscopic imaging. The
dataset comprised both solvent and substrate backgrounds and represented two cellular
conditions measured over an extended time period. This represents one of the most
challenging experimental scenarios in biomedical Raman spectroscopic imaging. We applied
a pre-processing method that efficiently corrects each specific image for solvent and substrate
interferences (Figure 2). Our data shows that after solvent and substrate background
correction of each individual Raman image, the spectra essentially contained no residual
background. Other solvent and substrate background correction methods may be preferred or
perform equally well (e.g. MCR or constrained least squares regression methods). If a sample
emits auto-fluorescence, it may be necessary to use other methods (e.g. Whitaker filter or
constrained polynomial subtraction [18]) depending on the specific biological sample and
Raman instrumentation.
We further demonstrated how to combine and normalise a series of images together. In
general, normalisation ensures that all spectra have comparable total intensity. Normalisation
preserves the relative biochemical content and effectively helps to give all samples an equal
impact in factor analysis and spectral unmixing models. The normalisation efficiently reduces
the absolute intensity variability and as a consequence makes the images directly comparable
for multivariate image analyses.
Finally, we applied the pixel unmixing technique N-FINDR on the standardised dataset
containing the series of images. In general, there is a range of methods that can be applied to
analyse the combined images (e.g. PCA, MCR-ALS, VCR, clustering). The choice of method
depends on the specific dataset and objective in question. Pixel unmixing methods such as
VCA and N-FINDR are intuitive since they extract the most pure spectra from the dataset.
This means that the endmember spectra are actually contained in the Raman image and
therefore are chemically meaningful. These techniques generally work well for applications 10
where the pixels contain nearly pure biochemical components, and have been applied to a
range of problems [15, 19, 20]. Factor based methods such as MCR-ALS are based on
spectral variance [9]. In certain instances, this can be used to uncover hidden pure
components buried in the dataset. These models can also result in components that may not be
directly intuitive from an analytical point of view. It is therefore always advised to correlate
the extracted pure components with a reference library of biochemical Raman spectra. Both
factor based and pure pixel based methods have the advantage of being able to explain
complex datasets with very few components or endmembers. By contrast, cluster based
methods (e.g. HCA and k-means) group spectra according to their distance. These techniques
can be efficiently used to segment Raman images with clear distinction of areas. These
methods are not quantitative but readily apply to simpler systems.
Here we demonstrated an example using N-FINDR spectral unmixing that resulted in 5
endmembers (Figure 3A and B). Endmember 3 contained the specific Raman peaks of
cholesterol esters at 702, 1298, 1745, 2850, and 2885 cm−1. By contrast, endmembers 2 and 4
were associated with more complex lipid compositions relatively down-regulated in
strontium-treated cells. We performed relative area quantification in a way that was directly
comparable with conventional biomedical imaging analysis (e.g., immunohistochemistry).
The abundance values for each image showed a significant up-regulation in cholesterol and
cholesterol esters in the strontium-treated cells compared to control (p < 0.05), which was in
agreement with fluorescence, PCR and microarray studies [7]. We further extended our
analysis by applying k-means cluster analysis to the abundance images. These data showed
that the spectral profile of the entire cytoplasm were very distinct for the strontium treated
cells. This resulted in a perfect discrimination between the two cell types across the series of
images and showed that the developed methodology was robust for classification purposes. It
should be noted that the presented approach requires no prior knowledge, and only uses the
information obtained from the spectra to analyse the content. This means that this
methodology can be used as a label free explorative approach. This enables unbiased
explorative investigations of cells and tissue. When the target of the investigation has been
found it has previously been shown that supervised annotation can be used to greatly improve
automatic recognition of the constituents of cells and tissue [21-23].
We believe our methodology is directly transferable to various tissues and cells. The example
we presented involved multi-image Raman spectroscopic analysis using two cell types. We 11
acquired the images in this work using a non-confocal Raman setup. Recently developed high
performance confocal Raman setups generally reduce solvent and substrate background. The
specific experiment therefore dictates the choice of pre-processing and analysis and should be
chosen during the explorative image analysis. Future studies could aim to systematically
compare different pre-processing and image analyses for further improving Raman
spectroscopic imaging as a characterisation technique in biomedical applications. The
framework presented here represents a very general approach for multi-image analysis and
quantification in biomedical Raman spectroscopy imaging. Our methodology enables
comparisons of repetitions and longitudinal studies. Importantly, this work standardises
Raman spectroscopic imaging and enables database-based Raman spectral image analysis
across experiments that can enable the classification of cells and tissues based on model-based
predictions.
5. Conclusion
In summary, we have presented here a comprehensive methodology that enabled us to
combine a series of Raman images into the same image analysis. We applied our image
analysis to extract ’pure' biochemical components from the combined dataset for quantitative
assessment and clustering of cells, which revealed significant biochemical changes in lipid
content across independently measured Raman images. We showed that clustering of the
abundance values was able to distinguish the cells. The quantitative Raman image analysis
presented significantly improves cellular characterisation and opens up new possibilities to
use Raman spectroscopic imaging in biomedicine.
Acknowledgements
The authors wish to thank Hélène Autefage for cell culture assays. M.B. and M.M.S.
acknowledge the support of the Medical Research Council, the Engineering and Physical
Sciences Research Council, and the Biotechnology and Biological Sciences Research Council
UK Regenerative Medicine Platform Hub “A Hub for Engineering and Exploiting the Stem
Cell Niche” (MR/K026666/1). All raw data are available on request from
12
References
[1] G. J. Puppels, F. F. M. de Mul, C. Otto, J. Greve, M. Robert-Nicoud, D. J. Arndt-Jovin and T. M. Jovin, Nature 347, 301-303 (1990).
[2] I. W. Schie and T. Huser, Appl Spectrosc 67, 813-28 (2013).[3] C. Krafft, S. B. Sobottka, G. Schackert and R. Salzer, Analyst 130, 1070-7 (2005).[4] G. Shetty, C. Kendall, N. Shepherd, N. Stone and H. Barr, Br J Cancer 94, 1460-4
(2006).[5] K. Kong, C. J. Rowlands, S. Varma, W. Perkins, I. H. Leach, A. A. Koloydenko, H. C.
Williams and I. Notingher, Proceedings of the National Academy of Sciences 110, 15189-15194 (2013).
[6] T. C. von Erlach, M. A. Hedegaard and M. M. Stevens, Analyst 140, 1798-803 (2015).[7] H. Autefage, E. Gentleman, E. Littmann, M. A. B. Hedegaard, T. Von Erlach, M.
O’Donnell, F. R. Burden, D. A. Winkler and M. M. Stevens, Proceedings of the National Academy of Sciences 112, 4280-4285 (2015).
[8] M. Hedegaard, C. Matthäus, S. Hassing, C. Krafft, M. Diem and J. Popp, Theoretical Chemistry Accounts 130, 1249-1260 (2011).
[9] J. Felten, H. Hall, J. Jaumot, R. Tauler, A. de Juan and A. Gorzsas, Nat Protoc 10, 217-40 (2015).
[10] J. M. P. Nascimento and J. M. Bioucas Dias, Geoscience and Remote Sensing, IEEE Transactions on 43, 898-910 (2005).
[11] M. E. Winter, 1999.[12] H. Martens and E. Stark, Journal of Pharmaceutical and Biomedical Analysis 9, 625-
635 (1991).[13] H. Martens, J. P. Nielsen and S. B. Engelsen, Anal Chem 75, 394-404 (2003).[14] P. Geladi, D. MacDougall and H. Martens, Applied Spectroscopy 39, 491-500 (1985).[15] M. Miljkovic, T. Chernenko, M. J. Romeo, B. Bird, C. Matthaus and M. Diem,
Analyst 135, 2002-13 (2010).[16] J. MacQueen, Berkeley, Calif., 1967.[17] M. Hedegaard, C. Krafft, H. J. Ditzel, L. E. Johansen, S. Hassing and J. Popp, Anal
Chem 82, 2797-802 (2010).[18] C. A. Lieber and A. Mahadevan-Jansen, Appl Spectrosc 57, 1363-7 (2003).[19] C. Krafft, B. Belay, N. Bergner, B. F. M. Romeike, R. Reichart, R. Kalff and J. Popp,
Analyst 137, 5533-5537 (2012).[20] T. Chernenko, C. Matthaus, L. Milane, L. Quintero, M. Amiji and M. Diem, ACS
Nano 3, 3552-9 (2009).[21] K. Klein, Alexander M. Gigler, T. Aschenbrenner, R. Monetti, W. Bunk, F. Jamitzky,
G. Morfill, Robert W. Stark and J. Schlegel, Biophysical Journal 102, 360-368 (2012).[22] Krau, D. Petersen, D. Niedieker, I. Fricke, E. Freier, S. F. El-Mashtoly, K. Gerwert
and A. Mosig, Analyst 140, 2360-2368 (2015).[23] B. Bird, M. Miljkovic, S. Remiszewski, A. Akalin, M. Kon and M. Diem, Lab Invest
92, 1358-1373 (2012).
13
Figure Legends
Figure 1. Overview of the analysis protocol for the relative quantification of biochemical
content in cells using multiple Raman images. The image analysis consists of: (i) pre-
processing on each separate image to correct for the solvent/substrate/auto-fluorescence
background; (ii) dataset assembly and unfolding of the image matrix; (iii) normalisation of the
unfolded matrix to reduce experimental effect among different images; (iv) image analysis to
extract the relative abundances of biochemical components; (v) reconstruction of the
component images; and (vi) relative quantification or clustering of the biochemical content.
Figure 2. Raw Raman spectra representing 1 to 99 percentile spectra for clarity. Also shown
is the solvent/substrate corrected spectra as well as the normalised Raman spectra. All the raw
data for this dataset have been previously published [7].
Figure 3. A) Endmember spectra. These spectra were scaled and truncated in the silent region
(1800-2800 cm-1) for clarity. B) N-FINDR colour images. Each pixel is represented by a
combination of colours depending on the abundance value of an endmember. C) Mean
abundance values with standard deviations for control cells (n = 4) and Sr 100 cells (n = 4) of:
(a) Endmember 1 (cytoplasm); (b) Endmember 2 (lipids); (c) Endmember 3 (cholesterol and
cholesterol ester); (d) Endmember 4 (lipids); (e) Endmember 5 (nucleus). *p < 0.05, (Mann-
Whitney test). All the raw data for this dataset have been previously published [7].
Figure 4. k-means clustering image (k = 4) performed on the relative abundances extracted
using N-FINDR. All the raw data for this dataset have been previously published [7].
14
Figure 1
15
Figure 2
16
Figure 3A-I
17
Figure 4a-e
18
Graphical Abstract
We describe a broadly applicable methodology that enables the combination of a series of Raman images into a single image for analysis. This approach takes into account the entire Raman spectral profile of multiple images in order to identify, quantify and cluster the biochemical distribution across a series of Raman images.
19