DOI: 10 · Web viewPeer reviewed version of the manuscript published in final form at DOI: 10.1002/jbio.201500238. Article type: Full Article Quantitative multi-image analysis for

Peer reviewed version of the manuscript published in final form at DOI: 10.1002/jbio.201500238.

Article type: Full Article

Quantitative multi-image analysis for biomedical Raman spectroscopic imaging

Hedegaard M A B1*, Bergholt M S2, and Stevens M M2

*Corresponding Author: E-mail: [email protected]

1Department of Chemical Engineering, Biotechnology and Environmental Technology, University of Southern Denmark, Campusvej 55, 5230 Odense M, Denmark2Department of Materials, Department of Bioengineering and Institute of Biomedical Engineering, Imperial College London, London SW7 2AZ, United Kingdom

Received: ((will be filled in by the editorial staff))Revised: ((will be filled in by the editorial staff))Published online: ((will be filled in by the editorial staff))

Keywords: Raman spectroscopic imaging, multi-image analysis, biochemical quantification

1

Abstract

Imaging by Raman spectroscopy enables unparalleled label-free insights into cell and tissue

composition at the molecular level. With established approaches limited to single image

analysis, there are currently no general guidelines or consensus on how to quantify

biochemical components across multiple Raman images. Here, we describe a broadly

applicable methodology for the combination of multiple Raman images into a single image

for analysis. This is achieved by removing image specific background interference, unfolding

the series of Raman images into a single dataset, and normalisation of each Raman spectrum

to render comparable Raman images. Multivariate image analysis is finally applied to derive

the contributing ‘pure’ biochemical spectra for relative quantification. We present our

methodology using four independently measured Raman images of control cells and four

images of cells treated with strontium ions from substituted bioactive glass. We show that the

relative biochemical distribution per area of the cells can be quantified. In addition, using k-

means clustering, we are able to discriminate between the two cell types over multiple Raman

images. This study shows a streamlined quantitative multi-image analysis tool for improving

cell/tissue characterisation and opens new avenues in biomedical Raman spectroscopic

imaging.

1. Introduction

Raman spectroscopy is an optical vibrational technique based on inelastic light scattering of

molecules. In recent years there has been an increase in the use of Raman spectroscopic

imaging for the characterisation and analysis of cells and tissues [1, 2], diseases [3, 4], and

spectral pathology [5]. Much of this increase is due to the label-free biochemical information

(i.e. specific structures of proteins, lipids, and nucleic acids) with submicron spatial resolution

that Raman spectroscopic imaging reveals. Recently, Raman spectroscopic imaging has also

been used in regenerative medicine [6, 7].

Biochemical information from Raman spectroscopic imaging can generally be extracted and

analysed using multivariate and cluster analyses. While extensive efforts have been devoted to

Raman cluster analysis in biomedicine, only limited work has been conducted using

quantitative analysis. Recent attention has been drawn to the use of more sophisticated factor-

based analysis including multivariate curve resolution, alternating least squares (MCR-ALS),

and spectral unmixing techniques based on pure pixel approaches such as vertex component

2

analysis (VCA) or the N-FINDR algorithm (N-FINDR) [8-11]. Spectral unmixing methods

assume that the acquisition of a large number of spectra result in few spectra in the dataset

that contain nearly pure components known as ‘endmembers’. For high resolution Raman

spectroscopic imaging, this assumption is usually fulfilled for cells and tissues [8].

Applying Raman spectroscopic imaging to biomedicine often involves characterising cells

and tissues over multiple timepoints and repetitions, and can also involve the comparison of

two or more conditions. One of the recurring issues when analysing multiple Raman images is

their comparability. This is particularly important when the quantification or clustering of

different cell/tissues is involved. Raman spectroscopic imaging is further confounded by

several factors. First, biomedical samples are often immersed in a solvent (e.g. phosphate-

buffered saline [PBS]) on a substrate such as MgF2 that gives rise to varying spectral

backgrounds. Second, cell and tissue spectra can contain superimposed Raman- and auto-

fluorescence signals with prominent variance across images. Third, measured spectra can be

affected by varying cell/tissue morphology and optical properties (e.g. variation in refractive

indices, absorption, scattering and anisotropy), which influences absolute intensity. Finally,

Raman spectroscopy instruments experience day-to-day variation through factors such as

fluctuations in laser excitation power and optical focusing. Together, these effects contribute

differently to each Raman image according to the specific experiment and sample under

consideration. For this reason, spectral analyses are usually performed using each Raman

image separately. However, a separated analysis does not enable comparative studies, which

compromises the reproducibility and significance of results when quantification or clustering

are involved. As a result, the quantitative nature of Raman spectroscopy imaging is not fully

exploited in experiments. Since there are no general guidelines on how to quantify

biochemical components in multiple Raman images, there is a need to introduce standardised

methodologies when a series of independent images are acquired.

Here, we present a broadly applicable methodology that enables a user to combine a series of

Raman spectroscopic images into a single image for analysis. This is achieved by removing

image specific background interferences, unfolding the images into a single dataset, and

normalisation of each Raman spectra to render comparable images. Spectral unmixing is

finally applied as an example to derive the contributing endmembers for relative

quantification of biochemical components. We present our methodology using four

independently measured Raman images of control cells and four images of cells treated with

strontium ions from substituted bioactive glass. These cells have both differences in

substrate/solvent backgrounds, and total intensity that require correction before the images are 3

directly comparable. Our method is a multi-image analysis tool and illustrates the application

of Raman spectroscopy imaging for quantitative analysis in biomedicine, using a small but

known set of Raman spectroscopic images.

2. Methods for Multi-Image Analysis

Figure 1 shows an overview of our methodology for comparative analysis and quantification

of biochemical content using a series of Raman images. The image analysis consists of six

steps: (i) pre-processing (individual solvent/substrate/auto-fluorescence baseline correction of

images) and removal of spectra with no cellular information; (ii) dataset assembly and

unfolding into an image matrix; (iii) normalisation of the unfolded matrix to reduce absolute

intensity fluctuations among different images; (iv) image analysis (factor analysis or spectral

unmixing) to extract the relative contributions of biochemical components; (v) reconstruction

of ’pure' biochemical images; and (vi) relative quantification and clustering of biochemical

content. The selection of pre-processing and image analyses depends on the specific

experiment and sample under consideration. In this work, we applied a selection of pre-

processing and image analysis methods that are generally applicable for the conventional,

non-confocal Raman spectroscopic imaging of cells.

2.1 Pre-processing: Individual solvent/substrate/auto-fluorescence correction of images

To remove the image specific solvent and substrate background that are unrelated to the

biochemical information, we performed the pre-processing of each Raman image separately

before analysis (Figure 1). Since the solvent and substrate background spectrum can change

significantly in a single image (e.g. due to optical focus effects), its contribution must be

estimated and subtracted for each pixel independently. For known substrate and solvent

backgrounds, extended multiple scatter correction-spectral interference subtraction (EMSC-

SIS) is an efficient method for cells and tissue [12]. EMSC-SIS is an extension to extended

multiplicative scatter correction (EMSC) [13]. EMSC can effectively correct for offsets, linear

and quadratic effects observed across images caused by drifts in the instrument and changes

in optical focus between cells. The spectral interference subtraction component of EMSC-SIS

estimates the amount of background interference by using solvent/substrate reference spectra

and adjusting according to a corrected cell spectrum. For Raman images of cells, both solvent

and substrate references are readily obtained since these backgrounds are present around cells.

To improve the solvent and substrate correction, k-means clustering can advantageously be

used to obtain the most representable solvent/substrate reference spectra.4

2.2 Combining and normalising images in one analysis for direct comparison

Following the image specific solvent/substrate/autofluorescence background correction, all

images are combined and unfolded into a single matrix (Figure 1). During this step, it is

important to keep track of the original position of each spectrum in its respective image since

this is later used to reconstruct the resulting biochemical images. In order to make the series

of images comparable in intensity, the entire dataset is then normalised. Several normalisation

strategies are commonly used to correct for sample specific properties and unwanted

instrument effects such as optical focusing effects and laser power fluctuations. These include

peak normalisation, vector normalisation, standard normal variate (SNV), multiple scatter

correction (MSC) [14], and extended multiple scatter correction (EMSC) [13]. The optimal

method of normalisation should be chosen according to the specific experiment.

2.3 Image analysis methods

A variety of multivariate image analyses have been applied to Raman images including pure

pixel spectral unmixing techniques, Principal Component Analysis (PCA), Multivariate Curve

Resolution (MCR), constrained least squares regression of reference basis spectra, and cluster

analysis. Pure pixel spectral unmixing is a family of techniques that aims to identify the

corners of the simplex spanned by the matrix of spectra. These distinct spectra containing

biochemical signatures are termed ’endmembers'. Spectral unmixing assumes that a linear

combination of endmembers can explain the biochemical variance in the dataset. After

spectral unmixing and image reconstruction, each spectrum can be associated with a set of

abundance values for respective endmembers. Algorithms belonging to this family of

techniques include Vertex Component Analysis [10] and N-FINDR [11]. Both techniques

have previously been applied to Raman images [6-8, 15]. Briefly, the N-FINDR algorithm

searches for the spectral vectors spanning the largest volume of the respective image dataset.

Since this process is computationally intensive, it is not uncommon to apply dimension

reduction by PCA before implementing the method. PCA can also be used to determine the

number of components needed to characterise the specific dataset. Typically, including a large

number of endmembers results in redundant endmembers that do not add new biochemical

information for characterisation. It is therefore recommended to develop several models

implementing different numbers of endmembers. Results should be evaluated in terms of

model fit quality and the biomedical interpretation of the endmembers and resulting

abundance images.

5

http://onlinelibrary.wiley.com/doi/10.1002/jbio.201500238/full#jbio201500238-bib-0013

http://onlinelibrary.wiley.com/doi/10.1002/jbio.201500238/full#jbio201500238-bib-0014

http://onlinelibrary.wiley.com/doi/10.1002/jbio.201500238/full#jbio201500238-fig-0001

Cluster algorithms represent another well-used technique for image analysis. These include

Hierarchical Cluster Analysis, k-means, and fuzzy c-means [2]. Cluster analysis has

traditionally been used to group spectra according to their similarity; the resulting image is

then constructed using the representative cluster groups. For the resulting group, a

representative spectrum can then be calculated as the mean of the group for hierarchical

cluster analysis or as the resulting k centroid in k-means. The k-means cluster is a commonly

used partitioning method [16]. In general, k-means clustering involves the partitioning of a

data set into clusters so that the differences between the data within each cluster are

minimised and the differences between clusters are maximised according a defined distance

measure. Typically, the Euclidian distance is applied for spectral datasets. First, the algorithm

selects k random spectra as starting centroids; a centroid denotes the centre or mean of the

clusters. Second, distances are calculated between every spectrum and these centroids, with

each spectrum subsequently assigned to a cluster whose centroid is nearest. When all spectra

have been assigned to the k centroids, a new set of centroids is calculated based on the mean

of the spectra associated with each centroid. This process is repeated until the assignment

does not change and the incremental improvement is below a given threshold. An alternative

stop criterion is the maximum number of iterations. A comparison of different clustering

techniques has been described elsewhere [8].

3. Results: Multi-image analysis example

In order to demonstrate multi-image analysis for quantitative Raman spectroscopic imaging,

we used a dataset of cells representing two biological conditions. We present the methodology

using non-confocal Raman spectroscopy images of cells in PBS solution and on MgF2

substrates. Specifically, we show an example where we corrected for solvent and substrate

background, assembled the entire dataset, normalised and finally performed quantitative

multi-image analysis.

3.1 Dataset

The Raman image dataset we used represented cells cultured under two biological conditions

and comes from the same dataset we presented previously in Autefage et al. [7]. This dataset

consisted of control mesenchymal stem cells (MSCs) grown in osteogenic media (control) and

cells exposed to ions from 100% strontium substituted bioactive glass (Sr 100). The goal of

the original study [7] was to evaluate how simple strontium ion incorporation within the

biomaterial 45S5 bioactive glass (BG) influenced the global response of human MSCs 6

(hMSC). Specifically, hMSCs were seeded at 4,000 cells/cm2 on MgF2 slides and cultured in

CTL or Sr 100-conditioned medium for 5 days. The cells were then fixed in 3.7% (vol/vol)

formaldehyde in PBS for 15 minutes and washed with PBS. We found and previously

reported a significant upregulation in cholesterol and cholesterol esters in the Sr 100 cells

compared to the control. The dataset we now use in the current paper is a subset of the dataset

presented in [7] and contains only four control and four Sr 100 cells. Its purpose here is to

serve only as an example with which to demonstrate the application of multi image Raman

analysis in biomedicine.

The Raman spectroscopy images of individual cells were acquired using a Renishaw 2000

(Renishaw, Gloucestershire, UK) with a 514.5 nm laser excitation (Melles Griot, Carlsbad,

CA) using 30 mW power at the sample. Each spectrum was acquired using 1.0 second

exposure time per spectrum with a water immersion objective 60× (NA = 1.0) (Nikon, Tokyo,

Japan) which gives a spatial resolution of ∼0.8 µm per pixel. A 600 groove/mm grating

covering the spectral region 512 cm−1 to 3,035 cm−1 was used for all Raman maps. Cells

were measured independently on different days (on MgF2 substrate in PBS solution) and

therefore contained both a substrate and solvent background.

3.2 Individual solvent and baseline correction of Raman spectroscopic images

Each Raman spectroscopy image was first corrected for solvent and substrate using EMSC-

SIS. To identify the reference and background spectra for correction, k-means cluster analysis

was performed for each image separately (see supplementary information for details).

Clusters representing substrate/solvent background and cells were identified and used for

EMSC-SIS correction. Figure 2 shows the 1 to 99 percentile raw Raman spectra and the pre-

processed spectra after EMSC-SIS correction. These results demonstrate that the EMSC-SIS

technique can efficiently remove the solvent/substrate background over a series of images.

Subsequently, we removed the areas in the image containing only solvent and substrate

background from the dataset since these would essentially contribute zero spectra.

3.3 Combining and normalising Raman images

Following removal of solvent and substrate background spectra from each individual image,

the data was unfolded into eight separate matrices. These were then combined into a single

matrix to generate a large dataset including all the respective Raman spectra (Figure 1). The

complete dataset was then normalised to make the individual images comparable. In the 7

example presented, we applied EMSC using the mean spectrum of the combined dataset as

cell reference spectrum. Figure 2 also shows the representative 1 to 99 percentile Raman

spectra after normalisation using EMSC. These images show that by using EMSC, the

spectral variance caused by power density/focusing variations can be significantly reduced,

which enables comparative image analysis.

3.4 Applying N-FINDR spectral unmixing

Due to the ability of high spatial resolution Raman spectroscopy imaging to resolve small

cellular components, we selected N-FINDR for image analysis [8]. We applied N-FINDR

spectral unmixing to the dataset of eight cells to extract the ’pure' endmember spectra across

all images. In order to estimate the number of endmembers, we applied PCA to the image

dataset. We found that five components explained 97.22% of the variance (PC1: 96.72%,

PC2: 0.37%, PC3: 0.06%, PC4: 0.04% and PC5: 0.03%). Subsequently we fitted a N-FINDR

model and also found that five endmembers adequately described the biochemical variation in

the multiple Raman images.

Endmember 1 was essentially associated with cytoplasmic proteins (e.g. 1004 cm–1 of

phenylalanine, 1660 cm–1 [Amide I], and 2940 cm−1 [CH3 stretching of proteins]).

Endmember 2 was also found in the cytoplasm but represented regions with higher lipid and

phospholipid content (e.g. 2850 and 2885 cm−1 [symmetric and asymmetric CH2,

respectively]). Endmember 3 was associated with cholesterol/cholesterol esters and phosphor

lipids as indicated by the presence of subtle bands at 717 cm−1 of choline groups and peaks at

1298, 1745, 2850 cm−1 and 2885 cm−1. This third endmember differed significantly from

endmembers 2 and 4 by the presence of cholesterol, cholesterol esters and higher protein to

lipid content. This endmember showed a very high abundance in the strontium bioactive glass

treated cells indicating an up regulation of cholesterol and cholesterol esters across the entire

cytoplasm as indicated in Figure 4c. This is in contrast to the very low abundance of both

endmember 2 and 4 in both control and treated cells. This could indicate that the lipids

represented by endmember 2 and 4 are localised in smaller compartments. Endmember 4 was

also associated with another specific type of lipid and phospholipid showing typical features

at 2850 cm−1 and 2885 cm−1 (symmetric and asymmetric CH2). Endmember 5 essentially

represented the nucleus rich areas with bands associated with DNA and RNA at 785, 1093,

1121, 1335, and 1574 cm−1 assigned to DNA, DNA-backbone stretching vibration, the U, T,

C (ring-breathing modes in the DNA/RNA bases), and the ring-breathing mode of DNA/RNA 8

and DNA bases, respectively. To generate abundance images showing the specific

biochemical components, we fed the endmembers to a non-negativity constrained least

squares algorithm. For each endmember, we constructed a merged image that represented the

abundance or relative concentration (Figure 3B).

3.5 Relative quantification and clustering

In this work, we performed a relative area quantification, which is a principle comparable to

fluorescence quantification for calculating the relative area of the cell associated with specific

chemical components. Following the N-FINDR analysis, each spectrum in the respective

images has an abundance value between 0 and 1 for each endmember spectrum. If a threshold

is set on the abundance value, pixels can be selected that predominantly have contents of that

endmember. In the example shown, we set the threshold at 0.3 and counted the number of

pixels with a higher abundance value. This threshold was selected as being higher than the

average abundance value for all endmembers (0.2). After counting pixels for each endmember

for each cell, we calculated the percentage of the total number of pixels representing each

endmember individually for each cell. The mean percentage abundance with standard

deviations (SD) for control and Sr 100 cells are shown in Figure 3C. These results showed

that Sr 100 cells were associated with distinct abundance profiles. A Mann-Whitney test

further showed that endmember 3 was associated with upregulated cholesterol and cholesterol

esters (p < 0.05) for the Sr 100 cells, which was in line with previous results [7].

To further explore the application of multi-image analysis, we fed the 40 N-FINDR

abundance images (8 cells × 5 biochemical images) into a k-means clustering analysis (Figure

4). Our analysis showed that the cells could be clustered across multiple images and that

control cells and Sr 100 could easily be visually distinguished according to changes in

cholesterol and cholesterol ester contents (Figure 3). These results demonstrate that the multi-

image analysis method can realise robust quantitative compositional analysis that can be used

for segmentation and clustering.

4. Discussion

Raman spectroscopic imaging offers a simple and yet highly effective label-free means by

which to interrogate the biochemical composition of cells and tissue with a high degree of

molecular specificity [4, 17]. This has recently enabled the comprehensive label-free

characterisation of biological cells and tissue across a vast number of biomedical disciplines.

Quantitative imaging analysis in biomedical applications, however, remains challenging. Due 9

to the spectral variability among different images, conventional approaches have been limited

to single image analysis that does not take full advantage of the quantitative characteristics of

Raman spectroscopy. In this work we describe a general methodology for Raman image

analysis that extends biomedical experiments from single image to multiple image analyses.

To our knowledge, this is the first detailed report on multi-image analysis in Raman

spectroscopic imaging (Figure 1). To demonstrate the practicality of our method, we showed

an example of cellular characterisation using non-confocal Raman spectroscopic imaging. The

dataset comprised both solvent and substrate backgrounds and represented two cellular

conditions measured over an extended time period. This represents one of the most

challenging experimental scenarios in biomedical Raman spectroscopic imaging. We applied

a pre-processing method that efficiently corrects each specific image for solvent and substrate

interferences (Figure 2). Our data shows that after solvent and substrate background

correction of each individual Raman image, the spectra essentially contained no residual

background. Other solvent and substrate background correction methods may be preferred or

perform equally well (e.g. MCR or constrained least squares regression methods). If a sample

emits auto-fluorescence, it may be necessary to use other methods (e.g. Whitaker filter or

constrained polynomial subtraction [18]) depending on the specific biological sample and

Raman instrumentation.

We further demonstrated how to combine and normalise a series of images together. In

general, normalisation ensures that all spectra have comparable total intensity. Normalisation

preserves the relative biochemical content and effectively helps to give all samples an equal

impact in factor analysis and spectral unmixing models. The normalisation efficiently reduces

the absolute intensity variability and as a consequence makes the images directly comparable

for multivariate image analyses.

Finally, we applied the pixel unmixing technique N-FINDR on the standardised dataset

containing the series of images. In general, there is a range of methods that can be applied to

analyse the combined images (e.g. PCA, MCR-ALS, VCR, clustering). The choice of method

depends on the specific dataset and objective in question. Pixel unmixing methods such as

VCA and N-FINDR are intuitive since they extract the most pure spectra from the dataset.

This means that the endmember spectra are actually contained in the Raman image and

therefore are chemically meaningful. These techniques generally work well for applications 10

where the pixels contain nearly pure biochemical components, and have been applied to a

range of problems [15, 19, 20]. Factor based methods such as MCR-ALS are based on

spectral variance [9]. In certain instances, this can be used to uncover hidden pure

components buried in the dataset. These models can also result in components that may not be

directly intuitive from an analytical point of view. It is therefore always advised to correlate

the extracted pure components with a reference library of biochemical Raman spectra. Both

factor based and pure pixel based methods have the advantage of being able to explain

complex datasets with very few components or endmembers. By contrast, cluster based

methods (e.g. HCA and k-means) group spectra according to their distance. These techniques

can be efficiently used to segment Raman images with clear distinction of areas. These

methods are not quantitative but readily apply to simpler systems.

Here we demonstrated an example using N-FINDR spectral unmixing that resulted in 5

endmembers (Figure 3A and B). Endmember 3 contained the specific Raman peaks of

cholesterol esters at 702, 1298, 1745, 2850, and 2885 cm−1. By contrast, endmembers 2 and 4

were associated with more complex lipid compositions relatively down-regulated in

strontium-treated cells. We performed relative area quantification in a way that was directly

comparable with conventional biomedical imaging analysis (e.g., immunohistochemistry).

The abundance values for each image showed a significant up-regulation in cholesterol and

cholesterol esters in the strontium-treated cells compared to control (p < 0.05), which was in

agreement with fluorescence, PCR and microarray studies [7]. We further extended our

analysis by applying k-means cluster analysis to the abundance images. These data showed

that the spectral profile of the entire cytoplasm were very distinct for the strontium treated

cells. This resulted in a perfect discrimination between the two cell types across the series of

images and showed that the developed methodology was robust for classification purposes. It

should be noted that the presented approach requires no prior knowledge, and only uses the

information obtained from the spectra to analyse the content. This means that this

methodology can be used as a label free explorative approach. This enables unbiased

explorative investigations of cells and tissue. When the target of the investigation has been

found it has previously been shown that supervised annotation can be used to greatly improve

automatic recognition of the constituents of cells and tissue [21-23].

We believe our methodology is directly transferable to various tissues and cells. The example

we presented involved multi-image Raman spectroscopic analysis using two cell types. We 11

acquired the images in this work using a non-confocal Raman setup. Recently developed high

performance confocal Raman setups generally reduce solvent and substrate background. The

specific experiment therefore dictates the choice of pre-processing and analysis and should be

chosen during the explorative image analysis. Future studies could aim to systematically

compare different pre-processing and image analyses for further improving Raman

spectroscopic imaging as a characterisation technique in biomedical applications. The

framework presented here represents a very general approach for multi-image analysis and

quantification in biomedical Raman spectroscopy imaging. Our methodology enables

comparisons of repetitions and longitudinal studies. Importantly, this work standardises

Raman spectroscopic imaging and enables database-based Raman spectral image analysis

across experiments that can enable the classification of cells and tissues based on model-based

predictions.

5. Conclusion

In summary, we have presented here a comprehensive methodology that enabled us to

combine a series of Raman images into the same image analysis. We applied our image

analysis to extract ’pure' biochemical components from the combined dataset for quantitative

assessment and clustering of cells, which revealed significant biochemical changes in lipid

content across independently measured Raman images. We showed that clustering of the

abundance values was able to distinguish the cells. The quantitative Raman image analysis

presented significantly improves cellular characterisation and opens up new possibilities to

use Raman spectroscopic imaging in biomedicine.

Acknowledgements

The authors wish to thank Hélène Autefage for cell culture assays. M.B. and M.M.S.

acknowledge the support of the Medical Research Council, the Engineering and Physical

Sciences Research Council, and the Biotechnology and Biological Sciences Research Council

UK Regenerative Medicine Platform Hub “A Hub for Engineering and Exploiting the Stem

Cell Niche” (MR/K026666/1). All raw data are available on request from

[email protected].

12

References

[1] G. J. Puppels, F. F. M. de Mul, C. Otto, J. Greve, M. Robert-Nicoud, D. J. Arndt-Jovin and T. M. Jovin, Nature 347, 301-303 (1990).

[2] I. W. Schie and T. Huser, Appl Spectrosc 67, 813-28 (2013).[3] C. Krafft, S. B. Sobottka, G. Schackert and R. Salzer, Analyst 130, 1070-7 (2005).[4] G. Shetty, C. Kendall, N. Shepherd, N. Stone and H. Barr, Br J Cancer 94, 1460-4

(2006).[5] K. Kong, C. J. Rowlands, S. Varma, W. Perkins, I. H. Leach, A. A. Koloydenko, H. C.

Williams and I. Notingher, Proceedings of the National Academy of Sciences 110, 15189-15194 (2013).

[6] T. C. von Erlach, M. A. Hedegaard and M. M. Stevens, Analyst 140, 1798-803 (2015).[7] H. Autefage, E. Gentleman, E. Littmann, M. A. B. Hedegaard, T. Von Erlach, M.

O’Donnell, F. R. Burden, D. A. Winkler and M. M. Stevens, Proceedings of the National Academy of Sciences 112, 4280-4285 (2015).

[8] M. Hedegaard, C. Matthäus, S. Hassing, C. Krafft, M. Diem and J. Popp, Theoretical Chemistry Accounts 130, 1249-1260 (2011).

[9] J. Felten, H. Hall, J. Jaumot, R. Tauler, A. de Juan and A. Gorzsas, Nat Protoc 10, 217-40 (2015).

[10] J. M. P. Nascimento and J. M. Bioucas Dias, Geoscience and Remote Sensing, IEEE Transactions on 43, 898-910 (2005).

[11] M. E. Winter, 1999.[12] H. Martens and E. Stark, Journal of Pharmaceutical and Biomedical Analysis 9, 625-

635 (1991).[13] H. Martens, J. P. Nielsen and S. B. Engelsen, Anal Chem 75, 394-404 (2003).[14] P. Geladi, D. MacDougall and H. Martens, Applied Spectroscopy 39, 491-500 (1985).[15] M. Miljkovic, T. Chernenko, M. J. Romeo, B. Bird, C. Matthaus and M. Diem,

Analyst 135, 2002-13 (2010).[16] J. MacQueen, Berkeley, Calif., 1967.[17] M. Hedegaard, C. Krafft, H. J. Ditzel, L. E. Johansen, S. Hassing and J. Popp, Anal

Chem 82, 2797-802 (2010).[18] C. A. Lieber and A. Mahadevan-Jansen, Appl Spectrosc 57, 1363-7 (2003).[19] C. Krafft, B. Belay, N. Bergner, B. F. M. Romeike, R. Reichart, R. Kalff and J. Popp,

Analyst 137, 5533-5537 (2012).[20] T. Chernenko, C. Matthaus, L. Milane, L. Quintero, M. Amiji and M. Diem, ACS

Nano 3, 3552-9 (2009).[21] K. Klein, Alexander M. Gigler, T. Aschenbrenner, R. Monetti, W. Bunk, F. Jamitzky,

G. Morfill, Robert W. Stark and J. Schlegel, Biophysical Journal 102, 360-368 (2012).[22] Krau, D. Petersen, D. Niedieker, I. Fricke, E. Freier, S. F. El-Mashtoly, K. Gerwert

and A. Mosig, Analyst 140, 2360-2368 (2015).[23] B. Bird, M. Miljkovic, S. Remiszewski, A. Akalin, M. Kon and M. Diem, Lab Invest

92, 1358-1373 (2012).

13

Figure Legends

Figure 1. Overview of the analysis protocol for the relative quantification of biochemical

content in cells using multiple Raman images. The image analysis consists of: (i) pre-

processing on each separate image to correct for the solvent/substrate/auto-fluorescence

background; (ii) dataset assembly and unfolding of the image matrix; (iii) normalisation of the

unfolded matrix to reduce experimental effect among different images; (iv) image analysis to

extract the relative abundances of biochemical components; (v) reconstruction of the

component images; and (vi) relative quantification or clustering of the biochemical content.

Figure 2. Raw Raman spectra representing 1 to 99 percentile spectra for clarity. Also shown

is the solvent/substrate corrected spectra as well as the normalised Raman spectra. All the raw

data for this dataset have been previously published [7].

Figure 3. A) Endmember spectra. These spectra were scaled and truncated in the silent region

(1800-2800 cm-1) for clarity. B) N-FINDR colour images. Each pixel is represented by a

combination of colours depending on the abundance value of an endmember. C) Mean

abundance values with standard deviations for control cells (n = 4) and Sr 100 cells (n = 4) of:

(a) Endmember 1 (cytoplasm); (b) Endmember 2 (lipids); (c) Endmember 3 (cholesterol and

cholesterol ester); (d) Endmember 4 (lipids); (e) Endmember 5 (nucleus). *p < 0.05, (Mann-

Whitney test). All the raw data for this dataset have been previously published [7].

Figure 4. k-means clustering image (k = 4) performed on the relative abundances extracted

using N-FINDR. All the raw data for this dataset have been previously published [7].

14

Figure 1

15

Figure 2

16

Figure 3A-I

17

Figure 4a-e

18

Graphical Abstract

We describe a broadly applicable methodology that enables the combination of a series of Raman images into a single image for analysis. This approach takes into account the entire Raman spectral profile of multiple images in order to identify, quantify and cluster the biochemical distribution across a series of Raman images.

19

Documents

DOI: 10 · Web viewPeer reviewed version of the manuscript published in final form at DOI: 10.1002/jbio.201500238. Article type: Full Article Quantitative multi-image analysis for