Machine learning for blob detection in high-resolution 3D

IN DEGREE PROJECT COMPUTER SCIENCE AND ENGINEERING,SECOND CYCLE, 30 CREDITS

, STOCKHOLM SWEDEN 2018

Machine learning for blob detection in high-resolution 3D microscopy images

MARTIN TER HAAK

KTH ROYAL INSTITUTE OF TECHNOLOGYSCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

Machine learning for blobdetection in high-resolution 3Dmicroscopy images

MARTIN TER HAAK

EIT Digital Data ScienceDate: June 6, 2018Supervisor: Vladimir VlassovExaminer: Anne HåkanssonElectrical Engineering and Computer Science (EECS)

iii

Abstract

The aim of blob detection is to find regions in a digital image that dif-fer from their surroundings with respect to properties like intensityor shape. Bio-image analysis is a common application where blobs candenote regions of interest that have been stained with a fluorescent dye.In image-based in situ sequencing for ribonucleic acid (RNA) for exam-ple, the blobs are local intensity maxima (i.e. bright spots) correspond-ing to the locations of specific RNA nucleobases in cells.

Traditional methods of blob detection rely on simple image processingsteps that must be guided by the user. The problem is that the usermust seek the optimal parameters for each step which are often specificto that image and cannot be generalised to other images. Moreover,some of the existing tools are not suitable for the scale of the microscopyimages that are often in very high resolution and 3D.

Machine learning (ML) is a collection of techniques that give computersthe ability to ”learn” from data. To eliminate the dependence on userparameters, the idea is applying ML to learn the definition of a blobfrom labelled images. The research question is therefore how ML canbe effectively used to perform the blob detection.

A blob detector is proposed that first extracts a set of relevant and non-redundant image features, then classifies pixels as blobs and finallyuses a clustering algorithm to split up connected blobs. The detec-tor works out-of-core, meaning it can process images that do not fitin memory, by dividing the images into chunks. Results prove the fea-sibility of this blob detector and show that it can compete with otherpopular software for blob detection. But unlike other tools, the pro-posed blob detector does not require parameter tuning, making it eas-ier to use and more reliable.

KeywordsBiomedical Image Analysis; Blob Detection; Machine Learning; 3D;Computer Vision; Image Processing

iv

Abstract

Syftet med blobdetektion är att hitta regioner i en digital bild som skil-jer sig från omgivningen med avseende på egenskaper som intensiteteller form. Biologisk bildanalys är en vanlig tillämpning där blobbarkan beteckna intresseregioner som har färgats in med ett fluorescerandefärgämne. Vid bildbaserad in situ-sekvensering för ribonukleinsyra(RNA) är blobbarna lokala intensitetsmaxima (dvs ljusa fläckar) motsvarandeplatserna för specifika RNA-nukleobaser i celler.

Traditionella metoder för blob-detektering bygger på enkla bildbehan-dlingssteg som måste vägledas av användaren. Problemet är att an-vändaren måste hitta optimala parametrar för varje steg som ofta ärspecifika för just den bilden och som inte kan generaliseras till andrabilder. Dessutom är några av de befintliga verktygen inte lämpliga förstorleken på mikroskopibilderna som ofta är i mycket hög upplösningoch 3D.

Maskininlärning (ML) är en samling tekniker som ger datorer möj-lighet att “lära sig” från data. För att eliminera beroendet av använ-darparametrar, är tanken att tillämpa ML för att lära sig definitionenav en blob från uppmärkta bilder. Forskningsfrågan är därför hur MLeffektivt kan användas för att utföra blobdetektion.

En blobdetekteringsalgoritm föreslås som först extraherar en uppsät-tning relevanta och icke-överflödiga bildegenskaper, klassificerar sedanpixlar som blobbar och använder slutligen en klustringsalgoritm föratt dela upp sammansatta blobbar. Detekteringsalgoritmen fungerarutanför kärnan, vilket innebär att det kan bearbeta bilder som inte fårplats i minnet genom att dela upp bilderna i mindre delar. Resultatetvisar att detekteringsalgoritmen är genomförbar och visar att den kankonkurrera med andra populära programvaror för blobdetektion. Meni motsats till andra verktyg behöver den föreslagna detekteringsalgo-ritmen inte justering av sina parametrar, vilket gör den lättare att an-vända och mer tillförlitlig.

NyckelordBiomedicinsk bildanalys; Blobdetektion; Maskininlärning; 3D; Datorseende;Bildbehandling

v

Acknowledgements

First, I would like to express my gratitude towards my examiner As-soc. Prof. Anne Håkansson at the KTH Royal Institute of Technol-ogy for guiding me from the first project proposal all the way to thefinal deliverable. She was always open to answering the most trouble-some questions or providing critical feedback. Due to her meticulousremarks I was able to reshape and tweak my work in order to achievethe high quality it has now.

I would also like to thank my supervisor Jacob Kowalewski at Sin-gle Technologies under whom I performed this research. Not onlywould he provide me with the required resources at any moment, buthe would also not hesitate to free up time for discussion. That I wasable to finish the project well within the set time is most likely dueto his dependable commitment. Moreover, his ideas and suggestionshave strongly contributed to the approach applied in this project.

Furthermore, I would like to thank Single Technologies for providingme with a very interesting thesis subject and a pleasant working space.I want to thank my co-workers for the nice chats and the friendly am-bience around the office.

Finally, I would like to thank my university supervisor Assoc. Prof.Vladimir Vlassov who provided me with some highly needed hints sothat I could proceed with my research.

Martin ter Haak

Stockholm, May 2018

Contents

0.1 Acronyms and abbreviations . . . . . . . . . . . . . . . . ix

1 Introduction 11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.4 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4.1 Benefits, ethics and sustainability . . . . . . . . . . 51.5 Research methodology . . . . . . . . . . . . . . . . . . . . 61.6 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . 71.7 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 An introduction to in situ RNA sequencing 9

3 Blob detection 113.1 Automatic scale selection . . . . . . . . . . . . . . . . . . . 113.2 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.2.1 Template matching . . . . . . . . . . . . . . . . . . 133.2.2 Thresholding . . . . . . . . . . . . . . . . . . . . . 143.2.3 Local extrema . . . . . . . . . . . . . . . . . . . . . 163.2.4 Differential extrema . . . . . . . . . . . . . . . . . 163.2.5 Machine learning . . . . . . . . . . . . . . . . . . . 193.2.6 Super-pixel classification . . . . . . . . . . . . . . . 19

4 Machine learning 214.1 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.1.1 Naive Bayes . . . . . . . . . . . . . . . . . . . . . . 224.1.2 Logistic regression . . . . . . . . . . . . . . . . . . 234.1.3 K-Nearest Neighbour . . . . . . . . . . . . . . . . . 244.1.4 Decision Tree . . . . . . . . . . . . . . . . . . . . . 25

vi

CONTENTS vii

4.1.5 Random Forest . . . . . . . . . . . . . . . . . . . . 264.1.6 AdaBoost . . . . . . . . . . . . . . . . . . . . . . . 264.1.7 Support Vector Machines . . . . . . . . . . . . . . 274.1.8 Neural network . . . . . . . . . . . . . . . . . . . . 274.1.9 Validation . . . . . . . . . . . . . . . . . . . . . . . 29

4.2 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.2.1 K-means . . . . . . . . . . . . . . . . . . . . . . . . 304.2.2 Agglomerative clustering . . . . . . . . . . . . . . 304.2.3 MeanShift . . . . . . . . . . . . . . . . . . . . . . . 314.2.4 Spectral clustering . . . . . . . . . . . . . . . . . . 314.2.5 Other clustering algorithms . . . . . . . . . . . . . 324.2.6 Validation . . . . . . . . . . . . . . . . . . . . . . . 32

4.3 Dimensionality reduction . . . . . . . . . . . . . . . . . . 334.3.1 Principal Component Analysis (PCA) . . . . . . . 33

5 Related work 345.1 Blob detection . . . . . . . . . . . . . . . . . . . . . . . . . 345.2 Machine learning for biomedical image analysis . . . . . 35

6 Methodology 386.1 Blob detection process . . . . . . . . . . . . . . . . . . . . 38

6.1.1 Feature extraction . . . . . . . . . . . . . . . . . . . 396.1.2 Feature compression . . . . . . . . . . . . . . . . . 406.1.3 Pixel classification . . . . . . . . . . . . . . . . . . 406.1.4 Pixel clustering . . . . . . . . . . . . . . . . . . . . 406.1.5 Blob extraction . . . . . . . . . . . . . . . . . . . . 416.1.6 Blob filtration . . . . . . . . . . . . . . . . . . . . . 416.1.7 Chunking . . . . . . . . . . . . . . . . . . . . . . . 41

6.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 426.2.1 A: Feature extraction . . . . . . . . . . . . . . . . . 426.2.2 B: Feature compression . . . . . . . . . . . . . . . 456.2.3 C: Pixel classification . . . . . . . . . . . . . . . . . 456.2.4 D: Pixel clustering . . . . . . . . . . . . . . . . . . 496.2.5 E: Run on whole image . . . . . . . . . . . . . . . . 506.2.6 F: Comparison with state-of-the-art . . . . . . . . 516.2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . 51

6.3 Data collection . . . . . . . . . . . . . . . . . . . . . . . . . 516.3.1 Characteristics . . . . . . . . . . . . . . . . . . . . 516.3.2 Labelling . . . . . . . . . . . . . . . . . . . . . . . . 53

viii CONTENTS

6.4 Experimental design . . . . . . . . . . . . . . . . . . . . . 556.4.1 Test system . . . . . . . . . . . . . . . . . . . . . . 556.4.2 Software . . . . . . . . . . . . . . . . . . . . . . . . 566.4.3 Data analysis . . . . . . . . . . . . . . . . . . . . . 566.4.4 Overall reliability and validity . . . . . . . . . . . 56

7 Analysis 587.1 Results from A: Feature extraction . . . . . . . . . . . . . 587.2 Results from B: Feature compression and C: Pixel classi-

fication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617.3 Results from D: Pixel clustering . . . . . . . . . . . . . . . 667.4 Results from E: Run on whole image . . . . . . . . . . . . 707.5 Results from F: Comparison with state-of-the-art . . . . . 71

8 Conclusions 748.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 768.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . 78

Bibliography 79

A Experiment F software configurations 95A.1 Crops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95A.2 MFB detector . . . . . . . . . . . . . . . . . . . . . . . . . 95A.3 FIJI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96A.4 CellProfiler . . . . . . . . . . . . . . . . . . . . . . . . . . . 97A.5 Ilastik . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

0.1. ACRONYMS AND ABBREVIATIONS ix

0.1 Acronyms and abbreviations

Terms related to biology

RNA Ribonucleic acidFISH Fluorescence in situ hybridizationHCS High content screeningDNA Deoxyribonucleic acid

FISSEQ Fluorescent in situ sequencingmRNA Messenger RNA

HCA High content analysis

Terms related to image processing

2D Two-dimensional3D Three-dimensional

LoG Laplacian of GaussianGGM Gaussian gradient magnitudeDoH Determinant of HessianDoG Difference of Gaussians

Terms related to machine learning

ML Machine learningNN Neural network

PCA Principal component analysisSVD Singular value decomposition

MI Mutual informationSVM Support vector machine

RF Random forestDT Decision treeLR Logistic regression

KNN k-nearest neighbourNB Naive Bayes

ReLU Rectified linear unitRBF Radial basis function

Chapter 1

Introduction

In this thesis it is researched how machine learning can be applied toblob detection. What is meant with machine learning and blob detec-tion will be later described in their respective chapters. This chapterprovides an introduction to the research.

1.1 Background

On the interface of computer science and biology we have an interdis-ciplinary field called bio-informatics. This field focuses on applyingtechniques from computer science to better understand biological data.One of its areas, namely biomedical image analysis, aims to analyseimages that have been captured for the purpose of analysing medicaldata.

Microscopy imaging is an important tool in the biomedical field forapplications like the study of the anatomy of cells and tissues (histol-ogy) [1], urine analysis [2] and cancer diagnosis [3]. Fluorescent chem-icals are often added to mark interesting features in the images such aswith fluorescence in situ hybridization (FISH). FISH is the binding offluorescent dyes to specific ribonucleic acid (RNA) sequences in tissuecells [4]. By capturing microscopy images under certain lighting con-ditions, these sequences light up as groups of local intensity maxima,also called blobs (see Figure 1.1 for an example). The location and theorder of RNA sequences that are detected can be used for gene expres-

1

2 CHAPTER 1. INTRODUCTION

sion profiling. This profiling allows researchers to determine the typesand structure of single cells [5].

As microscopes are becoming faster and supporting higher resolutions,the scale of the produced images makes it unfeasible for researchers todo manual analysis. Even more, it has been demonstrated that ma-chine learning methods can outperform human vision at recognisingpatterns in microscopy images [6]. Therefore several bio-informaticssoftware packages [7–10] have been developed that facilitate them oreven make the analysis fully automatic in so called high-content screen-ing (HCS) [11]. Furthermore, confocal microscopes are increasingly be-ing used to create 3D images of cell tissue. These microscopes, whichwere to a large extent originally developed at KTH [12], can captureimages at different depths.

Machine learning, as a field from computer science, aims to ”train” pro-grams to perform specific tasks by supplying them with data. Learningfrom data is useful when the task is hard to formalise such is often thecase in object detection. For example, explaining to a computer howit can find cells in an image of animal tissue is hard. One way to dothis is by providing the computer with a large dataset of cell images.With this data machine learning algorithms can be applied to deducea visual definition of a cell. Using this new definition the computercan spot instances of cells in any image. The same reasoning can beapplied to detecting blobs in biomedical images as well. By supplyinga program with a set of examples of blobs, it can learn to detect blobsin images analogously to how it can detect cells.

1.2 Problem

In this thesis the aim is to do blob detection on high-resolution 3D mi-croscopy images. This is a difficult task because firstly it is often notpossible to check the veracity of the found blobs. Experts can usu-ally only assess the results by looking at them visually or by checkingwhether they match with prior knowledge. Secondly, the scale of im-ages poses a challenge for both the blob detection algorithms and forverifying the results.

Popular methods for biomedical image analysis rely on a number of

CHAPTER 1. INTRODUCTION 3

Figure 1.1: Microscope image of human tissue cells where RNA se-quences have been stained with a specific fluorescent dye. The blobs,visible as bright spots, are spatially clustered within cells. A single celland its most clear blobs have been labelled for an example.

simple image processing steps for which the user has to set the right pa-rameters such as in FIJI [13] and CellProfiler [10]. The main drawbackof this approach is that some assumptions have to be made in orderto tune parameters for the algorithms. Because these parameters areoptimised only for the current image set, they cannot always be gener-alised to other image sets. Or simply, what can be a blob in one imagemay not be a blob in another image. Secondly, to deal with noise popu-lar methods usually apply a number of pre-processing steps incurringextra time and additional parameters. Moreover, FIJI and CellProfilerwere not created with high-content screening in mind since they canonly process images that fully fit in memory, which is not always thecase. Also, for a tool that is so popularly used, CellProfiler is quite slowand some of its functions only work for 2D images.

To tackle the issue of user-set parameters, machine learning can be ap-plied to train a model that can find blobs without user interaction.In addition, the models can be taught to ignore noise, thereby skip-ping the pre-processing steps. The algorithms have to deal with the3D aspect and ideally use that information in their analysis. Further-


more, the algorithms have to operate out-of-core, meaning that theycan process images that do not fit in memory. Lastly, efficiency is amajor concern because of both the high resolution of microscopy im-ages nowadays and the extra computations that machine learning al-gorithms usually require. Therefore a requirement is that the analysisof an image does not take longer than the time needed to capture thatimage.

The research question is: How can machine learning techniques effectivelybe applied to blob detection in high-resolution 3D microscopy images?. Notethat here ’effective’ combines both the notions of high quality and lowrunning time since solutions that only excel in one aspect but lack inthe other are useless.

1.3 Purpose

The purpose of this thesis is to apply and test different machine learn-ing techniques for blob detection in high-resolution 3D microscopy im-ages. For the purpose of a proof-of-concept, images produced for insitu RNA sequencing are analysed as these images usually satisfy thesecharacteristics. Since multiple steps are needed to distinguish blobs,machine learning can be applied at different stages in different forms.Therefore in each step suitable machine learning techniques are tested.The result is an analysis that compares the tested machine learningtechniques and makes a conclusion on which are best suited for solvingthe problem.

1.4 Goals

The aim of this project is to aid the development of autonomous bio-image analysis tools such that they require user minimal interaction.As user-guided image processing is replaced by computer vision thehope is that these tools become both faster and more accurate. Whilehumans are limited by their cognitive capabilities, machines can con-tinuously be enhanced by iterative upgrades. Faster hardware, smarteralgorithms and better data will all help to progress the performance ofsuch analytical tools.


Even though blob detection is only one task of current bio-image analy-sis tools, insights originating from this research can be applied to othercommon tasks as well such as edge or corner detection. Machine learn-ing models can be taught to recognise cell membranes, cytoplasms ornuclei in a similar fashion as to blob detection. Different training dataand alternative features have to be used but the algorithms will be anal-ogous.

1.4.1 Benefits, ethics and sustainability

With the ongoing research on cell tissue such as brain and organs, theability to do large-scale gene expression profiling of single cells hasgreat advantages. The identity and function of every cell can be de-termined, which allows researchers to accurately map the structure ofcomplex tissues. Having an automated analysis pipeline can be a sig-nificant benefit to effective research in this field. Researchers do notwish to continuously adjust the settings with trial-and-error to findthose parameters that give the best results. Therefore an approach isneeded that picks the optimal settings for them so they can focus ontheir research.

Letting computers take over the tasks of humans for image analysis canlead to great gains in terms of performance. Computers will surely bemuch faster and work longer, but their accuracy will not necessarily becomparable to that of humans. Human experts can directly profit fromtheir prior knowledge where computer programs have to be specifi-cally tailored for this. This means that the precision of such computerprograms depends on the experience of both the original domain ex-pert and the software engineer. Human mistakes can lead to errors inthe software but while a human will usually notice when somethinghas gone wrong, a computer does not care as long as the exceptionis not caught. When machine learning is employed, this problem be-comes even more significant because then the accuracy of the softwarehinges on the quality of the training data. As biomedical images are fre-quently used in the research, diagnosis or treatment of human health,it is important to think about who should take responsibility when im-age analysis tools produce incorrect results. Ethics play an importantrole in deciding whether the producer of the software should be heldaccountable, or the user of the software. It is easy to shove the blame


towards the original creator but there is also the responsibility of theoperating researchers and doctors. This is a difficult predicament, butaccording to me the liability should be investigated on a case-by-casebasis. When an incident has occurred, thorough inspection of the in-volved events should be performed. The inspection should determinewhether the cause was a doctors mistake, software error or hardwarefault. Based on this information a verdict needs to be made on whoshould be held accountable.

Regarding the possible medical applications of an automated imageanalysis tool, it is not hard to imagine the profits it brings about for thesustainability of health. As we humans are being surpassed by com-puter vision in our image analysis ability, we can focus on the tasks inwhich we are still superior such as interpreting the results and draw-ing conclusions. The consequence is that we become more efficient attreating health. There is clearly a strong relationship with the thirdSustainable Development Goal (SDG): ”Good Health and Well-being”set down by the United Nations on September the 25th 2015 [14]. Theproject is not related to environmental sustainability.

1.5 Research methodology

Research can be classified as either quantitative; meaning that a phe-nomenon is proved by experiments or tested with large data sets (quan-tity), or as qualitative; wherein a phenomenon is studied through prob-ing the terrain or environment (quality) [15]. Since in this thesis thegoal is to find the algorithms that perform best on a certain input, quan-titative results will be collected. The performance is measured by pre-determined metrics, therefore numbers dictate the conclusions.

The philosophical assumption followed is post-positivism. Even thoughthe reality is objectively given through reproducible results, as in pos-itivism [15], different observers can have divergent opinions on whatis the ’optimal’ algorithm for the problem, which distinguishes post-positivism from just positivism [15]. Additionally, it may also dependin practice on which characteristics of the algorithm are deemed mostimportant. For example, a low quality but fast solution can have pref-erence to a high quality but slow solution in some cases. Realism, whichis the other potential philosophical assumption in this case [15], is not


applicable because it assumes that matters do not depend on the per-son who is thinking about them. However, it has just been argued thatthe interpreter possibly assesses the results subjectively.

The research method used is applied research because the practical prob-lem of blob detection needs to be solved, which is the main character-istic of applied research [15]. Multiple approaches are tested to find thebest approach with the application of RNA sequencing in mind. Possi-ble competing research methods are fundamental research; also called ba-sic research since it drives new innovations, principles and theories, anddescriptive research; which focuses on more statistical research and de-scribing the characteristics of a situation as opposed to describing thecauses and effects. However, since the goal of the thesis is to improvethe performance of known solutions it should not be characterised asbasic or descriptive research, but rather as applied research.

A deductive approach is adopted because a generalisation is concludedthat answers the research question, based on large amounts of quan-titative data [15]. An abductive approach could also be chosen, but thisapproach assumes that the data is incomplete [15]. Since more data canbe generated if desired, this is not the case in this project.

1.6 Delimitations

The main product of this thesis is the results and conclusions of theanalysis as opposed to the developed software. Since the developedsoftware is not meant to be used in production as-is, it does not haveto be highly optimised or robust to bad user input. Nevertheless, itsquality must be sufficient such that the test results are credible. Inaddition, the focus will be on evaluating existing techniques, insteadof coming up with custom algorithms and methods unless necessary.Available tried-and-tested implementations will be deployed to limitthe amount of coding and debugging needed. This means that onlythose algorithms will be tested of which there are thrust-worthy im-plementations such as those found in popular software libraries.


1.7 Outline

The first 3 chapters introduce the background information that is neededto understand the context and the experiments. Chapter 2: An intro-duction to in situ RNA sequencing provides a broad description of anexample application of blob detection. The next Chapter 3: Blob detec-tion describes the current state of art in algorithmic blob detection withbiomedical image analysis in mind. Chapter 4: Machine learning intro-duces the basic theory of the machine learning concepts and algorithmsthat are applicable in this thesis. It is followed by Chapter 5: Related workwhich discusses the papers and corresponding researches that are rele-vant to this thesis. Next Chapter 6: Methodology lays out the strategy foranswering the research question by six experiments. Chapter 7: Analy-sis contains the results of the experiments while argumenting their re-liability. The thesis ends with Chapter 8: Conclusions that answers theresearch question, discusses the implications and suggests some openquestions that are left.

Chapter 2

An introduction to in situ RNAsequencing

In order to do phenotypic profiling1 of single cells traditionally onewould look at the appearance of the cells by morphological methods2

[16]. In image-based cell profiling3, hundreds of morphological fea-tures [such as the shape, structure and texture] are measured froma population of cells treated with either chemical or biological per-turbagens [16]. A perturbagen is an agent (small molecule, geneticreagent, etc.) that can be used to produce gene expression changesin cell lines [17]. If one would then like to quantify the effects of atreatment, he or she can measure the changes in those morphologicalfeatures compared to untreated cell in the control group.

However, instead of looking at the results of gene expression such asthe shape and structure of cells, one could also look more directly atwhich RNA4 sequences are being synthesised by transcription. In tran-scription, messenger RNA (mRNA5) is synthesised as a complemen-tary copy of a DNA segment by an enzym called RNA polymerase6

[18]. These RNA sequences are used to transport the genetic informa-1use the set of observable characteristics to create a profile2methods that are based on form and structure3gaining information on a cell4ribonucleic acid, a molecule essential in various biological roles in coding, de-

coding, regulation, and expression of genes5RNA molecule that convey genetic information from DNA to the ribosomes6enzyme that is responsible for copying a DNA sequence into a RNA sequence

9

10 CHAPTER 2. AN INTRODUCTION TO IN SITU RNA SEQUENCING

tion from the DNA in the nucleus to the ribosomes7 where they specifythe amino acid8 sequence for the creation of proteins. Protein productslike enzymes control the processes in the cell by facilitating the chem-ical reactions [19]. By knowing which enzymes are being produced,one can tell the type and functions of single cells.

Developments in high-resolution microscopy together with fluorescencein situ hybridization (FISH) allow gene expression profiling for resolv-ing molecular states of many different cell types [20] without losingspacial information. The FISH procedure starts by binding specific flu-orescent chemicals to specific nucleobases in RNA-strings [4]. Thesechemicals are chosen such that they absorb light and emit it with alonger wavelength [21]. When capturing an image of that specific wave-length the locations of the fluorescent chemicals are revealed, and thusthe locations of the tagged nucleobases. The nucleobases will show upas local intensity maxima in the images, that are usually called blobs.By capturing multiple photos with different fluorescent agents, frag-ments of nucleobases (sometimes called barcodes) can be distinguishedthat can encode for the full RNA string [20]. One popular method offluorescent in situ RNA sequencing is FISSEQ [5].

Automated microscopy systems with the ability to make large amountsof high-resolution images each hour allow the transcriptonomic9 pro-filing of thousands of cells [22]. Even more, confocal microscopes canbe used to capture photos of the cells at different depths of the tissueresulting in 3D images [12]. One of the main challenges from a bio-informatics point of view is accurately finding the blobs correspondingto different nucleobases and use them to do RNA sequencing.

7complex molecule that acts as a factory for protein synthesis in cells8building blocks of proteins9based on information relayed through transcription

Chapter 3

Blob detection

Blob detection falls within the field of visual feature detection. Thisfield, which is part of computer vision, focuses on finding image prim-itives such as corners, edges, curves and other points of interests indigital images [23]. Blob detection is aimed at finding regions in an im-age that are different from the surroundings with respect to propertieslike brightness, colour and shape (see Figure 3.1a for more properties).These regions are called blobs (see Figure 3.1b for an example). Asthere are multiple definitions of blobs depending on the application,there are also many different algorithms for finding them.

A different but more exact definition used by Tony Lindeberg, who is ainfluential researcher on multi-scale feature detection, is that a blob isa region with at least one local extremum [24], such as a bright spot ina dark image or dark spot in a light image. Even though most classicaldefinitions consider blobs in 2D, the definition can be extended to 3Das well. In this thesis blobs are defined as small (< 50 pixels) round3D spots in an image that are brighter than their background (i.e. localintensity maxima). Refer back to 1.1 for an example.

3.1 Automatic scale selection

A majority of blob detection methods are based on automatic scale se-lection as inspired by Lindeberg [27]. Before detection, the image isconverted to scale-space representation by applying a convolutional

11

12 CHAPTER 3. BLOB DETECTION

(a) Examples of blob properties. From[25].

(b) Blob detection in a field of sunflow-ers. From [26].

Figure 3.1

g(x, y) =1

2πσ2exp{−x2 + y2

2σ2}

Equation 3.2: Two-dimensional Gaussian function. x is the distancefrom the origin in the horizontal axis, y is the distance from the originin the vertical axis, and σ is the standard deviation of the Gaussiandistribution

smoothing kernel over the image. In most cases this is the Gaussian fil-ter which performs a weighted average of its surrounding pixels basedon the Gaussian distribution (see Formula 3.2 for the 2D filter), leadingto a blurred image. The main purpose of scale-space representation isto understand the image structure at multiple levels of resolutions si-multaneously [27]. The scale can set by changing the parameter σ. Alarger scale σ increases the amount of smoothing, which leads to moreGaussian noise ignored and larger objects that can be detected [28]. Byrunning the blob detection algorithms on the same image at differentscales, blobs of different sizes can be detected. Figure 3.3 shows howwith different scale levels of Gaussian smoothing variously sizes blobscan be found.

CHAPTER 3. BLOB DETECTION 13

Figure 3.3: Smoothed and thresholded images of an old telephone forscale levels s2 = 0, 2, 16, 32, 128, 1024 (from top-left to bottom-right).From [24].

3.2 Algorithms

For every combination of blob definition and application different blobdetection algorithms can be optimal. In the domain of this thesis, a fewalgorithms stand out that are either popularly used or are potentialcandidates. These are template matching, thresholding, local extremaalgorithms, differential algorithms, algorithms using machine learningand over-segmentation.

3.2.1 Template matching

Since blobs can be regarded as simple objects in an image, templatematching can be applied to find them. This algorithm requires an im-age of the expected appearance of the object, called a template (Figure3.4a). The template is moved over the search image (Figure 3.4b) witha stride of 1 and objects are detected where the template matches partof the image [28]. Every time the sum of absolute differences (SAD)or sum of squared differences (SSD) is stored in a correlation matrix(Figure 3.4c). The highest values (local maxima) in the correlation ma-trix correspond to a high probability that a object is located there. Athreshold can then be used to extract the most significant objects andtheir locations. To find objects of different shapes and sizes multiple


(a) Template (b) Search image (c) Correlation image

Figure 3.4: Template matching for finding a coin in an image of a set ofcoins. From [30].

templates can be designed beforehand. Template matching is easy toimplement and very fast [29]. However, its main drawback is that ithas a hard time finding objects that do not match the precise template.Since the blobs in our case can be of slightly different sizes and some-times clumped up with other blobs, this method will not be very effec-tive.

3.2.2 Thresholding

When blobs are defined as either bright or dark spots in an image (Fig-ure 3.5a), one can simply threshold the pixels to attain a binary imagewith regions corresponding to blobs (Figure 3.5b). Many thresholdingtechniques exist that exploit different information such as shape, clus-tering, entropy and object attributes. Sezgin and Sankur performeda survey and comparison of 40 selected thresholding methods fromvarious categories [31]. Common processing steps that follow are fill-ing up holes within the blobs that are a result of noise and splittingup multiple connected blobs by using a watershed algorithm (Figure3.5b). The next step consists of locating the blobs by looking for con-nected components; groups of neighbouring blob pixels. Also the blobscan be filtered out that do not adhere to certain criteria such as size andshape. Finally the centroids of the blobs are calculated and returned asthe location of the blob (Figure 3.5d).

Exactly this approach is used by the popular bio-image analysis toolCellProfiler [10]. This interactive tool lets users create a custom pipelinethat takes an image as input and outputs results according to the cho-


(a) Input image (b) Binary image by thresholding

(c) Binary image after watershed (d) Final clustering and count

Figure 3.5: Common steps in a thresholding algorithm. Created usingFiji [13].

sen steps. These steps are simple image processing steps such as back-ground removal, smoothing, enhancements and object detection. Itworks well when the user has time to tweak the parameters for eachimage or when images are similar. However, if not so, then it can be-come quite time-consuming to do batch processing of a large numberof images.

Watershed

Watershed works by treating an image as a topographic map and let-ting ”water” flow from the peaks of the image downwards. In Figure3.6, the peaks are marked as red circles. The boundary where the waterfrom two markers meet each other indicates where the blobs should besplit.


Figure 3.6: Starting markers for watershed. First the shortest distanceto the edge of the blobs is computed for each pixel. The darker the pixel,the further it is from the edge. The local minima that then appear areused as markers (visualised as red circles). When multiple markers areclose together, then all but one are purged. Created using scikit-image[32].

3.2.3 Local extrema

One can also simply look at the local maxima or minima in intensity tofind the bright or dark blobs in the image. During run-time for every3x3 region (other sizes are possible) the location of the pixel with themaximum or minimum intensity is recorded, usually only when it isabove a certain threshold to ignore noise. These pixels are assumed tobe the centres of blobs. Next a filtering step often follows to removethe extrema that are not centres of blobs. Sometimes a segmentationalgorithm like watershed (see section 3.2.2) is used to find which otherpixels belong to the blobs. Although this method is simple, problemswill occur when there are large blobs with multiple local extrema. Inthis case the algorithm will output multiple smaller blobs instead of alarge one.

3.2.4 Differential extrema

Differential methods can be used instead, when local extrema are notsufficient to distinguish blobs due to noise. These methods are basedon the derivative of the intensity function with respect to the coordi-nates and will therefore pinpoint regions where the intensity changesfaster than in the rest of the image. Blobs can be mathematically repre-sented by a pair consisting of a saddle point and one extremum point


making it look like a peak in the frequency domain [33] (see Figure3.7).

Laplacian of the Gaussian (LoG) is a popular differential method usedfor blob detection [34]. First it convolves the input image by a Gaus-sian kernel at a certain scale t = σ2 to give a scale-space representationL(x, y; t), where x and y are the pixel coordinates. Next, it applies theLaplacian operator (3.1) which results in a strong positive response ofdark blobs of a specific size [34]. To capture blobs of different sizesusually the Gaussian kernel is applied with different scales simulta-neously with the scale-normalised Laplacian operator (3.2). Figure 3.8shows the result of applying the LoG with different scales to the sameimage. Since the Laplacian is expensive to compute, the Difference ofGaussians (DoG) is commonly used instead. This operator can be seenas an approximation of the Laplacian but is faster to compute. Similarlyas to the LoG method, blobs can be detected in different scale-spaces.It is computed as the difference between two images smoothed withGaussian kernels of different scales (3.3).

∇2L = Lxx + Lyy (3.1)

∇2normL = t(Lxx + Lyy) (3.2)

∇2normL ≈ t

∆t(L(x, y; t+∆t)− (L(x, y; t) (3.3)

The scale-normalised Determinant of the Hessian (DoH) is another pop-ular differential method. It uses the Monge-Ampère operator (3.4),where HL denotes the Hessian matrix of the scale-space representa-tion L. What Lindeberg found in a detailed analysis is that the Hessianoperator has better scale selection properties under linear image trans-formations than the Laplacian operator [34].

detHnormL = t2(LxxLyy − L2xy) (3.4)


(a) Sunflower with a linestraight through the y-centre. Adapted from[33].

(b) Intensity of the pixels on the red line in(a). The local minima that is used as the blobcentre is indicated, together with the saddlepoints. Created with Matplotlib [35].

Figure 3.7: Intensity function over the x-axis of a sunflower image. Thisis applicable to 2D and 3D as well.

Original image σ = 1.0 σ = 3.5 σ = 10.0

Figure 3.8: Laplacian of Gaussian applied to image with different scalesσ. Created using Ilastik [36].


3.2.5 Machine learning

The problem of the previous algorithms is that they require the user totune the parameters in order to find the desired blobs. Also what maybe good parameters in one image may not be satisfactory in another. Sowhat if we could teach the program what is a good blob by giving it ex-amples and then letting it find the other blobs according to the learneddefinition. This is exactly how supervised machine learning can be ap-plied to blob detection. In advance different features for each pixel arecalculated that describe the intensity, edges and texture. By precedingthe feature extraction with Gaussian smoothing using multiple scales,features are generated for multiple scale-spaces (as explained in sec-tion 3.1). Next, the user interactively selects some pixels belonging toa blob and some that do not belong to a blob. With this information asupervised machine learning algorithm like a RandomForest (see sub-section 4.1.5) or the support vector machine (SVM) (subsection 4.1.7) istrained which can predict the class of the remaining pixels.

The connected components of the blob pixels are then tagged as candi-date blobs in the next step. If these candidate blobs are as desired, thentheir centroids can be returned as the blobs positions. But if there arestricter criteria, then machine learning can be applied again to distin-guish the true blobs from the false blobs. First a set of features is calcu-lated for each candidate blob such as shape, size or intensity histogramfor example. Then a few blobs must be selected by the user as beingtrue and a few others as being false. A machine learning algorithm canthen use this information to identify only the correct blobs.

Since an arbitrary number of features can be included, this methodof finding blobs can be very accurate. This was what the people be-hind Ilastik thought as well, because their software does exactly this[36].

3.2.6 Super-pixel classification

Super-pixel classification is another approach for blob detection. Itstarts by creating a segmentation of the pixels into regions of pixelscalled super-pixels. This is essentially a clustering step that tries togroup neighbouring pixels together that are similar with respect to spe-


(a) Felzenszwalb (b) Quickshift

Figure 3.9: Products of two over-segmentation algorithms on an imageof the astronaut Eileen Collins. From [41].

cific properties. Algorithms producing such so-called over-segmentationsare, among others, Felzenzwalb’s [37] image segmentation algorithmand Quickshift [38] (see Figure 3.9). The next step is classifying thesesuper-pixels as being a blob or not. A popular approach for this is ex-tracting SIFT (scale-invariant feature transform) descriptors [39], mapthese to clusters and create a bag-of-visual-words histogram for theclusters appearing in the super-pixel as in [40]. The histogram is thenclassified as blob or non-blob using a supervised machine learning al-gorithm such as the SVM (see 4.1.7). This requires off-line training ofthe classifier prior to run-time with labelled super-pixels.

Chapter 4

Machine learning

In this chapter the machine learning techniques that can be applied tothe project’s problem are treated. As the focus of this thesis is to evalu-ate their performance, the techniques are only shortly discussed. Thesedescriptions are not meant to be exhaustive, so the reader is advised toconsult more elaborate sources if he or she requires a more thoroughexplanation.

Machine learning is a field of computer science that gives computersystems the ability to ”learn” (i.e. progressively improve performanceon a specific task) with data, without being explicitly programmed [42].Data is usually structured as multi-array of values. Each row corre-sponds to one instance (e.g. customer) that we can call a datapoint. Thecolumns are called features and describe characteristics of that instance(e.g. name, birth year, address, phone number, etc.). Typical tasksof machine learning are classification, regression, clustering, anomalydetection and structured prediction. A distinction that is commonlymade between machine learning algorithms is whether they are su-pervised or unsupervised.

Supervised learning is the task of learning a function that maps an in-put to an output based on example input-output pairs [43]. The outputvalue is commonly called label. After training the learned function canbe used to predict the label for new inputs. In classification the algo-rithm needs to decide to which discrete class a datapoint belongs. Aclassic example is classifying e-mails as either spam or non-spam. Re-gression on the other hand, aims to predict a continuous target value

21

22 CHAPTER 4. MACHINE LEARNING

for some datapoint. Lets say you want to approximate the price of ahouse with input information such as the floor area, location, buildyear and the number of bedrooms. Then you could look at other housesand build a model that describes the relationship between the house in-formation and the price. With enough data this model is then usablefor predicting other house prices.

Unsupervised learning algorithms are not provided with the labelsduring training. This means that they will have to find patterns ontheir own. One of the most common types of unsupervised learningis clustering. Here an algorithm is applied that groups datapoints to-gether that are similar with respect to some properties [44]. For a setof music tracks for example, it can investigated whether they can bepartitioned into categories by considering their metadata like the year,artist, genre and length. As another unsupervised learning type, di-mensionality reduction aims to describe the original data using fewerdimensions [45]. The main advantages of this are gaining a better con-ceptual understanding of the data, decreasing required storage spaceand improving the running time for following algorithms.

In literature different terms are used for the same concepts. Thereforenote that datapoints, instances, observations and example inputs allmean the same thing, namely the individual data-units. The attributesof the data-units are sometimes called properties, features or dimen-sions. The attribute that needs to be predicted can be called label, de-cision class, output class, response variable or target output.

4.1 Classification

4.1.1 Naive Bayes

As a baseline classifier, which is an algorithm to which other algo-rithms are compared, the naive Bayesian classifier is often used [46].This classifier uses the famous Bayesian theorem to make predictions(see Equation 4.1). It works by determining the probability of a data-point belonging to a certain class by considering the prior knowledgeof conditions related to the datapoint. For example, if one would liketo estimate the probability of a person Bert of age 57 having cancer

CHAPTER 4. MACHINE LEARNING 23

P (A|B) =P (B|A)P (A)

P (B)

Equation 4.1: Bayes theorem where A and B are events and P (B) ̸= 0.

P (Bert has cancer|Bert is 57 years old), the Bayesian formula can be usedwithP (A) as the probability of someone having cancer andP (B) as theprobability of someone being 57 years old.

To do a binary classification of Bert having cancer or not, this condi-tional probability is calculated and compared to the threshold of 50%.If the probability is more than 50%, then Bert is classified as havingcancer. Since the probabilities are usually assumed to be normally dis-tributed, probabilities can be estimated for conditions that have notbeen seen before.

This method can be extended to consider multiple conditions (i.e. fea-tures) by calculating the product of the conditional probabilities for thegiven conditions. Unfortunately, the main drawback of this method isthat it assumes that within one class all features are statistically inde-pendent, hence the name ”naive” [46]. On the good side though, re-search has shown that this is not a very significant problem in practice,especially for highly dimensional data [47]. Furthermore, this algo-rithm has very convenient properties that make it worth trying in manycases. Namely, it offers a range of important services such as learningfrom very large datasets, incremental learning, anomaly detection, rowpruning, and feature pruning - all in near linear time [46]. In addition,it requires a minimal memory footprint and is fast to train.

4.1.2 Logistic regression

The probability of a datapoint belonging to a certain class can be esti-mated in other ways as well. A common method is logistic regressionthat uses regression to fit a line h(x) through the data. By insertingthe h(x) value for a datapoint x into a sigmoid function (see Figure4.2), a number between 0 and 1 is returned. This number indicates theprobability that the datapoint belongs to the positive class (in the caseof binary logistic regression). Multinomial logistic regression may beused in cases where the dependent variable has more than two out-


f(x) =1

1 + e−x

Figure 4.2: Sigmoid function.

come categories.

4.1.3 K-Nearest Neighbour

A very simple classification algorithm that has seen popular usage inresearch is k-Nearest Neighbour (or kNN). Its ease of understandingand implementing, together with its general applicability, is the rea-son that it was included in the top 10 algorithms in data mining [48].Instead of building a model from the training data such as most otherlearning algorithms, it actually uses the training data directly for clas-sification. For this reason it is called a non-parametric classifier. For ev-ery datapoint it finds the k nearest datapoints in the training dataset.The classes of those nearest neighbours dictate the class of the inputdatapoint using a majority vote. The used distance function dependson the application but common types are the Euclidean and cosine dis-tance [49]. kNN is notorious to being sensitive to noise such as outliers.Too small values of k can lead to noisy datapoints receiving stronginfluence in classifying new datapoints [48]. Because n comparisonsneed to be made for each input datapoint, performance is a big issuefor large datasets as well. For this reason a number of improvementshave been proposed such as ’condensing’ [50] or ’editing’ [51] the train-ing dataset such that it becomes smaller but approximately retains itsaccuracy.


Figure 4.3: An example of a decision tree for deciding whether to gofor a trip when considering the weather. From [52].

4.1.4 Decision Tree

Decision trees are models that map observations about an item to con-clusions about its target value using a series of decisions based on theobservation’s attributes [52]. The decision tree model has the shape ofa directed acyclic graph in the form of a tree, where each internal noderepresents a decision and each leaf node represents the predicted classfor a given observation (see Figure 4.3). Every time a new observationhas to be classified, it starts with a comparison at the root of the tree.There one of its attribute is compared to a certain value and based onthis decision it continues down either of the node’s branches. At everynode such comparison is made until the observation arrives at the leafnode and a final classification is made.

Inducing the decision trees from training data is called decision treelearning. The goal is to generate a general model that can be used toclassify new observations [52]. There are different algorithms for gen-erating such model but they all rely on the main idea that at each nodethe decision has to be made that best splits the data with respect to thetarget class. The quality of the split is measured by the informationgain or information gain ratio that the decision produces. The infor-mation is often defined as the weighted average of Shannon Entropy94.1) or the Gini Impurity (4.2) over the new branches, where P (xi) is


defined as the probability of a possible value from {x1, ..., xn}.

H(X) = −∑i

P (xi) · log2 P (xi) (4.1)

Gini(X) = 1−∑i

P (xi)2 (4.2)

4.1.5 Random Forest

A common extension of decision trees are ensemble methods like ran-dom forests. These are a set of multiple induced decision trees thatcombine their outputs into a single classification to improve the over-all accuracy [52]. Decision trees are known for being very sensitiveto irregularities in the training data which makes them susceptible toover-fitting [52].

A random forest is created by building multiple decision trees, eachwith a different random sample of features from the training data. Thisensemble method is also sometimes called ”random subspace method”or ”feature bagging”. The motivation for this method is that it pre-vents classifiers from focusing on only a single (or few) features thatare strong predictors of the response variable. Because the classifiershave to look for more general features, they are less likely to over-fit.Ho performed an analysis of how random space projection leads to ac-curacy gains [53]. Random forests have been successfully applied topixel classification in the bio-image analysis software Ilastik. The ac-companying paper adds that ”The ability of the random forest to cap-ture highly non-linear decision boundaries in feature space is a majorprerequisite for the application to general sets of use cases.” [36].

4.1.6 AdaBoost

AdaBoost is another ensemble method that has shown good results inpractice. Similarly to bagging, it combines the predictions of multiplearbitrary classifiers. It was invented by Y. Freund and R.E. Schapire in1996 [54]. Where bagging takes into account all the predictors equally,boosting differs by actually taking a weighted sum of the predictions


as the final output. The ”Ada” in the name stands for adaptive, be-cause the algorithm is able to tweak subsequent weak learners suchthat they focus on instances that are harder to classify. By combiningweak learners that are only slightly better then random guessing, thefinal model is provably going to converge to a strong learner.

4.1.7 Support Vector Machines

Support vector machines (SVM) represent a powerful technique in clas-sification, regression and outlier detection [55]. Similarly as to decisiontrees and random forests, they are non-probalistic. For a binary classi-fication it seeks out an optimum hyperplane separating the two classesinvolved such that the distance between the closest representatives ofthe two classes is maximised. During training time SVM algorithmsbuild a SVM model that splits the training data into two classes withthe least error. Next, new datapoints are classified based on which sideof the hyperplane they fall. The hyperplane can be linear such in reg-ular linear SVM’s but sometimes has other shapes such as curves. Inthese cases a kernel function can used to map the data into a differentfeatures space [55]. In this new feature space it should be easier to finda linear hyperplane that divides the transformed data. Popular kernelsare the Gaussian radial basic function (RBF) kernel, the exponential kerneland the polynomial kernel.

4.1.8 Neural network

The previously described machine learning algorithms are heavily usedin industry and work well on a wide variety of important problems.However, for some problems central in artificial intelligence (AI), suchas speech recognition and object detection they have not achieved therequired performance. Therefore a new field of machine learning calleddeep learning has emerged, motivated in part by the failure of tradi-tional algorithms to generalise well on such AI tasks [56]. A significantchallenge for more complex data is the curse of dimensionality, whichmakes machine learning exceedingly more difficult when the numberof dimensions is high [56]. In order to cope with this problem tra-ditional machine learning algorithms need prior beliefs to be guided


f(x) = max(0, x)

Figure 4.4: Rectified Linear Unit (ReLU) function.

about what kind of function to learn. However, these priors hurt thealgorithm’s ability to generalise over more complex functions.

Deep learning relies heavily on the concept of artificial neural networks,which are networks of nodes inspired loosely by the neural networksof which animal brains are composed. These networks consist of con-nected layers where each layer is made up of nodes. The output ofthese nodes are actually linear functions of the input connections of thenode followed by a non-linear activation function such as sigmoid (Fig-ure 4.2) or Rectified Linear Unit (Figure 4.4). What makes these neu-ral networks ’deep ’ are their hidden layers that are situated betweenthe input and output layer. These hidden layers enable the network tolearn very complex non-linear functions as are required in more com-plicated tasks. Usually the nodes of each layer are connected to all thenodes in the neighbouring layers, this is called densely connected.

The most common type of artificial neural network is the feed-forwardnetwork that aims to approximate some function in order to predictthe output for any arbitrary input. This can be useful for tasks such asclassification and regression but even for more complex tasks such asdata compression or image segmentation. These networks are called’feed-forward’ because the data ’flows’ from the input layer to the out-put layer (see Figure 4.5). There are no feedback connections in thistype of network, in contrast to recurrent neural networks for example.The method of training a feed-forward neural network (or most otherartificial neural networks) is called backpropagation. Backpropagation isused to calculate the gradient of the loss function with respect to theweights working from the final layer back to the first hidden layer. This


Figure 4.5: Example of a feedforward neural network. Adapted from[59].

gradient is then needed in gradient descent to update the weights of eachlayer. There are also more advanced optimisation algorithms such asAdadelta [57] and Adam optimiser [58].

4.1.9 Validation

F1-score

To measure the performance of a binary classification algorithm thef1-score is often used. It is defined as the harmonic mean (4.3) of theprecision (4.4) and recall (4.5). Its range runs from 0.0 to 1.0.

f1 = 2 · precision · recallprecision + recall (4.3)

precision =|true positive|

|true positive|+ |false positive| (4.4)

recall = |true positive||true positive|+ |false negative| (4.5)


4.2 Clustering

4.2.1 K-means

K-means clustering is one of the most simple and popular clustering al-gorithm [60]. It starts with selecting k random (though smarter meth-ods exists) points from the data as centroids. Then in the next step itassigns each of the remaining points to the closest centroid. At the endof this iteration the points have been partitioned in k disjoint clusters.Next, for each cluster a new centroid is calculated as the mean of allthe attribute values of the points in the cluster. In the next iterationthe points are assigned to the new centroids. The algorithm continuesiterating until either the centroids stop moving between iterations oranother stop criterion is reached. The space requirements for K-meansare modest because only the data points and centroids are stored [60].K-means is also quite fast because its running time is linear with respectto the dataset size. This makes it a powerful multi-purpose clusteringalgorithm and a good starting point for more advanced clustering al-gorithms.

4.2.2 Agglomerative clustering

Agglomerative clustering is an example of a hierarchical clustering methodthat first derives a hierarchical tree from the data and then infers themain clusters [60]. Agglomerative is also sometimes called bottom-up because it starts by putting each point in a separate cluster, andthen builds up new larger clusters from the smaller clusters until allpoints are connected. At every step it determines which two clustersare closest together and then merges them. There are different linkagecriteria for deciding the distance between clusters such as: minimaldistance between closest members (single linkage), minimal distancebetween furthest members (complete linkage), distance between cen-troids and minimal sum of squared differences clusters (’ward’ link-age). The biggest drawback of this algorithm is the running time sinceit needs to compare every pair of clusters in each step, therefore re-quiring O(n3) [60] computations. There are however faster implemen-tations that run in O(n2 logn). Another challenge is the non-triviality


of inferring the flat clusters from the hierarchical tree since the criteriacan be subjective. Examples of criteria are a fixed number of clustersor a maximum distance between clusters.

4.2.3 MeanShift

As a non-parametric clustering algorithm for feature spaces MeanShiftwas proposed in 2002 [61]. The algorithm relies on centroids that itcontinuously updates to be the mean of the points within a given re-gion. It aims to discover ’blobs’ in a smooth density of samples [62].This property makes the algorithm an attractive candidate for cluster-ing pixels, since blobs have a consistent density and are often quitedense. The algorithm is however not as highly scalable, because it re-quires multiple nearest neighbour searches during the execution of thealgorithm [62]. The only parameter it requires is the bandwidth, whichdictates the size of the region to search through. The bandwidth canbe set beforehand or be estimated.

4.2.4 Spectral clustering

Spectral clustering algorithms use the top eigenvectors of a matrix de-rived from the distance between points (also called affinity matrix) [63].For this family of algorithms a common approach goes as follows. Firstthe affinity matrix is calculated for all the points. Then an eigendecom-position is performed on the normalised Laplacian of this matrix. Nextthe k eigenvectors are selected belonging to the top k highest eigenval-ues. These vectors are concatenated into a matrix of n× k. Finally thepoints in this lower-dimensional space are assigned to k clusters usinga simple clustering algorithm such as k-means. k is usually determinedbeforehand as the expected number of clusters, but other approachesexist that guess k from the eigendecomposition matrix. Spectral clus-tering is a popular algorithm because it is simple to implement, canbe solved efficiently by standard linear algebra software and very of-ten outperforms traditional clustering algorithms such as the k-meansalgorithm [64].


4.2.5 Other clustering algorithms

Of course the mentioned list does not cover all algorithms that havebeen invented for the purpose of clustering, which is impossible dueto the overwhelming amount of literature on the subject. Other pop-ular algorithms that have been considered but are deemed unsuitableare: affinity propagation [65]; since it does not scale very well for n, DB-SCAN [66]; because all the blobs have the same density the algorithmwill not be able to distinguish them, and clustering using the Gaussianmixture model; since it has too many parameters and is not scalable[67].

4.2.6 Validation

Silhouette Score

Since the ground-truth of the cluster to which a point belongs is of-ten either subjective or unknown, it is difficult to evaluate the qualityof a clustering algorithm. The Silhouette Coefficient was therefore pro-posed by P.J. Rousseeuw in 1987 [68], because it can be calculated solelyfrom the clustering results. It is composed of two scores [69]:

a The mean distance between a datapoint and all other points inthe same cluster

b The mean distance between a datapoint and all other points inthe next nearest cluster

The Silhouette Coefficient s for a single datapoint is then defined as:

s =b− a

max(a, b)

To determine the Silhouette Score for a dataset, the mean of the Silhou-ette Coefficients for all datapoints is calculated (or a random sample ifthere are too many).

The score is bounded by -1 for incorrect clustering and +1 for highlydense clustering. When the score is around zero it means that clustersare likely overlapping [69]. If the clusters are dense and well separated,


then the score is higher, which corresponds to the conventional defi-nition of a cluster. Furthermore, a large silhouette score correspondswith a high value of roundedness of the clusters which is fortunately adesired property of blobs.

4.3 Dimensionality reduction

4.3.1 Principal Component Analysis (PCA)

Principal Component Analysis is likely the most popular multivariatestatistical technique [70]. It is widely used as a method for dimension-ality reduction. It can be thought of as fitting a k-dimensional ellipsoidto the data such that each axis corresponds to a principal component.The axes are chosen so that they explain the highest amount of vari-ance and are orthogonal with respect to each other. The first principalcomponent is the axis that explains the largest amount of variance. Thesecond principal component lays orthogonally to the first and explainsthe second highest amount of variance and so on. Shorter axes do notprovide as much information and can therefore be removed.

The aim of PCA is to find from the data X these components, whichare linear combinations of the original variables. Singular Value De-composition (SVD) is used to calculate X = P∆QT [70]. The matrixP∆ denotes the factor scores, or in other words the importances of thedimensions. Matrix Q holds the coefficients of the linear combinationsused to compute the factor scores. By multiplying the original matrixX with Q we can project the data to a lower dimensional space. Theresult is a compressed version of the original matrix which can be usedto speed up following steps. Because of this characteristic PCA is oftenused for image compression where it has proved itself to be effective[71].

Chapter 5

Related work

The related work is divided in two subjects: blob detection and ma-chine learning for biomedical image analysis. Each subject is treatedseparately.

5.1 Blob detection

A survey on the usage of blob detection algorithms for biomedical im-age analysis described in literature has been done in 2016 [72]. Theauthors gathered and examined 30 relevant papers in which classicalblob detection algorithms are utilised. In other words, the algorithmsthat do not use any machine learning or artificial intelligence. Theyfound that a majority (20 of 30) of the papers used either the Laplacianof the Gaussian (LoG), the Difference of Gaussians (DoG) or the De-terminant of Hessian (DoH) method (see Figure 5.1). The authors didnot provide an explanation why they think these methods are the mostpopular.

Blob detection is not only useful for analysing biomedical images, butalso for images in other fields. When fruits are interpreted as blobsfor example, machine vision techniques can be used to count them intrees [40, 73]. Tracking piglets in videos [74] and traffic sign detectionfor autonomous driving [75] are alternative applications of blob detec-tion. Other types of images may be analysed as well like infra-red [76]and ultrasound images for the purpose of detecting breast abnormali-

34

CHAPTER 5. RELATED WORK 35

Figure 5.1: Frequency of blob detection methods used in 30 biomedicalimage analysis papers. From [72].

ties [77]. Ultimately blob detection can be used for any problem whereregions need to be detected that are visually distinct from their sur-roundings.

5.2 Machine learning for biomedical image anal-ysis

With the development of more powerful computing systems the useof artificial intelligence is becoming ubiquitous. The bio-informaticsexperts have already discovered the advantages of computer-assistedimage analysis in the 2000’s and a great deal of literature has been writ-ten about it already.

Before the popularity of machine learning for computer vision, moresimple image processing techniques were used such as segmentation,thresholding, and watershed (see 3.2.2) such as in CellProfiler [10].CellProfiler allows the user to define a number of processing modulesin sequence for performing analysis on cell images. Another popu-lar software application for image processing is FIJI [13], which is a”batteries-included” version of the powerful ImageJ 1.x [78] image pro-cessing tool. Even though these tools have many features, each step inthe analysis requires parameters to be determined by the user. This canbe difficult because the results depend highly on these parameters and

36 CHAPTER 5. RELATED WORK

it is difficult to affirm the correct parameters. Also, both tools cannotdo out-of-core image processing without resorting to custom pluginsor scripts. This missing feature makes them unsuitable for analysingimages that do not fit in memory. Even more, 3D is not yet fully sup-ported in CellProfiler with some crucial functions missing.

Gene expression profiling using image processing methods is describedin [5, 20, 79]. Transcriptomics, the techniques used to study an organ-ism’s transcriptome (sum of all its RNA transcripts), are often used forgene expression profiling. Image analysis methods have been success-fully applied to transcriptomics [22, 80]. The authors of the papers [81,82] have provided an extensive explanation of the common steps in anautomated bio-image analysis pipeline. Especially in high-throughputexperiments, image analysis is used heavily to quantify phenotypes ofinterest to biologists [16]. Papers such as [16, 83, 84] treat the commoncase of phenotypic cell profiling specifically.

Since mere image processing methods were not always sufficient, ma-chine learning techniques have become more present in biomedicalimage analysis - often in the context of high-content screening (HCS).By using techniques from image processing, computer vision and ma-chine learning, large amounts of bio-image data can be analysed, whichis also frequently called high content analysis (HCA) [82]. Machinelearning has also been applied to cell segmentation [85, 86] and nu-cleus detection [87]. In [88] the authors use a type of neural network,called convolutional neural network, to detect nuclei. For cell segmen-tation SVM’s are utilised in [89], while deep learning algorithms arecompared in [90].

Besides proprietary software, free tools that apply machine learning forimage-based cell analysis have been developed such as CellCognition[8], CellClassifier [7], Advanced Cell Classifier (ACC) [91] and cellX-press [9]. CellCognition is a computational framework for quantitativeanalysis of high-throughput fluorescence microscopy. It has functionsfor among others: image segmentation, object detection, feature extrac-tion and statistical classification. The only purpose of CellClassifier isautomatic classification of single-cell phenotypes using supervised ma-chine learning. It requires images that have been prepared with Cell-Profiler. CellCognition and CellClassifier rely on a SVM for phenotypicclassification. Advanced Cell Classifier is the improvement over Cell-Classifier that is more user-friendly, allows for more advanced machine

CHAPTER 5. RELATED WORK 37

learning with 16 different classifiers and was made for high-contentscreens. cellXpress is another fully featured and highly optimised soft-ware platform for cellular phenotype profiling. The platform is de-signed for fast and high-throughput analysis of cellular phenotypesbased on microscopy images.

Notably the bio-image analysis tool Ilastik [36] was a major inspira-tion for this project because its use of active learning, which lets userslabel a few instances iteratively on-line, showed to be very effective. Ituses a random forest for pixel classification, without indication of othertried algorithms. Therefore in this project other machine learning can-didates will be evaluated as well. The described tools are primarilymeant for cell detection and profiling, which is slightly different fromblob detection. Besides that, they all work semi-automatically by re-quiring the user to label a few instances beforehand, while the aim ofthis project is to do blob detection fully automatically.

Chapter 6

Methodology

Machine learning can be applied to blob detection by first classifyingthe pixels as blobs/non-blobs, followed by a clustering of these pixelsinto blobs. This approach will also be used in this research, as exten-sively described in section 6.1. Based on this approach a number ofexperiments can be devised that help answer the research question.These are discussed in section 6.2. Section 6.3 describes the charac-teristics of the data, how the data was collected and how it has beenlabelled. The last section 6.4 discusses the details of the experimentsand how overall reliability and validity will be assured.

6.1 Blob detection process

In this project the blob detection process consists of 6 subsequent steps.The input is a 3D image and the output is a list of blob coordinates. Thefirst step starts by extracting a set of features for each pixel in the image.These features signify intensity, edges and texture. An optional inter-mediary step is compressing the features with PCA. Next, a trainedclassification model is used to classify the pixels into two classes: bloband non-blob. The resulting binary image is passed into a clusteringsteps that attempts to declump the touching blobs. In the next stepthe locations and characteristics of all blobs are extracted. Finally, theblobs are filtered based on their characteristics and returned as output.An overview of the process is visible in figure 6.1. Now each step will

38

CHAPTER 6. METHODOLOGY 39

Figure 6.1: The blob detection process.

be more thoroughly discussed.

6.1.1 Feature extraction

Besides the single pixel intensities, filters can be applied to the inputimage to obtain additional features for each pixel. The filters are partlyinspired by Ilastik [36]. The intensities of neighbouring pixels are rep-resented by the raw image smoothed with a Gaussian filter. The Lapla-cian of Gaussian, Difference of Gaussians and Determinant of Hes-sian (see 3.2.4), and Gaussian of gradient magnitude are used to detectedges. The texture of regions is distinguished by the eigenvalues of thestructure tensor (see 3.2.5) and the eigenvalues of the Hessian of Gaus-sian [36]. The scale of the filter σ for each feature can be specified. Thefeatures can be calculated in 2D by calculating them for each z-planeseparately, but some also in 3D by applying a Gaussian filter in the z

dimension.

40 CHAPTER 6. METHODOLOGY

6.1.2 Feature compression

The idea behind this step is that since the number of extracted featuresmay be too high, it can take very long to train and run a classificationalgorithm. Also some features may not be very relevant after all. A di-mensionality reduction algorithm such as PCA can reduce the numberof features without sacrificing the accuracy too much. This step takesin the pixel features, transforms them using a fitted PCA model andoutputs the resulting pixel features in a lower dimensionality.

6.1.3 Pixel classification

A trained classifier model can now be used to classify pixels by theirfeatures. The output of the classification is a list of predicted labels foreach pixel saying whether it is likely part of a blob or not. The labels forall pixels are then put together again such that we get a binary imagewith a background of 0’s and regions of 1’s that denote blobs.

6.1.4 Pixel clustering

Since blobs can be clumped up together, the aim of this step is to splitthem up. The method commonly used in image-based cell analysissoftware is a watershed algorithm (see 3.2.2). The starting markers arethe local maxima in the blobs and the ”water” flows until the edges ofthe blobs.

Even though the results of watershed are usually acceptable, there areother algorithms that can be useful for declumping as well. When thex and y coordinates are treated as features of each blob pixel, thenthey can be grouped in clusters by a clustering algorithm with the in-verse of the Euclidian distance as similarity measure. The pixels thatare close together will form clusters which correspond with individualblobs.

In order to reduce the running time, the clustering is performed foreach connected component separately, so that only the pixels in a com-ponent are considered each time. The result of this step is a segmented


image with labelled regions. The background is labelled with 0’s, whileeach cluster is labelled with a unique id from {1, 2, ...}.

6.1.5 Blob extraction

In this step the segmented image is processed and all the clusters (i.e.blobs) with their characteristics are extracted. For each blob the centroidis calculated as the mean of the x, y and z coordinates of the pixels. Theradius is the maximum distance over all the pixels to the centroid. Theoutput of this step is a list of blobs with their respective characteris-tics.

6.1.6 Blob filtration

With some prior knowledge about the size of blobs, this step filters outthe blobs that are either too small or too large. This is needed becauseit may be possible that noise may be mistaken as blobs in the previoussteps. Finally this step outputs the filtered blobs from the original inputimage.

6.1.7 Chunking

Since an image may not fit in memory completely, it has to be processedout-of-core. The solution in this thesis is applying analysis on separateparts of the image by chunking. A chunk is a rectangular cuboid whosedepth is the same as the image depth but whose width and height areusually much smaller (usually 500× 500 pixels). In addition to the im-age data within the boundaries, every chunk also contains data of a 10pixel thick border around the chunk called the overlap. This is neededfor the convolutional filters that otherwise need to guess the values out-side of the chunk boundaries. Also, some blobs may lie across chunkboundaries that otherwise would be split in two. The value of 10 pix-els was chosen because blobs are very unlikely to be bigger than 20pixels in diameter, which means that blobs whose centroid lays withina chunk are fully encompassed by the overlap. All the steps are per-formed on each chunk in sequence. In the end the blobs of each chunkare collected and returned as final list of blobs.


6.2 Experiments

The research question of this thesis is How can machine learning tech-niques effectively be applied to blob detection in high-resolution 3D microscopyimages? The research is comprised of six experiments.

The first four experiments evaluate how machine learning techniquescan be applied to the first four steps of the blob detection process: fea-ture extraction, feature compression, pixel classification and pixel clus-tering. There are no experiments related to the last two steps of blob de-tection; blob extraction and blob filtering, because there is not enoughdata to justify using a machine learning approach over a simple heuris-tic method.

The fifth experiment assesses the feasibility of the blob detection pro-cess, using the found optimal machine learning techniques, by runningit over a whole image. In the sixth experiment the blob detector of thisthesis will be compared to the state-of-the-art tools that are commonlyused for blob detection in biomedical images.

6.2.1 A: Feature extraction

Even though an unlimited number of features can be extracted froman image, only a limited set may be useful for a specific application.The features that may be beneficial for differentiating blob pixels fromnon-blob pixels are shown in Table 6.1. The single pixel value and theGaussian filter are the most basic features and respond to light/darkregions in general. The motivation for choosing the Laplacian of Gaus-sian, Difference of Gaussians and Determinant of Hessian features istheir popularity in blob detection (see Figure 5.1). The reason is theirhigh response to local extrema (see 3.2.4). The Gaussian of gradientmagnitude is suitable for detecting edges in images. After applying aGaussian filter it employs a gradient magnitude filter that reveals thegradients of the pixel intensities. The Eigenvalues of structure tensorand the Eigenvalues of Hessian of Gaussian are both also used in Ilastik[36] to reveal texture.


Feature Code 2D/3D Scales σ

Value value N/A N/AGaussian filter gaus 2D, 3D 0.7, 1.0, 1.6, 2.5, 4.0

Laplacian of Gaussian log 2D, 3D 0.7, 1.0, 1.6, 2.5, 4.0Gaussian of gradient magnitude ggm 2D, 3D 0.7, 1.0, 1.6, 2.5, 4.0

Difference of Gaussians dog 2D, 3D 0.7, 1.0, 1.6, 2.5, 4.0Determinant of Hessian doh 2D 0.7, 1.0, 1.6, 2.5, 4.0

Eigenvalues of structure tensor stex, stey 2D 0.7, 1.0, 1.6, 2.5, 4.0Eigenvalues of Hessian of Gaussian hogex, hoge 2D 0.7, 1.0, 1.6, 2.5, 4.0

Table 6.1: All the pixel features that are tested. The value filter repre-sents the single pixel intensities. The Determinant of Hessian, Eigen-values of structure tensor and Eigenvalues of Hessian of Gaussian arenot implemented in 3D. Because the eigenvalues consist of both an x

and y component, these features consist of two attributes.

A1: Feature selection

To optimise feature extraction it is needed to find a subset of featuresfrom Table 6.1 that is both relevant for classifying a pixel as blob/non-blob and is non-redundant.

The feature selection process proposed by José Bins and Bruce A. Draperdoes exactly this [92]. They suggest a three step approach for selectinga small number of important features from a huge set of features thatis often available in the computer vision domain. The feature selectionmethod in this thesis is largely inspired by their work.

The first step is filtering out the irrelevant features by their Relief score[93] with respect to their predictive power of the label. Even though Re-lief has been shown to detect relevant features well, in practice it is verytime consuming to compute for large datasets. Therefore in this thesisthe choice was made to use mutual information (MI) instead, which hasseen use in feature selection as well [94]. This metric, also sometimescalled information gain, is utilised to calculate the gain in information(defined as the decrease in entropy) when instances are split by somecondition. This makes it a suitable metric because the features are eval-uated for significance to classification, which essentially is splitting in-stances by their features. Even more, the benefit of mutual information


is that it does not make assumptions on the data such as other methodslike the Chi-square test that assumes categorical variables [95] and thePearson correlation coefficient that only considers linear correlations[96]. The filtering step ends by removing the features whose mutualinfo score is not at least some minimum value. Additionally, becausesome features may take longer to calculate, the duration of calculationis also taken into account.

Since mutual information does not detect redundancy, it is possiblethat some of the most relevant features are very similar to each other.Therefore the second step aims to eliminate redundancy by only keep-ing the most relevant feature for each group of similar features. Sim-ilar features are found by applying k-Means. This method of apply-ing k-Means is unusual because most times instances are clustered bytheir features but now features are clustered by the values for each in-stance.

In the paper the authors suggest a third step that uses the SequentialFloating Forward Selection (SFFS) algorithm [97] to create an optimalsubset of features, but this is not needed in our case because the numberof features is already sufficiently low after the second step.

The field of feature selection in machine learning is very broad andmany algorithms have been proposed that aim to find the set of op-timal features. However, in order to limit the scope of this thesis theother methods of feature selection are not considered. The reason forchoosing above method is that it has been shown to be more effectivethan standard feature selection algorithms for large datasets with lotsof irrelevant and redundant features [92]. It seeks the most relevantfeatures with the lowest amount of redundancy as are desired proper-ties for a feature set.

A2: 2D versus 3D comparison

In addition, it is interesting to see whether features are more relevantwhen calculated in 3D versus 2D. This can be done by comparing theirmutual information with respect to the label with a two sided t-test1

for the null hypothesis that the mean of the mutual information for the2D feature is equal to the mean of the mutual information for the 3D

1From scipy.stats.ttest_rel


feature µ(MI(f2D))=µ(MI(f3D). For the features where the hypothesisis rejected, it is determined whether the mutual information for the 2Dfeature is higher, or for the 3D feature.

Reliability and validity

To reduce the effect of chance, 100 random samples of about 26,400pixels, and their corresponding labels are selected from the trainingimage. In total about 1% of the pixels in the blob regions are sampled.In each sample about 144 (0.55%) of the pixels are labelled as blob, therest as non-blob. Then for each sample the desired features are calcu-lated, followed by a mutual information calculation with respect to thelabel.

6.2.2 B: Feature compression

For the sake of reducing classification time, it is needed to investigatehow much the pixel features can be compressed with PCA whilst keep-ing enough information for accurate classification. This is measuredby the performance of the pixel classification after the features havetransformed using a fitted PCA model. The features are compressed to{1, 2, ...m} components wherem is the number of extracted features.

6.2.3 C: Pixel classification

For pixel classification there are a number of suitable algorithms thatgo from simple to more complex. Simpler algorithms have less pa-rameters because they make more assumptions on the data. This canmake them more useful when little data is available compared to morecomplex algorithms. Algorithms with more hyper-parameters requiremore data and effort to train but can become more accurate becausethey can detect less obvious patterns in the data.

Now for candidates, naive Bayes can be used as a baseline classifierto compare the other algorithms with. Logistic regression is a knowngood performer on binary classification problems. k-Nearest neigh-bour can work well on some problems that are not too complex. It


is simple to understand and implement, but its major drawback is thelong running time for large data sets. Decision trees can be effective too,because they can potentially emulate any decision boundary. How-ever, since they are prone to over-fitting, a random forest of multiplebagged decision trees may be a better alternative. AdaBoost is anotherensemble method that, like a random forest, can mitigate the shortcom-ings of a decision tree by focusing on the harder-to-classify instances.As a classifier that can achieve a high accuracy, even with little data,the support vector machine is a popular additional contestant. Finally,a simple feed-forward neural network is chosen as last candidate be-cause due to its high number of parameters it can potentially achieve avery high accuracy when given enough data.

The metric used for measuring classification quality is the f1-score (see4.1.9). The accuracy metric is not so useful because the blob/non-blobpixels are highly imbalanced with at most 1% of the pixels being blobpixels. The speed is measured as the prediction time for classifying aset of pixels. The training time of the classifiers is not taken into con-sideration because all training happens off-line.

C1: Hyper-parameter tuning

First, the optimal hyper-parameters are searched for each classifier suchthat they achieve the highest f1-score on the best selected features. Ta-ble 6.2 shows the classifiers that will be tested and their hyper-parametersearch space. The same optimised decision tree is used as base estima-tor in Random Forest and in AdaBoost. Naive Bayes does not have anyhyper-parameters that can be optimised. For logistic regression, sup-port vector machine, decision tree and k-nearest neighbour grid search[56] is used to find the optimal combination of hyper-parameters. Forthe neural network a random search [56] is performed over 100 ran-dom combinations to approach the best hyper-parameters, followed bya grid search to attain the local maximas.

With exception of the neural network classifier which is implementedwith Keras [98] using a TensorFlow [99] back-end, the other classifiersuse the implementation in scikit-learn [100].


Table 6.2: Classifiers tested for pixel classification and their searchspace of hyper-parameters. The remaining hyper-parameters get val-ues according to the default values in scikit-learn 0.19.1 or Keras2.1.5 for the neural network.

Classifier Search space of hyper-parameters

Naive Bayes None

Logistic regression penalty1 ∈ {’l1’, ’l2’},C2 ∈ {0.5, 1.0, 1.5, 2.0, 2.5}

k-Nearest neighbour n_neighbors3 ∈ {1, 3, 5, 10},weights4 ∈ {’uniform’, ’distance’},p5 ∈ {1, 2}

Decision tree criterion6 ∈ {’gini’, ’entropy’},splitter7 ∈ {’best’, ’random’,max_depth8 ∈ {3, 4, ..., 12},max_features9 ∈ {1, 2, ..., 10}

Random forest None - uses 50 optimised decision trees

AdaBoost None - uses 50 optimised decision trees

Support vector machine C2 ∈ {0.5, 1.0, 1.5, 2.0, 2.5},kernel10 ∈ {’linear’, ’poly’, ’rbf’, ’sigmoid’},gamma11 ∈ {0.1, 0.4, 0.7, 1.0, 1.3}

Neural network n_neurons112 ∈ {1, 5, 10, 15, 20, 25, 30}n_neurons213 ∈ {1, 5, 10, 15, 20, 25, 30}dropout14 ∈ {0.0, 0.1, 0.2, 0.3, 0.5}lr15 ∈ {0.0001, 0.0005, 0.001, 0.005, 0.01}decay16 ∈ {0.000001, 0.00001, 0.0001, 0.001, 0.01, 0.1}

1 ’l1’ or ’l2’ regularisation2 regularisation term - smaller means stronger regularisation3 number of neighbours to be considered when classifying a new point4 ’uniform’ means that all neighbours are weighted equally, ’distance’ means that theweight for each neighbour is the inverse of its distance to the point5 p=1 is the Manhattan distance and p=2 is the Euclidean distance [49]6 criterion for determining best split - ’gini’ is the gini impurity, ’entropy’ is the infor-mation gain7 strategy for choosing best split at each node - ’best’ chooses the best split and ’random’chooses a random split8 maximum depth of the tree9 number of features to consider when choosing best split10 kernel used in the algorithm - ’poly’ is a polynomial kernel of degree 3 and ’rbf’stands for radial basis function11 coefficient used in the ’rbf’, ’poly’ and ’sigmoid’ kernels12 number of neurons in the first hidden layer13 number of neurons in the second hidden layer14 fraction of input units to drop15 learning rate16 learning rate decay over each update


C2: Pixel classifier comparison on different PCA compressions

To test the optimised classifiers, they are first trained on the selectedbest features which are compressed with PCA to different numbersk=m,m-1.m-2,...,1 of components. Then they are used to predict thelabels for a different of features. Next, the classifiers are compared toeach other in terms of f1-score and prediction time.

Design neural network

Compared to other classification algorithms a neural network allowsfor many more hyper-parameters. In order to limit the options, the fol-lowing parameters are therefore fixed. The input layer has a numberof neurons that is equal to the number of best features found in A1.The number of hidden layers is 2 with the reasoning that more layersis better for learning complex functions but the problem is not com-plex enough to justify a deeper neural network. The output layer hasonly a single neuron because the neural network is expected to give abinary output. The layers are all dense meaning that between everypair of layers all the neurons are connected. An exception is the op-tional drop-out between the first and second hidden layer. Here someneurons may not be connected to the next layer based on the dropoutparameter. The benefit of drop-out is that since the neural networkcannot focus only on a few features and therefore needs to find generalpatterns, overfitting is prevented. The activation functions for the neu-rons in the hidden layers are ReLU’s (see Figure 4.4), which are easyto train because of their linear behaviour making them in general anexcellent choice [56]. The activation function in the output layer is asigmoid (see Figure 4.2) function as is common for binary classificationproblems with gradient-based learning [56]. In terms of optimisationalgorithm there is no consensus on what is the best algorithm [101].However, since the Adam optimiser [58] has shown to be robust and isused frequently in literature [56], this will be our choice as well. Withexception of the learning rate and learning rate decay, all the hyper-parameters in the Adam optimiser have the default values as providedin the original paper. The number of training epochs is 150 and thebatch size is 10000.



For both hyper-parameter optimisation and comparing the classifiers,cross validation is used to test how well the classification algorithmis at predicting labels of an unseen dataset. First 1% of the pixels inthe blob regions of the training image are randomly sampled, togetherwith their labels. For these pixels the best features are calculated asdetermined by A1. Stratified 10-fold cross validation is then applied totrain the classifiers with 9 folds and calculate their f1-score for the pre-dicting the labels in the remaining fold. Each of the 10 folds is the testfold exactly once. Stratification makes sure that the ratio of blob pixelsis equal in each fold. This is important because the blob ratio must besimilar to what is expected in a whole image. The mean of the f1-scorefor all 10 folds is taken as the overall f1-score for each classifier.

6.2.4 D: Pixel clustering

For the pixel clustering step, the goal is to find the algorithm that ismost suitable on the basis of its clustering quality and running time.The metric used for measuring clustering quality is the silhouette score(see 4.2.6).

In order to compare other clustering algorithms, the watershed algo-rithm is treated as a clustering algorithm in this experiment. Not everyclustering algorithm is suitable for pixel clustering because some mayhave different definitions of clusters or not scale well for instance. Thecandidates have therefore been chosen for the following reasons. K-means is a popular clustering algorithm with good reason: it is simpleand fast. Besides that, it looks for round blobs with similar size whichis beneficial in the case of blobs. Agglomerative clustering algorithmsmake few assumptions on the data, making them a suitable general-purpose candidate. The MeanShift algorithm was created for solvinghigh-density clusters which is exactly what blobs are. Spectral clus-tering is known to be slower than others but can find clusters of highquality.

The clustering algorithms and their hyper-parameters are shown in Ta-ble 6.3. Their implementation in scikit-learn [100] is used.


Table 6.3: Algorithms and their parameters used for pixel cluster-ing. The values of the remaining parameters are the default values inscikit-learn 0.19.1.

Clustering algorithm Parameters

Watershed NoneK-means n_clusters=number of local maxima1

Agglomerative (centroid) t2=5Agglomerative (ward) n_clusters=number of local maxima

MeanShift bandwidth3=4Spectral n_clusters=number of local maxima

1 starting centroids are initialised as the locations of the local maxima2 maximum distance between clusters - two clusters are merged when the dis-tance between their centroids is smaller than t3 coefficient used in the RBF kernel


To reduce the effect of chance, the blob region of the test image is splitin chunks of 500×500×depth and 1% of the chunks are randomly sam-pled. All the pixels in each chunk are first classified using the best clas-sifier as determined by C2 and best hyper-parameters as determined byC1. The pixels in each chunk are then clustered using every clusteringalgorithm in Table 6.3. For each clustering result the silhouette scoreand other blob statistics are calculated. The clustering algorithms arefinally evaluated by the mean of the silhouette score for each sampledchunk.

6.2.5 E: Run on whole image

The purpose of this experiment is to determine whether the overall ma-chine learning approach taken in this thesis is suitable to detect blobsin whole images in practice. The set of the optimal features from A1,the optimal PCA compression from B, the optimal classifier from Cand the optimal clustering algorithm from D will be used in the pro-cess. The blob detector with these properties is ran over one completeimage. Both the blob detection results and the running time will beanalysed.


6.2.6 F: Comparison with state-of-the-art

As a final step for validation of the blob detector proposed in this thesis,it is compared to the current state-of-the-art of tools for blob detection.It is not fair to compare the found blobs from the different tools. Firstly,because all the tools require different parameters, it is unreasonable toguarantee that the used parameters are optimal. Secondly, there is noground truth available on the number, location and sizes of the blobs tocheck the blob results with. Therefore the tools will only be comparedbased on their running time. The blob detection approach in this thesis,called the MFB detector, will be compared to FIJI [13], CellProfiler [10]and Ilastik [36]. For the configuration of each tool, the reader shouldconsult Appendix A. Each tool will be run over 10 random crops of size500×500×16 pixels of the test image. Then mean of those times is usedfor comparing the performance of the four tools.

6.2.7 Summary

For a summary of the solutions that will be tested in the experiments,one can consult Table 6.4.

6.3 Data collection

6.3.1 Characteristics

The data consists of high-resolution 3D microscopy images made witha confocal microscope. The images are saved in TIFF files where thedepth layers are saved as a z-series. The description of the TIFF filescontain metadata in the OME-TIFF XML format [102]. The character-istics of the images are summarised in Table 6.5.

Experiments A: Feature extraction, B: Feature compression and C: Pixelclassification require images with labelled pixels. For these steps thesame image is used for both training and evaluation with the charac-teristics in Table 6.6.

For experiment D: Pixel clustering there is no data of the ground truth


A - Image features B - Feature compression C - Classification algorithms

Raw pixel values PCA transform to Naive BayesGaussian filter m to 1 components Logistic regressionLaplacian of Gaussian k-Nearest neighbourGaussian of gradient magnitude Decision treeDifference of Gaussians Random forestDeterminant of Hessian AdaBoostEigenvalues of structure tensor Support vector machineEigenvalues of Hessian of Gaussian Neural network

D - Clustering algorithms E - Run on whole image F - Blob detection tools

Watershed N/A MFB DetectorK-means FIJIAgglomerative (centroid) CellProfilerAgglomerative (ward) IlastikMeanShiftSpectral

Table 6.4: Summary of solutions that will be tested in each experiment.

Characteristic Values

Width/height Any size, but typically around 30000× 30000

Number of layers Any number ≥ 1, but typically 3− 12

Pixel size x/y Any value, but typically 0.27 µmPixel size z At least 0.7 µm, but typically 1 µm

Storage size Typically 0.5− 10 GBData format Uncompressed OME-TIFF file with description

in the OpenMicroscopy OME-XML format [102]Color depth 8 bits, but usually less than 256 unique values

Table 6.5: General characteristics of the biomedical images consideredin this research.



Width/height 2310× 115000

Number of layers 3

Pixel size x/y 0.217µm, 0.219µmPixel size z 1.0µm

Storage size 762 MBData format OME-TIFF-XMLColor depth 8 bits, at most 75 unique values

Blob coverage 0.528% blob pixels

Table 6.6: Characteristics of the training image. This image has beenlabelled with the procedure described in 6.3.2.


Width/height 24097× 14445

Number of layers 16

Pixel size x/y 0.273µm, 0.272µmPixel size z 1.0µm

Storage size 5.19 GBData format OME-TIFF-XMLColor depth 8 bits, at most 210 unique values

Capture time 18548 s (5:09 hours)

Table 6.7: Characteristics of the test image. This image is unlabelled.

of the clusters. Since we now have more freedom to pick any image, adifferent image is used than in the previous experiments. The charac-teristics of this larger image can be found in Table 6.7. The same imageis used for experiments E: Run on whole image and F: Comparisonwith state-of-the-art.

6.3.2 Labelling

As supervised machine learning algorithm the pixel classifiers must betrained with image data where the pixels have been labelled as blobsand non-blobs. Two methods have been considered for generating theselabels: humans can label the pixels or a program labels them with prior


knowledge that is not present during classification. Human labelling istedious and error-prone, while machines are much faster but can makeerrors too without noticing. To reduce the drawbacks of these methodsit has been decided to combine them.

There are two version of the training image: one with blobs and onein which the fluorescent dyes have been stripped such that there areno blobs visible. A computer program can subtract the image withoutblobs from the image with blobs to reduce the background. The back-ground is not completely removed because the images are slightly dif-ferent due to noise and misalignment. In the resulting image the blobsare more distinct from their surroundings than in the original image.See also Figure 6.2. Then to accentuate the local maxima in the images,the Difference of Gaussians (DoG) of the image is calculated. A scaleof σ = 1.5 has shown to give the best results by trail-and-error.

Next a sample of the pixels (0.1%) from the DoG image is randomlychosen and used to fit a K-means clustering algorithm with 2 clusters.Standardisation (see Equation 6.1) is used to improve the speed andaccuracy of the convergence. Since pixels that belong to a blob receivea much higher value after a DoG transformation, the clustering algo-rithm will group blob pixels in one cluster and the non-blob pixels inanother cluster. K-means is chosen because it is general-purpose andreasonably efficient for a clustering algorithm. Its drawback however,is that it can get stuck in a local minimum because of an unfortunatechoice of initial centroids. Therefore the algorithm is re-initiated 50times with different centroid seeds. The centroids of the best output ofthe 50 runs in terms of inertia, which is defined as the sum of squareddistances of samples to their closest cluster centre [103], are used tok-means cluster the remaining pixels.

The results of the machine-labelled image are visually checked by ahuman to make sure that the blobs are correctly labelled. The labellingmethod utilises two pieces of prior knowledge that are not availableduring run-time of the classifier. Firstly, the classifier does not have ac-cess to the image where the blobs have been stripped during run-time,because it is too costly to strip the fluorescent dyes and then recaptureeach image. Secondly, the results will not be checked by a human in


(a) Blobs visible (b) Blobs stripped (c) Difference

Figure 6.2: Image c is produced as: c = a− 2 · gaussian_filter(b, σ = 1).The values were chosen empirically.

Specification Value

Operating system Windows 10 Home (64-bit)Processor Intel(R) Core(TM) i7-3630QM CPU @ 2.40 GHz

Memory size 8.00 GB

Table 6.8: Specifications of the test system.

practice.

X ′ =X − µ

σ(6.1)

6.4 Experimental design

6.4.1 Test system

All the experiments are performed on the same computer with thespecifications in Table 6.8.


Package name Version Purpose

javabridge [104] 1.0.15 Dependency of python-bioformatsKeras [98] 2.1.5 User-friendly API for neural networks

Matplotlib [35] 2.2.2 Visualisation of analysis resultsNumPy [105] 1.14.2 nD arrays, memory maps and helper functions

NetworkX [106] 2.1 Graph colouringpandas [107] 0.22.0 Data manipulation and analysis

python-bioformats [108] 1.3.2 Reading OME-TIFF image filesPyQt5 [109] 5.10.1 Matplotlib backend

scikit-image [110] 0.12.3 Image processingscikit-learn [100] 0.19.1 Machine learning

SciPy [111] 1.0.0 Statistics and other helper functionsTensorFlow [99] 1.7.0 Neural network framework

tifffile [112] 0.14.0 Reading OME-TIFF image filesPyYAML [113] 3.12 Reading and writing YAML files

Table 6.9: Python software packages used in this project.

6.4.2 Software

All the software is written in Python 3.5 from the Anaconda2 4.2.0 dis-tribution. The used Python packages are listed in Table 6.9.

6.4.3 Data analysis

During the experiments the relevant data is stored in pandas DataFrames,which are 2D tabular datastructures. Then after the experiments havefinished, the data is analysed using pandas and statistical functionsfrom SciPy. The results are either displayed in text or plotted withMatplotlib.

6.4.4 Overall reliability and validity

To guarantee statistical significance of the experiments the significancelevel is fixed to α = 0.001 for each statistical test. For the purpose of re-

2See Anaconda: https://anaconda.org/

https://anaconda.org/


peatability, the random seed is set to 0 before creation of every sample.The validity of the optimised blob detector is evaluated by running itover a whole image. To check that the performance is reasonable, it iscompared with performance of state-of-the-art tools.

Chapter 7

Analysis

In this chapter the results are listed for each experiment. Furthermore,interesting observations from the results are described in the text.

7.1 Results from A: Feature extraction

This section lists the results that have been collected in according to thetests on feature extraction. One should refer back to Table 6.1 for find-ing the features that correspond with the feature abbreviations gaus,log, ggm, etc.

A1: Selected best features

In the first step of the feature selection process we filter out those fea-ture that are either irrelevant or take too long to calculate. Figure 7.1shows per feature and scale the mutual information and time.

The figure shows that especially the Hessian of Gaussian eigenvaluesdo not provide much information for large scales while being at thesame time expensive to compute. Based on the information from thefigure we filter out the features whose mutual information is smallerthan 0.010 and whose time is larger than 0.4 s.

In the second step the features must be clustered such that redundantfeatures can be removed. The results of the clustering is visible in Fig-

58

CHAPTER 7. ANALYSIS 59

Figure 7.1: Mutual information with respect to the blob/non-blob labelof features for different scales on the left axis. On the right axis theaverage time to calculate the feature for a 500 × 500 × 3 pixels imagechunk.

60 CHAPTER 7. ANALYSIS

Figure 7.2: Results of clustering the features with K-means and k=10.The colors denote the cluster assignment. t-SNE [114] was used to con-vert the points to a 2D space and spread them out nicely.

ure 7.2. It is interesting to see that some clusters contain only one fea-ture (e.g. value and 2d_log_4.0) while some clusters are much larger.Now for each cluster only the feature with the highest mutual informa-tion score is kept. Table 7.1 shows the final selection of the 10 best fea-tures as determined by their mutual information. The value of 10 waschosen because it resulted in the highest silhouette score (see 4.2.6).

A2: Comparison 2D vs 3D

What Table 7.2 shows is that for the features with small scale it does notmatter if they are calculated in two dimensions or in three dimensions.For features with larger scale there is a difference between 2D and 3D


Name Mutual info Time (s)

3d_dog_1.6 0.030 0.0853d_gaus_0.7 0.026 0.0392d_stex_1.0 0.024 0.1673d_log_1.6 0.022 0.0742d_log_1.6 0.022 0.060value 0.022 0.0123d_log_2.5 0.020 0.0752d_log_2.5 0.018 0.0622d_log_4.0 0.013 0.0732d_hogey_0.7 0.011 0.209

Table 7.1: The 10 selected best features with their mutual informationas calculated using the two-step feature selection process discussed in6.2.1. The shown time is the average time to calculate the feature for a500× 500× 3 pixels image chunk.

but no consistent pattern is visible. That for the Difference of Gaussians(dog) with σ = 1.6 the third dimension provides more informationthan 2D on the blob/non-blob label was to be expected since duringthe generation of the training data the 3D version of the DoG was usedto predict the label.

7.2 Results from B: Feature compression andC: Pixel classification

During the tests the support vector machine classifier proved too hardto train because its training time scales more than quadratically withthe number of samples. So therefore the SVM was trained with a smallersample of 0.01% pixels. But since a large sample is crucial due to thesignificant class imbalance, the SVM performed badly.


Scale σ

0.7 1.0 1.6 2.5 4.0

gaus = [0.725] = [0.617] 2d [0.000] 2d [0.000] ×log = [0.938] = [0.704] 3d [0.000] 3d [0.000] 3d [0.000]ggm = [0.795] = [0.596] = [0.696] 2d [0.000] 2d [0.000]dog = [0.574] 3d [0.000] 3d [0.000] 2d [0.000] 2d [0.000]

Table 7.2: Results of t-test comparing the means of the mutual informa-tion with respect to the label between 2D and 3D versions of the fea-tures. ’×’ means that one of the samples was not normally distributedmaking the t-test is invalid. ’=’ means that the hypothesis can be ac-cepted and there is likely no difference in predictive power on the labelbetween 2D and 3D. ’2D’ or ’3D’ means that the hypothesis must berejected and the respective dimension has more predictive power thanthe other. The number between brackets is the p-value.

C1: Hyper-parameter optimisation

In Table 7.3 one can observe the values of the optimised hyper-parametersfor the classifiers used in pixel classification.

C2: Comparison classifiers with respect to PCA compres-sion

Each optimised pixel classifier was run over the same pixel sample thathas been compressed with PCA to varying number of components.Figure 7.3 shows that for most classification algorithms more compo-nents (i.e. features) leads to a higher f1-score, as is expected. The excep-tions are however the decision tree, random forest and naive Bayes. Thedisappointing performance of naive Bayes does not come as a surprisesince it is known that its independence assumption can be too limitingfor complex problems. What is surprising is that the decision tree andrandom forest actually profit from stronger (i.e. less components) PCAcompression. This gives the perception that they are under-fitted whentrained on PCA-compressed features. Also AdaBoost, that uses a deci-sion tree as base estimator, loses out compared to k-nearest neighbour,neural network and logistic regression. Without PCA-compression all


Classifier Best hyper-parameters f1-score

Naive Bayes N/A 0.159Logistic regression penalty=’l1’, C=2.5 0.831k-Nearest neighbour n_neighbours=10, weights=’distance’, p=1 0.803Decision tree criterion=’entropy’, max_depth=4, 0.894

max_features=10Random forest Same as decision tree 0.848AdaBoost Same as decision tree 0.897Support vector machine C=7.5, kernel=’rbf’, gamma=1.3 0.812Neural network n_neurons1=26, n_neurons2=10, 0.903

dropout=0.1, lr=0.08, decay=0.0001

Table 7.3: Found optimal hyper-parameters and their f1-score of thepixel classification algorithms. The values of the missing hyper-parameters are the default values in scikit-learn 0.19.1 or in Keras2.1.5 for the neural network.

the classifiers (with exception of naive Bayes and SVM) perform simi-larly with a f1-score around 0.89, with a neural network being the bestperformer.

The interesting observation that can be made from Figure 7.4 is thatsome classification algorithms (k-nearest neighbour, support vector ma-chine, naive bayes) profit from PCA-compression in terms of predictiontime, others are slowed down (random forest, neural network, logis-tic regression, decision tree), and for AdaBoost there does not seem tobe a difference. This means that the decision whether to apply PCA-compression should depend on the chosen classifier. Also the samefigure shows that k-nearest neighbour scales very badly with the num-ber of components. This can be explained by the added complexity ofcalculating the distance between points in a higher dimension. Its hugeprediction time makes it unsuitable for our problem.

In Figure 7.5 one can evaluate the classifiers on both their f1-score andprediction time. The most accurate and fastest classifiers can be foundin the top left corner. Those are neural network and decision tree onuncompressed data and logistic regression on both compressed on un-compressed data. A neural network is the most accurate classifier butis slower to train and run than a decision tree or logistic regression. It is


Figure 7.3: f1-score of classification algorithms on varying numbers ofcomponents for PCA-compression. The dashed line indicates the f1-score of the classifiers on non-compressed data. The SVM classifier isnot included due to its lacklustre performance.


Figure 7.4: Prediction time of classification algorithms on varyingnumbers of components for PCA-compression. The dashed line in-dicates the prediction time of the classifiers on non-compressed data.Note the logarithmic scale of the time axis.


surprising that a decision tree classifier performs better than a randomforest classifier. The output of a random forest is basically the mode ofthe output of multiple bagged decision trees. An explanation could bethat there is only a single feature or a few features on which the deci-sion tree relies for its classification. Since these essential features areignored in some of the bagged decision trees, the overall accuracy ofthe random forest is less than that of a single decision tree. The find-ing that a random forest performs worse than a single decision treefor pixel classification of blobs is bad for Ilastik which uses a randomforest.

7.3 Results from D: Pixel clustering

The clustering algorithms were run on a subset of the chunks of the testimage after the pixels have been classified. Since the decision tree with-out PCA-compression has shown to be both accurate and fast, it wasused for the pixel classification. In Figure 7.6 one can see how well theclustering algorithms perform in terms of silhouette score and runningtime. Interestingly, agglomerative clustering and k-Means have simi-lar performance that is both fast and with reasonable silhouette score.MeanShift and spectral clustering are definitely too slow for our pur-pose. Even more, Table 7.4 shows that MeanShift creates less but largerclusters. Also, the high standard deviation shows that it is highly unre-liable in terms of clustering time. Watershed is not suitable because ofits low silhouette score that is likely the result of it creating irregularlysized clusters. For this claim Figure 7.7 provides additional evidenceby showing the different products of the pixel clustering algorithmson the same connected components. Indeed, in that example thereis more variation in the size of the clusters produced by watershed.Also, subjectively speaking, agglomerative (centroid) has created thebest shaped clusters in the figure. Based on these results agglomera-tive clustering with inter-centroid distance as pairing metric seems tobe the most appropriate as clustering algorithm.


Figure 7.5: Comparison of f1-score and prediction time for the clas-sification algorithms. The number between the square brackets indi-cates the number of components for PCA-compression. Missing squarebrackets means no compression. Support vector machine is not in-cluded due to its lacklustre performance and k-nearest neighbour isnot included since its prediction time is much longer than the rest.


Figure 7.6: Comparison of silhouette score and running time of theclustering algorithms. The time is calculated as the mean clusteringtime for a 500× 500× 16 pixels chunk.


(a) Watershed (b) K-means

(c) Agglomerative (centroid) (d) Agglomerative (ward)

(e) MeanShift (f) Spectral

Figure 7.7: The same connected components segmented with water-shed and clustering algorithms.


Algorithm Time (s) # Blobs Blob size Silhouette

Watershed 1.416 (+/- 1.266) 15306 96.603 (+/- 97.092) 0.284 (+/- 0.171)K-means 0.131 (+/- 0.085) 15686 94.232 (+/- 78.200) 0.372 (+/- 0.188)

Agglomerative 0.066 (+/- 0.083) 16437 89.953 (+/- 57.253) 0.381 (+/- 0.189)(centroid)

Agglomerative 0.110 (+/- 0.108) 15616 94.580 (+/- 75.068) 0.375 (+/- 0.190)(ward)

MeanShift 6.378 (+/- 41.416) 9530 154.930 (+/- 91.375) 0.424 (+/- 0.222)Spectral 4.249 (+/- 9.835) 13081 87.701 (+/- 86.807) 0.350 (+/- 0.198)

Table 7.4: Statistics of the pixel clustering algorithms. The mean clus-tering time per 500×500×16 pixels chunk is given in time. # blobs is thetotal number of found blobs. blob size is the mean blob size in pixels.silhouette is the mean silhouette score for clustering each chunk. Thenumber between parentheses is the standard deviation.

7.4 Results from E: Run on whole image

The found optimal blob detector uses the 10 features in Figure 7.1, noPCA-compression, a decision tree classifier and agglomerative cluster-ing with inter-centroid distance as metric. Running this complete blobdetection process on the whole test image took in total 10,696 seconds,which is 2:58 hours. This duration is good considering that capturingthat image took 5:09 hours.

The share of each step can be found in Figure 7.8. It is unsurprisingthat the feature extraction step takes the majority of the running timebecause 10 filters have to applied to each image. The feature extractioncould be made faster by not calculating the heavier filters anymore.In this case the 2d_stex_1.0 and the 2d_hogey_0.7 filters have beenselected, but since they take more time to calculate than the other filters,they may be discounted in order to improve speed.

Furthermore, a total 1,556,913 of blobs were found with an average ra-dius of 2.587 pixels and density of 317.05 · 109 blobs per mm.

In Figure 7.10 one can see an example of the steps in the optimised blobdetector.


Figure 7.8: Share of steps in total running time of the blob detectionprocess.

7.5 Results from F: Comparison with state-of-the-art

Figure 7.9 shows how the running time of this thesis’ MFB detectorcompares with the speed of blob detection performed with FIJI, Cell-Profiler and Ilastik. Even though the MFB detector uses a very similarapproach to Ilastik, it is twice as slow. The likely reason is that Ilastikhas been better optimised. Ilastik uses the C++ VIGRA library [115]and makes use of multiple cores. By contrast, the MFB detector is im-plemented in Python and works on a single core only. Moreover, for atool that solely makes use of simple image processing algorithms, Cell-Profiler is very slow. CellProfiler performs approximately the samesteps as the FIJI macro but its performance is far worse.


Figure 7.9: Mean running time of blob detection on a 500 × 500 × 16

pixels chunk with different tools.


(a) Input image (b) Extracted features (examples)

(c) Classified pixels (d) Clustered pixels (colours are forvisible separation only)

(e) Extracted blob centroids

Figure 7.10: Input, output and intermediary images in the blob detec-tion steps on a chunk from the test image.

Chapter 8

Conclusions

In this project the goal is to move away from simple user-guided imageprocessing and go to fully automated computer vision for the task ofblob detection. With the existing tools for biomedical image analysis,the user has to tune the parameters for each step of the blob detectionprocess. Not only is this tedious, but it has to be repeated for every im-age set as well. This makes automatic operation of such pipelines im-possible. Besides this, some of the current software was not designedfor the scale of modern high-resolution 3D microscopy images that canget into the tens of gigabytes in size. To avoid the manual selection ofblob detection parameters, machine learning techniques can be used tolearn the definition of blobs from a large amount of image data by it-self. Also, the scaling problem can be solved by out-of-core processingmethods. Since performance is a major concern in this project, the re-search question was formulated as: How can machine learning techniqueseffectively be applied to blob detection in high-resolution 3D microscopy im-ages?

A blob detection pipeline, inspired by Ilastik [36], was designed thatconsists of 6 sequential steps. For steps 1 to 4, machine learning tech-niques were tested, while in step 5 and 6 a heuristic approach is usedinstead. For the first step of feature extraction a set of 10 image featureswas selected that is characterised by high relevancy and low redun-dancy. These features are used in the next step of pixel classification todecide for each pixel of the input image whether it is part of a blobor not. For this step 8 popular classifiers were first optimised for the

74

CHAPTER 8. CONCLUSIONS 75

problem, and then evaluated by their f1-score and prediction time. Theresults show that a decision tree and logistic regression are most suit-able because of their high accuracy combined with a low running time.A neural network can achieve a slightly higher f1-score but is slower toclassify with. Experiments on feature compression show that PCA com-pression is not worth it for the top classifiers because the accuracy ishit harder than the prediction time. The fourth step of pixel clusteringaims to split up touching blobs. Conventional approaches use a wa-tershed algorithm, but the novelty of this thesis is to apply clusteringalgorithms too. From the 6 clustering algorithms that were deemedsuitable, agglomerative clustering and k-means show the most poten-tial. Not only are they simple and run fast, but they create a high qualityclustering, as measured by silhouette score, as well.

In the next experiment the optimised blob detection process was ap-plied to a typical image captured for the purpose of in situ RNA se-quencing. The aim was to ascertain that the pipeline would work inpractice. The running time was just over 3 hours, which satisfies therequirement that it must be less than the time to capture that image(5 hours). Furthermore, the experiment shows that feature extractionis by far (75%) the most lengthy step. This shows that feature selec-tion is likely more significant than choice of classification or clusteringalgorithm. Even more, the results of experiment A show large discrep-ancies between the relevancy and time to calculate the features. Thismeans that by selecting a less computation-heavy set of features, therunning time can be greatly decreased.

The final experiment compares the running time of this thesis’ blobdetector, called MFB detector, with FIJI, CellProfiler and Ilastik. Therunning time of MFB detector was similar to FIJI but slower than Ilastik.The probable reason is that Ilastik is more highly optimised.

All-in-all, the results show that indeed machine learning can be veryeffective for blob detection in high-resolution 3D microscopy images.The proposed blob detector, using 10 optimal features, a decision treepixel classifier and agglomerative clustering algorithm, approaches thestate-of-the-art in terms of speed. Since the MFB detector is trained onlabelled data it is presumed to be more accurate than the other blobdetectors that rely on user-set parameters. This suspicion is howeverhard to check because there is no ground truth available on the blobs.Table 8.1 provides a comparison of the four blob detectors.

76 CHAPTER 8. CONCLUSIONS

MFB detector FIJI CellProfiler Ilastik

Thresholding Supervised ML User parameters User parameters Supervised ML(labels from data) (labels from user)

Declumping xy-clustering Watershed Watershed Noneor watershed

Out-of-core Yes No No Yes3D Yes Yes Yes, but limited Yes

Time (s) 16 14 90 8

Table 8.1: Comparison of blob detectors. The time denotes the meanduration to process a 500× 500× 16 pixels chunk.

8.1 Discussion

The implication of the proven good performance is that the MFB de-tector can eventually be integrated in high-content screening pipelinesfor analysis of biomedical images. Because the MFB detector does notrely on user parameters, it saves the medical experts time and cognitiveeffort. Instead of needing to tune the blob detection, they can focus onother stages of the analysis like diagnosis and interpretation. Further-more, since the detector was trained on labelled data of blobs, it is sup-posedly more accurate than approaches that rely on user parameters.As a potential drawback, it does mean that the labelled has to be cor-rect, because the performance of the blob detection depends stronglyon it.

There comes always a time that one should be critical of his own work.Starting with what is good about the research, the found blob detectorcombines a machine-learned thresholding algorithm combined withthe novel use of a clustering algorithm for blob declumping. On theway an optimal set of features was selected and it was found that therecan be slight differences in features calculated in 2D versus 3D. More-over, PCA compression was found to be nonsensical for the problem.An automatic method of creating labelled image data was devised byusing the difference between two images with and without blobs.

The research question has been thoroughly investigated. For every stepin the blob detection process those features or algorithms have been


tested that are either used in related work or have similar function. Anadditional requirement for the algorithms was that they must be easyto implement using available software libraries. By focusing on thislow hanging fruit it is possible that some other good candidates weremissed. Also it is possible that the tested candidates have not beenperfectly optimised. Since a support vector machine scales more thanquadratically with the number of samples, it was arduous to train. Thesolution was therefore to train on a smaller sample. But this resultedin a low f1-score compared to other classifiers, even though the SVM isknown to be generally a good performer. Moreover, only a very simpleversion of a feed-forward neural network was evaluated. Perhaps otherhyper-parameters such as activation function, learning algorithm andnetwork structure can improve its performance.

For feature selection, the (confirmed) suspicion was that the choice offeatures is highly important for the overall performance of blob detec-tion. Therefore an approach was consciously chosen that is specificallydesigned for computer vision. In a structured fashion it selects a sub-set of features that expresses both high relevancy and low redundancy.Despite this, there may be better ways to pick the features. For exam-ple, wrapper methods can be applied to recursively find an optimal sub-set of features using a performance evaluation at every iteration [94].Or embedded methods can be tested that directly integrate with the learn-ing algorithm [94]. In feature selection, being a whole field on its own,a great number of methods have been published already. And withopen problems such as scalability, stability and model selection [94]more papers are added every year.

The training data came from a single image only because there was noaccess to more. This may hurt the generality of the learned blob defini-tion. Ideally data from multiple images is used to train the pixel classi-fiers. There was also no data on the ground truth of blobs to check theclustering results with. Moreover, the unsupervised silhouette scoremetric was used to measure clustering quality, but this may be sub-optimal.

Speaking of metrics, those chosen may not provide a good assessmentof the image features and algorithms. For feature selection many othermetrics exist, to name a few: Fisher-score, ReliefF, chi-square and F-Score [94]. Also for classification and clustering there are alternativemetrics. For classification we have the Receiver Operating Character-

78 CHAPTER 8. CONCLUSIONS

istic Area Under Curve (ROC AUC), log loss and fbeta-score [116]. Formeasuring clustering quality there is also Calinski-Harabaz score [117].With so much choice it is difficult to determine what is the optimal met-ric to use.

In the final experiment F the blob detector from this thesis was com-pared to other popular bio-image analysis tools. The performance ofthe different tools may be inaccurate because the chosen parameterscould be less than optimal. The trial-and-error nature of those toolsmade it very costly to try every combination of parameters. Thereforethe experiment should be regarded as an approximate comparison ofthe running times of state-of-the-art tools.

8.2 Future work

As mentioned in the delimitations, the blob detector implementationin this project must be viewed as a leap in the right direction and notas final software product. It works reasonably but numerous improve-ments to speed, accuracy and usability are still available.

Image processing, which is now performed by the Python-based scikit-image, could be accelerated by C-based libraries like OpenCV [118] orVIGRA [115]. Within the Python environment, general speed improve-ments can be achieved by code optimisations such as Numba’s [119]just-in-time compilation. For feature extraction, instead of letting theCPU apply convolutional filters to the input image, GPU’s, being farmore efficient at such tasks, can be used. Parallelisation over multi-ple cores or multiple CPU’s may decrease the running time even more.Since the images are currently processed in chunks anyway, it shouldnot be too hard to distribute them over multiple processors. A higherdivision of work leads to more overhead though. So perhaps it wouldbe interesting to compare this to the more efficient method of storingthe complete image in a single memory array and processing it in asingle system.

In this thesis only one approach of machine learning to blob detectionhas been tested, but there are others that may be successful as well.Over-segmentation, upon which subsection 3.2.6 shortly touched, maybe used to first split the image into super-pixels and then classify each


group of pixels as being a blob or not using a machine-learned classi-fier. In this project there was no access to data for enabling this but per-haps future research can evaluate the over-segmentation approach aswell. For instance, white blood cell over-segmentation has been donewith support vector machines in [120]. In addition, a convolutionalneural network may be used for segmentation of bio-images like in[121, 122]. The advantage is that feature extraction is not necessarybecause the neural network is able to detect visual patterns of neigh-bouring pixels using convolutions.

For images where only a small fraction of the total area contains blobs,there may be ways to avoid searching through empty regions. An ideacould be to first predict for every region whether it contains blobs, per-haps using machine learning, before running the full blob detector onthe region.

By observing the classification results, it could be sometimes noted thatnoisy artefacts in the images, caused by imperfections of the tissue sam-ples, were sometimes classified as blobs. Since this leads to a loweroverall precision, one could look into ways of avoiding these false pos-itives.

As extension to the blob detection, machine learning may be appliedto the sequencing of RNA as well. With enough data of different se-quences, a model can be trained to predict the RNA sequences basedon the found blobs in multiple input images. In [123] the authors usegraph optimisation to find the most probable RNA sequence. How in-stead this could be achieved with machine learning is left as an openquestion.

Finally, since the tool was not intended for production as-is, usabilityhas been left out of scope. But for the software tool to be convenientin practice, several additions are required that aid the user. A graph-ical user interface could be included that shows the available options,provides feedback and displays the end results. Extra checks on userinput may be added to improve robustness. Options for different inputand output formats could be a good addition as well.

Bibliography

[1] D. Smyth, J. Bowman, and E. Meyerowitz, “Early flower devel-opment in Arabidopsis,” Plant Cell, vol. 2, no. 8, pp. 755–767,1990.

[2] Z. Zaman, G. Fogazzi, G. Garigali, M. Croci, G. Bayer, and T.Kránicz, “Urine sediment analysis: Analytical and diagnosticperformance of sediMAX® - A new automated microscopy image-based urine sediment analyser,” Clinica Chimica Acta, vol. 411,no. 3-4, pp. 147–154, 2010. doi: 10.1016/j.cca.2009.10.018.

[3] H. Krivinkova, J. Pontén, and T. Blöndal, “THE DIAGNOSISOF CANCER FROM BODY FLUIDS: A Comparison of Cytol-ogy, DNA Measurement, Tissue Culture, Scanning and Trans-mission Microscopy,” Acta Pathologica Microbiologica Scandinav-ica Section A Pathology, vol. 84 A, no. 6, pp. 455–467, 1976. doi:10.1111/j.1699-0463.1976.tb00143.x.

[4] E. Volpi and J. Bridger, “FISH glossary: An overview of the fluo-rescence in situ hybridization technique,” BioTechniques, vol. 45,no. 4, pp. 385–409, 2008. doi: 10.2144/000112811.

[5] J. Lee, E. Daugharthy, J. Scheiman, et al., “Fluorescent in situsequencing (FISSEQ) of RNA for gene expression profiling inintact cells and tissues,” Nature Protocols, vol. 10, no. 3, pp. 442–458, 2015. doi: 10.1038/nprot.2014.191.

[6] T. W. Nattkemper, T. Twellmann, H. Ritter, and W. Schubert,“Human vs machine: Evaluation of fluorescence micrographs,”eng, Computers in Biology and Medicine, vol. 33, no. 1, pp. 31–43,Jan. 2003, issn: 0010-4825.

80

https://doi.org/10.1016/j.cca.2009.10.018

https://doi.org/10.1038/nprot.2014.191

https://doi.org/10.2144/000112811

https://doi.org/10.1111/j.1699-0463.1976.tb00143.x

BIBLIOGRAPHY 81

[7] P. Rämö, R. Sacher, B. Snijder, B. Begemann, and L. Pelkmans,“CellClassifier: Supervised learning of cellular phenotypes,” Bioin-formatics, vol. 25, no. 22, pp. 3028–3030, 2009. doi: 10 . 1093 /bioinformatics/btp524.

[8] M. Held, M. Schmitz, B. Fischer, et al., “CellCognition: Time-resolved phenotype annotation in high-throughput live cell imag-ing,” Nature Methods, vol. 7, no. 9, pp. 747–754, 2010. doi: 10.1038/nmeth.1486.

[9] D. Laksameethanasan, R. Tan, G.-L. Toh, and L.-H. Loo, “CellX-press: A fast and user-friendly software platform for profilingcellular phenotypes,” BMC Bioinformatics, vol. 14, no. SUPPL16,2013. doi: 10.1186/1471-2105-14-S16-S4.

[10] A. Carpenter, T. Jones, M. Lamprecht, et al., “CellProfiler: Im-age analysis software for identifying and quantifying cell phe-notypes,” Genome Biology, vol. 7, no. 10, 2006. doi: 10.1186/gb-2006-7-10-r100.

[11] F. Zanella, J. B. Lorens, and W. Link, “High content screening:Seeing is believing,” eng, Trends in Biotechnology, vol. 28, no. 5,pp. 237–245, May 2010, issn: 1879-3096. doi: 10.1016/j.tibtech.2010.02.005.

[12] K. Carlsson, R. Lenz, and N. Åslund, “Three-dimensional mi-croscopy using a confocal laser scanning microscope,” OpticsLetters, vol. 10, no. 2, pp. 53–55, 1985. doi: 10 . 1364 / OL . 10 .000053.

[13] J. Schindelin, I. Arganda-Carreras, E. Frise, et al., “Fiji: An open-source platform for biological-image analysis,” Nature Methods,vol. 9, no. 7, pp. 676–682, 2012. doi: 10.1038/nmeth.2019.

[14] Health, en-US. [Online]. Available: http://www.un.org/sustainabledevelopment/health/ (visited on 03/28/2018).

[15] A. Håkansson, “Portal of Research Methods and Methodolo-gies for Research Projects and Degree Projects,” eng, in DIVA,CSREA Press U.S.A, 2013, pp. 67–73. [Online]. Available: http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-136960(visited on 01/24/2018).

[16] J. Caicedo, S. Cooper, F. Heigwer, et al., “Data-analysis strategiesfor image-based cell profiling,” Nature Methods, vol. 14, no. 9,pp. 849–863, 2017. doi: 10.1038/nmeth.4397.

https://doi.org/10.1038/nmeth.1486

https://doi.org/10.1364/OL.10.000053

https://doi.org/10.1364/OL.10.000053


https://doi.org/10.1186/1471-2105-14-S16-S4

http://www.un.org/sustainabledevelopment/health/

http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-136960

https://doi.org/10.1186/gb-2006-7-10-r100

https://doi.org/10.1093/bioinformatics/btp524

http://www.un.org/sustainabledevelopment/health/



https://doi.org/10.1016/j.tibtech.2010.02.005

https://doi.org/10.1093/bioinformatics/btp524

http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-136960

https://doi.org/10.1186/gb-2006-7-10-r100

https://doi.org/10.1016/j.tibtech.2010.02.005

82 BIBLIOGRAPHY

[17] B. Edris, J. A. Fletcher, R. B. West, M. van de Rijn, and A. H. Beck,Comparative Gene Expression Profiling of Benign and Malignant Le-sions Reveals Candidate Therapeutic Compounds for Leiomyosarcoma,en, Research article, 2012. doi: 10.1155/2012/805614. [Online].Available: https://www.hindawi.com/journals/sarcoma/2012/805614/ (visited on 02/20/2018).

[18] E. Solomon, L. Berg, and D. W. Martin, Biology, English, 8 edi-tion. Belmont, CA: Brooks Cole, Jan. 2007, isbn: 978-0-495-31714-2.

[19] K. Hofmann, “Enzyme Bioinformatics,” en, in Enzyme Cataly-sis in Organic Synthesis, Karlheinzauz and H. Waldmann, Eds.,Wiley-VCH Verlag GmbH, 2002, pp. 139–162, isbn: 978-3-527-61826-2. doi: 10 . 1002 / 9783527618262 . ch5. [Online]. Avail-able: http://onlinelibrary.wiley.com.focus.lib.kth.se / doi / 10 . 1002 / 9783527618262 . ch5 / summary (visited on02/20/2018).

[20] R. Ke, M. Mignardi, A. Pacureanu, et al., “In situ sequencingfor RNA analysis in preserved tissue and cells,” Nature Meth-ods, vol. 10, no. 9, pp. 857–860, 2013. doi: 10.1038/nmeth.2563.

[21] D. J. S. Birch, Y. Chen, and O. J. Rolinski, “Fluorescence,” en,in Photonics, D. L. Andrews, Ed., John Wiley & Sons, Inc., 2015,pp. 1–58, isbn: 978-1-119-01180-4. doi: 10.1002/9781119011804.ch1. [Online]. Available: http://onlinelibrary.wiley.com.focus . lib . kth . se / doi / 10 . 1002 / 9781119011804 . ch1 /summary (visited on 02/20/2018).

[22] N. Battich, T. Stoeger, and L. Pelkmans, “Image-based transcrip-tomics in thousands of single human cells at single-moleculeresolution,” En, Nature Methods, vol. 10, no. 11, p. 1127, Oct.2013, issn: 1548-7105. doi: 10.1038/nmeth.2657. [Online]. Avail-able: https://www.nature.com.focus.lib.kth.se/articles/nmeth.2657 (visited on 01/25/2018).

[23] Y. Li, S. Wang, Q. Tian, and X. Ding, “A survey of recent ad-vances in visual feature detection,” Neurocomputing, vol. 149,pp. 736–751, Feb. 2015, issn: 0925-2312. doi: 10.1016/j.neucom.2014.08.003. [Online]. Available: http://www.sciencedirect.com / science / article / pii / S0925231214010121 (visited on01/22/2018).

https://doi.org/10.1002/9781119011804.ch1

http://onlinelibrary.wiley.com.focus.lib.kth.se/doi/10.1002/9781119011804.ch1/summary

https://doi.org/10.1002/9783527618262.ch5

http://www.sciencedirect.com/science/article/pii/S0925231214010121


https://doi.org/10.1016/j.neucom.2014.08.003

https://doi.org/10.1002/9781119011804.ch1



https://www.nature.com.focus.lib.kth.se/articles/nmeth.2657

https://doi.org/10.1016/j.neucom.2014.08.003



https://doi.org/10.1155/2012/805614

https://www.nature.com.focus.lib.kth.se/articles/nmeth.2657

http://www.sciencedirect.com/science/article/pii/S0925231214010121


https://www.hindawi.com/journals/sarcoma/2012/805614/

https://www.hindawi.com/journals/sarcoma/2012/805614/

BIBLIOGRAPHY 83

[24] T. Lindeberg, “Detecting salient blob-like image structures andtheir scales with a scale-space primal sketch: A method for focus-of-attention,” en, International Journal of Computer Vision, vol. 11,no. 3, pp. 283–318, Dec. 1993, issn: 0920-5691, 1573-1405. doi:10 . 1007 / BF01469346. [Online]. Available: https : / / link -springer-com.focus.lib.kth.se/article/10.1007/BF01469346(visited on 01/30/2018).

[25] Blob Detection Using OpenCV ( Python, C++ ) | Learn OpenCV.[Online]. Available: https : / / www . learnopencv . com / blob -detection-using-opencv-python-c/ (visited on 02/21/2018).

[26] S. Lazebnik, Blob detection, Feb. 2011. [Online]. Available: http://www.cs.unc.edu/~lazebnik/spring11/lec08_blob.pdf(visited on 02/21/2018).

[27] T. Lindeberg and J.-O. Eklundh, “Scale detection and region ex-traction from a scale-space primal sketch,” 1990, pp. 416–426.

[28] A. Kaspers, “Blob detection,” English, Image Science Institute,UMC Utrecht, Tech. Rep., 2011. (visited on 02/21/2018).

[29] M. Sonka, V. Hlavac, and R. Boyle, Image Processing, Analysis,and Machine Vision. Thomson-Engineering, 2007, isbn: 978-0-495-08252-1.

[30] Template Matching — skimage v0.14dev docs. [Online]. Available:http : / / scikit - image . org / docs / dev / auto _ examples /features_detection/plot_template.html (visited on 02/21/2018).

[31] M. Sezgin and B. Sankur, “Survey over image thresholding tech-niques and quantitative performance evaluation,” J. ElectronicImaging, vol. 13, pp. 146–168, Jan. 2004.

[32] S. van der Walt, J. L. Schönberger, J. Nunez-Iglesias, et al., “Scikit-image: Image processing in Python,” en, PeerJ, vol. 2, e453, Jun.2014, issn: 2167-8359. doi: 10.7717/peerj.453. [Online]. Avail-able: https://peerj.com/articles/453 (visited on 02/21/2018).

[33] T. Lindeberg, “Feature Detection with Automatic Scale Selec-tion,” International Journal of Computer Vision, vol. 30, no. 2, pp. 79–116, 1998.

https://www.learnopencv.com/blob-detection-using-opencv-python-c/

https://www.learnopencv.com/blob-detection-using-opencv-python-c/

https://doi.org/10.7717/peerj.453

https://peerj.com/articles/453

http://www.cs.unc.edu/~lazebnik/spring11/lec08_blob.pdf

http://scikit-image.org/docs/dev/auto_examples/features_detection/plot_template.html

https://link-springer-com.focus.lib.kth.se/article/10.1007/BF01469346

https://link-springer-com.focus.lib.kth.se/article/10.1007/BF01469346

https://doi.org/10.1007/BF01469346

http://scikit-image.org/docs/dev/auto_examples/features_detection/plot_template.html

http://www.cs.unc.edu/~lazebnik/spring11/lec08_blob.pdf

84 BIBLIOGRAPHY

[34] T. Lindeberg, “Scale Selection Properties of Generalized Scale-Space Interest Point Detectors,” en, Journal of Mathematical Imag-ing and Vision, vol. 46, no. 2, pp. 177–210, Jun. 2013, issn: 0924-9907, 1573-7683. doi: 10.1007/s10851-012-0378-3. [Online].Available: https://link-springer-com.focus.lib.kth.se/article/10.1007/s10851-012-0378-3 (visited on 03/22/2018).

[35] J. D. Hunter, “Matplotlib: A 2d Graphics Environment,” Com-puting in Science & Engineering, vol. 9, no. 3, pp. 90–95, 2007, issn:1521-9615. doi: 10.1109/MCSE.2007.55. [Online]. Available:http://ieeexplore.ieee.org/document/4160265/ (visited on02/21/2018).

[36] C. Sommer, C. Straehle, U. Kothe, and F. Hamprecht, “Ilastik:Interactive learning and segmentation toolkit,” 2011, pp. 230–233. doi: 10.1109/ISBI.2011.5872394.

[37] P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient Graph-Based Image Segmentation,” en, International Journal of ComputerVision, vol. 59, no. 2, pp. 167–181, Sep. 2004, issn: 0920-5691,1573-1405. doi: 10.1023/B:VISI.0000022288.19776.77. [On-line]. Available: https://link-springer-com.focus.lib.kth.se/article/10.1023/B:VISI.0000022288.19776.77 (visitedon 01/30/2018).

[38] A. Vedaldi and S. Soatto, “Quick Shift and Kernel Methods forMode Seeking,” en, in Computer Vision – ECCV 2008, ser. Lec-ture Notes in Computer Science, Springer, Berlin, Heidelberg,Oct. 2008, pp. 705–718, isbn: 978-3-540-88692-1 978-3-540-88693-8. doi: 10.1007/978-3-540-88693-8_52. [Online]. Available:https://link-springer-com.focus.lib.kth.se/chapter/10.1007/978-3-540-88693-8_52 (visited on 01/30/2018).

[39] D. Lowe, “Distinctive image features from scale-invariant key-points,” International Journal of Computer Vision, vol. 60, no. 2,pp. 91–110, 2004. doi: 10.1023/B:VISI.0000029664.99615.94.

[40] W. S. Qureshi, A. Payne, K. B. Walsh, R. Linker, O. Cohen, andM. N. Dailey, “Machine vision for counting fruit on mango treecanopies,” en, Precision Agriculture, vol. 18, no. 2, pp. 224–244,Apr. 2017, issn: 1385-2256, 1573-1618. doi: 10 . 1007 / s11119 -016- 9458- 5. [Online]. Available: https://link.springer.com/article/10.1007/s11119-016-9458-5 (visited on 01/22/2018).

https://link-springer-com.focus.lib.kth.se/article/10.1023/B:VISI.0000022288.19776.77

https://doi.org/10.1109/ISBI.2011.5872394

https://doi.org/10.1109/MCSE.2007.55

https://link-springer-com.focus.lib.kth.se/article/10.1023/B:VISI.0000022288.19776.77

https://link.springer.com/article/10.1007/s11119-016-9458-5

http://ieeexplore.ieee.org/document/4160265/

https://doi.org/10.1007/s11119-016-9458-5

https://doi.org/10.1023/B:VISI.0000022288.19776.77

https://doi.org/10.1023/B:VISI.0000029664.99615.94

https://link-springer-com.focus.lib.kth.se/article/10.1007/s10851-012-0378-3

https://doi.org/10.1007/s10851-012-0378-3

https://link-springer-com.focus.lib.kth.se/chapter/10.1007/978-3-540-88693-8_52


https://link.springer.com/article/10.1007/s11119-016-9458-5

https://doi.org/10.1007/978-3-540-88693-8_52

https://doi.org/10.1007/s11119-016-9458-5


BIBLIOGRAPHY 85

[41] Comparison of segmentation and superpixel algorithms — skimagev0.14dev docs. [Online]. Available: http://scikit-image.org/docs/dev/auto_examples/segmentation/plot_segmentations.html (visited on 02/21/2018).

[42] A. L. Samuel, “Some Studies in Machine Learning Using theGame of Checkers,” IBM Journal of Research and Development,vol. 3, no. 3, pp. 210–229, Jul. 1959, issn: 0018-8646. doi: 10.1147/rd.33.0210.

[43] S. J. Russell and P. Norvig, Artificial Intelligence: A Modern Ap-proach, en. Prentice Hall, 2010, isbn: 978-0-13-604259-4.

[44] P. Berkhin, “A survey of clustering data mining techniques,” inGrouping Multidimensional Data: Recent Advances in Clustering,2006, pp. 25–71. doi: 10.1007/3-540-28349-8_2.

[45] S. T. Roweis and L. K. Saul, “Nonlinear Dimensionality Reduc-tion by Locally Linear Embedding,” en, Science, vol. 290, no. 5500,pp. 2323–2326, Dec. 2000, issn: 0036-8075, 1095-9203. doi: 10.1126/science.290.5500.2323. [Online]. Available: http://science.sciencemag.org/content/290/5500/2323 (visited on02/23/2018).

[46] T. Menzies, “Data Mining,” en, in Recommendation Systems inSoftware Engineering, Springer, Berlin, Heidelberg, 2014, pp. 39–75, isbn: 978-3-642-45134-8 978-3-642-45135-5. doi: 10.1007/978-3-642-45135-5_3. [Online]. Available: https://link-springer-com.focus.lib.kth.se/chapter/10.1007/978-3-642-45135-5_3 (visited on 02/26/2018).

[47] P. Domingos and M. Pazzani, “On the Optimality of the SimpleBayesian Classifier under Zero-One Loss,” en, Machine Learning,vol. 29, no. 2-3, pp. 103–130, Nov. 1997, issn: 0885-6125, 1573-0565. doi: 10.1023/A:1007413511361. [Online]. Available: https://link-springer-com.focus.lib.kth.se/article/10.1023/A:1007413511361 (visited on 02/26/2018).

[48] X. Wu, V. Kumar, J. R. Quinlan, et al., “Top 10 algorithms in datamining,” en, Knowledge and Information Systems, vol. 14, no. 1,pp. 1–37, Jan. 2008, issn: 0219-1377, 0219-3116. doi: 10.1007/s10115 - 007 - 0114 - 2. [Online]. Available: https : / / link -springer-com.focus.lib.kth.se/article/10.1007/s10115-007-0114-2 (visited on 02/28/2018).

https://link-springer-com.focus.lib.kth.se/article/10.1023/A:1007413511361

https://doi.org/10.1126/science.290.5500.2323

http://science.sciencemag.org/content/290/5500/2323


https://doi.org/10.1147/rd.33.0210


https://doi.org/10.1007/3-540-28349-8_2



https://doi.org/10.1007/978-3-642-45135-5_3


https://doi.org/10.1007/s10115-007-0114-2


http://scikit-image.org/docs/dev/auto_examples/segmentation/plot_segmentations.html

https://doi.org/10.1147/rd.33.0210

https://doi.org/10.1126/science.290.5500.2323



https://doi.org/10.1007/978-3-642-45135-5_3


https://doi.org/10.1007/s10115-007-0114-2


https://doi.org/10.1023/A:1007413511361


86 BIBLIOGRAPHY

[49] M. M. Deza and E. Deza, Encyclopedia of Distances, en, 4th ed.Berlin Heidelberg: Springer-Verlag, 2016, isbn: 978-3-662-52843-3. [Online]. Available: //www.springer.com/la/book/9783662528433(visited on 03/22/2018).

[50] P. Hart, “The condensed nearest neighbor rule (Corresp.),” IEEETransactions on Information Theory, vol. 14, no. 3, pp. 515–516,May 1968, issn: 0018-9448. doi: 10.1109/TIT.1968.1054155.

[51] D. L. Wilson, “Asymptotic Properties of Nearest Neighbor RulesUsing Edited Data,” IEEE Transactions on Systems, Man, and Cy-bernetics, vol. SMC-2, no. 3, pp. 408–421, Jul. 1972, issn: 0018-9472. doi: 10.1109/TSMC.1972.4309137.

[52] V. Podgorelec and M. Zorman, “Decision trees,” in Computa-tional Complexity: Theory, Techniques, and Applications, vol. 9781461418009,2012, pp. 827–845. doi: 10.1007/978-1-4614-1800-9_53.

[53] T. K. Ho, “A Data Complexity Analysis of Comparative Ad-vantages of Decision Forest Constructors,” en, Pattern Analysis& Applications, vol. 5, no. 2, pp. 102–112, Jun. 2002, issn: 1433-7541. doi: 10.1007/s100440200009. [Online]. Available: https://link-springer-com.focus.lib.kth.se/article/10.1007/s100440200009 (visited on 02/28/2018).

[54] Y. Freund and R. E. Schapire, “A Decision-Theoretic Generaliza-tion of On-Line Learning and an Application to Boosting,” Jour-nal of Computer and System Sciences, vol. 55, no. 1, pp. 119–139,Aug. 1997, issn: 0022-0000. doi: 10.1006/jcss.1997.1504. [On-line]. Available: http://www.sciencedirect.com/science/article/pii/S002200009791504X (visited on 02/28/2018).

[55] D. Simovici, “Intelligent Data Analysis Techniques—MachineLearning and Data Mining,” en, in Artificial Intelligent Approachesin Petroleum Geosciences, Springer, Cham, 2015, pp. 1–51, isbn:978-3-319-16530-1 978-3-319-16531-8. doi: 10.1007/978-3-319-16531-8_1. [Online]. Available: https://link-springer-com.focus.lib.kth.se/chapter/10.1007/978-3-319-16531-8_1(visited on 02/27/2018).

[56] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MITPress, 2016.

https://doi.org/10.1007/s100440200009

http://www.sciencedirect.com/science/article/pii/S002200009791504X

https://doi.org/10.1109/TIT.1968.1054155


https://doi.org/10.1007/978-3-319-16531-8_1

https://link-springer-com.focus.lib.kth.se/article/10.1007/s100440200009

https://doi.org/10.1006/jcss.1997.1504


http://www.sciencedirect.com/science/article/pii/S002200009791504X

https://doi.org/10.1109/TSMC.1972.4309137



https://doi.org/10.1007/978-3-319-16531-8_1

https://doi.org/10.1007/978-1-4614-1800-9_53

//www.springer.com/la/book/9783662528433

BIBLIOGRAPHY 87

[57] M. D. Zeiler, “ADADELTA: An Adaptive Learning Rate Method,”arXiv:1212.5701 [cs], Dec. 2012, arXiv: 1212.5701. [Online]. Avail-able: http://arxiv.org/abs/1212.5701 (visited on 04/11/2018).

[58] D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Opti-mization,” arXiv:1412.6980 [cs], Dec. 2014, arXiv: 1412.6980. [On-line]. Available: http://arxiv.org/abs/1412.6980 (visited on04/11/2018).

[59] D. Mysid, A simplified view of an artifical neural network. Nov.2006. [Online]. Available: https://commons.wikimedia.org/w/index.php?curid=1412126 (visited on 04/11/2018).

[60] S. Firdaus and M. A. Uddin, “A Survey on Clustering Algo-rithms and Complexity Analysis,” English, International Jour-nal of Computer Science, vol. 12, no. 2, pp. 62–85, Mar. 2015, issn:1694-0814. [Online]. Available: https://www.ijcsi.org/papers/IJCSI-12-2-62-85.pdf (visited on 03/01/2018).

[61] D. Comaniciu and P. Meer, “Mean shift: A robust approach to-ward feature space analysis,” IEEE Transactions on Pattern Analy-sis and Machine Intelligence, vol. 24, no. 5, pp. 603–619, May 2002,issn: 0162-8828. doi: 10.1109/34.1000236.

[62] Sklearn.cluster.MeanShift — scikit-learn 0.19.1 documentation. [On-line]. Available: http://scikit-learn.org/stable/modules/generated/sklearn.cluster.MeanShift.html#sklearn.cluster.MeanShift (visited on 03/05/2018).

[63] A. Y. Ng, M. I. Jordan, and Y. Weiss, “On Spectral Clustering:Analysis and an algorithm,” in Advances in Neural InformationProcessing Systems, MIT Press, 2001, pp. 849–856.

[64] U. V. Luxburg, A Tutorial on Spectral Clustering. 2007.[65] B. J. Frey and D. Dueck, “Clustering by Passing Messages Be-

tween Data Points,” en, Science, vol. 315, no. 5814, pp. 972–976,Feb. 2007, issn: 0036-8075, 1095-9203. doi: 10.1126/science.1136800. [Online]. Available: http : / / science . sciencemag .org/content/315/5814/972 (visited on 03/02/2018).

https://doi.org/10.1126/science.1136800

https://doi.org/10.1109/34.1000236

https://www.ijcsi.org/papers/IJCSI-12-2-62-85.pdf

http://scikit-learn.org/stable/modules/generated/sklearn.cluster.MeanShift.html#sklearn.cluster.MeanShift

http://arxiv.org/abs/1212.5701



https://commons.wikimedia.org/w/index.php?curid=1412126




https://commons.wikimedia.org/w/index.php?curid=1412126

https://www.ijcsi.org/papers/IJCSI-12-2-62-85.pdf


88 BIBLIOGRAPHY

[66] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A Density-basedAlgorithm for Discovering Clusters a Density-based Algorithmfor Discovering Clusters in Large Spatial Databases with Noise,”in Proceedings of the Second International Conference on KnowledgeDiscovery and Data Mining, ser. KDD’96, Portland, Oregon: AAAIPress, 1996, pp. 226–231. [Online]. Available: http://dl.acm.org/citation.cfm?id=3001460.3001507 (visited on 03/02/2018).

[67] 2.3. Clustering — scikit-learn 0.19.1 documentation. [Online]. Avail-able: http://scikit-learn.org/stable/modules/clustering.html#birch (visited on 03/02/2018).

[68] P. Rousseeuw, “Silhouettes: A graphical aid to the interpreta-tion and validation of cluster analysis,” Journal of Computationaland Applied Mathematics, vol. 20, no. C, pp. 53–65, 1987. doi: 10.1016/0377-0427(87)90125-7.

[69] Sklearn.metrics.silhouette_score — scikit-learn 0.19.1 documentation.[Online]. Available: http://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html#sklearn.metrics.silhouette_score (visited on 03/05/2018).

[70] A. Hervé and L. Williams, “Principal component analysis,” Wi-ley Interdisciplinary Reviews: Computational Statistics, vol. 2, no. 4,pp. 433–459, Jul. 2010, issn: 1939-5108. doi: 10.1002/wics.101.[Online]. Available: http://onlinelibrary.wiley.com/doi/abs/10.1002/wics.101 (visited on 04/03/2018).

[71] S. Ng, “Principal component analysis to reduce dimension ondigital image,” vol. 111, 2017, pp. 113–119. doi: 10 . 1016 / j .procs.2017.06.017.

[72] K. T. M. Han and B. Uyyanonvara, “A Survey of Blob Detec-tion Algorithms for Biomedical Images,” in 2016 7th Interna-tional Conference of Information and Communication Technology forEmbedded Systems (IC-ICTES), Mar. 2016, pp. 57–60. doi: 10 .1109/ICTEmSys.2016.7467122.

[73] K. Yamamoto, Y. Yoshioka, and S. Ninomiya, “Detection andcounting of intact tomato fruits on tree using image analysis andmachine learning methods,” 2013, pp. 664–667.

https://doi.org/10.1016/0377-0427(87)90125-7

http://dl.acm.org/citation.cfm?id=3001460.3001507


https://doi.org/10.1109/ICTEmSys.2016.7467122

http://onlinelibrary.wiley.com/doi/abs/10.1002/wics.101

http://scikit-learn.org/stable/modules/clustering.html#birch

https://doi.org/10.1002/wics.101

https://doi.org/10.1016/j.procs.2017.06.017

https://doi.org/10.1016/0377-0427(87)90125-7

https://doi.org/10.1016/j.procs.2017.06.017

http://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html#sklearn.metrics.silhouette_score

https://doi.org/10.1109/ICTEmSys.2016.7467122

http://scikit-learn.org/stable/modules/clustering.html#birch



http://onlinelibrary.wiley.com/doi/abs/10.1002/wics.101

BIBLIOGRAPHY 89

[74] N. J. B. McFarlane and C. P. Schofield, “Segmentation and track-ing of piglets in images,” en, Machine Vision and Applications,vol. 8, no. 3, pp. 187–193, May 1995, issn: 0932-8092, 1432-1769.doi: 10.1007/BF01215814. [Online]. Available: https://link.springer.com/article/10.1007/BF01215814 (visited on 01/23/2018).

[75] J. Zavadil, J. Tuma, and V. Santos, “Traffic signs detection usingblob analysis and pattern recognition,” 2012, pp. 776–779. doi:10.1109/CarpathianCC.2012.6228752.

[76] L. Minor and J. Sklansky, “The Detection and Segmentation ofBlobs in Infrared Images,” IEEE Transactions on Systems, Man andCybernetics, vol. 11, no. 3, pp. 194–201, 1981. doi: 10.1109/TSMC.1981.4308652.

[77] W. Moon, Y.-W. Shen, M. Bae, C.-S. Huang, J.-H. Chen, and R.-F.Chang, “Computer-aided tumor detection based on multi-scaleblob detection algorithm in automated breast ultrasound im-ages,” IEEE Transactions on Medical Imaging, vol. 32, no. 7, pp. 1191–1200, 2013. doi: 10.1109/TMI.2012.2230403.

[78] C. A. Schneider, W. S. Rasband, and K. W. Eliceiri, NIH Imageto ImageJ: 25 years of image analysis, en, Comments and Opin-ion, Jun. 2012. doi: 10.1038/nmeth.2089. [Online]. Available:http://www.nature.com/articles/nmeth.2089 (visited on05/15/2018).

[79] J. Lee, E. Daugharthy, J. Scheiman, et al., “Highly multiplexedsubcellular RNA sequencing in situ,” Science, vol. 343, no. 6177,pp. 1360–1363, 2014. doi: 10.1126/science.1250212.

[80] T. Stoeger, N. Battich, M. Herrmann, Y. Yakimovich, and L. Pelk-mans, “Computer vision for image-based transcriptomics,” Meth-ods, vol. 85, pp. 44–53, 2015. doi: 10.1016/j.ymeth.2015.05.016.

[81] O. Z. Kraus and B. J. Frey, “Computer vision for high contentscreening,” Critical Reviews in Biochemistry and Molecular Biol-ogy, vol. 51, no. 2, pp. 102–109, Mar. 2016, issn: 1040-9238. doi:10.3109/10409238.2015.1135868. [Online]. Available: https:/ / doi . org / 10 . 3109 / 10409238 . 2015 . 1135868 (visited on02/01/2018).



https://link.springer.com/article/10.1007/BF01215814

https://doi.org/10.1109/CarpathianCC.2012.6228752


https://doi.org/10.1016/j.ymeth.2015.05.016

https://doi.org/10.3109/10409238.2015.1135868

https://doi.org/10.3109/10409238.2015.1135868

https://doi.org/10.1007/BF01215814

https://link.springer.com/article/10.1007/BF01215814

http://www.nature.com/articles/nmeth.2089


https://doi.org/10.1016/j.ymeth.2015.05.016

https://doi.org/10.1109/TMI.2012.2230403

https://doi.org/10.3109/10409238.2015.1135868

90 BIBLIOGRAPHY

[82] A. Shariff, J. Kangas, L. P. Coelho, S. Quinn, and R. F. Murphy,“Automated image analysis for high-content screening and anal-ysis,” eng, Journal of Biomolecular Screening, vol. 15, no. 7, pp. 726–734, Aug. 2010, issn: 1552-454X. doi: 10.1177/1087057110370894.

[83] C. Sommer and D. Gerlich, “Machine learning in cell biology-teaching computers to recognize phenotypes,” Journal of CellScience, vol. 126, no. 24, pp. 5529–5539, 2013. doi: 10.1242/jcs.123604.

[84] B. T. Grys, D. S. Lo, N. Sahin, et al., “Machine learning and com-puter vision approaches for phenotypic profiling,” en, J Cell Biol,vol. 216, no. 1, pp. 65–71, Jan. 2017, issn: 0021-9525, 1540-8140.doi: 10.1083/jcb.201610026. [Online]. Available: http://jcb.rupress.org/content/216/1/65 (visited on 01/23/2018).

[85] M. Wang, X. Zhou, F. Li, J. Huckins, R. King, and S. Wong, “Novelcell segmentation and online SVM for cell cycle phase identifi-cation in automated microscopy,” Bioinformatics, vol. 24, no. 1,pp. 94–101, 2008. doi: 10.1093/bioinformatics/btm530.

[86] K. Vermeer, d. S. van der, H. Lemij, and B. de, “Automated seg-mentation by pixel classification of retinal layers in ophthalmicOCT images,” Biomedical Optics Express, vol. 2, no. 6, pp. 1743–1756, 2011.

[87] H. Irshad, A. Veillard, L. Roux, and D. Racoceanu, “Methodsfor nuclei detection, segmentation, and classification in digitalhistopathology: A review-current status and future potential,”IEEE Reviews in Biomedical Engineering, vol. 7, pp. 97–114, 2014.doi: 10.1109/RBME.2013.2295804.

[88] K. Sirinukunwattana, S. E. A. Raza, Y.-W. Tsang, D. R. J. Snead,I. A. Cree, and N. M. Rajpoot, “Locality Sensitive Deep Learn-ing for Detection and Classification of Nuclei in Routine ColonCancer Histology Images,” IEEE Transactions on Medical Imag-ing, vol. 35, no. 5, pp. 1196–1206, May 2016, issn: 0278-0062, 1558-254X. doi: 10 . 1109 / TMI . 2016 . 2525803. [Online]. Available:http://ieeexplore.ieee.org/document/7399414/ (visited on05/03/2018).

[89] S. Niu and K. Ren, “Neural cell image segmentation methodbased on support vector machine,” vol. 9675, 2015. doi: 10.1117/12.2205114.

https://doi.org/10.1093/bioinformatics/btm530

https://doi.org/10.1177/1087057110370894

https://doi.org/10.1242/jcs.123604

https://doi.org/10.1117/12.2205114

http://jcb.rupress.org/content/216/1/65

https://doi.org/10.1083/jcb.201610026

http://jcb.rupress.org/content/216/1/65

https://doi.org/10.1242/jcs.123604

http://ieeexplore.ieee.org/document/7399414/

https://doi.org/10.1117/12.2205114

https://doi.org/10.1109/TMI.2016.2525803

https://doi.org/10.1109/RBME.2013.2295804

BIBLIOGRAPHY 91

[90] N. Hatipoglu and G. Bilgin, “Cell segmentation in histopatho-logical images with deep learning algorithms by utilizing spa-tial relationships,” Medical and Biological Engineering and Com-puting, vol. 55, no. 10, pp. 1829–1848, 2017. doi: 10.1007/s11517-017-1630-1.

[91] F. Piccinini, T. Balassa, A. Szkalisity, et al., “Advanced Cell Clas-sifier: User-Friendly Machine-Learning-Based Software for Dis-covering Phenotypes in High-Content Imaging Data,” eng, CellSystems, vol. 4, no. 6, 651–655.e5, Jun. 2017, issn: 2405-4712. doi:10.1016/j.cels.2017.05.012.

[92] J. Bins and B. A. Draper, “Feature selection from huge featuresets,” in Proceedings Eighth IEEE International Conference on Com-puter Vision. ICCV 2001, vol. 2, 2001, 159–165 vol.2. doi: 10.1109/ICCV.2001.937619.

[93] K. Kira and L. A. Rendell, “The Feature Selection Problem: Tra-ditional Methods and a New Algorithm,” in Proceedings of theTenth National Conference on Artificial Intelligence, ser. AAAI’92,San Jose, California: AAAI Press, 1992, pp. 129–134, isbn: 978-0-262-51063-9. [Online]. Available: http://dl.acm.org/citation.cfm?id=1867135.1867155 (visited on 03/27/2018).

[94] J. Li, K. Cheng, S. Wang, et al., “Feature Selection: A Data Per-spective,” arXiv:1601.07996 [cs], Jan. 2016, arXiv: 1601.07996. [On-line]. Available: http://arxiv.org/abs/1601.07996 (visitedon 03/27/2018).

[95] K. Yeager, LibGuides: SPSS Tutorials: Chi-Square Test of Indepen-dence, en. [Online]. Available: https://libguides.library.kent.edu/SPSS/ChiSquare (visited on 04/09/2018).

[96] ——, LibGuides: SPSS Tutorials: Pearson Correlation, en. [Online].Available: https : / / libguides . library . kent . edu / SPSS /PearsonCorr (visited on 04/09/2018).

[97] R. Caruana and D. Freitag, “Greedy Attribute Selection,” in InProceedings of the Eleventh International Conference on Machine Learn-ing, Morgan Kaufmann, 1994, pp. 28–36.

[98] F. Chollet et al., Keras. 2015. [Online]. Available: https://keras.io.

https://libguides.library.kent.edu/SPSS/ChiSquare


https://libguides.library.kent.edu/SPSS/ChiSquare

https://doi.org/10.1016/j.cels.2017.05.012

https://keras.io

https://doi.org/10.1109/ICCV.2001.937619

https://doi.org/10.1007/s11517-017-1630-1

https://libguides.library.kent.edu/SPSS/PearsonCorr

https://doi.org/10.1109/ICCV.2001.937619

https://libguides.library.kent.edu/SPSS/PearsonCorr

https://keras.io

https://doi.org/10.1007/s11517-017-1630-1



92 BIBLIOGRAPHY

[99] M. Abadi, A. Agarwal, P. Barham, et al., TensorFlow: Large-ScaleMachine Learning on Heterogeneous Systems. 2015. [Online]. Avail-able: https://www.tensorflow.org/.

[100] F. Pedregosa, G. Varoquaux, A. Gramfort, et al., “Scikit-learn:Machine Learning in Python,” Journal of Machine Learning Re-search, vol. 12, no. Oct, pp. 2825–2830, 2011, issn: ISSN 1533-7928. [Online]. Available: http : / / jmlr . org / papers / v12 /pedregosa11a.html (visited on 03/27/2018).

[101] T. Schaul, I. Antonoglou, and D. Silver, “Unit Tests for StochasticOptimization,” arXiv:1312.6055 [cs], Dec. 2013, arXiv: 1312.6055.[Online]. Available: http://arxiv.org/abs/1312.6055 (visitedon 04/11/2018).

[102] I. G. Goldberg, C. Allan, J.-M. Burel, et al., “The Open MicroscopyEnvironment (OME) Data Model and XML file: Open tools forinformatics and quantitative analysis in biological imaging,” eng,Genome Biology, vol. 6, no. 5, R47, 2005, issn: 1474-760X. doi: 10.1186/gb-2005-6-5-r47.

[103] Sklearn.cluster.KMeans — scikit-learn 0.19.1 documentation. [On-line]. Available: http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html (visited on 03/13/2018).

[104] L. Kamentsky, Python-javabridge: Python wrapper for the Java Na-tive Interface, original-date: 2014-03-05T16:10:38Z, May 2018. [On-line]. Available: https://github.com/LeeKamentsky/python-javabridge (visited on 05/15/2018).

[105] NumPy — NumPy. [Online]. Available: http://www.numpy.org/(visited on 03/27/2018).

[106] A. Hagberg, P. Swart, and D. S Chult, “Exploring Network Struc-ture, Dynamics, and Function Using NetworkX,” in Proceedingsof the 7th Python in Science Conference, Jan. 2008.

[107] W. McKinney, “Data Structures for Statistical Computing in Python,”in Proceedings of the 9th Python in Science Conference, S. v. d. Waltand J. Millman, Eds., 2010, pp. 51–56.

[108] Python-bioformats: Read and write life sciences file formats, original-date: 2014-03-05T16:23:41Z, Apr. 2018. [Online]. Available: https://github.com/CellProfiler/python-bioformats (visited on05/15/2018).

https://github.com/CellProfiler/python-bioformats

https://www.tensorflow.org/


http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html

https://github.com/CellProfiler/python-bioformats

http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html

http://jmlr.org/papers/v12/pedregosa11a.html

https://github.com/LeeKamentsky/python-javabridge

https://github.com/LeeKamentsky/python-javabridge

http://www.numpy.org/

https://doi.org/10.1186/gb-2005-6-5-r47

http://jmlr.org/papers/v12/pedregosa11a.html

https://doi.org/10.1186/gb-2005-6-5-r47

BIBLIOGRAPHY 93

[109] Riverbank | Software | PyQt | What is PyQt? [Online]. Avail-able: https://www.riverbankcomputing.com/software/pyqt/intro (visited on 05/15/2018).

[110] S. v. d. Walt, J. L. Schönberger, J. Nunez-Iglesias, et al., “Scikit-image: Image processing in Python,” en, PeerJ, vol. 2, e453, Jun.2014, issn: 2167-8359. doi: 10.7717/peerj.453. [Online]. Avail-able: https://peerj.com/articles/453 (visited on 03/27/2018).

[111] E. Jones, T. Oliphant, P. Peterson, et al., SciPy: Open source sci-entific tools for Python. 2001. [Online]. Available: http://www.scipy.org/.

[112] S. Silvester, Tifffile: Read and write image data from and to TIFFfiles. [Online]. Available: https : / / github . com / blink1073 /tifffile (visited on 03/27/2018).

[113] Pyyaml: Canonical source repository for PyYAML, original-date:2011-11-03T05:09:49Z, May 2018. [Online]. Available: https://github.com/yaml/pyyaml (visited on 05/15/2018).

[114] D. M. Van and G. Hinton, “Visualizing data using t-SNE,” Jour-nal of Machine Learning Research, vol. 9, pp. 2579–2625, 2008.

[115] U. Köthe, Generische Programmierung für die Bildverarbeitung, Deutsch.Hamburg: Books on Demand, Sep. 2000, isbn: 978-3-8311-0239-6.

[116] C. D. Manning, P. Raghavan, and H. Schütze, Introduction to In-formation Retrieval. New York, NY, USA: Cambridge UniversityPress, 2008, isbn: 978-0-521-86571-5.

[117] M. Kozak, ““A Dendrite Method for Cluster Analysis” by Cal-iński and Harabasz: A Classical Work that is Far Too Often In-correctly Cited,” Communications in Statistics - Theory and Meth-ods, vol. 41, no. 12, pp. 2279–2280, Jun. 2012, issn: 0361-0926. doi:10.1080/03610926.2011.560741. [Online]. Available: https:/ / doi . org / 10 . 1080 / 03610926 . 2011 . 560741 (visited on05/16/2018).

[118] G. Bradski, “The OpenCV Library,” Dr. Dobb’s Journal of SoftwareTools, 2000.

[119] Numba — Numba. [Online]. Available: https://numba.pydata.org/ (visited on 05/16/2018).

https://doi.org/10.1080/03610926.2011.560741

https://numba.pydata.org/

https://doi.org/10.1080/03610926.2011.560741

https://doi.org/10.7717/peerj.453

https://github.com/blink1073/tifffile

https://www.riverbankcomputing.com/software/pyqt/intro

https://github.com/yaml/pyyaml

https://peerj.com/articles/453

http://www.scipy.org/

https://github.com/yaml/pyyaml

https://numba.pydata.org/

https://doi.org/10.1080/03610926.2011.560741

http://www.scipy.org/

https://github.com/blink1073/tifffile

https://www.riverbankcomputing.com/software/pyqt/intro

94 BIBLIOGRAPHY

[120] X. Zheng, Y. Wang, and G. Wang, “White blood cell segmen-tation using expectation-maximization and automatic supportvector machine learning,” Shuju Caiji Yu Chuli/Journal of Data Ac-quisition and Processing, vol. 28, no. 5, pp. 614–619, 2013.

[121] D. Cireşan, A. Giusti, L. Gambardella, and J. Schmidhuber, “Deepneural networks segment neuronal membranes in electron mi-croscopy images,” vol. 4, 2012, pp. 2843–2851.

[122] P. Moeskops, M. Viergever, A. Mendrik, V. De, M. Benders, andI. Isgum, “Automatic Segmentation of MR Brain Images witha Convolutional Neural Network,” IEEE Transactions on MedicalImaging, vol. 35, no. 5, pp. 1252–1261, 2016. doi: 10.1109/TMI.2016.2548501.

[123] G. Partel, G. Milli, and C. Wählby, “Improving Recall of In SituSequencing by Self-Learned Features and a Graphical Model,”arXiv:1802.08894 [cs, q-bio], Feb. 2018, arXiv: 1802.08894. [On-line]. Available: http://arxiv.org/abs/1802.08894 (visitedon 05/16/2018).

https://doi.org/10.1109/TMI.2016.2548501

https://doi.org/10.1109/TMI.2016.2548501


Appendix A

Experiment F software configu-rations

A.1 Crops

The coordinates of the crops that are processed by the four blob detec-tion programs can be found in Table A.1.

A.2 MFB detector

The proposed blob detector in this thesis follows the blob detectionprocess described in 6.1. The top 10 best features from Table 7.1 are ex-tracted for each input image. The features are not compressed. A deci-sion tree classifier is used to classify the pixels based on these features.Then an agglomerative clustering algorithm relying on inter-centroiddistance is used to group the blob pixels into blobs. Finally the blobssmaller than 4 pixels are filtered out and the blob centroids are calcu-lated.

95

96 APPENDIX A. EXPERIMENT F SOFTWARE CONFIGURATIONS

x y

10462 633111060 569812158 452115579 474117481 1349220019 1323621200 78833729 20726315 88518038 12185

Table A.1: Locations in pixels of the random crops of the test image.All the crops are of size 500× 500× 16 pixels.

A.3 FIJI

ImageJ macro, executed with FIJI (ImageJ 1.51s) for each input imagewith Process -> Batch -> Macro...:

name=getTitle;run("Smooth (3D)", "method=Gaussian sigma=1.000 use");run("3D Fast Filters","filter=TopHat radius_x_pix=2.0 radius_y_pix

=2.0 radius_z_pix=1.0 Nb_cpus=8");run("Make Binary", "method=MaxEntropy background=Default");run("3D Fill Holes");run("3D Maxima Finder", "radiusxy=1.50 radiusz=0.5 noise=100");run("3D Watershed Split", "binary=3D_TopHat seeds=peaks radius=1")

;run("3D object counter...", "threshold=100 slice=8 min.=4 max

.=4000000 statistics");filename = name + "_blobs.csv"saveAs("Results", "D:\\Single\\my-first-blobs\\analysis\\F\\fiji

\\" + filename);

APPENDIX A. EXPERIMENT F SOFTWARE CONFIGURATIONS 97

A.4 CellProfiler

CellProfiler pipeline, executed with CellProfiler 3.0.0 for each input im-age:

CellProfiler Pipeline: http://www.cellprofiler.orgVersion:3DateRevision:300GitHash:ModuleCount:13HasImagePlaneDetails:False

Images:[module_num:1|svn_version:\'Unknown\'|variable_revision_number:2|show_window:False|notes:\x5B\'Tobegin creating your project, use the Images module to compilea list of files and/or folders that you want to analyze. Youcan also specify a set of rules to include only the desiredfiles in your selected folders.\'\x5D|batch_state:array(\x5B\x5D, dtype=uint8)|enabled:True|wants_pause:False]:Filter images?:Images onlySelect the rule criteria:and (extension does isimage) (directory doesnot containregexp "\x5B\\\\\\\\\\\\\\\\/\x5D\\\\\\\\.")

Metadata:[module_num:2|svn_version:\'Unknown\'|variable_revision_number:4|show_window:False|notes:\x5B\'TheMetadata module optionally allows you to extract informationdescribing your images (i.e, metadata) which will be storedalong with your measurements. This information can becontained in the file name and/or location, or in an externalfile.\'\x5D|batch_state:array(\x5B\x5D, dtype=uint8)|enabled:True|wants_pause:False]Extract metadata?:YesMetadata data type:TextMetadata types:{}Extraction method count:1Metadata extraction method:Extract from file/folder namesMetadata source:File nameRegular expression to extract from file name:(?P<filename>.*)Regular expression to extract from folder name:(?P<Date>\x5B0-9\x5D{4}_\x5B0-9\x5D{2}_\x5B0-9\x5D{2})$Extract metadata from:All imagesSelect the filtering criteria:and (file does contain "")


Metadata file location:Match file and image metadata:\x5B\x5DUse case insensitive matching?:No

NamesAndTypes:[module_num:3|svn_version:\'Unknown\'|variable_revision_number:8|show_window:False|notes:\x5B\'TheNamesAndTypes module allows you to assign a meaningful name toeach image by which other modules will refer to it.\'\x5D|batch_state:array(\x5B\x5D, dtype=uint8)|enabled:True|wants_pause:False]Assign a name to:All imagesSelect the image type:Grayscale imageName to assign these images:inputMatch metadata:\x5B\x5DImage set matching method:OrderSet intensity range from:Image metadataAssignments count:1Single images count:0Maximum intensity:255.0Process as 3D?:YesRelative pixel spacing in X:1.0Relative pixel spacing in Y:1.0Relative pixel spacing in Z:3.7Select the rule criteria:and (file does contain "")Name to assign these images:DNAName to assign these objects:CellSelect the image type:Grayscale imageSet intensity range from:Image metadataMaximum intensity:255.0

Groups:[module_num:4|svn_version:\'Unknown\'|variable_revision_number:2|show_window:False|notes:\x5B\'TheGroups module optionally allows you to split your list ofimages into image subsets (groups) which will be processedindependently of each other. Examples of groupings includescreening batches, microtiter plates, time-lapse movies, etc.\'\x5D|batch_state:array(\x5B\x5D, dtype=uint8)|enabled:True|wants_pause:False]Do you want to group your images?:Nogrouping metadata count:1Metadata category:None

GaussianFilter:[module_num:5|svn_version:\'Unknown\'|variable_revision_number:1|show_window:False|notes:\x5B\x5D|batch_state:array(\x5B\x5D, dtype=uint8)|enabled:True|wants_pause:False]


Select the input image:inputName the output image:gaussian_filteredSigma:0.3

EnhanceOrSuppressFeatures:[module_num:6|svn_version:\'Unknown\'|variable_revision_number:6|show_window:False|notes:\x5B\x5D|batch_state:array(\x5B\x5D, dtype=uint8)|enabled:True|wants_pause:False]Select the input image:gaussian_filteredName the output image:enhancedSelect the operation:EnhanceFeature size:12Feature type:SpecklesRange of hole sizes:1,10Smoothing scale:2.0Shear angle:0.0Decay:0.95Enhancement method:TubenessSpeed and accuracy:Fast

Threshold:[module_num:7|svn_version:\'Unknown\'|variable_revision_number:10|show_window:False|notes:\x5B\x5D|batch_state:array(\x5B\x5D, dtype=uint8)|enabled:True|wants_pause:False]Select the input image:enhancedName the output image:thresholdedThreshold strategy:GlobalThresholding method:ManualThreshold smoothing scale:0.0Threshold correction factor:1.0Lower and upper bounds on threshold:0.0,1.0Manual threshold:0.10Select the measurement to threshold with:NoneTwo-class or three-class thresholding?:Two classesAssign pixels in the middle intensity class to the foregroundor the background?:ForegroundSize of adaptive window:50Lower outlier fraction:0.05Upper outlier fraction:0.05Averaging method:MeanVariance method:Standard deviation# of deviations:2.0Thresholding method:Otsu

RemoveHoles:[module_num:8|svn_version:\'Unknown\'|variable_revision_number:1|show_window:False|notes:\x5B\x5D|


batch_state:array(\x5B\x5D, dtype=uint8)|enabled:True|wants_pause:False]Select the input image:thresholdedName the output image:removed_holesSize:1.0

Watershed:[module_num:9|svn_version:\'Unknown\'|variable_revision_number:1|show_window:False|notes:\x5B\x5D|batch_state:array(\x5B\x5D, dtype=uint8)|enabled:True|wants_pause:False]Select the input image:removed_holesName the output object:watershedGenerate from:DistanceMarkers:NoneMask:Leave blankConnectivity:8Downsample:1

MeasureObjectSizeShape:[module_num:10|svn_version:\'Unknown\'|variable_revision_number:1|show_window:False|notes:\x5B\x5D|batch_state:array(\x5B\x5D, dtype=uint8)|enabled:True|wants_pause:False]Select objects to measure:watershedCalculate the Zernike features?:No

FilterObjects:[module_num:11|svn_version:\'Unknown\'|variable_revision_number:8|show_window:False|notes:\x5B\x5D|batch_state:array(\x5B\x5D, dtype=uint8)|enabled:True|wants_pause:False]Select the objects to filter:watershedName the output objects:filtered_objectsSelect the filtering mode:MeasurementsSelect the filtering method:LimitsSelect the objects that contain the filtered objects:NoneSelect the location of the rules or classifier file:Elsewhere...\x7CRules or classifier file name:rules.txtClass number:1Measurement count:1Additional object count:0Assign overlapping child to:Both parentsSelect the measurement to filter by:AreaShape_AreaFilter using a minimum measurement value?:YesMinimum value:4Filter using a maximum measurement value?:YesMaximum value:1000


MeasureObjectSizeShape:[module_num:12|svn_version:\'Unknown\'|variable_revision_number:1|show_window:False|notes:\x5B\x5D|batch_state:array(\x5B\x5D, dtype=uint8)|enabled:True|wants_pause:False]Select objects to measure:filtered_objectsCalculate the Zernike features?:No

ExportToSpreadsheet:[module_num:13|svn_version:\'Unknown\'|variable_revision_number:12|show_window:False|notes:\x5B\x5D|batch_state:array(\x5B\x5D, dtype=uint8)|enabled:True|wants_pause:False]Select the column delimiter:Comma (",")Add image metadata columns to your object data file?:YesSelect the measurements to export:NoCalculate the per-image mean values for object measurements?:NoCalculate the per-image median values for object measurements?:NoCalculate the per-image standard deviation values for objectmeasurements?:NoOutput file location:Elsewhere...\x7CD\x3A\\\\\\\\Single\\\\\\\\my-first-blobs\\\\\\\\analysis\\\\\\\\F\\\\\\\\cellprofilerCreate a GenePattern GCT file?:NoSelect source of sample row name:MetadataSelect the image to use as the identifier:NoneSelect the metadata to use as the identifier:NoneExport all measurement types?:NoPress button to select measurements:filtered_objects\x7CAreaShape_Area,filtered_objects\x7CAreaShape_MeanRadius,Image\x7CCount_filtered_objects,Image\x7CExecutionTime_01Images,Image\x7CExecutionTime_04Groups,Image\x7CExecutionTime_02Metadata,Image\x7CExecutionTime_11FilterObjects,Image\x7CExecutionTime_03NamesAndTypes,Image\x7CExecutionTime_07Threshold,Image\x7CExecutionTime_08RemoveHoles,Image\x7CExecutionTime_05GaussianFilter,Image\x7CExecutionTime_09Watershed,Image\x7CExecutionTime_06EnhanceOrSuppressFeatures,Image\x7CExecutionTime_10MeasureObjectSizeShape,Image\x7CFileName_input,Experiment\x7CModification_Timestamp,Experiment\x7CRun_TimestampRepresentation of Nan/Inf:NaNAdd a prefix to file names?:No


Figure A.1: List of input images.

Filename prefix:MyExpt_Overwrite existing files without warning?:YesData to export:filtered_objectsCombine these object measurements with those of the previousobject?:NoFile name:blobs.csvUse the object name for the file name?:No

A.5 Ilastik

The parameters of the Ilastik Pixel Classification + Object Classificationproject, executed with Ilastik 1.3.0 can be found in the Figures A.1, A.2,A.3, A.4, A.5 and A.6.


Figure A.2: Selected pixel features based on the found best features inTable 7.1.

Figure A.3: Labels in the Training step. Label 1 denotes non-blob pixelsand Label 2 denotes blob pixels. Of both around 50 example pixels wereindicated.


Figure A.4: Parameters of the Thresholding step.

Figure A.5: Parameters of the Object Feature Selection step. Only thesize feature was selected.


Figure A.6: Labels in the Object Classification step. The labels are irrel-evant because object classification is not part of blob detection. How-ever, two labels were needed in order to export the blobs.

TRITA TRITA-EECS-EX-2018:125

ISSN 1653-5146

www.kth.se

Documents

Machine learning for blob detection in high-resolution 3D