13
Future Generation Computer Systems 43–44 (2015) 61–73 Contents lists available at ScienceDirect Future Generation Computer Systems journal homepage: www.elsevier.com/locate/fgcs Accessing medical image file with co-allocation HDFS in cloud Chao-Tung Yang a,, Wen-Chung Shih b , Lung-Teng Chen a , Cheng-Ta Kuo a , Fuu-Cheng Jiang a , Fang-Yie Leu a a Department of Computer Science, Tunghai University, Taichung, 40704, Taiwan ROC b Department of Applied Informatics and Multimedia, Asia University, Taichung, 41354, Taiwan ROC highlights The motivation of this paper is to attempt to resolve the problems of storing and sharing electronic medical records and medical images between different hospitals. Specifically, this study develops a Medical Image File Accessing System (MIFAS) based on HDFS of Hadoop in cloud. The proposed system can improve medical imaging storage, transmission stability, and reliability while providing an easy-to-operate management interface. This paper focuses on the cloud storage virtualization technology to achieve high-availability services. The experimental results show that the high reliability data storage clustering and fault tolerance capabilities can be achieved. article info Article history: Received 31 December 2013 Received in revised form 31 July 2014 Accepted 15 August 2014 Available online 28 September 2014 Keywords: EMR PACS Hadoop HDFS Co-allocation Cloud computing abstract Patient privacy has recently become the most important issue in the World Health Organization (WHO) and the United States and Europe. However, inter-hospital medical information is currently shared using paper-based operations, and this is an important research issue for the complete and immediate exchange of electronic medical records to avoid duplicate prescriptions or procedures. An electronic medical record (EMR) is a computerized medical record created by a care-giving organization, such as a hospital and doctor’s surgery. Using electronic medical records can improve patient’s privacy and health care efficiency. Although there are many advantages to electronic medical records, the problem of exchanging and sharing medical images remains to be solved. The motivation of this paper is to attempt to resolve the problems of storing and sharing electronic medical records and medical images between different hospitals. Cloud Computing is enabled by the existing parallel and distributed technology, which provides computing, storage and software services to users. Specifically, this study develops a Medical Image File Accessing System (MIFAS) based on HDFS of Hadoop in cloud. The proposed system can improve medical imaging storage, transmission stability, and reliability while providing an easy-to-operate management interface. This paper focuses on the cloud storage virtualization technology to achieve high-availability services. We have designed and implemented a medical imaging system with a distributed file system. The experimental results show that the high reliability data storage clustering and fault tolerance capabilities can be achieved. © 2014 Elsevier B.V. All rights reserved. This work is sponsored by Tunghai University the U-Care ICT Integration Platform for the Elderly, No. 103GREEnS004-2, Aug. 2014. This work was supported in part by the Ministry of Science and Technology, Taiwan ROC, under grant numbers MOST 101-2218-E-029-004 and MOST 102-2218-E-029-002. Corresponding author. E-mail addresses: [email protected], [email protected] (C.-T. Yang), [email protected] (W.-C. Shih), [email protected] (L.-T. Chen), [email protected] (C.-T. Kuo), [email protected] (F.-C. Jiang), [email protected] (F.-Y. Leu). 1. Introduction Medical records are important documents storing patient health care data and information. And it works for production of records to all medical institutions that performing clinical team in the business. Although most hospitals have established computerized medical record systems, they still keep written records stored on paper. This practice creates many problems, including the cost of space management, time-consuming human access and transmission, difficultly in backing up data, and increasing paper costs and waste. http://dx.doi.org/10.1016/j.future.2014.08.008 0167-739X/© 2014 Elsevier B.V. All rights reserved.

Accessing medical image file with co-allocation HDFS in cloud

Embed Size (px)

Citation preview

Page 1: Accessing medical image file with co-allocation HDFS in cloud

Future Generation Computer Systems 43–44 (2015) 61–73

Contents lists available at ScienceDirect

Future Generation Computer Systems

journal homepage: www.elsevier.com/locate/fgcs

Accessing medical image file with co-allocation HDFS in cloud✩

Chao-Tung Yang a,∗, Wen-Chung Shih b, Lung-Teng Chen a, Cheng-Ta Kuo a,Fuu-Cheng Jiang a, Fang-Yie Leu a

a Department of Computer Science, Tunghai University, Taichung, 40704, Taiwan ROCb Department of Applied Informatics and Multimedia, Asia University, Taichung, 41354, Taiwan ROC

h i g h l i g h t s

• The motivation of this paper is to attempt to resolve the problems of storing and sharing electronic medical records and medical images betweendifferent hospitals.

• Specifically, this study develops a Medical Image File Accessing System (MIFAS) based on HDFS of Hadoop in cloud.• The proposed system can improve medical imaging storage, transmission stability, and reliability while providing an easy-to-operate management

interface.• This paper focuses on the cloud storage virtualization technology to achieve high-availability services.• The experimental results show that the high reliability data storage clustering and fault tolerance capabilities can be achieved.

a r t i c l e i n f o

Article history:Received 31 December 2013Received in revised form31 July 2014Accepted 15 August 2014Available online 28 September 2014

Keywords:EMRPACSHadoopHDFSCo-allocationCloud computing

a b s t r a c t

Patient privacy has recently become the most important issue in the World Health Organization (WHO)and the United States and Europe. However, inter-hospital medical information is currently shared usingpaper-based operations, and this is an important research issue for the complete and immediate exchangeof electronic medical records to avoid duplicate prescriptions or procedures. An electronic medicalrecord (EMR) is a computerized medical record created by a care-giving organization, such as a hospitaland doctor’s surgery. Using electronic medical records can improve patient’s privacy and health careefficiency. Although there are many advantages to electronic medical records, the problem of exchangingand sharing medical images remains to be solved. The motivation of this paper is to attempt to resolvethe problems of storing and sharing electronic medical records and medical images between differenthospitals. Cloud Computing is enabled by the existing parallel and distributed technology, which providescomputing, storage and software services to users. Specifically, this study develops a Medical Image FileAccessing System (MIFAS) based on HDFS of Hadoop in cloud. The proposed system can improve medicalimaging storage, transmission stability, and reliability while providing an easy-to-operate managementinterface. This paper focuses on the cloud storage virtualization technology to achieve high-availabilityservices.Wehave designed and implemented amedical imaging systemwith a distributed file system. Theexperimental results show that the high reliability data storage clustering and fault tolerance capabilitiescan be achieved.

© 2014 Elsevier B.V. All rights reserved.

✩ This work is sponsored by Tunghai University the U-Care ICT IntegrationPlatform for the Elderly, No. 103GREEnS004-2, Aug. 2014. This work was supportedin part by the Ministry of Science and Technology, Taiwan ROC, under grantnumbers MOST 101-2218-E-029-004 and MOST 102-2218-E-029-002.∗ Corresponding author.

E-mail addresses: [email protected], [email protected] (C.-T. Yang),[email protected] (W.-C. Shih), [email protected] (L.-T. Chen),[email protected] (C.-T. Kuo), [email protected] (F.-C. Jiang),[email protected] (F.-Y. Leu).

http://dx.doi.org/10.1016/j.future.2014.08.0080167-739X/© 2014 Elsevier B.V. All rights reserved.

1. Introduction

Medical records are important documents storing patienthealth care data and information. And it works for productionof records to all medical institutions that performing clinicalteam in the business. Although most hospitals have establishedcomputerized medical record systems, they still keep writtenrecords stored on paper. This practice creates many problems,including the cost of space management, time-consuming humanaccess and transmission, difficultly in backing up data, andincreasing paper costs and waste.

Page 2: Accessing medical image file with co-allocation HDFS in cloud

62 C.-T. Yang et al. / Future Generation Computer Systems 43–44 (2015) 61–73

An Electronic Medical Records (EMR) system refers to a paper-less, digital, and computerized system of maintaining patient data.An EMR system is designed to increase efficiency and reduce doc-umentation errors by streamlining the process. Implementing anEMR system is a complex, expensive investment that has created ademand for Healthcare IT professionals and accounts for a growingsegment of the healthcare workforce [1]. According to the Med-ical Records Institute, an EMR system includes Automated Medi-cal Records, A Computerized Medical Record Provider, ElectronicMedical Records, Electronic Patient Records, and Electronic HealthRecords [1,2].

Using EMR can protect patients’ privacy and reduce paperusage and management of medical problems. This helps improvethe efficiency and financial management of medical institutions,integrates the resources of medical institutions, provides mutualsupport management and decision-making systems, and ensuresthe cross-organizational sharing of resources. As a result EMR isbecoming increasingly popular.

It is worth noting that while storage cost per Terabyte is de-clining, the overall cost to manage storage is growing. The com-monmisperception is that storage is inexpensive, but the reality isthat storage volumes continue to grow faster than hardware pricedeclines. Using of cloud computing promises to reduce costs, highscalability, availability and disaster recoverability, canwe solve thelong-term face the problem of medical image archive [3,1,4–6].

Building cloud computing in this large-scale parallel computingcluster is growing with thousands of processors [7,8,3,6,9–21]. Insuch a large number of compute nodes, faults are becoming com-mon place. Current virtualization fault tolerance response plan, fo-cusing on recovery failure, usually relies on a checkpoint/restartmechanism. However, in today’s systems, node failures can of-ten be predicted by detecting the deterioration of health condi-tions [22–33].

The need for medical image retrieval and replication is increas-ing, as hospitals must handle the increasing number of medicalphotographs and images. However, the increase in high-qualityimaging devices is currently outpacing the related infrastructure.Current picture archiving and communication systems (PACS) areunable to provide efficient query response services. It is difficultto sustain huge queries and file retrievals with limited bandwidth.Conventional access strategies and bandwidth restrict the qualityof communication in the Web PACS network for exchanging anddownloading a large amount of images. To enhance the quality ofmedical treatment,medical imaging requires an efficient file trans-fer strategy to achieve high-speed access. This paper tries to solvethe EMR issues of exchanging, storing and sharing medical images.

Based on the ‘‘Medical Images Exchange’’ concept of EMR, thisstudy presents a medical image file accessing system (MIFAS) builton the Hadoop platform [34] to solve problems of exchanging,storing, and sharing medical images. This study also presents anew strategy for inspecting the medical image that involves a co-allocation mechanism in a cloud environment. The Hadoop plat-form and proposed co-allocation mechanism establish a cloudenvironment for MIFAS. MIFAS helps users retrieve, share, andstore medical images between different hospitals. The remainderof this paper is organized as follows. Section presents a backgroundreview. Section 3 introduces the system architecture. Section 4presents experimental results. Finally, Section 5 concludes thearticle.

2. Background

2.1. Challenges in medical image exchanging

For over a decade, most hospitals and private radiology prac-tices have transformed from film-based image management sys-tems to fully digital (filmless and paperless) environments but

subtly dissimilar (in concept only) to convert from a paper medi-cal chart to a health electronic record (HER). Film and film librarieshave given ways to modern picture archiving and communicationsystems (PACS). These systems offer highly redundant archivesthat tightly integratewith historical patientmetadata derived fromradiology information systems. These systems are more efficientthan film and paper, and are more secure because they incorpo-rate safeguards to limit access and sophisticate auditing systemsto track scanned data. Although radiologists prefer efficient accessto the comprehensive imaging records of their patients, there areno reliable methods to discover or obtain access to similar recordsthat might be stored elsewhere [31,32,35].

A literature review reveals that there are very few cloud-basedmedical image implementations. However, previous researchdiscusses the benefits of cloud-based medical images, includingscalability, cost effectiveness, and replication [29]. The same studypresented a Hadoop-based PACS system, but failed to provide anappropriate management interface.

2.2. Hadoop and HDFS

Hadoop is one of the most salient pieces of the data-miningrenaissance and offers the ability to tackle large datasets in waysthat were not previously possible due to time and cost constraints.It is a part of the Apache Software Foundation and is beingbuilt by a global community of contributors. The Hadoop projectpromotes the development of open-source software and suppliesa framework for the development of highly scalable distributedcomputing applications [30,36].

Hadoop is the top-leveled project in Apache Software Founda-tion and it supports the development of open source software [34].Hadoop provides a framework for developing highly scalable dis-tributed applications. The developer focuses on applying logic todatasets instead of processing details. The Hadoop DistributedFile System (HDFS) stores large files on multiple machines. Thisapproach achieves reliability by replicating data across multiplehosts, and hence does not require RAID storage on hosts. The HDFSis built from a cluster of data nodes, each of which serves blocksof data over the network using a block protocol. The HDFS alsoserves the data over HTTP, allowing access to all content from aweb browser or other client. Data nodes can connect to each otherto rebalance data, move copies around, and ensure high replica-tion of data. A file system requires one unique server, or namenode. This is a single point of failure for any HDFS installation. Ifthe name node goes down, the file system will go off-line. Whenit comes back up, the name node must replay all outstanding op-erations. This replay process can take over half an hour for a largecluster [37].

2.3. PACS

PACS is an acronym that stands for Picture Archiving andCommunication System. PACS has revolutionized the field ofradiology, which now consists of all digital, computer-generatedimages as opposed to the analog film of yesteryear. Analog filmtook up space and time for filing and retrieval and storage, andwasprone to being lost or misfiled. PACS saves time and money, andreduces the liability caused by filing errors and lost films. A PACSconsists of four major components: imaging modalities such as CTand MRI, a secure network for transmitting patient information,workstations for interpreting and reviewing images, and archivesfor storing and retrieving images and reports. Combined withavailable and emerging Web technology, PACS can deliver timelyand efficient access to images, interpretations, and related data.PACS breaks down the physical and time barriers associated withtraditional film-based image retrieval, distribution, and display.

Page 3: Accessing medical image file with co-allocation HDFS in cloud

C.-T. Yang et al. / Future Generation Computer Systems 43–44 (2015) 61–73 63

PACS is primarily responsible for the inception of virtual radiology,as images can now be viewed from across town, or even fromaround the world. PACS also acts as a digital filing system to storepatients’ images in an organized way that enables records to beretrieved with ease as needed for future reference.

2.4. DICOM

DICOM is short for Digital Imaging and Communications inMedicine, a standard in the field of medical informatics for ex-changing digital information betweenmedical imaging equipmentand other systems, ensuring interoperability. The standard speci-fies a set of protocols for devices communicating over a networkthe syntax and semantics of commands and associated informationthat can be exchanged using these protocols a set of media storageservices and devices claiming conformance to the standard, as wellas a file format and amedical directory structure to facilitate accessto the images and related information stored on media that shareinformation. The standard was developed jointly by the AmericanCollege of Radiology (ACR) and National Electrical ManufacturersAssociation (NEMA) as an extension of an earlier standard for ex-changing medical imaging data that did not include provisions fornetworking or offline media formats [38].

DICOM integrates scanners, servers, workstations, printers, andnetwork hardware from multiple manufacturers into a picturearchiving and communication system (PACS). These differentdevices come with DICOM conformance statements that clearlystate the DICOM classes they support. DICOM has been widelyadoptedbyhospitals and ismaking inroads in dentists’ anddoctors’offices. DICOM differs from some, but not all, data formats in thatit groups information into datasets. That means that a file of achest X-ray image actually contains the patient ID within the file,so that the image can never be separated from this information bymistake. This is similar to the way that image formats such as JPEGcan also have embedded tags to identify and otherwise describethe image.

A DICOM data object consists of a number of attributes, suchas its name and ID, and one special attribute containing the imagepixel data (i.e., logically, the main object has no ‘‘header’’ as such—merely a list of attributes, including the pixel data). The same ba-sic format is used for all applications, including network and fileusage, but when written to a file, usually a true ‘‘header’’ (con-taining copies of a few key attributes and details of the applica-tion which wrote it) is added. A single DICOM object can onlycontain one attribute containing pixel data. For many modalities,this corresponds to a single image. However, the attribute maycontain multiple ‘‘frames’’, which makes it possible to store cineloops or other multi-frame data. Another example is NM data,where an NM image by definition is a multi-dimensional multi-frame image. In these cases, three- or four-dimensional data canbe encapsulated in a single DICOM object. Pixel data can be com-pressed using a variety of standards, including JPEG, JPEG Lossless,JPEG 2000, and Run-length encoding (RLE). LZW (zip) compressioncan be used for the whole dataset (not just the pixel data), but thisis rarely implemented.

2.5. Co-allocation mechanism

A co-allocation architecture enables parallel downloading fromdata nodes. It can also speed up downloads and overcome networkfaults. A previously proposed architecture [13] consists of threemain components: an information service, a broker/co-allocator,and local storage systems. Co-allocation of data transfers is an ex-tension of the basic template for resourcemanagement [5]. Variousapplications specify the characteristics of the desired data and pass

attribute descriptions to a broker. The broker searches for avail-able resources, obtains replica locations from the Information Ser-vice [4] and Replica Management Service [10], and then obtainsthe lists of physical file locations. We have implemented the fol-lowing eight co-allocation schemes: Brute-Force (Brute), History-based (History), Conservative Load Balancing (Conservative),Aggressive Load Balancing (Aggressive), Dynamic Co-allocationwith Duplicate Assignments (DCDA), Recursively-Adjusting Mech-anism (RAM), Dynamic Adjustment Strategy (DAS), and Anticipa-tive Recursively-Adjusting Mechanism (ARAM) [23–25].

2.6. Related works

HDFS servers (i.e., DataNodes) and traditional streaming mediaservers are both used to support client applications that have ac-cess patterns characterized by long sequential reads andwrites. Assuch, both systems are designed to favor high storage bandwidthover low access latency [39].

‘‘Cloud computing’’ has recently become a hot topic in this field.S. Sagayaraj [29] proposed that Apache Hadoop is a frameworkfor running applications on large clusters of commodity hardware.The Hadoop framework transparently provides reliability anddata motion. Hadoop implements a Map/Reduce computationalparadigm that divides an application into many small fragmentsof work. Each fragment of work may be executed or re-executedon any node in the cluster. Thus, simply replacing the PACS Serverwith Hadoop Framework can produce a good, scalable, and costeffective tool for health care system imaging. The same studypresents an HPACS system, but fails to provide an appropriatemanagement interface.

J. Shafer, et al.’s [40] study ‘‘The Hadoop distributed filesystem:Balancing portability and performance’’, was favorably received inthis field. The poor performance of HDFS can be attributed to chal-lenges in maintaining portability, including disk scheduling underconcurrent workloads, file system allocation, and file system pagecache overhead. Using application-level I/O scheduling can signif-icantly improve HDFS performance under concurrent workloadswhile preserving portability. Further improvements by reducingfragmentation and cache overhead are also possible, at the expenseof reducing portability. However, maintaining Hadoop portabilitywhenever possible will simplify development and benefit users byreducing installation complexity, thus encouraging the spread ofthis parallel computing paradigm.

Previous research [11,13,22,24,25,27] shows that co-allocationcan solve the data transfer problem in grid environments; theseresults are categorized: RAM [12,24,22], DAS, [27,13] and ARAM[28,23]. This concept is the foundation for the current study. How-ever, the significant difference between this paper and previousworks is that the proposed system is implemented on a cloud en-vironment.

3. System design and implementation

Fig. 1 illustrates the current overview of a distribution file sys-tem. The MIFAS has three HDFS groups. The first group, THU1,and the second group, THU2, are both hosted at Tunghai Univer-sity. The third group is hosted at Chung Shan Medical University(CSMU) Hospital. All of the groups are connected with a 100 Mbpsnetwork bandwidth in a TANET (Taiwan Academic Network) net-work environment. The HDFS group number can be very flexible.The minimum is one but the maximum can be many. More HDFSgroups increase the amount of duplication available. This meansthat the PCAS images source is from the HDFS. Thus, if we increasethe source number (i.e., build more HDFS group), the effects willdefinitely be different based on source numbers.

Page 4: Accessing medical image file with co-allocation HDFS in cloud

64 C.-T. Yang et al. / Future Generation Computer Systems 43–44 (2015) 61–73

Fig. 1. Overview of distribution file system.

Fig. 2. System architecture of MIFAS.

3.1. System architecture

Fig. 2 shows the proposed MIFAS, based on a cloud environ-ment. The distribution file systemwas built onHDFS of Hadoop en-vironment (Section 2.2). This Hadoop platform can be described asPaaS (Platform as a Service), effectively extending SaaS (Softwareas Service) to platforms. The top level of MIFAS consists of a web-based interface. MIFAS provides a user-friendly interface for medi-cal image queries. IaaS: in our previouswork, we used OpenNebulato manage our VMs [29]. In order to achieve our goal, we can mi-grate the servers into OpenNebula environment. And it is also a keyfeature to develop the VFT approach on virtualization [41].

Middleware: this mechanism handles the transmission issuesinMIFAS and is calledMIFASMiddleware in this study. TheMiddle-ware’s purpose is to assign and acquire the best transmission pathto the distributed file system. This Middleware also collects nec-essary information, such as bandwidth between server and server,the server utilization rate, and network efficiency. The informationprovided entirety MIFAS Co-allocation Distributed Files System todetermine the best solution of downloading allocation jobs. Thesoftware components of MIFAS are shown in Fig. 3.

Information service: to obtained analysis in the status of host.The MIFAS Middleware fetches the information of hosts, called

Fig. 3. The software components of MIFAS.

information service. The experiments in this study installed theGanglia [42] in each Hadoop node to determine the real-time statefrom all members. Therefore, we could get the best strategy oftransmission data from information service, which is one of thecomponents of MIFAS Middleware.

Co-allocation: as mentioned in Section 2.5, the proposed co-allocation mechanism allows parallel downloading from datanodes, speeds up downloading, and solves network problems. Dueto user using MIFAS to access Medical Images, the co-allocationwill be enabled automatically. To realize parallel downloadingapproaches, the system splits those files into different parts andobtains data from different clouds, depending on their status. Thisallows the best downloading strategy. Earlier research [11,13,22]adopts the same co-allocation mechanism.

Replication location service: the experiments in this studyinvolved three HDFS groups in different locations, and each HDFSowned a certain amount of data nodes. The replication locationservice means that the service automatically makes duplicates

Page 5: Accessing medical image file with co-allocation HDFS in cloud

C.-T. Yang et al. / Future Generation Computer Systems 43–44 (2015) 61–73 65

Fig. 4. System workflow of MIFAS.

Fig. 5. Authorization interface.

from one private cloud to one another when medical images areuploaded to MIFAS.

3.2. System workflow

Fig. 4 depicts workflow of MIFAS operations. This study ana-lyzes a real system to determine the performance of the proposedapproach. First, users input the username and password for au-thentication. Second, users input search terms to query patientinformation. Third, users can also view patients’ medical images.Fourth, users can configure in MIFAS. Fifth, if users can present theMIFAS downing mechanism, the Middleware is compatible withMIFAS.

3.3. System interface

MIFAS offers an authentication interface (Fig. 5) that usersmustpass before logging in to the MIFAS. After passing the validation,users will see the MIFAS Portal (Fig. 6). Fig. 6 shows that there arethree main functional blocks, each with its functional purpose.

Examination Type: this is the catalog of Medical ExaminationType. Medical images can come from various medical imaginginstruments, including ultrasound (US), magnetic resonance (MR),positron emission tomography (PET), computed tomography (CT),endoscopy (ENDO), mammograms (MG), direct radiography (DR),

computed radiography (CR), etc. The examination type is catalogeddepending on the DICOM definition.

Filter: this block provides a search function; users can obtainpatient information through inputted keyword. In a web-based in-terface system, it is easy to achieve this goal. Users can capture anyinformation that theywant using filters. Thus, like other system onthe internet, MIFAS providesmultidimensional information for theusers. There are fourmain options in the filter function: ‘‘Chart NO’’(Examination No), ‘‘Patient Name’’, ‘‘Start of Examination Date’’,and ‘‘End of Examination Date’’.

Patient information list: This block shows detailed informa-tion according to the search condition of Block B and Block A. Thisblock also possesses several important functions.

Fig. 7 shows the functional items, which include the ImageStatus, Thumbnail viewer, PACS Reporting, and Download File.Fig. 8 displays the information of file distribution status, includingphotographic description and photographic catalog. Fig. 9 showsthe thumbnail viewer function, which displays a thumbnail ofthe medical image and examination report. Fig. 10 shows PACSReporting, which is a detailed patient medical record. For moredetail of Medical Images, we can through Download File functionthen utilize other professional DICOM viewer. Regarding how toupload medical images to MIFAS, see Fig. 11. According to ourthesis, the Replication Location Service will duplicate images toeach cloud. Finally the 4th icon in Fig. 7 is for downloading DICOM

Page 6: Accessing medical image file with co-allocation HDFS in cloud

66 C.-T. Yang et al. / Future Generation Computer Systems 43–44 (2015) 61–73

Fig. 6. Portal of the MIFAS system.

Fig. 7. MIFAS functions.

Fig. 8. File status.

Fig. 9. Medical image preview.

Page 7: Accessing medical image file with co-allocation HDFS in cloud

C.-T. Yang et al. / Future Generation Computer Systems 43–44 (2015) 61–73 67

Fig. 10. Patient records.

Fig. 11. Upload medical images.

Fig. 12. The node summary of the cloud test-bed.

format medical images from MIFAS. MIFAS uses a co-allocationmechanism to allocate files through an optimal strategy.

4. Experimental and results

4.1. Environments

In this section, wewill compare the performance of MIFAS withPACS. A cloud computing test-bed, consisting of three distributedfile systems, has been built by the High-performance Computinglaboratory of computer science department in Tunghai University,using Hadoop. The summary of the Hadoop nodes is shown inFig. 12. Fig. 13 shows the topology of three PACS systems used in

Chung Shan Medical University Hospital. It is a production PACSin CSMU and there are total three PACS systems: CSMU, CSUMChung Kang Branch Hospital, and CSUM Tai-Yuan Branch Hospital.Each PACS system has a synchronization mechanism. The networkbandwidth between each PACS is under 100 Mbps as listed inTable 1.

4.2. Compare proximal images retrieval times from PCSA and MIFAS

This experiment analyzes the same medical images as Table 2in PCAS andMIFAS. Fig. 14 depicts the general interaction betweenthe PACS vs. MIFAS. This illustration shows the experiment 1environment. All the nodes are under a good situation, and thendownloaded the same files fromeachPACS andMIFAS. Thepurpose

Page 8: Accessing medical image file with co-allocation HDFS in cloud

68 C.-T. Yang et al. / Future Generation Computer Systems 43–44 (2015) 61–73

Fig. 13. The topology of three PACS systems in Chung Shan Medical University Hospital.

Fig. 14. System workflow of MIFAS.

Table 1Bandwidth of PACS in CSMU.

End-to-end transmission rates (Mbps) of PACS in CSMU

Node from Node to Bandwidth (Mbps)

CSMU Chung Kang branch 100CSMU Tai-Yuan branch 100Chung Kang branch Tai-Yuan branch 100

of this experiment is to compare the image retrieval times fromPCAS and MIFAS.

Fig. 15 shows the average results after testing each site 500times. Fig. 16 shows that downloading from the proximal site bemore time-consuming. The results of Figs. 15 and 16 show thatPACS has better retrieval efficiency than MIFAS. This is primarilybecause PACS was built on a high-performance server, and isalso a high cost medical image system. MIFAS cannot easily cross

Table 2Experimental medical images.

Image type Medical image attributePixel QTY. Total size (MB) Testing times

CR chest 2804×2931 1 7.1 500 timesA series of CT 512 × 512 389 65 500 times

this threshold. However, experiment 2 reveals the advantages ofMIFAS.

This study also compares PACS and MIFAS in terms of proximalfailure problems. Because MIFAS provides a co-allocation strategy,the single site failure issue does not affect MIFAS operation. Onthe other hand, if the PACS system encounters the same problem,the only one solution is to use another PACS site. Thus, experiment2 assumes that PACS in CSMU encounters system failure, and themedical staff must send medical images to another CSMU branch

Page 9: Accessing medical image file with co-allocation HDFS in cloud

C.-T. Yang et al. / Future Generation Computer Systems 43–44 (2015) 61–73 69

Fig. 15. Image retrieval times result under 500 times testing.

Fig. 16. Image retrieval times result in proximal.

Fig. 17. Hardware failures in both environments.

hospital. In the same situation, an HDFS group failure problemoccurs in MIFAS. Even if the MIFAS has only two left HDFS groups,users still can retrieve medical images from the other HDFS group.This experiment assumes that both failures occur at a proximal site.Fig. 17 illustrates these experiment results, where PACS in CSMU isnot available, but the MIFAS can still function well.

Fig. 18 depicts a hardware broken state in both systems. Obvi-ously, only one PACS is left in CSUM Tai-Yuan branch, but a strongcontrast could be seen in MIFAS. MIFAS still supports medical im-age transfers to users under a co-allocation mechanism. In otherwords, the benefit of MIFAS is that it conquers network/hardwarefailures.

PACS has its own synchronization mechanism. However, ifthere is a hardware or network broken in PACS, the only way is touse the survival site and try to reduce the resume time. The PACSsystem also has limitations in the number of concurrent users.

Fig. 18. Hardware broken in both environments.

Fig. 19. User face to PACS and MIFAS on WAN.

Unlike the MIFAS, it cannot distribute the workload from usersaccessing under the co-allocation mechanism. This section showsthat MIFAS can effectively reduce the single site failure, and theproblems of broken network and hardware.

4.3. Compare retrieve times from MIFAS and PACS cross hospitals

Because HDFS has good performance underWAN,we simulatedthe environment as shown in Fig. 19. Generally, a hospital PACSprovides a single download node with many potential problems,while the MIFAS is better suited for multi-user access on a WANdue to its flexible architecture. Fig. 20 simulates PACS and MIFASaccessing the same CR medical record (7.1 MB) with different usernumbers.MIFAS provides comparatively better performance as thenumber of user increases. In the same experiment, we tested itagain, and this time we enabled the co-allocation mechanism inMIFAS as shown in Fig. 21.

4.4. Using DRBD with heartbeat increase the reliability of MIFAS

This section describes the access HDFS in order to increase thereliability of MIFAS to THU1 HDFS in the experiment, in the NameNode failure of the system’s fault tolerance. Due to the system toview or download DICOM files are required in the HDFS’s Name

Page 10: Accessing medical image file with co-allocation HDFS in cloud

70 C.-T. Yang et al. / Future Generation Computer Systems 43–44 (2015) 61–73

Fig. 20. Retrieval CR image on both systems.

Fig. 21. Retrieval CR image on PACS and MIFAS (Co-allocation).

Node through to each Data Node to fetch files. Therefore, Namenode failure state, the node HDFS is invalid.

Service IP: stand for provide service channel to external users,users through this IP to access services. VM2 is the primary node,and VM1 is secondary. For the difference between primary andsecondary, please refer to Section 2. Debian1, Debian2 andDebian3are the hosts: VM2 is living on Debian1, and the secondary nodeVM1 is living on Debian2. The whole environment is as shown inFig. 22.

In Fig. 23, shutting down Debian1 does not cause the ServiceIP to stop providing service. You can see the connection status ofService IP in the bottom of Fig. 23; it lost only one pack. The reasonis under our VFT mechanism, the secondary node is replaced withthe primary node immediately. In HA speak it is called FAILOVER.It is a good practice of the HAmechanism. And VFT also boot on theVM2 to on-line host Debian3. In this case, we can show that VFT isa good solution to solve the HA problem on virtualization. The GUIis shown in Fig. 24.

Fig. 23. Shutdown host and VM2.

Fig. 24. The GUI of VFT mechanism.

Fig. 22. Experimental environments.

Page 11: Accessing medical image file with co-allocation HDFS in cloud

C.-T. Yang et al. / Future Generation Computer Systems 43–44 (2015) 61–73 71

Fig. 25. Experimental environments.

Fig. 26. Comparison of physical host and virtual machine throughput.

In this MIFAS with VFT environment, HDFS build three clusters(THU1, THU2 and CSMU). For each Namenode are done in two VMsconfiguration, and use the DRBD with heartbeat sync to do thispart of the configuration, were configured for each Namenode fourDatanodes. Fig. 25 shows the details and the environment.

At this part, we conduct stress testing with JMeter. We set 10Threads, and Loop count 5 times more physical machines and vir-tual machine on the environment were to download 1, 10 and50 MB file sizes, etc., the resulting throughput and the ability todownload data. The results of Fig. 26 show that the smaller of filesize will enable greater throughput, and physical machines andVMswill bemore obvious differences. Fig. 27 shows thatwe down-load a small file, and VMs transmission performance will be betterthan physical machines.

In this experiment, we download the same files from each PACSand MIFAS. The purpose of this experiment is to compare thePACS and MIFAS Networking Performance. Fig. 28 shows the re-sults, a smaller download file, MIFAS better transmission capacity,whereas in downloading large files, PACS has better transmissionperformance.

Fig. 27. Comparison of physical host and VM networking performance.

Fig. 28. Comparison of PACS and MIFAS networking performance.

5. Conclusions and future work

This work develops a Medical Image File Accessing System(MIFAS) based on HDFS of Hadoop in cloud. MIFAS is a flexible,

Page 12: Accessing medical image file with co-allocation HDFS in cloud

72 C.-T. Yang et al. / Future Generation Computer Systems 43–44 (2015) 61–73

stable, and reliable system and proves medical images sharing,storing and exchanging issues. Therefore, medical images can eas-ily be shared between different hospitals. MIFAS offers the follow-ing advantages:

Scalability: ability to extend the nodes as required. Adding anode to the network is as simple as hooking a Linux box to thenetwork and copying a few configuration files. Hadoop providesdetails about the available space in the cluster. This makes it easyto decide if a node must be added.

Cost effective: since the Linux nodes are relatively cheap; thereis no need to invest much on the hardware and OS.

Best strategy: the Hadoop platform offers a distributed filesystem and uses a co-allocation mechanism to retrieve medicalimages.

Replication: because the replication location service in MIFASmiddleware data can be saved, it can be easily shared in differentprivate clouds.

Easymanagement:weprovide friendlymanagement interface.This interface makes it easy to set and manage the private cloudenvironment in MIFAS.

The experimental results show that the high reliability datastorage clustering and fault tolerance capabilities can be achieved.The MIFAS system achieves acceptable redundancy in medicalresources with much less expense. Furthermore, the MIFAS willbe improved by enhancing the performance of file accessing. Thefuture study will be a good example of cloud-based medical imagefile accessing, and achieves the goal of medical image exchangingbetween patients and their caregivers.

Acknowledgments

This work is sponsored by Tunghai University the U-Care ICTIntegration Platform for the Elderly, No. 103GREEnS004-2, Aug.2014. This work was supported in part by the Ministry of Scienceand Technology, Taiwan ROC, under grant numbers MOST 101-2218-E-029-004 and MOST 102-2218-E-029-002.

References

[1] Electronic Medical Record. http://en.wikipedia.org/wiki/Electronic_health_record.

[2] Canada Health Infoway. http://www.infoway-inforoute.ca.[3] Mental Research Institute. http://www.mri.org.[4] Chervenak, E. Deelman, I. Foster, L. Guy, W. Hoschek, A. Iamnitchi, C.

Kesselman, P. Kunszt, M. Ripeanu, B. Schwarz, H. Stockinger, K. Stockinger,B. Tierney, Giggle: a framework for constructing scalable replica locationservices, in: Proc. SC, 2002, pp. 1–17.

[5] Chervenak, I. Foster, C. Kesselman, C. Salisbury, S. Tuecke, The data grid:Towards an architecture for the distributedmanagement and analysis of largescientific datasets, J. Netw. Comput. Appl. 23 (3) (2001) 187–200.

[6] G.V. Koutelakis, D.K. Lymperopoulos, Member, IEEE, a grid PACS architecture:providing data-centric applications through a grid infrastructure, in: Proceed-ings of the 29th Annual International Conference of the IEEE EMBS Cite Inter-national, Lyon, France, ISBN: 978–1-4244-0787-3, August 23–26, 2007.

[7] American Recovery and Reinvestment Act(ARRA). http://www.ama-assn.org.[8] Department of Health, Executive Yuan, R.O.C. (TAIWAN) ‘‘Electronic medical

record project’’. http://emr.doh.gov.tw.[9] N.E. King, B. Liu, Z. Zhou, J. Documet, H.K. Huang, The data storage grid: the

next generation of fault-tolerant storage for backup and disaster recovery ofclinical images, in: Osman M. Ratib, Steven C. Horii (Eds.), Medical Imaging2005: PACS and Imaging Informatics, in: Proceedings of SPIE, vol. 5748, 2005,pp. 208–217.

[10] S. Vazhkudai, Enabling the co-allocation of grid data transfers, in: Proceedingsof Fourth International Workshop on Grid Computing, 17 November 2003,pp. 44–51.

[11] C.T. Yang, Chun-Pin Fu, Ching-Hsien Hsu, File replication, maintenance, andconsistency management services in data grids, J. Supercomput. 53 (3) (2010)411–439.

[12] C.T. Yang, I-Hsien Yang, Chun-Hsiang Chen, RACAM: design and implemen-tation of a recursively-adjusting co-allocation method with efficient replicaselection in data grids, Concurr. Comput.: Pract. Exper. 22 (15) (2010)2170–2194.

[13] Chao-Tung Yang, Chih-Hao Lin, Ming-Feng Yang, Wen-Chung Chiang, Aheuristic QoS measurement with domain-based network information modelfor grid computing environments, Int. J. Ad Hoc Ubiquitous Comput. 5 (4)(2010) 235–243.

[14] LizheWang, DanChen, Jiaqi Zhao, Jie Tao, Resourcemanagement of distributedvirtual machines, Int. J. Ad Hoc Ubiquitous Comput. 10 (2) (2012) 96–111.

[15] Lizhe Wang, Marcel Kunze, Jie Tao, Gregor von Laszewski, Towards building acloud for scientific applications, Adv. Eng. Softw. 42 (9) (2011) 714–722.

[16] Lizhe Wang, Dan Chen, Fang Huang, Virtual workflow system for distributedcollaborative scientific applications on grids, Comput. Electr. Eng. 37 (3) (2011)300–310.

[17] Lizhe Wang, Gregor von Laszewski, Marcel Kunze, Jie Tao, Jai Dayal, Providevirtual distributed environments for grid computing on demand, Adv. Eng.Softw. 41 (2) (2010) 213–219.

[18] Lizhe Wang, Gregor von Laszewski, Jie Tao, Marcel Kunze, Virtual datasystem on distributed virtual machines in computational grids, Int. J. Ad HocUbiquitous Comput. 6 (4) (2010) 194–204.

[19] Lizhe Wang, Jie Tao, Rajiv Ranjan, Holger Marten, Achim Streit, Jingying Chen,Dan Chen, G-Hadoop: MapReduce across distributed data centers for data-intensive computing, Future Gener. Comput. Syst. 29 (3) (2013) 739–750.

[20] Wanfeng Zhang, Lizhe Wang, Dingsheng Liu, Weijing Song, Yan Ma, Peng Liu,Dan Chen, Towards building a multi-datacenter infrastructure for massiveremote sensing image processing, Concurr. Comput.: Pract. Exper. 25 (12)(2013) 1798–1812.

[21] Yan Ma, Lizhe Wang, Dingsheng Liu, Tao Yuan, Peng Liu, Wanfeng Zhang,Distributed data structure templates for data-intensive remote sensingapplications, Concurr. Comput.: Pract. Exper. 25 (12) (2013) 1784–1797.

[22] Chao-Tung Yang, Yao-Chun Chi, Ming-Feng Yang, Ching-Hsieh Hsu, Ananticipative recursively-adjusting mechanism for parallel file transfer in datagrids, Concurr. Comput.: Pract. Exper. 22 (15) (2010) 2144–2169.

[23] C.T. Yang, Ming-Feng Yang, Wen-Chung Chiang, Enhancement of anticipativerecursively adjusting mechanism for redundant parallel file transfer in datagrids, J. Netw. Comput. Appl. 32 (4) (2009) 834–845.

[24] C.T. Yang, I-Hsien Yang, Shih-Yu Wang, Ching-Hsien Hsu, Kuan-Ching Li, Arecursively-adjusting co-allocation scheme with a cyber-transformer in datagrids, Future Gener. Comput. Syst. 25 (7) (2009) 695–703.

[25] C.T. Yang, I.H. Yang, K.C. Li, S.Y. Wang, Improvements on dynamic adjustmentmechanism in co-allocation data grid environments, J. Supercomput. 40 (3)(2007) 269–280.

[26] Z. Zhou, et al., A data grid for imaging-based clinical trials, in: StevenC. Horii, Katherine P. Andriole (Eds.), Medical Imaging 2007: PACS and ImagingInformatics, in: Proc. of SPIE, vol. 6516, 2007, p. 65160U.

[27] Chao-Tung Yang, Shih-YuWang,WilliamC. Chu, Implementation of a dynamicadjustment strategy for parallel file transfer in co-allocation data grids,J. Supercomput. 54 (2) (2010) 180–205.

[28] Chao-Tung Yang, Chiu-Hsiung Chen, Ming-Feng Yang, Implementation of amedical image file accessing system in co-allocation data grids, Future Gener.Comput. Syst. 26 (8) (2010) 1127–1140.

[29] Gopinath Ganapathy, S. Sagayaraj, Circumventing picture archiving andcommunication systems server with Hadoop framework in health careservices, J. Soc. Sci. 6 (2010) 310–314.

[30] J. Venner, Pro Hadoop, first ed., Apress, 2009, p. 440. 13:9781430219439.[31] L. Faggioni, et al., The future of PACS in healthcare enterprises, Eur. J. Radiol.

(2010).[32] E. Bellon, et al., Trends in PACS architecture, Eur. J. Radiol. (2010).[33] Lizhe Wang, Dan Chen, Yangyang Hu, Yan Ma, Jian Wang, Towards enabling

cyberinfrastructure as a service in clouds, Comput. Electr. Eng. 39 (1) (2013)3–14.

[34] Apache Hadoop Project. http://hadoop.apache.org/hdfs/.[35] L.N. Sutton, PACS and diagnostic imaging service delivery—a UK perspective,

Eur. J. Radiol. 78 (2) (2011) 243–249. http://dx.doi.org/10.1016/j.ejrad.2010.05.012.

[36] L. Xuhui, et al., Implementing WebGIS on Hadoop: A case study of improvingsmall file I/O performance on HDFS, in: Cluster Computing and Workshops,2009, CLUSTER’09, IEEE International Conference on, 2009, pp. 1–8.

[37] R. Grossman, et al., Compute and storage clouds using wide area highperformance networks, Future Gener. Comput. Syst. 25 (2009) 179–183.

[38] DICOM. http://medical.nema.org/.[39] A.L.N. Reddy, J. Wyllie, K.B.R. Wijayaratne, Disk scheduling in a multimedia

I/O system, ACM Trans. Multimedia Comput. Commun. Appl. 1 (1) (2005)37–59.

[40] J. Shafer, et al., The Hadoop distributed filesystem: Balancing portability andperformance, in: Performance Analysis of Systems & Software, ISPASS, 2010IEEE International Symposium on, White Plains, NY, 2010, pp. 122–133.

[41] Chao-Tung Yang, Jung-Chun Liu, Ching-Hsien Hsu, Wei-Li Chou, On improve-ment of cloud virtual machine availability with virtualization fault tolerancemechanism, J. Supercomput. (2013) in press.

[42] Ganglia Monitoring System. http://ganglia.sourceforge.net/.

Page 13: Accessing medical image file with co-allocation HDFS in cloud

C.-T. Yang et al. / Future Generation Computer Systems 43–44 (2015) 61–73 73

Chao-Tung Yang He is a Professor of Computer Scienceat Tunghai University in Taiwan. He received the Ph.D. inComputer Science from National Chiao Tung University inJuly 1996. In August 2001, he joined the Faculty of theDepartment of Computer Science at Tunghai University.He is serving in a number of journal editorial boards,including International Journal of Communication Systems,Journal of Applied Mathematics, Journal of Cloud Computing,‘‘Grid Computing, Applications and Technology’’ SpecialIssue of Journal of Supercomputing, and ‘‘Grid and CloudComputing’’ Special Issue of International Journal of Ad Hoc

and Ubiquitous Computing. Dr. Yang has publishedmore than 250 papers in journals,book chapters and conference proceedings. His present research interests are incloud computing and service, grid computing, parallel computing, and multicoreprogramming. He is a member of the IEEE Computer Society and ACM.

Wen-Chung Shih received a B.S. degree in Computer andInformation Science from National Chiao Tung Universityin 1992 and an M.S. degree in Computer and InformationScience from National Chiao Tung University in 1994.He received the Ph.D. degree in Computer Science fromNational Chiao Tung University in 2008. He passedthe second class of the National Higher Examinationin Information Processing field in 1994 and in LibraryInformationManagement field in 2004, respectively. Since2008, he has worked as an Assistant Professor in AsiaUniversity, Taiwan. Since August 2014, he is an Associate

Professor and Chairman of the department of Computer Science and InformationEngineering in Asia University. His research interests include e-Learning, datamining and expert systems.

Lung-Teng Chen received the master’s degree in Infor-mation Engineering Dragon from Tunghai University in2011. Since 2006, he served as a director in the informationchamber system group of Chung Shan Medical UniversityHospital, mainly work for to the use of information sys-tem to assist medical clinical affairs. His interests includeinformation computing architecture, storage architecture,computer networks, networking applications, etc.

Cheng-Ta Kuo received the M.S. degree in Department ofComputer Science from Tunghai University in 2011. Since2007, he has worked as a Software Engineer in WE CANMEDICINES CO., LTD. His research interests include cloudcomputing, virtualization and data mining.

Fuu-Cheng Jiang Currently, he is with the department ofcomputer science at Tunghai University in Taiwan. His re-search interests include network modeling, cloud com-puting, wireless networks and simulation. Dr. Jiang wasthe recipient of the Best Paper Award at the 5th Interna-tional Conference on Future Information Technology 2010(FutureTech2010), which ranked his paper first amongthe 201 submittals. He has served the TPC of the ICCCT2011–2012 International Conference, BWCCA2010, IEEECloudCom 2012 and CSE2011 Session Chair. Moreover, heserved as a journal reviewer of the Computer Journal, Ad

Hoc Networks and the International Journal of Communication Systems.

Fang-Yie Leu received his B.S., M.S. and Ph.D. degreesfrom National Taiwan University of Science and Technol-ogy, Taiwan, in 1983, 1986 and 1991, respectively, and an-other M.S. degree from Knowledge System Institute, USA,in 1990. His research interests include wireless commu-nication, network security, Grid applications and Chinesenatural language processing. He is currently a professor ofTunghai University, Taiwan, the director of database andnetwork security laboratory of the University, the work-shop chair of MCNCS and CWECS workshops, and the ed-itorial board member of several international journals. He

is also a member of IEEE Computer Society.