Ashraf Hasan Bqerat Ahmad Al-Khasawneh, Ayoub Alsarhan and … · 2016. 3. 28. · techniques in page replacement, page out and address translation. If we want to take a sneak peak

Int. J. Computer Applications in Technology, Vol. 51, No. 3, 2015 247

Copyright © 2015 Inderscience Enterprises Ltd.

Towards improving the performance of distributed virtual memory based reversal cache approach

Sa’ed Abed* Computer Engineering Department, College of Computing Sciences and Engineering, Kuwait University, P.O. Box 5969, Safat, 13060, Kuwait Fax: +965 2483946 Email: [email protected] *Corresponding author

Ashraf Hasan Bqerat Department of Computer Engineering, Hashemite University, P.O. Box 150459, Zarqa, 13115, Jordan Email: [email protected]

Ahmad Al-Khasawneh, Ayoub Alsarhan and Ibrahim Obeidat Computer Information System Department, Hashemite University, P.O. Box 150459, Zarqa, 13115, Jordan Email: [email protected] Email: [email protected] Email: [email protected]

Abstract: Distributed virtual memory (DVM) is one of the techniques that aim to maximise throughput under the reality that data transfer time between memory to memory is less than memory to hard disk which motivates these techniques. Distributed reversal cache (DRC) is a DVM technique that has been developed recently and is considered as a complete new approach. In this paper, part of memory of each master node is taken off to create the reversal cache then distributing its load over all nodes to free more area in the master node. Furthermore, we tackle the convenient issues of reversal cache and modify it for the sake of overcoming some stalls which mainly affect the performance of the master node by connecting the reversal cache technique with the DVM concept. Experimental results show that our technique outperformed other techniques in terms of page faults reduction and thrashing enhancement of 14% and 27%, respectively.

Keywords: DVM; distributed virtual memory; CVM; conventional virtual memory; reversal cache; page fault; thrashing.

Reference to this paper should be made as follows: Abed, S., Bqerat, A.H., Al-Khasawneh, A., Alsarhan, A. and Obeidat, I. (2015) ‘Towards improving the performance of distributed virtual memory based reversal cache approach’, Int. J. Computer Applications in Technology, Vol. 51, No. 3, pp.247–256.

Biographical notes: Sa’ed Abed received his BSc and MSc in Computer Engineering from Jordan University of Science and Technology (J.U.S.T.), Jordan in 1994 and 1996, respectively. In 2008, he received his Ph.D. in Computer Engineering from Concordia University, Canada. He has previously worked as lecturer at King Faisal University in Saudi Arabia from 1997 until 2003. He joined Hashemite University, Jordan, as Assistant Professor from 2008 until 2014. Currently, he is an Assistant Professor in the Department of Computer Engineering at Kuwait University. His research interests include formal methods, theorem proving, model checking, SAT-solvers, VLSI design and automation, synthesis, computer architecture and fault tolerance. He also served as a reviewer for various international conferences and journals.

248 S. Abed et al.

Ashraf Hasan Bqerat has earned the BS in Electrical and Computer Engineering from the Hashemite University with honour in 2009, before that he had joined the Jubilee School (a School for Gifted Students in the Middle East under the supervision of The Arab Council for the Gifted and Talented Students) 2001–2005 Amman, Jordan. His first research was in abstract maths at the age of 16. He is currently a computer engineer at the Hashemite University, Jordan. His research interest involves computer architecture, operating systems and VLSI.

Ahmad Al-Khasawneh is an Associate Professor of Computer information System at Hashemite University. He holds a PhD and MSc in Information and Software Systems and Technology from Newcastle University, Australia and BS in Computer and Automatic Control Engineering. Prior to joining the Hashemite University of Jordan, he held several key positions with major international ICT consultancy and solutions firms and lecturer in IT-related topics at Newcastle University, Australia. Currently, he is the Dean of the Prince Hussein Bin Abdullah II for Information Technology at Hashemite University.

Ayoub Alsarhan received his PhD in Electrical and Computer Engineering from Concordia University, Canada, in 2011, MSc in Computer Science from Al al-Bayt University, Jordan, in 2001, and BE in Computer Science from the Yarmouk University, Jordan, in 1997. He is currently an Assistant Professor at the Computer Information System at Hashemite University, Zarqa, Jordan. His research interests include cognitive network, parallel processing, machine learning and real-time multimedia communication over internet.

Ibrahim Obeidat is the Chair of Computer Information System Department of Hashemite University in Jordan. He received his BSc in Electrical Engineering from Jordan, Master from the New York Institute of Technology, and PhD from George Washington University, USA. His research interests lays in computer science and networking. He has experience in networking reliability. Prior to joining the Hashemite University of Jordan, He held several key positions with major international ICT and IS consultancy and solutions firms.

1 Introduction Computers are getting faster gaining more and more capabilities within time, in terms of speed (CPU speed, bus speed and memory speed), number of CPUs per computer system and the capacity of storage devices. Thus, more hardware (HW) means more throughput and more work is being finished in a fixed period of time. On the other hand, the architects should consider software (SW) efficiency in the design. SW controls the HW and thus the attention is directed to the operating system (OS). To increase throughput of a computer system, we have the following two choices:

• investing more on the HW with extra cost, area and power

• enhancing and optimising the SW on the same HW where the cost is not affected too much.

Resource utilisation is an important issue to maximise throughput. For such a goal, a lot of techniques were inspired by researchers in many fields of computer engineering. Resource sharing maximises throughput by utilising the unused resources which will be then used more and more. Distributed virtual memory (DVM) is one of the techniques that rely on resource sharing. That is, DVM uses the unused random access memory (RAM) of other computers. The degree of utilisation is based on the used techniques in page replacement, page out and address translation. If we want to take a sneak peak over the RAM system and hard disk (HD) system, we will see that the access time of data in RAM is measured in microseconds

while in HD the data access is measured in milliseconds. This can tell us how much time is consumed when accessing the HD.

In conventional virtual memory (CVM), the pages are exchanged between HD and RAM. The idea of this technique is to enlarge RAM and expand it to the HD system. Virtually, they appear as one unit but physically they are separated. For example, a computer in a LAN may get use of other nodes’ RAM. On the other hand, the virtual memory is distributed between local memory, memory of other nodes and the local HD.

In DVM, when a page is stripped from a node N in page out, instead of sending this page to node’s N HD (as in CVM), which takes a huge time in storing and retrieving this page, it will be sent to node k (where k! = N) memory. Thus, the throughput is increased by reducing page seek time since the transfer time of memory to HD is greater than memory to memory time.

When using DVM technique, it works close to the sense of distributing processes (at level of pages) to other nodes to enhance the throughput of the system by balancing the load across the nodes (Nuttall, 1996). If we look to the future, we note that the gap between disk and network performance is increasing rapidly. On the basis of statistics, the disk latency performance improves 10% every year and its bandwidth increases 20%. On the other hand, the network (backbone of the distributed system) performance on latency improves 20% every year and the bandwidth improves 45% (Chu et al., 2007). Therefore, the DVM is increasingly becoming more and more efficient.

Towards improving the performance of distributed virtual memory based reversal cache approach 249

In this paper, our system consists of loosely coupled computers (nodes). Each node is a standalone system with CPU, Memory and Input/Output subsystems. Any LAN topology may be used for connecting the nodes to each other. In this case, a new level of memory is added to any node and then will be considered the memory of other nodes. The authors in Li (1988) and Barrera (1993) were the first who introduced memory sharing at pages level, then (Clancey and Francioni, 1990) and later on many other authors (Abaza, 1992; Malkawi et al., 1991; Fellah and Abaza, 1999; Geva and Wiseman, 2007).

Shared virtual memory (SVM) was also proposed over two decades ago and firstly was introduced by Li and Hudak (1989). In Li et al. (1989), the authors introduced the problem of memory coherency in loosely coupled system. Later on Li (1986) has gone deeper in SVM. His dissertation was a base for other works on SVM for many years. Many techniques for SVM have been evolved with time. SVM is a technique that supports shared space addressing, where the memory of different nodes can be seen by the user as one coherent memory (Iftode and Singh, 1999). This technique creates the illusion that there is one big memory.

Distributed shared virtual memory (DSVM) (Zhang et al., 2000; Bilas et al., 1999; Dwarkada et al., 1999; Christine and Isabelle, 1997; Jun, 2009) is a technique in where a chunk of memory of a node is shared between different nodes. In this case, part of a certain node memory is accessible from other nodes in the system. The sum of chunks in the network produces a logical SVM. This SVM is distributed among all nodes of the system which form a second storage space that comes between node’s main memory and its HD as shown in Figure 1.

Figure 1 DSVM block diagram

Figure 1 explains a DSVM example that is distributed between three nodes. It can be seen that the virtual memory is physically distributed between these nodes but virtually they are one unit. This virtual memory is shared between these three nodes which is accessible for any node. Note that the utilisation of all nodes’ RAMs will increase

since the level of multiprogramming is increased leading to more and more parallelism which leads to higher RAM utilisation.

In Abed et al. (2011), a new approach was suggested by the authors to enhance DVM where they added a new level of memory called cluster cache and created a central node (master node) for each cluster to store temporary pages from all cluster nodes after being sent out of the node and thus enlarge the virtual memory as an alternative of the HD. Then, they evaluated the approach performance to show the efficiency of their technique over CVM technique. The problem with this technique was that the enhancement was only carried out on the regular nodes while the master node was killed by the cut out area assigned for the reversal cache.

The objective of this paper is to present a new technique distributed reversal cache (DRC) for organising the virtual memory and demonstrating how our new enhancement does better than the existing methods. In DRC technique, the master node obsesses the reversal cache which is a part of its RAM; this reversal cache is separated virtually from the rest of the RAM. It is responsible for caching some pages for all the nodes over the LAN. Thus, we overcome the problem in Abed et al. (2011) by distributing the reversal cache to all over nodes and free more area in the master node; so it appears as one chunk (logically) while it is distributed all over nodes (physically). The pages are stored in this cache regarding to a predetermined algorithm.

Figure 2 illustrates the idea of CVM, DVM, DSVM and DRC techniques. CVM approach deals with the node as an independent unit, no resource sharing exists between network nodes and pages are always exchanged between memory and HD (1 and 4). Thus, the memory is expanded to the HD. In DVM approach, when a page is stripped out from memory it firstly searches for a space in another node’s memory, if there is no vacancy then the page is sent to the node HD (1, 3 and 4). In DSVM approach, when the page is evicted from main memory it is sent to the DSVM and if a certain process requests a page, it checks the main memory and then goes to DSVM and finally it searches for it in the local HD (1, 3′ and 4). In DRC approach, the distributed virtual memory is a little bit different such that when the page is ejected from local memory it will be directed to the reversal cache. The reversal cache is a part of memory of nominated master node that has been booked for all nodes of the LAN to store least recently used (LRU) pages from all nodes and the replaced page from reversal cache will search for a node to be stored in. If it does not find a vacancy then it will go to node’s local HD (1, 2, 3 and 4). We are interested here in DRC technique. To evaluate our approach and show its efficiency, the results were compared with existing techniques based on test benches under different system loads. More details about this technique will be given in upcoming sections.

The structure of this paper is organised as follows: Section 2 reviews the closest-related work in this area. Section 3 gives some preliminaries on the concept of

250 S. Abed et al.

reversal cache system. Section 4 describes the DRC proposed methodology and introduces the DRC page look up and its usage in address translation. The experimental result of applying our approach is presented in Section 5. Finally, Section 6 concludes the paper and presents directions for future research.

Figure 2 Overview of CVM, DVM, DSVM and DRC methods

2 Related work Many studies have been performed throughout decades of researching in software area side by side with hardware development and adopted both DVM and DSVM techniques. A short review on both techniques will be provided in this section and the closest work to ours will be presented.

2.1 DVM-related work

Clancey and Francioni (1990) introduced the node page table (NPT) technique. In this technique, they created a table for each node in the system. It contains information about the page owner, the physical address, the process ID, and the page destination in case the page is transferred to a different node (TO) in addition to other required page details. The function of this table is to keep information about its pages’ locations and help to track them to retrieve when it is needed. The system is based completely on message passing. These massages contain the information to fill the NPT. This NPT is always updated every T quantum by the information encapsulated in exchanged messages.

To map the logical address into a physical one, address translation process is required. Clancey and Francioni (1990) introduced an algorithm to search for a page v, which belongs to process p at node x as described in Figure 3.

Later, Abaza (1992) tested many algorithms for page out which happens when the memory is fully occupied and there is no place for the called page of a process. In this case, a selected page is removed from the memory (Kermarrec and Pautet, 1994; Qanzu’a, 1996; Silberschatzs

et al., 2005; Andrew, 1995). The tested algorithms were: round robin (RR), least loaded neighbour (LLN) and least active neighbour (LAN).

Figure 3 NPT search algorithm

If the page does not exist in the memory, a page replacement strategy is applied; the system picks a victim page and sends it to a destination node. In NPT, there had been used several techniques for replacement such as: local least recently used (LLRU), least recently brought (LRB) and the global least recently used or least recently brought (GLRUOB). The least recently used page among local and remote pages is then selected (Clancey and Francioni, 1990). There is a big problem in this technique, which is that any node has a limited knowledge about the rest of the system especially for the nodes’ capacities.

Abaza and Fellah (1997) discussed the DVM technique in carrier sense multiple access with collision detection (CSMA/CD) environment. The authors divided the memory in each node into two sets: the home set, and the copy set. Home set is loaded before program execution and remains static during it. Copy set is the set of pages that are owned by other nodes. They are distributed before program execution to the part of memory (copy set part) of another node. The results demonstrated performance improvements for the proposed technique (Clancey and Francioni, 1990). They partitioned the node memory into three areas:

• the home set of pages, which are the pages owned by the running local process

• the guest set of pages, which are the pages owned by other processes residing at the node of the faulting process

• the remote set of pages, which are the pages owned by the current node but residing in other nodes in the system.


Also, they tested four algorithms under this environment: Accept-Until-Full algorithm, Odd/Even algorithm, Mod algorithm and Class Priority algorithm. They proved that the class priority algorithm outperforms all other methods in terms of memory performance. They tested their work and showed that the DVM technique is more efficient in terms of memory utilisation.

Later, Fellah (2001) proposed a new hybrid memory management mechanism that combines the advantages of both pages and objects structure in object oriented systems while avoiding their disadvantages. His approach is based on the inherent relationship among objects for caching different types of remote pages and objects. High-profile objects (HPO) and low-profile objects (LPO) with their respective h-cache and Z-cache were introduced. He implemented his technique through simulation and showed that it dramatically reduces miss-penalties and outperforms earlier methods.

In Qanzu’a (1996), Qanzu’a introduced a new approach for managing DVM by replacing the NPT presented in (Abaza, 1992) by two data structures; the process page table (PPT) and processor page table (PrPT). The PPT contains information about all pages of a certain process, while the PrPT contains information about all pages that resides in a certain node and these pages could be for a local or remote process for another node. Figure 4 shows the address translation in PrPT and PPT technique.

Figure 4 Address translation in PrPT and PPT technique

2.2 DSVM-related work

DSVM has been discussed in many papers through last two decades. Using this concept increases the level of multiprogramming heavily. We will point to some related works. In these related works most of them concentrated on memory coherence (Iftode and Singh, 1999; Hsu and Tam, 1989), consistency (Iftode and Singh, 1999), shared space addressing in parallel computing (Iftode and Singh, 1999; Subashini and Bhuvaneswari, 2012), network performance for distributed systems (Bitam and Alla, 2006) and synchronisation and deadlocks issues (Hsu and Tam, 1989; Blount and Butrico, 1993).

Hsu and Tam (1989) have shed light on the issue of processes synchronisation between different nodes, they proved that not only the memory coherence is important in loosely coupled system but also process synchronisation as well.

Kermarrec and Pautet (1994) had studied page replacement in DSVM. Their system consists of groups of diskless embedded systems. In their paper, they introduced a new algorithm which integrates page replacement. In their algorithm, they divided the physical memory of each node into two parts: backing storage and cache storage. The backing storage is reserved to the pages that are owned by the node the page owner is determined at initialisation time. Other nodes know that this page is owned by the owner node so when a node flushes this page it will be directly sent to the backing storage of the owner node. The cache storage is the part of the physical memory where it stores the pages owned by other nodes when it is needed by the executed process after an acquisition operation. When more memory is needed this node may flush some pages in this storage to the backing storage of the owner nodes.

Blount and Butrico (1993) have shown a real implementation of DSVM on IBM machine. This machine was IBM RISC System/6000 workstation. The work station has high speed fibre connection and the communication process is controlled by low-overhead link protocols. They applied the DSVM on AIX v3 operating system running on the mentioned workstation and called it DSVM6k. this project was raised for several goals: modify the AIX v3 operating system to work as a distributed system using DSVM. This modification has to be as minimal as possible. Furthermore, this modified operating system must be efficient and feasible and can be traced in such way that new approaches can be applied on it in future. We also have to mention that there were no non-DSVM operations affected or downgraded in AIX v3 operating system after modification. Later on this modified AIX v3 has been offered as a commercial operating system.

To sum up, the main difference of our work compared with earlier published work (as shown above) is that the DRC approach enhances the throughput of the master node by connecting the reversal cache technique with the DSVM concept. This is carried out by distributing the cache load between all nodes to free more area for the master node based on a modified replacement algorithm.

3 Preliminaries In DSVM generally, we have two choices to implement data transfer between deferent nodes: either make it a message-based communication where the data are transferred by its values or make it pass by reference communication via (Read/Write op.). In the former, deadlocks can occur so the programmers should be aware to provide a deadlock detection algorithm side to side with message-based communication. Also, the latter is not preferred to be used when moving large data structures between nodes, since it

252 S. Abed et al.

will consume more time than the former (Blount and Butrico, 1993).

The important fields of the general address space of a certain page in a DSVM are the Processor ID (PrID) and page physical address. PrID is defined in which node does this page resides at the moment and the physical (real) address of that node in the memory. But in this way, the address is not reconditioned by the nodes in DSVM system. There has to be a software called virtual memory manager (VMM). This software is responsible to translate (map) this physical address into a known virtual address to be recognised by the all system nodes. Other fields may be included in this structure such as Present bit which indicates if the page exists in memory or disk; Rights bit which controls the read or write and execution access and many other entries.

3.1 Reversal cache system

Reversal cache is a novel approach where a logical space of memory (memory of master node) is used to store pages from all nodes over the LAN. The name of the technique explains itself: cache and reversal. By cache we mean that the evicted pages are cached to the owner node instead of sending it directly to a certain node memory which has large size. Bigger memory means more time is consumed to search for any page while in the reversal cache the memory is smaller. By reversal, we mean that most caches come as a first storage unit in storage unit’s hierarchy. But here the cache position is reversed from first storage to a second storage unit.

Reversal cache technique depends on dividing the LAN into groups; in each group the system nominates a node and cut off a part from it virtually. This part represents the reversal cache and the node obsesses this cache will be called the master node. In Abed et al. (2011), the authors had taken off 10% of each master node memory and booked it as a reversal cache. In DVM technique, a page is paged out from a node after page replacement. This page will not be sent directly to the node’s HD. The node searches for another node which has a free frame to store its page in Clancey and Francioni (1990).

Figure 5 shows a reversal cache system which has different nodes, master node and backbone network that communicates via layer2 switch. Note that the master node holds the reversal cache at a part of its memory and this cache is accessible from all nodes.

4 Design methodology The enhanced DVM technique presented in Abed et al. (2011) has really showed a good enhancement when it was applied on (Qanzu’a, 1996) algorithm. The simulation results showed that the enhancement was in terms of reduction of page faults by an average value of 15%.

Fewer page faults mean increased throughput and higher utilisation is gained without adding any HW (resources) only with existing resources which are utilised more and more. But this enhancement was for all nodes except for the master node. For simple reason which was the booked memory that has taken 10% of its RAM. Thus, the enhancement that was made by applying this method was eliminated for this reason.

Figure 5 Overview of the reversal cache system (see online version for colours)

The aim of this paper is to enhance the throughput of the master node. This solution can be made by connecting the reversal cache technique with the DSVM concept. Reversal cache approach cannot only be applied on (Qanzu’a, 1996) methodology, but actually it is a concept that can be applied on many other techniques to increase its throughput such as NPT technique (Clancey and Francioni, 1990) and others.

The distribution of this reversal cache seems to be the solution as in the DSVM. Just distribute the reversal cache physically all over the LAN nodes, and keep the reversal cache coherent as one unit and call it DRC. This makes a load balancing through the distribution of the reversal cache on all nodes instead of booking a large space in the master node, which then downgrades its performance. This space will be distributed fairly between different nodes and therefore increase the master node memory capacity. Figure 6 illustrates the idea of DRC.

In Figure 6, we notice how DRC looks really in the system. Many nodes are connected by a layer 2 switch in the LAN (of course our DRC technique is implemented in the level of layer2 switch and no need for layer 3 switch since it is too slow and it is implemented in a LAN). In Abed et al. (2011), the cache was considered as one block exists in the master node which has made an extra headache on master node. In this work, we distribute DRC on all nodes. So, physically reversal cache is separated but logically it is one unit.


Figure 6 Overview of the DRC (see online version for colours)

Here are some facts about the targeted system based on DRC technique:

• Loosely coupled nodes.

• Backbone network with a communication control protocol. We can use the TCP/IP protocol since our main concern is throughput.

• All the nodes are in the same LAN; so we only need layer 2 switch.

• The topology is star topology.

• Only memory can be shared between nodes. Neither CPU nor Input/Output subsystems are shared.

• Our system is built using message passing for communication.

• Nodes are heterogeneous since they do not have the same capabilities.

• Each node has an infinite HD space.

Of course, the distribution is for the reversal cache pages and not for the LLN table. This table remains in the master node and will not be distributed. No need for such a move.

In the next sections, we introduce the DRC page look up and its usage in address translation and explain the concept of DRC page out policy.

4.1 DRC page look up

Figure 7 presents the page look up in DRC technique. A process P requests page x which exists in node N: check page at node n, where n is the owner of process P and n! = N. Either a page hit occurs or go and find it in the node n cluster cache. If not found, check the LLN table to see page x is other nodes’ memories. If the page is not found, then go directly to the node’s n HD where the pages resides there. To improve cache throughput (increase cache hits), a new cache (called Reversal Cache) is added for each cluster which is considered as an innovative technique in DVM. The difference between (Abed et al., 2011) method

and our DRC in page look up was made bold in Figure 7 where the usage of DRC is added for the first time for our novel approach.

4.2 DRC page out

Page out is a process of getting rid of a page that is not necessarily needed right now. In Abed et al. (2011) and our DRC method as well, the system is divided into groups. Each group is called cluster and each cluster has its own memory (cache). The cache exists in the master node memory in Abed et al. (2011) while it is different in DRC (as described earlier).

In our system, the cache contains the LLN data which provides the necessary information about number of free frames in each node. This table has two entries:

• processor ID (PrID)

• the number of free frames (f).

The aim of creating such table is to select the LLN. This table is updated every T time unit. The page out policy is followed as described in Abed et al. (2011).

Figure 7 Page look up in DRC technique (see online version for colours)

5 Simulation results Simulation results have shown that the performance of the master node had been improved to reach the enhancement level made to the other nodes when applying reversal cache technique.

The results proved that the expansion of distributed memory to Abed et al.’s (2011) methodology works as a

254 S. Abed et al.

compatible system, which affirms the concept of distributed systems. The concatenation of both concepts will form the DRC concept.

5.1 Simulation environment

A Before talking about the simulator environment, we concentrate on the following aspects:

A1 The system is loosely coupled.

A2 Star topology is used.

A3 It is a message passing system.

A4 Simulation is expressed in terms of page fault rate.

B C++ language was used to build the simulator. The system structure consists from four clusters with four master nodes; each cluster has five nodes. Thus, the overall system has 20 nodes. Each node has been assigned 60 frames of memory and infinite HD. Regarding to the methodology used in Abed et al. (2011), we should book 10% of each master node memory to be used as a cache for each cluster which means that we should take off six frames from each master node memory but here we will distribute booking to all nodes so each node will have fairly one booked frame to be used as a DRC. The load of the work is varying from 82 to 263 pages. Before execution, the pages are distributed at random. After execution, the pages will be swapped between node memory, DRC, other nodes’ memory, and secondary storage devices. The LLN table is located in the master node for each cluster.

5.2 Results

5.2.1 Page FAULTS

The simulation was made as a comparison between CVM, Qanzu’a method (Qanzu’a, 1996; Abed et al., 2011) method and DRC.

Figure 8 shows the page faults number for CVM, Qanzu’a method (Qanzu’a, 1996; Abed et al., 2011) method and DRC as a function of work load pages’ number:

Page faults No. for 10K page references = f(No. of work load pages).

Figure 8 show the rate of page fault for node 2 in cluster 3 (regular node) when using CVM, Qanzu’a method (Qanzu’a, 1996; Abed et al., 2011) method and DRC method. Note that the X-axis shows the work load (number of pages) and Y-axis shows pages’ faults number for ~10000 page references. According to results, we note that the DRC technique has outperformed CVM by 25% and Qanzu’a method (Qanzu’a, 1996) by 15%. We also see that the DRC had nearly the same performance compared with (Abed et al., 2011) method, which means the distribution of reversal cache all over the nodes will not affect the enhancement made by Abed et al.’s (2011) method.

Figure 9 shows the rate of page fault to work load (number of pages) for the master node 1 of cluster 3, using DRC methodology, (Abed et al., 2011) methodology, Qanzu’a methodology (Qanzu’a, 1996) and CVM. Note also the enhancement made by the distribution of reversal cache all over the nodes. It is clear that in Abed et al.’s (2011) method, the master node has no enhancement but when using the DRC the enhancement is so clear, it has decreased the number of page faults by nearly 8%. In fact, the DRC methodology has overcome the problem of master node page fault rate which was one of the main defects in Abed et al.’s (2011) method. Regarding the simulation results, we can note that the DRC technique for master node has outperformed (Abed et al., 2011) by 14%, (Qanzu’a, 1996), by 15% and CVM by 25%.

To sum up, it is clear that in this paper we had treated a previous bug in Abed et al. (2011), which was in the master node memory performance, by distributing its reversal cache to other nodes in the cluster. The results showed that master node has equal enhancement when comparing to other regular nodes which was a problem in Abed et al. (2011).

Figure 8 Comparison between CVM (Qanzu’a, 1996; Abed et al., 2011) and DRC methods in regular node (see online version for colours)

Figure 9 Comparison between CVM (Qanzu’a, 1996; Abed et al., 2011) and DRC methods in master node (see online version for colours)


5.2.2 Thrashing consistency

Thrashing is a big problem that faces CVM systems since most of the execution time of any process is consumed in fetching the pages from the HD system. So as page fault rate decreases, the thrashing is reduced. Another important issue that faces any OS is that thrashing is not consistent. If we draw the thrashing as a function of time it shows an ugly shape which leads to HD irregular work and increases the number of segments which leads to more latency. But the DRC technique will make a pattern of consistency of thrashing such that the HD will work in an equal consistent manner all the time. The simulation result assures this fact as shown in Figure 10.

Figure 10 DRC thrashing vs. CVM (see online version for colours)

Thrashing is unwanted case and more of it means more page faults, lower throughput, shorter life for HD, and more power consumption in the HD especially when there are numerous computers and servers. It is obvious that the consistency in thrashing is more stable using DRC technique and the thrashing is decreased by nearly 27% compared with CVM which is a great help as depicted in Figure 10. This of course has a lot of advantages:

• page seeking time reduction

• data congestion reduction especially when taking the existing LAN into consideration

• power saving when using HD, since the HD consumes a lot of energy compared with RAM because of the usage of its mechanical parts

• HD heat reduction

• HD life time expansion.

6 Conclusions DRC technique shows good enhancements on the overall performance of the computer system without adding any means of extra HW. Simulation results showed a fair enhancement on the throughput of the proposed technique by assigning part of memory of each master node to build a

reversal cache and then sharing it out to all nodes. Also, the paper addressed the most essentials of DRC method and DVM theory which affect the performance of the master that node.

On the basis of the simulation results, our technique has outperformed the previous techniques as follows:

• Reduced page faults compared with Qanzu’a method (Qanzu’a, 1996) by 25% and CVM by 15%, respectively.

• Reduced page faults of master node by 14% compared with (Abed et al., 2011) method.

• Reduced thrashing by 27% compared with CVM.

• Even though, in our paper, the complexity of programming is increased, this is compensated by the fact that the throughput is enhanced and the OS efficiency is increased.

Future researches may be performed to simplify our algorithm without affecting the enhancement of the technique which will lead to less complex implementations. This technique can be included implicitly in future Operating Systems implementations for small and medium sized LANs to increase the efficiency.

References Abaza, M. (1992) Distributed Virtual Memory Systems,

PhD Thesis, The University of Wisconsin-Milwaukee. Abaza, M. and Fellah, A. (1997) ‘Distributed virtual memory in

the CSMA/CD environment’, IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, 10 Years PACRIM 1987–1997, Networking the Pacific Rim, Vol. 2, pp.778–781.

Abed, S., Bqerat, A.H., Alouneh, S and Mohd, B.J. (2011) ‘A novel approach to enhance distributed virtual memory’, Computers and Electrical Engineering, Vol. 38, No. 2, pp.388–398.

Andrew, S.T. (1995) Distributed Operating Systems, Prentice Hall International Editions.

Barrera, J.S. (1993) Odin: A Virtual Memory System for Massively Parallel Processors, Microsoft Research, Microsoft Corporation One Microsoft Way Redmond, WA 98052.

Bilas, A., Liao, C. and Singh, J.P. (1999) ‘Using network interface support to avoid asynchronous protocol processing in shared virtual memory systems’, Proceedings of the 26th International Symposium on Computer Architecture, pp.282–293.

Bitam, M. and Alla, H. (2006) ‘Performance evaluation of communication networks for distributed systems’, Int. J. of Computer Applications in Technology, Vol. 25, No. 4, pp.218–226.

Blount, M.L. and Butrico, M. (1993) ‘DSVM6K: distributed shared virtual memory on the RISC System/6000’, Compcon Spring ‘93, Digest of Papers, pp.491–500.

Christine, M. and Isabelle, P. (1997) ‘A survey of recoverable distributed shared virtual memory systems’, IEEE Transactions on Parallel and Distributed Systems, Vol. 8, No. 9, pp.959–969.

256 S. Abed et al.

Chu, R., Xiao, N. and Lu, X. (2007) ‘A clustering model for memory resource sharing in large scale distributed system’, International Conference on Parallel and Distributed Systems, Vol. 2, pp.1–8.

Clancey, P.M. and Francioni, J.M. (1990) ‘Distribution of pages in a distributed virtual memory’, International Conference on Parallel Processing, pp.258–265.

Dwarkadas, S., Gharachorloo, K., Kontothanassis, L., Scales, D.J., Scott, M.L. and Stets, R. (1999) ‘Comparative evaluation of fine- and coarse-grain approaches for software distributed shared memory’, Fifth International Symposium on High-Performance Computer Architecture, pp.260–269.

Fellah, A. (2001) ‘On virtual page-based and object-based memory managements in distributed environments’, IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, Vol. 1, pp.311–314.

Fellah, A. and Abaza, M. (1999) ‘On page blocks in distributed virtual memory systems’, IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, pp.605–607.

Geva, M. and Wiseman, Y. (2007) ‘Distributed shared memory integration’, IEEE International Conference on Information Reuse and Integration, pp.146–151.

Hsu, M. and Tam, V. O. (1989) ‘Transaction synchronization in distributed shared virtual memory systems’, Proceedings of the 13th Annual International Computer Software and Applications Conference COMPSAC’89, pp.166–175.

Iftode, L. and Singh, J.P. (1999) ‘Shared virtual memory: progress and challenges’, Proceedings of the IEEE, Special Issue on Distributed Shared Memory, Vol. 87, No. 3, pp.498–507.

Jun, W.C. (2009), ‘The research on the dynamic paging algorithm based on working set’, Second International Conference on Future Information Technology and Management Engineering, pp.396–399.

Kermarrec, Y. and Pautet, L. (1994) ‘Integrating page replacement in a distributed shared virtual memory’, Proceedings of the 14th International Conference on Distributed Computing Systems, pp.355–362.

Li, K. (1986) Shared Virtual Memory on Loosely Coupled Multiprocessors, PhD Thesis, Dept. of Computer Science, Yale University, New Haven.

Li, K. (1988) ‘IVY: a shared virtual memory system for parallel computing’, International Conference on Parallel Processing, Vol. 2, pp.94–101.

Li, K. and Hudak, P. (1989) ‘Memory coherence in shared virtual memory systems’, ACM Trans. Comput. Syst., Vol. 7, No. 4, pp.321–359.

Malkawi, M., Knox, D. and Abaza, M. (1991) ‘Dynamic page distribution in distributed virtual memory systems’, Proceedings of the Forth ISSM International Conf. on Parallel and Distributed Computing and Systems, pp.87–91.

Nuttall, M. (1996) ‘Survey of systems providing process or object migration’, Operating Systems Review, Vol. 28, pp.64–80.

Qanzu’a, G.E. (1996) Practical Enhancements of Distributed Virtual Memory, MS Thesis, Jordan University of Science and Technology.

Silberschatzs, A., Galvin, P. and Gagne, G. (2005) Operating System Concepts, 7th ed., John Wiley and Sons.

Subashini, G. and Bhuvaneswari, M.C. (2012) ‘Task allocation in distributed computing systems using adaptive particle swarm optimisation’, Int. J. of Computer Applications in Technology, Vol. 44, No. 4, pp.293–302.

Zhang, X.g, Qu, Y. and Xiao, L. (2000) ‘Improving distributed workload performance by sharing both CPU and memory resources’, Proceedings of the 20th International Conference on Distributed Computing Systems, pp.233–241.

Documents

Ashraf Hasan Bqerat Ahmad Al-Khasawneh, Ayoub Alsarhan and … · 2016. 3. 28. · techniques in page replacement, page out and address translation. If we want to take a sneak peak