JaSkel A Java Skeleton-Based Framework for--688IT编程网

Abstract —This paper presents JaSkel, a skeleton-based framework to develop parallel and distributed applications. The framework provides a set of Java abstract classes as a skeleton catalog, which implements recurring parallel interaction

paradigms. This approach aims to improve the programmer’s

productivity and the code efficiency and portability. It also helps to structure scalable applications through the refinement and composition of skeletons. Evaluation results show that the use of skeletons do contribute to improve both the development time and the execution performance of an application. Index Terms —Parallel programming, Distributed

programming, Programming environments, Skeletons, Java

I. I NTRODUCTION

LUSTER and Grid computing environments require

adequate tools to structure scalable applications, taking

advantage of the underlying multi-layer architecture, namely a

grid of clusters of shared memory multi-core processing

nodes.

Skeleton based tools are especially attractive for these

environments, since they provide a structured way to develop

scalable applications, supporting refinement or composition of

base skeletons. This work addresses the use of skeletons

applied to an object-oriented environment, Java.

A skeleton catalog can be a collection of code templates

implemented as an abstract class library. The catalog aims to

help the programmer to create Java code for a

parallel/distributed computing platform. Skeletons are

abstractions modeling a common, reusable parallelism

exploitation pattern [1][2]. A skeleton may also be seen as a

high order construct (i.e. parameterized by other pieces of

code) which defines a particular parallel behavior.

The computing development environment presented here contains a skeleton-based framework, JaSkel, as a support

structure where software projects can be organized and

developed. Programmers select appropriate skeletons and their

grouping and fill in the gaps with pieces of domain-specific code. The environment generates the source code with all the relevant components for scalable computing environments, including the run-time adaptive load distribution.

This work differs from other research environments [3][4]

This work was supported in part by PPC-VM project POSI/CHS/47158/2002, funded by Portuguese FCT (POSI) and by European funds (FEDER).

in the way it uses different and orthogonal components for

distinct tasks: a skeleton-based framework to structure scalable applications (which may use one or more processors), a source code generator which supports distribution of selected object classes and an adaptive run-time load and data scheduler. The independence between these components let programmers develop, test and run structured applications in a non-distributed environment; it also allows the efficient use of skeletons on distributed and shared memory architectures, including clusters of SMP with multi-core processor chips. JaSkel also exhibits a completive advantage over other skeleton frameworks: JaSkel is an inheritance based framework. Domain-specific code is specified by refining the provided skeletons, implementing abstract methods. This structure overcomes the lack of extensibility of other skeleton frameworks, since the provided skeletons can be modified by

inheritance. The framework itself is organized as a hierarchy

of classes, e.g., a concurrent farm derives from a sequential

farm and a demand driven farm derives from a concurrent

farm.

More complex scalable structures can be obtained through a

free composition of supplied skeletons such as multi-level

farms or farms of pipelines.

The description of the code generator and run-time load

distribution tools is out of the scope of this presentation.

The remainder of this paper is organized as follows. Section

2 presents related work. Section

3 describes the

skeleton-based Java framework built to help programmers to

structure scalable applications. Section 4 presents qualitative

and performance evaluation of skeletons from the

skeleton-based framework. The last section concludes this

report with a discussion about the obtained results and draws

some conclusions on the work done so far.

II. R ELATED W ORK

Several skeleton based systems have been proposed and most support skeleton composition (e.g., nesting) [5][6][2]. The most relevant Java environments for parallel programming based on skeletons are Lithium [4] and CO 2P 3S

[3]. Lithium supports skeletons for pipeline, farm and divide & conquer, as well as other low level skeletons (map,

composition). CO 2P 3S is based on generative patterns, where

skeletons are generated and the programmer must fill the provided hooks with application specific functionality.

JaSkel: A Java Skeleton-Based Framework for Structured Parallel and Distributed Computing

João Ferreira, João Sobral, and Alberto Proença

The main differences between JaSkel and these approaches are:

- our approach provides separate tools to structure parallel

applications and to implement communication and distribution code;

- JaSkel explores class hierarchy and inheritance along with object composition.

A skeleton hierarchy overcomes the main limitations of other alternatives, supporting refinement of the provided skeletons, including the implementation of non-pure functional skeletons (e.g., skeletons that maintain state between invocations).

III. A JAVA SKELETON-BASED FRAMEWORK

Many parallel algorithms share the same generic patterns of computation and interaction. Skeletal pro

gramming proposes that such patterns be abstracted and provided as a programmer's toolkit. We call these abstractions algorithmical skeletons, parallel skeletons or simply skeletons.

We define parallel skeletons as abstractions modeling a common, reusable parallelism pattern. A skeleton may also be seen as a high order construct (i.e. parameterized by other pieces of code and other parameters) which defines a particular parallel behavior.

Skeletons provide simple interfaces to programmers with an incomplete structure that can be parameterized by the number of processors, domain-specific code or data distribution; programmers can focus on the computational side of their algorithms rather than the control of the parallelism. Since the lower level operations are hidden, programmers' productivity increases.

Skeleton-based programs are smaller, easier to read and maintain, and less prone to error. Since many parallel applications share the same interaction patterns these properties make skeletons a potential tool for code reusability. Skeletons also promote code portability, since the same skeleton can be used in different architectures: the skeleton implementation can be optimized for a particular target platform. This way no source code modification is required to move into a different architecture.

Skeleton based approaches promote a more orthogonal approach to parallel computing since paralleli

sm related issues are delegated to the skeleton implementation. Application specific code completes the skeleton structure. Moving from one parallelization strategy can be as simple as changing the skeleton base classes.

JaSkel supports skeleton refinement through skeleton hierarchies, which can be used to develop more application specific skeletons.

There is always a trade-off between these benefits and the resulting performance. However, the skeletal approach provides programmers an easy path to optimize the computational part of their algorithms, and skeletons may be carefully tuned to run efficiently in the underlying architecture.

All these issues – productivity, reusability, portability and orthogonality – are illustrated bellow, while the efficiency of this framework (and the associated environment) is evaluated in the following section.

A.Common skeletons

Skeletons can be divided in two main classes: data parallel skeletons and task parallel skeletons. Data parallel skeletons are based on a distributed data structure, where data is distributed among several processors; each processor usually executes the same code on the different data blocks. Task parallel

skeletons are based on the distribution of the execution of independent tasks on several processors. The more common skeletons for scalable applications are bellow: 1) Farm: The Farm skeleton is a data parallel skeleton and it consists of a master entity and multiple workers. The master decomposes the input data in smaller independent data pieces and send them to each worker. Workers process the data and send their result to the master, which merges into the final result.

Farm skeletons may use a static or dynamic load-distribution approach. In the first case, all data is distributed at the beginning of the computation. This strategy is suitable for homogeneous environments and for regular problems. The other approach is better to unbalanced problems or heterogeneous environments.

There is an interesting farm skeleton variation with dynamic load-distribution, where data is sent only when workers request them: Dynamic Farm or Farm-on-Demand. This variation improves efficiency in heterogeneous environments, when there is a large number of data pieces, reducing workers idle time; however, communication costs may increase and performance may degrade.

A single master can be a bottleneck for a large number of workers, but skeletons can be tuned to handle these limitations; a farm skeleton can, for instance, use several masters to improve performance.

2) Pipeline: The Pipeline skeleton is a task parallel skeleton and it corresponds to the well known functional composition. The tasks of the algorithm are serially decomposed and each processor executes a task. Each processor/task is usually called a stage.

In most cases, input data are sent to the first stage and then flow between the adjacent stages of the pipeline. The computation ends when the last stage ends processing. However, the initial input data can also be decomposed in smaller blocks; which uses more efficiently the workers, since the pipeline can process different data blocks at the same time.

3) Divide-and- conquer: The Divide-and-Conquer skeleton corresponds to the well known sequential algorithm with the same name. Basically, a problem is divided in subproblems and each of these subproblems is solved independently. Subproblems are independent from each other and they can be concurrently solved in different processors. The results of

each subproblem are combined to get the final result.

4) Heartbeat: The Heartbeat skeleton models a very common pattern present in many parallel algorithms: data are spread among workers, each updates a particular part and new data values depend on values held by other workers. Each worker behaves like the beating of a heart: gathers new

data, processes it and sends it out; then repeat all over [7]. Heartbeat is appropriate for iterative algorithms and it is a communication-intensive skeleton.

5) Skeleton composition: Conceptually, skeletons may be composed in order to get different interaction patterns. If a worker of a farm can be expressed as a heartbeat skeleton, then it seems appropriate to write it using the heartbeat skeleton.

B.JaSkel framework

The current JaSkel prototype (JaSkel 1.01) successfully evaluated skeletons for farm and pipeline parallel coding. Later versions will be extended to include the other parallel skeletons or to improve current skeletons.

To write a scalable application using JaSkel, a programmer must perform the following steps:

1) To structure the parallel program and to express it using

the available skeletons;

2) To refine the supplied abstract classes and write the

domain-specific code used as skeleton parameters;

3) To write the code that starts the skeleton, defining other

relevant parameters (the number of processors, the load

distribution policy, ...).

The current JaSkel prototype provides the programmer different versions of the farm and the pipeline skeletons:

- a fully sequential farm;

- a concurrent farm that creates a new thread for each worker;

- a dynamic farm, which sends only data to workers when

they require it;

- a fully sequential pipeline;

- a concurrent pipeline, which creates a new thread for

each data flow.

The current version does not provide explicit support for divide & conquer and heartbeat skeletons, although these can be developed through refinement of the farm skeleton. A divide & conquer skeleton can be implemented by a multi-level farming, where farming levels are created on demand. A heartbeat skeleton can be implemented by a repeating farming, where workers can keep data between iterations.

A JaSkel skeleton is a simple Java class that implements the Skeleton interface and extends the Compute class. The Skeleton interface defines a method eval that must be defined by all the skeletons. This method starts the skeleton activity.

To create objects that will perform domain-specific computations, the programmer must create a subclass of class Compute (inspired in Lithium). The Compute abstract class defines an abstract method public abstract Object compute(Object input) that defines the domain-specific computations involved in a skeleton.

For instance, to create a Farm, a programmer needs to perform the following steps (suggested by the supplied skeleton template):

1) To create the worker's class, which is a subclass of

Compute;

2) To define the worker's inherited method public Object

compute(Object input);

3) To create the master's class which is a subclass of Farm;

4) To define the methods public Collection split(Object

initialTask) and public Object join(Collection partialResults);

5) to create a new instance of the master's class and call the

method eval; this method will basically perform the

following steps:

- it creates multiple workers;

it splits the initial data using the defined split method;

- it calls compute method from each worker with the

pieces of data returned by method split;

- it merges the partial results using the defined join

method.

Figure 1 shows the Farm skeleton UML class diagram. Some methods were omitted.

The specialization or the creation of a new skeleton is done by class refinement. Figure 2 illustrates the concurrent farm skeleton UML class diagram, which extends the sequential farm skeleton.

Fig. 1. The sequential farm skeleton implements the Skeleton interface and extends the Compute class to allow skeleton composition.

Either the skeletons or the entities that will perform domain-specific code extend the class Compute . Figure 3 illustrates the pipeline skeleton, which also extends the Compute class.

JaSkel skeletons are also subclasses of Compute class to allow composition. The method public Object compute(Object input) on skeletons usually calls the eval method to start the skeleton activity.

C. Building JaSkel applications The best way to show how to build a skeleton-based application is through an example: to find and count all prime numbers up to N .

One solution (from v/dads/HTML/sieve.html) begins with an (unmarked) array of integers from 2 to N ; the first unmarked integer, 2, is the first prime; mark every multiple of this prime; repeatedly take the next unmarked integer as the next prime and mark every multiple of the prime. This algorithm was devised by Eratosthenes of Cyrene (276 BC - 194 BC).

A Java implementation that codes this algorithm marks the multiples, setting them to 0. This implementation consists of two entities: a number generator and a prime filter. The first generates the input integer array [2..N] and the latter filters the non-prime integers.

A prime filter has a list with the primes from 2 to sqrt(N) (called filter) and every integer n from the input array will be divided by each prime of this list; if it does not find any divisor, then n is prime.

This algorithm can be parallelized in different ways:

- as a simple farm: the input array is decomposed in smaller pieces, and each piece is sent to a prime filter; each prime filter will test the input integers using the filter [2..sqrt(N)]; - as a simple pipeline: each prime filter constitutes a

pipeline stage and uses a filter with a distinct set of

values in the range [2..sqrt(N)]; the input data is sent to the first pipeline stage and then flows between the adjacent stages; when it reaches the end, all the non-primes integers were filtered.

- as a composition of farms and/or pipelines

The examples bellow show how the JaSkel framework codes this algorithm using different scalable structures. The implementation will count the primes up to 10,000,000, and the code was slightly modified to improve readability.

1) Primes sieve as a farm: the prime filter (the farm worker) is illustrated in Code 1. Its main method is filter (line 07), which filters the given integer array. The compute method (lines 10-12), needed to define the skeleton's domain-specific code, delegates its job to the method filter . Note that the class FarmPrimeFilter is a subclass of Compute (line 01).

Code 2 illustrates the class GeneratorFarm (the farm

master): it extends the skeleton FarmConcurrent (line 01) and it uses private methods to implement the method split (lines 5-7) and the method join (lines 8-10). 01 public class FarmPrimeFilter extends Compute {

02 int[] myPrimes; // buffer to hold primes already calculated 03 …

04 public void init(int minP, int maxP) {

05 … // computes primes between MinP and MaxP,

updates myPrimes list 06 }

07 public synchronized int[] filter(final int[] num) {

08 … // removes non-primes from the list using myPrimes list 09 }

10 public Object compute(Object input) { 11 return this.filter((int[]) input); 12 }

Code. 1.The FarmPrimeFilter class.

Fig. 3. Pipeline Skeleton; it implements the skeleton interface and extend the Compute class to allow skeleton composition.

Fig. 2. The concurrent farm skeleton extends the farm skeleton, rewriting the

eval method

Code 3 shows the code that connects these entities:

- it creates several prime filter objects (lines 8-12) and

initializes each filter (method init );

- it creates a new farm generator object, setting its

parameters: the workers, and input data (line 13);

- it starts the skeleton activity, calling method eval (line

14);

- it gets the final result, using method getResult (line 15).

2) Primes sieve as a farm of farmers: A two-level farming can be easily created with loop nesting: only a few modifications are required in lines 8-11 to create this hierarchy (Code 4). The inner loop in lines 05-09 creates each inner farm in the same way as the previous example. The outer loop in lines 03-12 creates the main farm, where each worker is also a farm.

3) Primes sieve as a pipeline: the prime filter that is illustrated for a pipeline is identical to a worker in a farming (Code 1). The only difference between the farm and the pipeline prime filter is that the first is a subclass of Compute , and the latter is a subclass of PipelineWorker . The PipelineWorker class is a subclass of Compute , but it defines

and implements three new methods:

- setNextWorker , which sets the pipeline's stages; - sendNext , which sends data to the next stage;

- eval , which calls method compute and makes the data flow between the adjacent stages.

The pipeline class generator is defined in the same way as the farm class generator (Code 2), but it extends the PipelineConcurrent class. Code 5 shows the main code differences between a pipeline and a farming approach (Code 3): - a list of prime filters is created; all the filters are different and disjoint (lines 09-10); - it creates a new pipeline generator object, setting the parameters stages list and input data (line 13);

4) Primes sieve as a pipeline of farms: combining a pipeline with a farming is straightforward. Code 6 presents a code equivalent to code 4 (a two level farm), now using a pipeline of farms. The most relev

ant changes are in bold: the outer loop creates a set of pipeline stages where each stage is a concurrent farm.

IV. P ERFORMANCE EVALUATION

This section aims to show that the benefits of the skeleton based framework did not impose a performance degradation.

The first part presents execution times for the previous case study – prime number sieve – while at the end the

performance of a reference algorithm and its implementation is evaluated – a parallel ray tracer, from the Java Grande Forum [8] –. Figure 4 compares execution times and speed-ups for the two basic versions of farm and pipeline prime sieve presented … 05 int range = min / nstages

07 08 for(int j=0; j<nstages; j++) {

09 PipePrimeFilter pf = new PipePrimeFilter(); 10 filter.init(j * range + 1, (j + 1) * range), 11 12 }

13 GeneratorPipeline g = new GeneratorPipeline(v, ar); 14

Code. 5. The main pipeline code.

…

int range = min / nstages Vector vout = new Vector();

for(int i=0; i<nstages ; i++) { // for each pipeline stage Vector vin = new Vector();

for(int j=0; j<nworkers ; j++) { // creates workers of each stage FarmPrimeFilter pf = new FarmPrimeFilter(); pf.init(i * range + 1, (i + 1) * range), vin.add(pf); }

GeneratorFarm gin = new GeneratorFarm(vin, null); vout.add(gin); }

GeneratorPipeline g = new GeneratorPipeline(vout, ar); …

Code. 6. A pipeline of farms.

01 public class GeneratorFarm extends FarmConcurrent {

02 public GeneratorFarm(Collection workers, Object inputTask) { 03 super(workers, inputTask); 04 }

05 public Collection split(Object initialTask) { 06 return(Packs.split((int[])initialTask,blocksize)); 07 }

08 public Object join(Collection partialResults) { 09 return(Packs.join((Vector) partialResults)); 10 } 11 }

Code. 2. The generator farm class.

01 …

02 Vector vout = new Vector (); 03 for(int i=0; i<outerworkers; i++) { 04 Vector vin = new Vector ();

05 for(int j=0; j<innerworkers; j++) {

06 FarmPrimeFilter pf = new FarmPrimeFilter(); 07 pf.init(1, max); 08 vin.add(pf); 09 }

10 GeneratorFarm gin = new GeneratorFarm(vin, null); 11 vout.add(gin); 12 }

13 GeneratorFarm g = new GeneratorFarm(vout, ar); 14 …

Code. 4. A two level farm.

int nprocess; // environment specified number of workers 02 int max = 10000000; // max prime

03 int sMax = (int) Math.sqrt(max); // square root of the max prime

04 int [] ar = new int[(max-sMax+1)/2]; // buffer to hold odd numbers to filter 05 int min = sMax+1;

06 for(int i=min; i<=max; i+=2) ar[(i-min)/2]=i; //place odd numbers in the list 07 Vector v = new Vector(); 08 for(int j=0; j<nprocess; j++) {

09 FarmPrimeFilter pf = new FarmPrimeFilter(); // create filter 10 pf.init(1, max); // initialize the filter

11 v.add(pf); // store filter reference into vector v 12 }

13 GeneratorFarm g = new GeneratorFarm(v, ar); // create framer 14 g.eval(); // starts the farming process 15 Object o = g.getResult(); // get results

System.out.println("Number of Primes: " + Packs.check((int[])o));

Code. 3. The main farm code.

in previous section. These results were collected on a dual Xeon 3.2 GHz 1MB L2 cache computing node, with multi-threading enabled (e.g., with four Intel HT virtual processors), running CentOS 4.1. Presented values are the median value of 5 runs on a dedicated node.

Farming execution times where collected using one pack of numbers per worker, while the pipeline version was tuned to use two packs of numbers per pipeline stage, which allows

expected, due to a better load distribution. In the pipeline a regular block distribution of prime filters among stages is not well balanced (first pipeline stages have heavier workloads). Additionally, there is an initial delay to fill the pipeline and each pack must cross all the pipeline stages, increasing the sequential workload. However, a better tuned pipeline version can achieve an execution time within 3% of the best farming time (3,33 seconds, versus 3,22 seconds, with a pipeline using 4 filters, 16 packs of numbers and setting halving the filter size for the first stage). Future versions of JaSkel skeletons will include support for automatic tuning of these algorithms [9]. Fortunately, with JaSkel these changes from one parallelization to another are straightforward to implement. The second case study analyses the execution time of the parallel RayTracer application from the Java Grande Forum. This application is a typical farm that renders an image of sixteen spheres. Each worker renders an image subset, distributed in a cyclic fashion.

A simplified version of the JaSkel code is presented in Code 7. The render method (line 03) renders a sub-image given by the Interval parameter. The compute method (lines 06-08) was implemented based on this method. The main code follows the farming approach, presented before: the split method (lines 13-19) generates several Interval objects and the join method (lines 21-29) merges the resulting sub-images. Line 35 creates several workers (RayTracer) and lines 38-40 create the farm, start its activity and retrieve the result.

The experimental setup to measure the execution times used

7 computing nodes (described earlier) on a Gigabit interconnected cluster.

Figure 5 compares executions times and speed-ups of three implementations of the RayTracer algorithm (with an image of size 500x500): the original version with mpiJava [10], a version converted to the MPP package (www./~bjornoh/mtj/mpp) and a JaSkel version that also uses MPP. All these implementations place each worker on a separate physical CPU. Speed-ups are relative to

the JGF sequential version. The object distribution in JaSkel follows the technique presented in [11]. The MPP package is fully written in Java (using java.nio) which avoids the instability problems in Java bindings to MPI, with performance values close to MPI implementations.

spring framework是什么框架的

01 public class RayTracer extends Compute {

02 …

03 public int[] render(Interval interval) { // renders the image

04 …

05 }

06 public Object compute(Object o) {

07 return(render((Interval)o));

08 }

09 }

11 public class RayTracerFarm extends FarmConcurrent {

12 …

13 public Collection split(Object initialTask) {

14 Vector v = new Vector();

15 for(int i=0; i<nworkers; i++) {

16 Interval inp = new Interval(…);

17 v.add(inp);

18 }

19 return(v);

20 }

21 public Object join(Collection partialResults) {

22 Vector v = (Vector) partialResults;

23 int[] row = …

24 for(int j=0; j<nworkers; j++) {

25 int p_row_aux[] = (int[]) v.elementAt(j);

26 … // join p_row_aux to row

27 }

28 return(row);

29 }

30 public static void main(String argv[]) {

31 ...

32 Interval interval = new Interval(…);

33 Vector workers = new Vector();

34 for(int i=0; i<nworkers; i++) {

35 RayTracer rt = new RayTracer();

36 workers.add(rt);

37 }

38 RayTracerFarm rf = new RayTracerFarm(workers,interval);

39 rf.eval();

40 int[] row = (int[]) rf.getResult();

41 }

Code. 7. JaSkel RayTracer code.

688IT编程网

JaSkel A Java Skeleton-Based Framework for

发表评论

推荐文章

java正则表达式选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

热门文章

excel文字递增函数公式

数字递增公式

notepad 正则变量运算

C++regex库常用函数及实例

js正则表达式之前瞻后顾与非捕获分组

indesign正则数字和英文之间的空格

C#匹配中文字符串的4种正则表达式分享

PHP正则表达式匹配中文字符

匹配中文汉字的正则表达式介绍

Python正则表达式如何进行字符串替换

orcl中用正则表达式

sql正则表达式excel

dataframe正则表达式

postgress sql正则

el-upload accept 正则表达式

半小时正则表达式

判断科学计数法的正则

根据url判断静态资源的方法

Java正则表达式-匹配正负浮点数

替换模糊匹配正则-hive

最新文章

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

能被5整除的十进制整数的正规表达式

大于0小于等于1的正则表达式

linux grep 26个字母

java pattern 正则表达式

掌握文本编辑器中的搜索和替换技巧

标签列表

688IT编程网

JaSkel A Java Skeleton-Based Framework for

发表评论

推荐文章

java正则表达式 选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

热门文章

excel文字递增函数公式

数字递增公式

notepad 正则变量运算

C++regex库常用函数及实例

js正则表达式之前瞻后顾与非捕获分组

indesign正则数字和英文之间的空格

C#匹配中文字符串的4种正则表达式分享

PHP正则表达式匹配中文字符

匹配中文汉字的正则表达式介绍

Python正则表达式如何进行字符串替换

orcl中用正则表达式

sql正则表达式excel

dataframe正则表达式

postgress sql正则

el-upload accept 正则表达式

半小时 正则表达式

判断科学计数法的正则

根据url判断静态资源的方法

Java正则表达式-匹配正负浮点数

替换模糊匹配正则-hive

最新文章

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

能被5整除的十进制整数的正规表达式

大于0小于等于1的正则表达式

linux grep 26个字母

java pattern 正则表达式

掌握文本编辑器中的搜索和替换技巧

标签列表

java正则表达式选择题

非零金额正则表达式

半小时正则表达式