Abstract —This paper presents JaSkel, a skeleton-based framework to develop parallel and distributed applications. The framework provides a set of Java abstract classes as a skeleton catalog, which implements recurring parallel interaction
paradigms. This approach aims to improve the programmer’s
productivity and the code efficiency and portability. It also helps to structure scalable applications through the refinement and composition of skeletons. Evaluation results show that the use of skeletons do contribute to improve both the development time and the execution performance of an application.  Index Terms —Parallel programming, Distributed
programming, Programming environments, Skeletons, Java
I. I NTRODUCTION
LUSTER  and Grid computing environments require
adequate tools to structure scalable applications, taking
advantage of the underlying multi-layer architecture, namely a
grid of clusters of shared memory multi-core processing
nodes.
Skeleton based tools are especially attractive for these
environments, since they provide a structured way to develop
scalable applications, supporting refinement or composition of
base skeletons. This work addresses the use of skeletons
applied to an object-oriented environment, Java.
A skeleton catalog can be a collection of code templates
implemented as an abstract class library. The catalog aims to
help the programmer to create Java code for a
parallel/distributed computing platform. Skeletons are
abstractions modeling a common, reusable parallelism
exploitation pattern [1][2]. A skeleton may also be seen as a
high order construct (i.e. parameterized by other pieces of
code) which defines a particular parallel behavior.
The computing development environment presented here contains a skeleton-based framework, JaSkel, as a support
structure where software projects can be organized and
developed. Programmers select appropriate skeletons and their
grouping and fill in the gaps with pieces of domain-specific code. The environment generates the source code with all the relevant components for scalable computing environments, including the run-time adaptive load distribution.
This work differs from other research environments [3][4]
This work was supported in part by PPC-VM project POSI/CHS/47158/2002, funded by Portuguese FCT (POSI) and by European funds (FEDER).
in the way it uses different and orthogonal components for
distinct tasks: a skeleton-based framework to structure scalable applications (which may use one or more processors), a source code generator which supports distribution of selected object classes and an adaptive run-time load and data scheduler. The independence between these components let programmers develop, test and run structured applications in a non-distributed environment; it also allows the efficient use of skeletons on distributed and shared memory architectures, including clusters of SMP with multi-core processor chips. JaSkel also exhibits a completive advantage over other skeleton frameworks: JaSkel is an inheritance based framework. Domain-specific code is specified by refining the provided skeletons, implementing abstract methods. This structure overcomes the lack of extensibility of other skeleton frameworks, since the provided skeletons can be modified by
inheritance. The framework itself is organized as a hierarchy
of classes, e.g., a concurrent farm derives from a sequential
farm and a demand driven farm derives from a concurrent
farm.
More complex scalable structures can be obtained through a
free composition of supplied skeletons such as multi-level
farms or farms of pipelines.
The description of the code generator and run-time load
distribution tools is out of the scope of this presentation.
The remainder of this paper is organized as follows. Section
2 presents related work. Section
3 describes the
skeleton-based Java framework built to help programmers to
structure scalable applications. Section 4 presents qualitative
and performance evaluation of skeletons from the
skeleton-based framework. The last section concludes this
report with a discussion about the obtained results and draws
some conclusions on the work done so far.
II. R ELATED W ORK
Several skeleton based systems have been proposed and most support skeleton composition (e.g., nesting) [5][6][2]. The most relevant Java environments for parallel programming based on skeletons are Lithium [4] and CO 2P 3S
[3]. Lithium supports skeletons for pipeline, farm and divide & conquer, as well as other low level skeletons (map,
composition). CO 2P 3S is based on generative patterns, where
skeletons are generated and the programmer must fill the provided hooks  with application specific functionality.
JaSkel: A Java Skeleton-Based Framework for Structured Parallel and Distributed Computing
João Ferreira, João Sobral, and Alberto Proença
C
The main differences between JaSkel and these approaches are:
- our approach provides separate tools to structure parallel
applications and to implement communication and distribution code;
- JaSkel explores class hierarchy and inheritance along with object composition.
A skeleton hierarchy overcomes the main limitations of other alternatives, supporting refinement of the provided skeletons, including the implementation of non-pure functional skeletons (e.g., skeletons that maintain state between invocations).
III.  A JAVA SKELETON-BASED FRAMEWORK
Many parallel algorithms share the same generic patterns of computation and interaction. Skeletal pro
gramming proposes that such patterns be abstracted and provided as a programmer's toolkit. We call these abstractions algorithmical skeletons, parallel skeletons or simply skeletons.
We define parallel skeletons as abstractions modeling a common, reusable parallelism pattern. A skeleton may also be seen as a high order construct (i.e. parameterized by other pieces of code and other parameters) which defines a particular parallel behavior.
Skeletons provide simple interfaces to programmers with an incomplete structure that can be parameterized by the number of processors, domain-specific code or data distribution; programmers can focus on the computational side of their algorithms rather than the control of the parallelism. Since the lower level operations are hidden, programmers' productivity increases.
Skeleton-based programs are smaller, easier to read and maintain, and less prone to error. Since many parallel applications share the same interaction patterns these properties make skeletons a potential tool for code reusability. Skeletons also promote code portability, since the same skeleton can be used in different architectures: the skeleton implementation can be optimized for a particular target platform. This way no source code modification is required to move into a different architecture.
Skeleton based approaches promote a more orthogonal approach to parallel computing since paralleli
sm related issues are delegated to the skeleton implementation. Application specific code completes the skeleton structure. Moving from one parallelization strategy can be as simple as changing the skeleton base classes.
JaSkel supports skeleton refinement through skeleton hierarchies, which can be used to develop more application specific skeletons.
There is always a trade-off between these benefits and the resulting performance. However, the skeletal approach provides programmers an easy path to optimize the computational part of their algorithms, and skeletons may be carefully tuned to run efficiently in the underlying architecture.
All these issues – productivity, reusability, portability and orthogonality – are illustrated bellow, while the efficiency of this framework (and the associated environment) is evaluated in the following section.
A.Common skeletons
Skeletons can be divided in two main classes: data parallel skeletons and task parallel skeletons. Data parallel skeletons are based on a distributed data structure, where data is distributed among several processors; each processor usually executes the same code on the different data blocks. Task parallel
skeletons are based on the distribution of the execution of independent tasks on several processors. The more common skeletons for scalable applications are bellow: 1) Farm: The Farm skeleton is a data parallel skeleton and it consists of a master entity and multiple workers. The master decomposes the input data in smaller independent data pieces and send them to each worker. Workers process the data and send their result to the master, which merges into the final result.
Farm skeletons may use a static or dynamic load-distribution approach. In the first case, all data is distributed at the beginning of the computation. This strategy is suitable for homogeneous environments and for regular problems. The other approach is better to unbalanced problems or heterogeneous environments.
There is an interesting farm skeleton variation with dynamic load-distribution, where data is sent only when workers request them: Dynamic Farm or Farm-on-Demand. This variation improves efficiency in heterogeneous environments, when there is a large number of data pieces, reducing workers idle time; however, communication costs may increase and performance may degrade.
A single master can be a bottleneck for a large number of workers, but skeletons can be tuned to handle these limitations; a farm skeleton can, for instance, use several masters to improve performance.
2) Pipeline: The Pipeline skeleton is a task parallel skeleton and it corresponds to the well known functional composition. The tasks of the algorithm are serially decomposed and each processor executes a task. Each processor/task is usually called a stage.
In most cases, input data are sent to the first stage and then flow between the adjacent stages of the pipeline. The computation ends when the last stage ends processing. However, the initial input data can also be decomposed in smaller blocks; which uses more efficiently the workers, since the pipeline can process different data blocks at the same time.
3) Divide-and- conquer: The Divide-and-Conquer skeleton corresponds to the well known sequential algorithm with the same name. Basically, a problem is divided in subproblems and each of these subproblems is solved independently. Subproblems are independent from each other and they can be concurrently solved in different processors. The results of
each subproblem are combined to get the final result.
4) Heartbeat: The Heartbeat skeleton models a very common pattern present in many parallel algorithms: data are spread among workers, each updates a particular part and new data values depend on values held by other workers. Each worker behaves like the beating of a heart: gathers new
data, processes it and sends it out; then repeat all over [7]. Heartbeat is appropriate for iterative algorithms and it is a communication-intensive skeleton.
5) Skeleton composition: Conceptually, skeletons may be composed in order to get different interaction patterns. If a worker of a farm can be expressed as a heartbeat skeleton, then it seems appropriate to write it using the heartbeat skeleton.
B.JaSkel framework
The current JaSkel prototype (JaSkel 1.01) successfully evaluated skeletons for farm and pipeline parallel coding. Later versions will be extended to include the other parallel skeletons or to improve current skeletons.
To write a scalable application using JaSkel, a programmer must perform the following steps:
1) To structure the parallel program and to express it using
the available skeletons;
2) To refine the supplied abstract classes and write the
domain-specific code used as skeleton parameters;
3) To write the code that starts the skeleton, defining other
relevant parameters (the number of processors, the load
distribution policy, ...).
The current JaSkel prototype provides the programmer different versions of the farm and the pipeline skeletons:
- a fully sequential farm;
- a concurrent farm that creates a new thread for each worker;
- a dynamic farm, which sends only data to workers when
they require it;
- a fully sequential pipeline;
- a concurrent pipeline, which creates a new thread for
each data flow.
The current version does not provide explicit support for divide & conquer and heartbeat skeletons, although these can be developed through refinement of the farm skeleton. A divide & conquer skeleton can be implemented by a multi-level farming, where farming levels are created on demand. A heartbeat skeleton can be implemented by a repeating farming, where workers can keep data between iterations.
A JaSkel skeleton is a simple Java class that implements the Skeleton interface and extends the Compute class. The Skeleton interface defines a method eval that must be defined by all the skeletons. This method starts the skeleton activity.
To create objects that will perform domain-specific computations, the programmer must create a subclass of class Compute (inspired in Lithium). The Compute abstract class defines an abstract method public abstract Object compute(Object input) that defines the domain-specific computations involved in a skeleton.
For instance, to create a Farm, a programmer needs to perform the following steps (suggested by the supplied skeleton template):
1) To create the worker's class, which is a subclass of
Compute;
2) To define the worker's inherited method public Object
compute(Object input);
3) To create the master's class which is a subclass of Farm;
4) To define the methods public Collection split(Object
initialTask) and public Object join(Collection partialResults);
5) to create a new instance of the master's class and call the
method eval; this method will basically perform the
following steps:
- it creates multiple workers;
-
it splits the initial data using the defined split method;
- it calls compute method from each worker with the
pieces of data returned by method split;
- it merges the partial results using the defined join
method.
Figure 1 shows the Farm skeleton UML class diagram. Some methods were omitted.
The specialization or the creation of a new skeleton is done by class refinement. Figure 2 illustrates the concurrent farm skeleton UML class diagram, which extends the sequential farm skeleton.
Fig. 1. The sequential farm skeleton implements the Skeleton interface and extends the Compute class to allow skeleton composition.
Either the skeletons or the entities that will perform domain-specific code extend the class Compute . Figure 3 illustrates the pipeline skeleton, which also extends the Compute  class.
JaSkel skeletons are also subclasses of Compute  class to allow composition. The method public Object compute(Object input) on skeletons usually calls the eval  method to start the skeleton activity.
C. Building JaSkel applications The best way to show how to build a skeleton-based application is through an example: to find and count all prime numbers up to N .
One solution (from v/dads/HTML/sieve.html) begins with an (unmarked) array of integers from 2 to N ; the first unmarked integer, 2, is the first prime; mark every multiple of this prime; repeatedly take the next unmarked integer as the next prime and mark every multiple of the prime. This algorithm was devised by Eratosthenes of Cyrene (276 BC - 194 BC).
A Java implementation that codes this algorithm marks the multiples, setting them to 0. This implementation consists of two entities: a number generator and a prime filter. The first generates the input integer array [2..N] and the latter filters the non-prime integers.
A prime filter has a list with the primes from 2 to sqrt(N) (called filter) and every integer n from the input array will be divided by each prime of this list; if it does not find any divisor, then n is prime.
This algorithm can be parallelized in different ways:
- as a simple farm: the input array is decomposed in smaller pieces, and each piece is sent to a prime filter; each prime filter will test the input integers using the filter [2..sqrt(N)]; - as a simple pipeline: each prime filter constitutes a
pipeline stage and uses a filter with a distinct set of
values in the range [2..sqrt(N)]; the input data is sent to the first pipeline stage and then flows between the adjacent stages; when it reaches the end, all the non-primes integers were filtered.
- as a composition of farms and/or pipelines
The examples bellow show how the JaSkel framework codes this algorithm using different scalable structures. The implementation will count the primes up to 10,000,000, and the code was slightly modified to improve readability.
1) Primes sieve as a farm: the prime filter (the farm worker) is illustrated in Code 1. Its main method is filter  (line 07), which filters the given integer array. The compute  method (lines 10-12), needed to define the skeleton's domain-specific code, delegates its job to the method filter . Note that the class FarmPrimeFilter  is a subclass of Compute (line 01).
Code 2 illustrates the class GeneratorFarm (the farm
master): it extends the skeleton FarmConcurrent (line 01) and it uses private methods to implement the method split  (lines 5-7) and the method join  (lines 8-10). 01  public class FarmPrimeFilter extends Compute {
02  int[] myPrimes; // buffer to hold primes already calculated 03  …
04  public void init(int minP, int maxP) {
05    … // computes primes between MinP and MaxP,
updates myPrimes list 06  }
07  public synchronized int[] filter(final int[] num) {
08    … // removes non-primes from the list using myPrimes list 09  }
10  public Object compute(Object input) { 11    return this.filter((int[]) input); 12  }
Code. 1.The FarmPrimeFilter class.
Fig. 3. Pipeline Skeleton; it implements the skeleton interface and extend the Compute class to allow skeleton composition.
Fig. 2. The concurrent farm skeleton extends the farm skeleton, rewriting the
eval method
Code 3 shows the code that connects these entities:
- it creates several prime filter objects (lines 8-12) and
initializes each filter (method init );
- it creates a new farm generator object, setting its
parameters: the workers, and input data (line 13);
- it starts the skeleton activity, calling method eval  (line
14);
- it gets the final result, using method getResult (line 15).
2) Primes sieve as a farm of farmers: A two-level farming can be easily created with loop nesting: only a few modifications are required in lines 8-11 to create this hierarchy (Code 4). The inner loop in lines 05-09 creates each inner farm in the same way as the previous example. The outer loop in lines 03-12 creates the main farm, where each worker is also a farm.
3) Primes sieve as a pipeline: the prime filter that is illustrated for a pipeline is identical to a worker in a farming (Code 1). The only difference between the farm and the pipeline prime filter is that the first is a subclass of Compute , and the latter is a subclass of PipelineWorker . The PipelineWorker  class is a subclass of Compute , but it defines
and implements three new methods:
- setNextWorker , which sets the pipeline's stages; - sendNext , which sends data to the next stage;
- eval , which calls method compute  and makes the data flow between the adjacent stages.
The pipeline class generator is defined in the same way as the farm class generator (Code 2), but it extends the PipelineConcurrent  class. Code 5 shows the main code differences between a pipeline and a farming approach (Code 3): - a list of prime filters is created; all the filters are different and disjoint (lines 09-10); - it creates a new pipeline generator object, setting the parameters stages list and input data (line 13);
4) Primes sieve as a pipeline of farms: combining a pipeline with a farming is straightforward. Code 6 presents a code equivalent to code 4 (a two level farm), now using a pipeline of farms. The most relev
ant changes are in bold:  the outer loop creates a set of pipeline stages where each stage is a concurrent farm.
IV. P ERFORMANCE EVALUATION
This section aims to show that the benefits of the skeleton based framework did not impose a performance degradation.
The first part presents execution times for the previous case study – prime number sieve – while at the end the
performance of a reference algorithm and its implementation is evaluated – a parallel ray tracer, from the Java Grande Forum [8] –. Figure 4 compares execution times and speed-ups for the two basic versions of farm and pipeline prime sieve presented … 05  int range = min / nstages
06
07  08  for(int j=0; j<nstages; j++) {
09  PipePrimeFilter pf = new PipePrimeFilter(); 10  filter.init(j * range + 1, (j + 1) * range), 11  12  }
13  GeneratorPipeline g = new GeneratorPipeline(v, ar); 14
Code. 5. The main pipeline code.
int range = min / nstages Vector vout = new Vector();
for(int i=0; i<nstages ; i++) {      //  for each pipeline stage  Vector vin = new Vector();
for(int j=0; j<nworkers ; j++) {    // creates workers of each stage  FarmPrimeFilter pf = new FarmPrimeFilter();  pf.init(i * range + 1, (i + 1) * range),  vin.add(pf);  }
GeneratorFarm gin = new GeneratorFarm(vin, null);  vout.add(gin);  }
GeneratorPipeline g = new GeneratorPipeline(vout, ar); …
Code. 6. A pipeline of farms.
01  public class GeneratorFarm extends FarmConcurrent {
02  public GeneratorFarm(Collection workers, Object inputTask) { 03    super(workers, inputTask); 04  }
05  public Collection split(Object initialTask) { 06    return(Packs.split((int[])initialTask,blocksize)); 07  }
08  public Object join(Collection partialResults) { 09    return(Packs.join((Vector) partialResults)); 10  } 11  }
Code. 2. The generator farm class.
01 …
02 Vector vout = new Vector (); 03 for(int i=0; i<outerworkers; i++) { 04  Vector vin = new Vector ();
05  for(int j=0; j<innerworkers; j++) {
06  FarmPrimeFilter pf = new FarmPrimeFilter(); 07  pf.init(1, max); 08  vin.add(pf); 09  }
10  GeneratorFarm gin = new GeneratorFarm(vin, null); 11  vout.add(gin);  12 }
13 GeneratorFarm g = new GeneratorFarm(vout, ar); 14 …
Code. 4. A two level farm.
01
int nprocess; // environment specified number of workers 02  int max = 10000000;  // max prime
03  int sMax = (int) Math.sqrt(max);  // square root of the max prime
04  int [] ar = new int[(max-sMax+1)/2];  // buffer to hold odd numbers to filter 05  int min = sMax+1;
06  for(int i=min; i<=max; i+=2)  ar[(i-min)/2]=i; //place odd numbers in the list 07  Vector v = new Vector(); 08  for(int j=0; j<nprocess; j++) {
09  FarmPrimeFilter pf = new FarmPrimeFilter();  // create filter 10  pf.init(1, max);  // initialize the filter
11  v.add(pf);  // store filter reference into vector v 12  }
13  GeneratorFarm g = new GeneratorFarm(v, ar); // create framer 14  g.eval();  // starts the farming process 15  Object o = g.getResult(); // get results
16
System.out.println("Number of Primes: " + Packs.check((int[])o));
Code. 3. The main farm code.
in previous section. These results were collected on a dual Xeon 3.2 GHz 1MB L2 cache computing node, with multi-threading enabled (e.g., with four Intel HT virtual processors), running CentOS 4.1. Presented values are the median value of 5 runs on a dedicated node.
Farming execution times where collected using one pack of numbers per worker, while the pipeline version was tuned to use two packs of numbers per pipeline stage, which allows
expected, due to a better load distribution. In the pipeline a regular block distribution of prime filters among stages is not well balanced (first pipeline stages have heavier workloads). Additionally, there is an initial delay to fill the pipeline and each pack must cross all the pipeline stages, increasing the sequential workload. However, a better tuned pipeline version can achieve an execution time within 3% of the best farming time (3,33 seconds, versus 3,22 seconds, with a pipeline using 4 filters, 16 packs of numbers and setting halving the filter size for the first stage). Future versions of JaSkel skeletons will include support for automatic tuning of these algorithms [9]. Fortunately, with JaSkel these changes from one parallelization to another are straightforward to implement. The second case study analyses the execution time of the parallel RayTracer application from the Java Grande Forum. This application is a typical farm that renders an image of sixteen spheres. Each worker renders an image subset, distributed in a cyclic fashion.
A simplified version of the JaSkel code is presented in Code 7. The render method (line 03) renders a sub-image given by the Interval parameter. The compute method (lines 06-08) was implemented based on this method. The main code follows the farming approach, presented before: the split method (lines 13-19) generates several Interval objects and the join method (lines 21-29) merges the resulting sub-images. Line 35 creates several workers (RayTracer) and lines 38-40 create the farm, start its activity and retrieve the result.
The experimental setup to measure the execution times used
7 computing nodes (described earlier) on a Gigabit interconnected cluster.
Figure 5 compares executions times and speed-ups of three implementations of the RayTracer algorithm (with an image of size 500x500): the original version with mpiJava [10], a version converted to the MPP package (www./~bjornoh/mtj/mpp) and a JaSkel version that also uses MPP. All these implementations place each worker on a separate physical CPU. Speed-ups are relative to
the JGF sequential version. The object distribution in JaSkel follows the technique presented in [11]. The MPP package is fully written in Java (using java.nio) which avoids the instability problems in Java bindings to MPI, with performance values close to MPI implementations.
spring framework是什么框架的
01 public class RayTracer extends Compute {
02  …
03 public int[] render(Interval interval) {  // renders the image
04  …
05  }
06 public Object compute(Object o) {
07  return(render((Interval)o));
08  }
09 }
11 public class RayTracerFarm extends FarmConcurrent {
12  …
13 public Collection split(Object initialTask) {
14 Vector v = new Vector();
15 for(int i=0; i<nworkers; i++) {
16 Interval inp = new Interval(…);
17    v.add(inp);
18  }
19  return(v);
20  }
21 public Object join(Collection partialResults) {
22 Vector v = (Vector) partialResults;
23 int[] row = …
24 for(int j=0; j<nworkers; j++) {
25 int p_row_aux[] = (int[]) v.elementAt(j);
26 … // join p_row_aux to row
27  }
28  return(row);
29  }
30 public static void main(String argv[]) {
31  ...
32 Interval interval = new Interval(…);
33 Vector workers = new Vector();
34 for(int i=0; i<nworkers; i++) {
35 RayTracer rt = new RayTracer();
36  workers.add(rt);
37  }
38 RayTracerFarm rf = new RayTracerFarm(workers,interval);
39  rf.eval();
40 int[] row = (int[]) rf.getResult();
41 }
Code. 7.  JaSkel RayTracer code.

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。