springboot 原理解析Optimising Java RMI Programs by Communication
Restructuring
Kwok Cheung Yeung and Paul H.J.Kelly
Department of Computing
Imperial College,London,UK
Abstract.We present an automated run-time optimisation framework that can
improve the performance of distributed applications written using Java RMI whilst
preserving its semantics.
Java classes are modified at load-time in order to intercept RMI calls as they
occur.RMI calls are not executed immediately,but are delayed for as long as
possible.When a dependence forces execution of the delayed calls,the aggre-
gated calls are sent over to the remote server to be executed in one step.This
reduces network overhead and the quantity of data sent,since data can be shared
between calls.The sequence of calls may be cached on the server side along with
any known constants in order to speed up future calls.A remote server may also
make RMI calls to another remote server on behalf of the client if necessary.
Our results show that the techniques can speed up distributed programs signifi-
cantly,especially when operating across slower networks.We also discuss some
of the challenges involved in maintaining program semantics,and show how the
approach can be used for more ambitious optimisations in the future.
1Introduction
Frameworks for distributed programming such as the Common Object Resource Broker Architecture(C
ORBA)[9]and Java Remote Method Invocation(RMI)[12]aim to pro-vide a location-transparent object-oriented programming model,but do not completely succeed since the cost of a remote call may be several orders of magnitude greater than a local call due to marshalling overheads and relatively slow network connections. This means that developers must explicitly code with performance in mind,leading to reduced productivity and increased program complexity.
The usual approach to optimising distributed programs in general has been to op-timise the connection between the communicating hosts,fine-tuning the remote call mechanism and the underlying communication protocol to cut the overhead for each call to a minimum.Although this leads to a general speed-up,it does not help the per-formance of programs that are slow due to their using manyfine-grained methods instead of a few coarse-grained methods).Our approach towards solving this problem has been to consider all communicating nodes as part of one large program, rather than many disjoint ones.
We delay the execution of remote calls on the client for as long as possible until a dependency on the delayed calls blocks further progress.At this point,the delayed calls are executed in one step,after which the blocked operation may proceed.By delaying
the execution of remote calls,we build up a knowledge of the context in which calls were made on the client.This enables us tofind opportunities for optimisations between calls that would have been lost had the calls been executed immediately.
1.1Contributions
–We present an optimisation tool which can improve performance of Java/RMI ap-plications by combining static analysis of application bytecode with run-time opti-misation of sequences of remote operations.This tool operates on unmodified Java RMI applications,and runs on a standard JVM.
–By aggregating sequences of remote calls to the same server,the total number of message exchanges is reduced.By avoiding redundant parameter and result trans-fers,total amount of data transferred can also be reduced.When calls to different servers are aggregated together,results can be forwarded directly from one server to another,bypassing the client in some cases.
–We show how run-time overheads can be reduced by caching execution plans at the servers.
–We demonstrate the use of the tool using a number of examples.
The framework presented here provides the basis for a programme of research aimed at extending ag
gressive optimisation techniques across distributed systems,and deploying the results in large-scale industrial systems.We conclude with a discussion of the potential for the work,and the challenges that remain.
1.2Structure
We begin in Section2with a discussion of related work.We then cover the runtime optimisation framework used to implement our optimisations at a high-level in Section 3.We proceed to cover the optimisations performed in Section4,and the challenges involved in maintaining the semantics of the original application in Section5.We then present some performance results in Section6andfinish off with some suggestions for future work in Section7and conclude in Section8.
2Related Work
Most work on optimising RMI has concentrated on reducing the run-time overhead of each remote call by reducing the amount of work done per-call or by using more lightweight network protocols.Examples include the UKA serialisation work[14], KaRMI[13],and R-UDP[10].Similar work has been done on CORBA by Gokhale and Schmidt[7].
Asynchronous RPC[11,15]aims to overlap client computation with communica-tion and remote execution,replacing results with‘promises’,which block the client only when actually used.
A more ambitious approach is the concept of caching the state of a remote-object locally[10].This works well provided that most operations on cached objects are reads.
However,a write operation incurs high penalties for all users of the cached object, since the client has to wait for invalidation of all copies of the object tofinish before proceeding.Thefirst request for invalidated data will also incur an extra delay as the server fetches it from the client that performed the last update.
A later implementation of remote-object caching[5]implements the notion of re-duced objects where only a subset of the remote-object state is cached on the server. The subset that is cached depends on the properties of the invoked methods—e.g.if a called method only accesses immutable variables,then those variables can be cached on the client without needing to deal with consistency issues.
Neither of these approaches to RMI optimisation conflict with our aggregation op-timisations,and although we have not done so ourselves,these optimisations could theoretically be combined.It may be
argued that our optimisations are made redundant under certain if the aggregated calls are cached locally).
The concept of aggregating numerous small operations into a single larger operation is very old,and appears in numerous other contexts,especially in the hardware domain. In the context of RPC mechanisms,concepts such as stored procedures in database sys-tems or commands in IBM’s San Francisco[3]project are also capable of aggregating calls,but these are explicit mechanisms.Implicit call aggregation is much rarer and harder to implement.One example would be the concept of batched futures[2]in the Thor database system.
3The Veneer Framework
The RMI optimisations are based on top of Veneer,which is a generalised framework that we have developed for the purpose of easing the development of run-time optimisa-tion techniques.This framework is written in standard Java,using the BCEL[4]library for bytecode generation and the Soot[16]library for program analysis.Veneer is not tied to any particular JVM implementation,which is essential since it is likely be used in a heterogeneous environment.We refer to Veneer as a‘virtual JVM’,since it behaves like a highly configurable Java virtual machine,without actually being one.
The framework presents a simplified model of the Java run-time environment,work-ing with what appears to be a simple interpreter,called an executor.A basic executor is shown in Figure1,which executes a method with no modifications whatsoever.
When a method that we are interested in is called,control passes to our executor instead of the original method.The executor is initialised with an execution plan,which is essentially a control-flow graph of the method,with executable code-blocks forming the nodes.The executor sits in a loop which executes the current block,then sets the current block to the next block in line to be executed.
The power of this framework lies in the fact that the plan is afirst-order object that we can change while the executor is still running,effectively modifying the code that will be executed.The executor has full control over the process of method execution between blocks,such that we can perform operations such as jumping to arbitrary code-blocks,modifying local variables or timing operations if necessary.
We minimise the interpretive overhead by delegating as much work as possible to the underlying JVM,and by making the code-blocks as coarse as possible.There is also
an option to permit blocks to run continuously without returning to the executor,though certain block types will always force a return.
The mapping of byte-code to code-blocks in the plan and the methods affected by our framework are determined by a plug-in policy class.The policy class also contains numerous call-back methods that are invoked on certain events,such as the initial load-ing of a class.
public class BasicExecutor extends Executor{
public int execute()throws Exception{
while(block!=null
&&!lockWasReleased()){
int next=-1;
try{
ute(this);
Block(next);
}catch(ExecuteException e){
//Pass control to exception handler
ExceptionHandler(e);
//Propagate exception if no handler
if(block==null)
throw e.getException();
locals[1]= e.getException();
}
}
return next;
}
}
Fig.1.Structure of a basic executor
4Optimisations
In this section we detail the RMI optimisations that have been implemented.The exam-ples used to illustrate the optimisations are deliberately simplified for clarity.
4.1Call Aggregation
Delaying calls to form call aggregates is the core technique upon which this project is based.It is an important optimisation in its own right,and furthermore can also open up further optimisation opportunities.For example,consider the following code fragment:
void m(RemoteObject r,int a){
int x =r.f(a);
int y =r.g(x);
int z =r.h(y);
System.out.println(z);
}
This program fragment incurs three remote method calls,with six data transfers.However,for this example,we can do better:
–Since all three calls are to the same remote object,they can be aggregated into a single large call,such that the number of times that call overhead is incurred is reduced to one (see Figure 2).
–x is returned as the result of the call to f from the remote server,but is subsequently passed back to it during the next call.The same occurs with the variable y .If the values of x and y were retained by the remote object between remote method calls,then the number of communications could be reduced from six to four.
–The variables x and y are unused by the client except as arguments to remote calls on the remote object from which they originated.x and y may therefore be considered as dead variables from the client’s point of view,and there is no need for their value to be passed back to the client at all,thereby further reducing the total number of remote transactions down to just two messages with payloads of size int .
With call aggregation No call aggregation Fig.2.Example of call aggregation
Client-side Implementation We have created a Veneer policy that only affects meth-ods that are statically determined to contain potentially remote method calls.Calls are deemed to be potentially remote if they are invoked via an interface,and i.RemoteException or one of its super-classes on the throw list.A run-time check is later used to ensure that the potential remote call is actually re-mote.Note that it is not sufficient just to check that the receiver of the call implements
The client runs under the control of the Veneer framework using this policy.If the executor encounters a confirmed remote call during the course of execution,then it places the call within a queue and proceeds to the next instruction.Sequences of ad-jacent calls to the same remote object are grouped together into remote plans.Remote plans also contain metadata regarding the calls,such as variable liveness and data de-pendencies.Calls to other remote objects will not force execution unless the target of the call is defined by a previous delayed call,leading to a control dependency.However, even this condition is relaxed by server forwarding,detailed in Section4.2.
When a non-remote block is encountered with delayed calls remaining in the queue, a decision has to be made whether or not to force execution of the calls.In general,it is safe to execute the current block without forcing if there are no dependencies be-tween the current instruction and the delayed operations.If dependencies exist or if it is impossible to tell,then we must force execution.
We detect data dependencies by noting attempts to access data returned by RMI calls.Since the results of RMI calls are constructed by deserialising the data returned by the server,there can be no other references to the returned data except for the local that the result of the remote call was placed in.We therefore regard local code that accesses locals that should contain the results of RMI calls as being dependent on the delayed calls.
This scheme is rather conservative,such that even simple assignments from one lo-cal variable to another can force the execution of the delayed plans.We hope to improve this in the future using improved static analysis.Also,it cannot detect indirect data de-pendencies—for example,if the RMI call modifies a remote database which the client proceeds to access using another API,then that access will go unnoticed.
When executing local code in the presence of delayed remote calls,we must ensure that the variables used by the delayed calls are not overwritten or modified by the local code.This is done by making a copy of all locals supplied to the delayed calls that may be touched by the local code.
On forcing execution,the queue of delayed remote plans is traversed,with plans being sent one-by-one,along with the set of data used by the plan,to the corresponding remote proxy on the server-side via standard RMI invocation to be executed.The proxy call may either return successfully or throw an exception.
If the call returns successfully,then the variables defined by the plan that are still live are copied back into the locals set of the executing method.If an exception was thrown,then the executor goes through the normal process offinding a handler for the exception within the method,and propagating it up the call chain if one is not found.
The same Veneer policy also runs a remote proxy server on startup,whichfirst registers itself in a naming service via JNDI.The proxy keeps track of all remote objects present on the JVM by inserting a small callback into the constructors of all remote classes at load time1.
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论