Hoard:一个动态内存管理库--688IT编程网

Hoard:A Scalable Memory Allocator

for Multithreaded Applications

Emery D.Berger∗Kathryn S.McKinley†Robert D.Blumofe∗Paul R.Wilson∗

∗Department of Computer Sciences

The University of Texas at Austin

Austin,Texas78712 {emery,rdb,wilson}@cs.utexas.edu †Department of Computer Science University of Massachusetts Amherst,Massachusetts01003

mckinley@cs.umass.edu

ABSTRACT

Parallel,multithreaded C and C++programs such as web servers, database managers,news servers,and scientiﬁc applications are be-coming increasingly prevalent.For these applications,the memory allocator is often a bottleneck that severely limits program perfor-mance and scalability on m

ultiprocessor systems.Previous alloca-tors suffer from problems that include poor performance and scal-ability,and heap organizations that introduce false sharing.Worse, many allocators exhibit a dramatic increase in memory consump-tion when confronted with a producer-consumer pattern of object allocation and freeing.This increase in memory consumption can range from a factor of P(the number of processors)to unbounded memory consumption.

This paper introduces Hoard,a fast,highly scalable allocator that largely avoids false sharing and is memory efﬁcient.Hoard is theﬁrst allocator to simultaneously solve the above problems. Hoard combines one global heap and per-processor heaps with a novel discipline that provably bounds memory consumption and has very low synchronization costs in the common case.Our re-sults on eleven programs demonstrate that Hoard yields low aver-age fragmentation and improves overall program performance over the standard Solaris allocator by up to a factor of60on14proces-sors,and up to a factor of18over the next best allocator we tested.

1.Introduction

Parallel,multithreaded programs are becoming increasingly preva-lent.These applications include web servers[35],database man-agers[27],news servers[3],as well as more traditional parallel applicat

ions such as scientiﬁc applications[7].For these applica-tions,high performance is critical.They are generally written in C or C++to run efﬁciently on modern shared-memory multiprocessor This work is supported in part by the Defense Advanced Research Projects Agency(DARPA)under Grant F30602-97-1-0150from the U.S.Air Force Research Laboratory.Kathryn McKinley was supported by DARPA Grant 5-21425,NSF Grant EIA-9726401,and NSF CAREER Award CCR-9624209.In addition,Emery Berger was supported by a Novell Corporation Fellowship.Multiprocessor computing facilities were provided through a generous donation by Sun Microsystems.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proﬁt or commercial advantage and that copies bear this notice and the full citation on theﬁrst page.To copy otherwise,to republish,to post on servers or to redistribute to lists,requires prior speciﬁc permission and/or a fee.

ASPLOS2000Cambridge,MA USA

Copyright2000ACM0-89791-88-6/97/05..$5.00servers.Many of these applications make intensive use of dynamic memory allocation.Unfortunately,the memory allocator is often a bottleneck that severely limits program scalability on multiproces-sor systems[21].Existing serial memory allocators

do not scale well for multithreaded applications,and existing concurrent allo-cators do not provide one or more of the following features,all of which are needed in order to attain scalable and memory-efﬁcient allocator performance:

Speed.A memory allocator should perform memory operations

(i.e.,malloc and free)about as fast as a state-of-the-art se-

rial memory allocator.This feature guarantees good allocator

performance even when a multithreaded program executes

on a single processor.

Scalability.As the number of processors in the system grows,the performance of the allocator must scale linearly with the num-ber of processors to ensure scalable application performance. False sharing avoidance.The allocator should not introduce false sharing of cache lines in which threads on distinct processors

inadvertently share data on the same cache line.

Low fragmentation.We deﬁne fragmentation as the maximum amount of memory allocated from the operating system di-

vided by the maximum amount of memory required by the

application.Excessive fragmentation can degrade perfor-

mance by causing poor data locality,leading to paging. Certain classes of memory allocators(described in Section6) exhibit a special kind of fragmentation that we call blowup.In-tuitively,blowup is the increase in memory consumption caused when a concurrent allocator reclaims memory freed by the pro-gram but fails to use it to satisfy future memory requests.We deﬁne blowup as the maximum amount of memory allocated by a given al-locator divided by the maximum amount of memory allocated by an ideal uniprocessor allocator.As we show in Section2.2,the com-mon producer-consumer programming idiom can cause blowup.In many allocators,blowup ranges from a factor of P(the number of processors)to unbounded memory consumption(the longer the program runs,the more memory it consumes).Such a pathological increase in memory consumption can be catastrophic,resulting in premature application termination due to exhaustion of swap space. The contribution of this paper is to introduce the Hoard allocator and show that it enables parallel multithr

eaded programs to achieve scalable performance on shared-memory multiprocessors.Hoard achieves this result by simultaneously solving all of the above prob-lems.In particular,Hoard solves the blowup and false sharing prob-lems,which,as far as we know,have never been addressed in the

literature.As we demonstrate,Hoard also achieves nearly zero syn-chronization costs in practice.

Hoard maintains per-processor heaps and one global heap.When

a per-processor heap’s usage drops below a certain fraction,Hoard transfers a largeﬁxed-size chunk of its memory from the per-processor heap to the global heap,where it is then available for reuse by an-other processor.We show that this algorithm bounds blowup and synchronization costs to a constant factor.This algorithm avoids false sharing by ensuring that the same processor almost always ,repeatedly malloc s)from a given cache line.Results on eleven programs demonstrate that Hoard scales linearly as the number of processors grows and that its fragmentation costs are low.On14processors,Hoard improves performance over the stan-dard Solaris allocator by up to a factor of60and a factor of18 over the next best allocator we tested.These features have led to its incorporation in a number of high-performance commercial appli-cations,including the Twister,Typhoon,Breeze and Cyclone chat and USENET servers[3]and BEMSolver,a high-performance sci-entiﬁc code[7].

The rest of this paper is organized as follows.In Section2,we explain in detail the issues of blowup and allocator-induced false sharing.In Section3,we motivate and describe in detail the algo-rithms used by Hoard to simultaneously solve these problems.We sketch proofs of the bounds on blowup and contention in Section4. We demonstrate Hoard’s speed,scalability,false sharing avoidance, and low fragmentation empirically in Section5,including compar-isons with serial and concurrent memory allocators.We also show that Hoard is robust with respect to changes to its key parameter. We classify previous work into a taxonomy of memory allocators

in Section6,focusing on speed,scalability,false sharing and frag-mentation problems described above.Finally,we discuss future directions for this research in Section7,and conclude in Section8. 2.Motivation

In this section,we focus special attention on the issues of allocator-induced false sharing of heap objects and blowup to motivate our work.These issues must be addressed to achieve efﬁcient memory allocation for scalable multithreaded applications but have been ne-glected in the memory allocation literature.

2.1Allocator-Induced False Sharing of Heap Objects

False sharing occurs when multiple processors share words in the same cache line without actually sharing data and is a notorious cause of poor performance in parallel applications[20,15,36].Al-locators can cause false sharing of heap objects by dividing cache lines into a number of small objects that distinct processors then write.A program may introduce false sharing by allocating a num-ber of objects within one cache line and passing an object to a dif-ferent thread.It is thus impossible to completely avoid false sharing of heap objects unless the allocator pads out every memory request to the size of a cache line.However,no allocator we know of pads memory requests to the size of a cache line,and with good reason; padding could cause a dramatic increase in memory consumption (for instance,objects would be padded to a multiple of64bytes on a SPARC)and could signiﬁcantly degrade spatial locality and cache utilization.

Unfortunately,an allocator can actively induce false sharing even on objects that the program does not pass to different threads.Ac-tive false sharing is due to malloc satisfying memory requests by different threads from the same cache line.For instance,single-heap allocators can give many threads parts of the same cache line. The allocator may divide a cache line into8-byte chunks.If mul-tiple threads request8-byte objects,the allocator may give each thread one8-byte object in turn.This splitting of cache lines can lead to false sharing.

Allocators may also passively induce false sharing.Passive false sharing occurs when free allows a future malloc to produce false sharing.If a program introduces false sharing by spreading the pieces of a cache line across processors,the allocator may then passively induce false sharing after a free by letting each processor reuse pieces it freed,which can then lead to false sharing.

2.2Blowup

Many previous allocators suffer from blowup.As we show in Sec-tion3.1,Hoard keeps blowup to a constant factor but many existing concurrent allocators have unbounded blowup(the Cilk and STL allocators[6,30])(memory consumption grows without bound while the memory required isﬁxed)or memory consumption can grow linearly with P,the number of processors(Ptmalloc and LKmalloc [9,22]).It is important to note that these worst cases are not just theoretical.Threads in a producer-consumer relationship,a com-mon programming idiom,may induce this blowup.To the best of our knowledge,papers in the literature do not address this prob-lem.For example,consider a program in which a producer thread repeatedly allocates a block of memory and gives it to a consumer thread which frees it.If the memory freed by the consumer is un-available to the producer,the program consumes more and more memory as it runs.

This unbounded memory consumption is plainly unacceptable, but a P-fold increase in memory consumption is also cause for con-cern.The scheduling of multithreaded programs can cause them to require much more memory when run on multiple processors than when run on one processor[6,28].Consider a program with P threads.Each thread calls x=malloc(s);free(x).If these threads are serialized,the total memory required is s.However,if they execute on P processors,each call to malloc may run in paral-lel,increasing the memory requirement to P∗s.If the allocator multiplies this consumption by another factor of P,then memory consumption increases to P2∗s.

3.The Hoard Memory Allocator

This section describes Hoard in detail.Hoard can be viewed as an allocator that generally avoids false sharing and that trades in-creased(but bounded)memory consumption for reduced synchro-nization costs.

Hoard augments per-processor heaps with a global heap that ev-ery thread may access(similar to Vee and Hsu[37]).Each thread can access only its heap and the global heap.We designate heap 0as the global heap and heaps1through P as the per-processor heaps.In the implementation we actually use2P heaps(without altering our analytical results)in order to decrease the probability that concurren

tly-executing threads use the same heap;we use a simple hash function to map thread id’s to per-processor heaps that can result in collisions.We need such a mapping function because in general there is not a one-to-one correspondence between threads and processors,and threads can be reassigned to other processors. On Solaris,however,we are able to avoid collisions of heap assign-ments to threads by hashing on the light-weight process(LWP)id. The number of LWP’s is usually set to the number of processors [24,33],so each heap is generally used by no more than one LWP. Hoard maintains usage statistics for each heap.These statistics are u i,the amount of memory in use(“live”)in heap i,and a i,the amount of memory allocated by Hoard from the operating system held in heap i.

Hoard allocates memory from the system in chunks we call su-perblocks.Each superblock is an array of some number of blocks (objects)and contains a free list of its available blocks maintained in LIFO order to improve locality.All superblocks are the same

is greater than1)and rounding the requested size up to the near-est size class,we bound worst-case internal fragmentation within a block to a factor of b.In order to reduce external fragmentation, we recycle completely empty superblocks for re-use by any size class.For clarity of exposition,we assume a single size class in the discussion below.

3.1Bounding Blowup

Each heap“owns”a number of superblocks.When there is no memory available in any superblock on a thread’s heap,Hoard obtains a superblock from the global heap if one is available.If the global heap is also empty,Hoard creates a new superblock by requesting virtual memory from the operating system and adds it to the thread’s heap.Hoard does not currently return empty superblocks to the operating system.It instead makes these su-perblocks available for reuse.

Hoard moves superblocks from a per-processor heap to the global heap when the per-processor heap crosses the emptiness thresh-old:more than f,the empty fraction,of its blocks are not in use (u i<(1−f)a i),and there are more than some number K of su-perblocks’worth of free memory on the heap(u i<a i−K∗S). As long as a heap is not more than f empty,and has K or fewer su-perblocks,Hoard will not move superblocks from a per-processor heap to the global heap.Whenever a per-processor heap does cross the emptiness threshold,Hoard transfers one of its superblocks that is at least f empty to the global heap.Always removing such a superblock whenever we cross the emptiness threshold main-

following invariant on the per-processor heaps:(u i≥

∗S)∨(u i≥(1−f)a i).When we remove a superblock,

u i by at most(1−f)S but reduce a i by S,thus restor-invariant.Maintaining this invariant bounds blowup to a

factor,as we show in Section4.

ﬁnds f-empty superblocks in constant time by dividing into a number of bins that we call“fullness groups”.

contains a doubly-linked list of superblocks that are in a fullness ,all superblocks that are between3/4and empty are in the same bin).Hoard moves superblocks

group to another when appropriate,and always allocates

superblocks.To improve locality,we order the within a fullness group using a move-to-front heuristic.

we free a block in a superblock,we move the superblock front of its fullness group.If we then need to allocate a block,

be likely to reuse a superblock that is already in memory;

we maintain the free blocks in LIFO order,we are also to reuse a block that is already in cache.

1illustrates,in simpliﬁed form,how Hoard manages su-For simplicity,we assume there are two threads and (thread i maps to heap i).In this example(which reads from

to top right,then bottom left to bottom right),the empty

f is1/4and K is0.Thread1executes code written on

side of each diagram(preﬁxed by“t1:”)and thread2 code on the right-hand side(preﬁxed by“t2:”).Initially,

heap is empty,heap1has two superblocks(one partially

empty),and heap2has a completely-full superblock.

top left diagram shows the heaps after thread1allocates x9

1.Hoard selects the fullest superblock in heap1for

Next,in the top right diagram,thread1frees y4,which superblock that heap2owns.Because heap2is stil

l more /4full,Hoard does not remove a superblock from it.In the

left diagram,thread2frees x2,which is in a superblock

by heap1.This free does not cause heap1to cross the threshold,but the next free(of x9)does.Hoard then moves the completely-free superblock from heap1to the global heap.

3.3Avoiding False Sharing

Hoard uses the combination of superblocks and multiple-heaps de-scribed above to avoid most active and passive false sharing.Only one thread may allocate from a given superblock since a superblock is owned by exactly one heap at any time.When multiple threads make simultaneous requests for memory,the requests will always be satisﬁed from different superblocks,avoiding actively induced false sharing.When a program deallocates a block of memory, Hoard returns the block to its superblock.This coalescing prevents multiple threads from reusing pieces of cache lines that were passed to these threads by a user program,avoiding passively-induced false sharing.

While this strategy can greatly reduce allocator-induced false sharing,it does not completely avoid it.Because Hoard may move superblocks from one heap to another,it is possible for two heaps to sha

re cache lines.Fortunately,superblock transfer is a relatively infrequent event(occurring only when a per-processor heap has dropped below the emptiness threshold).Further,we have observed that in practice,superblocks released to the global heap are often completely empty,eliminating the possibility of false sharing.Re-leased superblocks are guaranteed to be at least f empty,so the opportunity for false sharing of lines in each superblock is reduced. Figure1also shows how Hoard generally avoids false sharing. Notice that when thread1frees y4,Hoard returns this memory to

malloc(sz)

1.If sz>S/2,allocate the superblock from the OS

and return it.

2.i←hash(the current thread).

3.Lock heap i.

4.Scan heap i’s list of superblocks from most full to least

(for the size class corresponding to sz).

5.If there is no superblock with free space,

6.Check heap0(the global heap)for a superblock.

7.If there is none,

8.Allocate S bytes as superblock s

and set the owner to heap i.

9.Else,

10.Transfer the superblock s to heap i.

11.u0←u0−s.u

12.u i←u i+s.u

13.a0←a0−S

14.a i←a i+S

15.u i←u i+sz.

16.s.u←s.u+sz.

17.Unlock heap i.

18.Return a block from the superblock.

degrade

free(ptr)

1.If the block is“large”,

2.Free the superblock to the operating system and return.

3.Find the superblock s this block comes from and lock it.

4.Lock heap i,the superblock’s owner.

5.Deallocate the block from the superblock.

6.u i←u i−block size.

7.s.u←s.u−block size.

8.If i=0,unlock heap i and the superblock

and return.

9.If u i<a i−K∗S and u i<(1−f)∗a i,

10.Transfer a mostly-empty superblock s1

to heap0(the global heap).

11.u0←u0+s1.u,u i←u i−s1.u

12.a0←a0+S,a i←a i−S

13.Unlock heap i and the superblock.

Figure2:Pseudo-code for Hoard’s malloc and free.

y4’s superblock and not to thread1’s heap.Since Hoard always uses heap i to satisfy memory allocation requests from thread i, only thread2can reuse that memory.Hoard thus avoids both active and passive false sharing in these superblocks.

3.4Algorithms

In this section,we describe Hoard’s memory allocation and deallo-cation algorithms in more detail.We present the pseudo-code for these algorithms in Figure2.For clarity of exposition,we omit discussion of the management of fullness groups and superblock recycling.

Allocation

Hoard directly allocates“large”objects(size>S/2)via the virtual memory system.When a thread on processor i calls malloc for small objects,Hoard locks heap i and gets a block of a superblock with free space,if there is one on that heap(line4).If there is not, Hoard checks the global heap(heap0)for a superblock.If there is one,Hoard transfers it to heap i,adding the number of bytes in use in the superblock s.u to u i,and the total number of bytes in the superblock S to a i(lines10–14).If there are no superblocks in either heap i or heap0,Hoard allocates a new superblock and inserts it into heap i(line8).Hoard then chooses a single block from a superblock with free space,marks it as allocated,and returns a pointer to that block.

Deallocation

Each superblock has an“owner”(the processor whose heap it’s in).When a processor frees a block,Hoardﬁnds its superblock (through a pointer in the block’s header).(If this block is“large”, Hoard immediately frees the superblock to the operating system.) Itﬁrst locks the superblock and then locks the owner’s heap.Hoard then returns the block to the superblock and decrements u i.If the heap is too empty(u i<a i−K∗S or u i<(1−f)a i),Hoard transfers a superblock that is at least f empty to the global heap (lines10-12).Finally,Hoard unlocks heap i and the superblock.

4.Analytical Results

In this section,we sketch the proofs of bounds on blowup and syn-chronization.Weﬁrst deﬁne some useful notation.We number the heaps from0to P:0is the global heap,and1through P are the per-processor heaps.We adopt the following convention:capital letters denote maxima and lower-case letters denote current values. Let A(t)and U(t)denote the maximum amount of memory allo-cated and in use by the program(“live memory”)after memory operation t.Let a(t)and u(t)denote the current amount of mem-ory allocated and in use by the program after memory operation t. We add a subscript for a particular ,u i(t))and add a caret

(e.g.,ˆa(t))to denote the sum for all heaps except the global heap.

4.1Bounds on Blowup

We now formally deﬁne the blowup for an allocator as its worst-case memory consumption divided by the ideal worst-case memory consumption for a serial memory allocator(a constant factor times its maximum memory required[29]):

D EFINITION 1.blowup=O(A(t)/U(t)).

Weﬁrst prove the following theorem that bounds Hoard’s worst-case memory consumption:A(t)=O(U(t)+P).We can show that the maximum amount of memory in the global and the per-processor heaps(A(t))is the same as the maximum allocated into the per-processor heaps(ˆA(t)).We make use of this lemma,whose proof is straightforward but somewhat lengthy(the proof may be found in our technical report[4]).

L EMMA 1.A(t)=ˆA(t).

Intuitively,this lemma holds because these quantities are max-ima;any memory in the global heap was originally allocated into a per-processor heap.Now we prove the bounded memory consump-tion theorem:

T HEOREM 1.A(t)=O(U(t)+P).

P ROOF.We restate the invariant from Section3.1that we main-tain over all the per-processor heaps:(a i(t)−K∗S≤u i(t))∨((1−f)a i(t)≤u i(t)).

Theﬁrst inequality is sufﬁcient to prove the theorem.Summing over all P per-processor heaps gives us

ˆA(t)≤

i=1

u i(t)+P∗K∗S def.ofˆA(t)

≤ˆU(t)+P∗K∗S def.ofˆU(t)

≤U(t)+P∗K∗S. ˆU(t)≤U(t)

Since by the above lemma A(t)=ˆA(t),we have A(t)= O(U(t)+P)

Because the number of size classes is constant,this theorem holds over all size classes.By the deﬁni

tion of blowup above, and assuming that P<<U(t),Hoard’s blowup is O((U(t)+ P)/U(t))=O(1).This result shows that Hoard’s worst case memory consumption is at worst a constant factor overhead that does not grow with the amount of memory required by the pro-gram.

Our discipline for using the empty fraction(f)enables this proof, so it is clearly a key parameter for Hoard.For reasons we describe and validate with experimental results in Section5.5,Hoard’s per-formance is robust with respect to the choice of f.

4.2Bounds on Synchronization

In this section,we analyze Hoard’s worst-case and discuss expected synchronization costs.Synchronization costs come in twoﬂavors: contention for a per-processor heap and acquisition of the global heap lock.We argue that theﬁrst form of contention is not a scal-ability concern,and that the second form is rare.Further,for com-mon program behavior,the synchronization costs are low over most of the program’s lifetime.

Per-processor Heap Contention

While the worst-case contention for Hoard arises when one thread allocates memory from the heap

and a number of other threads free it(thus all contending for the same heap lock),this case is not par-ticularly interesting.If an application allocates memory in such a manner and the amount of work between allocations is so low that heap contention is an issue,then the application itself is fun-damentally unscalable.Even if heap access were to be completely independent,the application itself could only achieve a two-fold speedup,no matter how many processors are available.

Since we are concerned with providing a scalable allocator for scalable applications,we can bound Hoard’s worst case for such ap-plications,which occurs when pairs of threads exhibit the producer-consumer behavior described above.Each malloc and each free will be serialized.Modulo context-switch costs,this pattern results in at most a two-fold slowdown.This slowdown is not desirable but it is scalable as it does not grow with the number of processors (as it does for allocators with one heap protected by a single lock). It is difﬁcult to establish an expected case for per-processor heap contention.Since most multithreaded applications use dynamically-allocated memory for the exclusive use of the allocating thread and only a small fraction of allocated memory is freed by another thread [22],we expect per-processor heap contention to be quite low. Global Heap Contention

Global heap contention arises when superblocks areﬁrst created, when superblocks are transferred t

o and from the global heap,and when blocks are freed from superblocks held by the global heap. We simply count the number of times the global heap’s lock is ac-quired by each thread,which is an upper-bound on contention.We analyze two cases:a growing phase and a shrinking phase.We show that worst-case synchronization for the growing phases is in-versely proportional to the superblock size and the empty fraction but we show that the worst-case for the shrinking phase is expen-sive but only for a pathological case that is unlikely to occur in practice.Empirical evidence from Section5suggests that for most programs,Hoard will incur low synchronization costs for most of the program’s execution.

Two key parameters control the worst-case global heap contention while a per-processor heap is growing:f,the empty fraction,and S,the size of a superblock.When a per-processor heap is grow-ing,a thread can acquire the global heap lock at most k/(f∗S/s) times for k memory operations,where f is the empty fraction,S is the superblock size,and s is the object size.Whenever the per-processor heap is empty,the thread will lock the global heap and obtain a superblock with at least f∗S/s free blocks.If the thread then calls malloc k times,it will exhaust its heap and acquire the global heap lock at most k/(f∗S/s)times.

When a per-processor heap is shrinking,a thread willﬁrst ac-quire the global heap lock when the rele

ase threshold is crossed. The release threshold could then be crossed on every single call to free if every superblock is exactly f empty.Completely freeing each superblock in turn will cause the superblock toﬁrst be released to the global heap and every subsequent free to a block in that su-perblock will therefore acquire the global heap lock.Luckily,this pathological case is highly unlikely to occur since it requires an improbable sequence of operations:the program must systemati-cally free(1−f)of each superblock and then free every block in a superblock one at a time.

For the common case,Hoard will incur very low contention costs for any memory operation.This situation holds when the amount of live memory remains within the empty fraction of the maximum amount of memory allocated(and when all free s are local).John-stone and Stefanovi´c show in their empirical studies of allocation behavior that for nearly every program they analyzed,the memory in use tends to vary within a range that is within a fraction of total memory currently in use,and this amount often grows steadily[18, 32].Thus,in the steady state case,Hoard incurs no contention,and in gradual growth,Hoard incurs low contention.

5.Experimental Results

In this section,we describe our experimental results.We performed experiments on uniprocessors an

d multiprocessors to demonstrate Hoard’s speed,scalability,false sharing avoidance,and low frag-mentation.We also show that these results are robust with respect to the choice of the empty fraction.The platform used is a ded-icated14-processor Sun Enterprise5000with2GB of RAM and 400MHz UltraSparcs with4MB of level2cache,running Solaris 7.Except for the Barnes-Hut benchmark,all programs(including the allocators)were compiled using the GNU C++compiler at the highest possible optimization level(-O6).We used GNU C++in-stead of the vendor compiler(Sun Workshop compiler version5.0) because we encountered errors when we used high optimization levels.In the experiments cited below,the size of a superblock S is 8K,the empty fraction f is1/4,the number of superblocks K that must be free for superblocks to be released is4,and the base of the exponential for size classes b is1.2(bounding internal fragmenta-tion to1.2).

We compare Hoard(version2.0.2)to the following single and multiple-heap memory allocators:Solaris,the default allocator pro-vided with Solaris7,Ptmalloc[9],the Linux allocator included in the GNU C library that extends a traditional allocator to use multi-ple heaps,and MTmalloc,a multiple heap allocator included with Solaris7for use with multithreaded parallel applications.(Sec-tion6includes extensive discussion of Ptmalloc,MTmalloc,and other concurrent allocators.)The latter two are the only publicly-available concurrent allocators of which we are aware for the So-laris platfor

m(for example,LKmalloc is Microsoft proprietary). We use the Solaris allocator as the baseline for calculating speedups. We use the single-threaded applications from Wilson and John-stone,and Grunwald and Zorn[12,19]:espresso,an optimizer for programmable logic arrays;Ghostscript,a PostScript interpreter; LRUsim,a locality analyzer,and p2c,a Pascal-to-C translator.We chose these programs because they are allocation-intensive and have

688IT编程网

Hoard:一个动态内存管理库

发表评论

推荐文章

java正则表达式选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

热门文章

利用正则表达式实现文本数据提取与处理

正则表达式零宽断言详解

文本匹配规则

excel中使用正则

1-31正则表达式

anki之高级筛选

BUAA_OO_2021_第一单元总结

insert语句递增写法

sublime text 3在行前插入递增数字序号的方法

字符串只允许数字和英文的正则

powerbuilder 正则表达式

Shell脚本编写的高级技巧利用正则表达式进行字符串匹配

JAVA正则表达式的三种模式:贪婪,勉强和占有的讨论

go regexp匹配规则

oracle regexp_substr 实现原理

基本的元字符回溯引用和前后查匹配模式

elasticsearch query dsl正则

oracle sql正则表达式

GA-设置目标

仅匹配全角片假名的正则表达式

最新文章

java正则表达式选择题

工龄小数点提取

非零金额正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

vue validate 正则验证小数长度

标签列表

688IT编程网

Hoard:一个动态内存管理库

发表评论

推荐文章

java正则表达式 选择题

一种基于正则表达式的DBC文件解析及报文分析方法[发明专利]

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

热门文章

利用正则表达式实现文本数据提取与处理

正则表达式零宽断言详解

文本匹配规则

excel中使用正则

1-31正则表达式

anki之高级筛选

BUAA_OO_2021_第一单元总结

insert语句递增写法

sublime text 3在行前插入递增数字序号的方法

字符串只允许数字和英文的正则

powerbuilder 正则表达式

Shell脚本编写的高级技巧利用正则表达式进行字符串匹配

JAVA正则表达式的三种模式:贪婪,勉强和占有的讨论

go regexp匹配规则

oracle regexp_substr 实现原理

基本的元字符 回溯引用和前后查 匹配模式

elasticsearch query dsl正则

oracle sql正则表达式

GA-设置目标

仅匹配全角片假名的正则表达式

最新文章

java正则表达式 选择题

工龄小数点提取

非零金额 正则表达式

提取文本中数字的函数

vue数字相加小数点变长-概述说明以及解释

vue validate 正则验证小数长度

标签列表

java正则表达式选择题

非零金额正则表达式

基本的元字符回溯引用和前后查匹配模式

java正则表达式选择题

非零金额正则表达式