a r X i v :c s /0703003v 1  [c s .N A ]  28 F e
b  2007Functions to Support Input and Output of Intervals
M.H.van Emden ∗  B.Moa †S.C.Somosan ‡
Research Report DCS-311-IR
Department of Computer Science
University of Victoria Canada
Abstract Interval arithmetic is hardly feasible without directed rounding as provided,for example,by the IEEE floating-point standard.Equally essential for interval methods is directed rounding for conversion between the external decimal and internal binary numerals.This is not provided by the standard I/O libraries.Conversion algorithms exist that guarantee identity upon conversion followed by its inverse.Although it may be possible to adapt these algorithms for use in decimal interval I/O,we argue that outward rounding in radix conversion is computationally a simpler problem than guaranteeing identity.Hence it is preferable to develop decimal interval I/O ab initio ,which is what we do in this paper.1Introduction Interval arithmetic endows every computation with the authority of proof —the theo
rem being that the real-valued solution x belongs to a set of reals [a,b ],where a and b are IEEE-standard floating-point numbers.Yet it can easily happen that we get as output something obviously wrong such as [0.33333,0.33333]when x =1/3.It may well be that a correct interval has been computed.For example,if [a,b ]is the narrowest single-length IEEE-standard floating-point interval containing 1/3,and if we do not allow the output to be truncated,we get [a,b ]as
[0.333333313465118408203125,
0.3333333432674407958984375],
which is best written as
0.3333333[13465118408203125,432674407958984375],a notation proposed and analyzed in [9].This is the unabridged version of [0.33333,0.33333],
which is what we get with a typical default precision.The untruncated decimal numerals are exact representations of[a,b]because every binaryfloating-point number can be repre-sented as a decimal numeral.However,if one wants to ensure exact representation,then we get a decimal for every bit;not an economical representation.Then we might as well write the bits themselves,which gives
0.010101010101010101010101[0,1].
On output the only problem is that any reasonable choice of display precision causes the standard output routines to shorten the numerals for the bounds to the same result.The improvement needed here is output that is aware of whether an upper or a lower bound is to be displayed.
On input,however,we have a more serious problem.Suppose we want to initialize an interval variable at0.1.Initializing the lower and upper bounds with
lb=ub=0.1
is guaranteed to generate an interval that does not contain0.1,for the simple reason that there is nofloating-point number equal to0.1.Thus the best the compiler can do is to produce one of thefloating-point numbers closest to0.1.We don’t know which one it is.The table in Figure1shows for each of1/2,1/3,...,1/10that the compiler picks the upper bound of the narrowest interval containing the fraction concerned.To remind us that we cannot count on this to happen,the other choice is made for1/11.Thus,for input we need an algorithm that produces,for every fraction or numeral as input,a narrow interval containing it;ideally the narrowest.
Floating-point numbers are normalized as a power of2multiplied by a mantissa that is between1and2,like this:
2ˆ(-2)*  1.0101010101010101010101[0,1].
This shows the single-length format,which has23bits after the binary point.
Long strings of bits are usually shown in hexadecimal notation,which uses the16char-acters
0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f
to represent four consecutive bits at a time.As there are only23bits to be displayed,we represent thefirst three bits in octal notation,with the characters
0,1,2,3,4,5,6,7.
Thus the23bits after the binary point are shown as an octal character followed byfive hexadecimal characters.Thus,the narrowest interval containing1/3is
2ˆ(-2)*  1.2aaaa[a,b].
In this way we get the table in Figure1.
In this paper we develop algorithms to support software that performs correct input and output of intervals.
1/i floating-point number narrowest interval
produced by standard containing1/i
I/O library via compiler
--------------------------------------------------------
1/22ˆ(-1)*  1.0000002ˆ(-1)*  1.000000[,]
1/32ˆ(-2)*  1.2aaaab2ˆ(-2)*  1.2aaaa[a,b]
1/42ˆ(-2)*  1.0000002ˆ(-2)*  1.000000[,]
1/52ˆ(-3)*  1.4ccccd2ˆ(-3)*  1.4cccc[c,d]
1/62ˆ(-3)*  1.2aaaab2ˆ(-3)*  1.2aaaa[a,b]
1/72ˆ(-3)*  1.1249252ˆ(-3)*  1.12492[4,5]
1/82ˆ(-3)*  1.0000002ˆ(-3)*  1.000000[,]
1/92ˆ(-4)*  1.638e392ˆ(-4)*  1.638e3[8,9]
1/102ˆ(-4)*  1.4ccccd2ˆ(-4)*  1.4cccc[c,d]
1/112ˆ(-4)*  1.3a2e8c2ˆ(-4)*  1.3a2e8[c,d] Figure1:Why standard input cannot be relied on to obtain an interval for some of the common fractions.
2Previous work
The problems described in the introduction have motivated Rump[8]to develop a method for decimal input and output for intervals.He computes two2-dimensional arrays low and upp of double-lengthfloating-point numbers that satisfy
low[d,e]≤d×10e≤upp[d,e]
for d∈{1,2,...,9}and for e∈{−340,−339,...,308}.These two arrays contain a total of11682double-lengthfloating-point numbers.
With these precomputed numbers available,a decimal d k×10e is converted on input to the interval
[k
i=1low[d i,e−i],
k
i=1
upp[d i,e−i]](1)
of double-lengthfloating-point numbers,where the inner brackets enclose array subscripts. The additions for the lower(upper)bound are performed in downward(upward)rounding mode.As a result of the possibly occurring roundings,the interval obtained is not in general as narrow as possible.To minimize the inevitable widening,Rump recommends performing the additions starting with the smallest array elements.
The work of Rump should be compared and contrasted with what we will call here general-purpose c
onversion algorithms.Though these are not specifically intended for in-terval conversions,they can perhaps be adapted to this purpose because of their guarantees on accuracy.
Let us briefly review these algorithms.Steele and White[5]formulated identity re-quirements for conversions between internal and external numerals.The internal identity requirement is that conversion of an internal numeral to external and back results in the
same.If the external numerals are unlimited in length,this requirement can always be met. Similarly,the external identity requirement is that conversion of an external numeral to in-ternal and back results in the same.Usually,the internal numerals have afixed length,so this requirement can only be met when the external numeral is not longer than is warranted by thisfixed length.
Every binary numeral is representable as a decimal one.Hence,if the internal numerals are binary and the external ones are decimal,then it is trivial to satisfy the internal identity requirement by using the decimal equivalent of the internal binary numeral.The problem addressed by Steele and White is to do so with as few decimal digits as possible.
To transfer binaryfloating-point numerals from one computer to another by means of binaryfiles,one needs to be assured that thesefiles are written and read in a compatible way. Moreover,the layout of t
hefloating-point numerals of both machines needs to be the same. Adherence to the IEEEfloating-point format is not enough:in addition,both machines have to be big-endian or both need to be little-endian.Because of these difficulties it is attractive to convert internal binary numerals to an external textfile containing decimal numerals and to have an algorithm that guarantees faithful translation back to internal format.
Such use of decimal I/O requires that the conversion is efficient.For this reason,Gay [3]and Burger and Dybvig[1]devised a faster decimal output.The identity requirement assumes sufficiently accurate decimal input.This was addressed by Clinger[2]and by Gay [3].
Compared to Rump’s approach,this has the advantage of not requiring a database of precomputed numbers.
The identity requirements satisfied by[5,3,1,2]are convincing for the purposes en-visaged by these authors.However,the requirements for decimal interval I/O are equally compelling and quite different:
1.On input,compute the narrowestfloating-point interval containing the decimal nu-
meral.
2.On output,compute the narrowest interval containing thefloating-point number that
is bounded by decimal numerals of specified length.
To satisfy these requirements it seemed simplest to develop our algorithms fromfirst prin-ciples rather than to attempt to adapt the work referenced above.
From[4]it seems that XSC does the conversions between binary and decimal numerals correctly,but the authors do not say how it was done.Although the source code of XSC is publicly available(GPL license),it contains more than150files.Moreover,the code is not documented in such a way as to facilitatefinding the pieces of code that do the conversion, to gather them,and to understand how the conversion was done.
Our task,then,is to explain how to do the conversion,and to show how to implement it.As the utility of this kind of work requires executable programs,we needed to pick an implementation language.The primary language for numerical work is Fortran.However, this is more in the direction of systems programming,for which C/C++is a reasonable choice of language.
3Decimal input
The“scientific notation”for a number consisting of a decimal fraction and an exponent is almost universally used.Thus it would seem that one only has to cater to this format for I/O routines.Yet for input there is a strong case to be made for the pre-scientific notation of a fraction as a pair of integers.Therefore we consider these two in turn.
3.1Rational fractions
In many situations the most convenient way to input a number is as a rational fraction p/q,where p and q are integers.Not only is it convenient,but there is also a convincing correctness criterion:as there exists,in a given format,a unique leastfloating-point interval that contains p/q,the input function should yield this interval.
An algorithm for this purpose is one that has been widely,but not universally,taught to children for at least two centuries.We will illustrate this algorithm with p/q=3/7.As this number is less than1,we know that the binary fraction has the form0·....How do we get the missing digits?Multiply by two to get6/7.As the result is less than one,the result has the Multiply by two to get12/7.As the result is not less than one, the result has the and subtract1from12/7,so that we continue with5/7. Double again,get10/7,so that we continue with3/7after noting that the result has the form0.0
Let us check our computation.Thefirst group of digits after the decimal point is worth 3/8.Every next one is worth1/8times the previous group of three.So the value of the infinite string is(3/8)∗(1+(1/8)+(1/8)2+...).Let S=1+(1/8)+(1/8)2+.... Then S/8=(1/8)+(1/8)2+...=S−1.Hence S=8/7and wefind that the infinite string is(3/8)∗(8/7)=3/7.
Let us now translate this procedure to a machine-executable algorithm.We can decide whether the next binary digit should be a0or a1by computing in a variable pwr(from “power”)the successive powers2−k for k=0,1,2,...and adding some of these powers in a variable named frac(from“fraction”).We compute the next value of pwr at every step,but only add it to frac when deciding to write a1in the algorithm described above. This idea is embodied in the following code.
float frac=0.0;
//fraction to be built up out of powers of two
float pwr=  1.0;//power of2to add to fraction
/
/let r=p/q
while(frac+pwr>frac&&p>0){
//p>0and r-frac=(p/q)*pwr
pwr=pwr/2.0;p=2*p;
if(p>=q){
p=p-q;frac=frac+pwr;
}
//if p=0then r=frac
//if p>0then0<r-frac<=pwr
}
Typically,there are infinitely many binary digits.Only thefirst of afinite segment can be accommodated
in afloating-point number.At every iteration pwr is halved.At some point this quantity becomes insignificant.The criterion for this is whether adding pwr to frac makes any difference to frac.As soon as that is not the case,the iteration terminates.
The last assertion states that every time around the loop we have that frac is a lower bound for p/q and that the difference between the two is at most2−i,where i is the number of times around the loop.This code builds in frac a lower bound to p/q that approaches p/q as closely as the precision allows.If p/q has can be represented in thefloating-point number format,this shows by p becoming0.
The above segment of code is the kernel of the function shown in Figure2for the IEEE standard754single-length format.It produces the leastfloating-point interval that contains the fraction p/q.
Interval*convert(int p,int q){
//Assumes p and q are positive and less than2ˆ{30}.
//Returns the greatest float that is not greater than
//the rational r=p/q.
steelefloat sf=scaleFactor(p,q);
float frac=0.0;
//fraction to be built up out of powers of two
float pwr=  1.0;//power of2to add to fraction
while(frac+pwr>frac&&p>0){
//p>0and r-frac=(p/q)*pwr
pwr=pwr/2.0;p=2*p;
if(p>=q){
p=p-q;frac=frac+pwr;
}//p>0and r-frac=(p/q)*pwr
}
frac=frac*sf;
if(p==0)//r=frac
return new Interval(frac,frac);
else return new Interval(frac,next(frac));
}
Figure2:A C++function to convert a rational p/q to afloating-point number.The function next()produces the leastfloating-point number greater than its argument.
The function assumes that0.5≤p/q<1,which is of course not in general the case.It therefore needs the function in Figure3.
3.2Input of decimalfloating-point numerals
Many programming languages and datafiles use a similar format for fractional numbers. Though details may vary,it is easy to extract from suchfiles an integer e containing the

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。