The Delta Tree:
An Object-Centered Approach to Image-Based Rendering
William J. Dally*†, Leonard McMillan†, Gary Bishop†, and Henry Fuchs†
*Artificial Intelligence Laboratory, Massachusetts Institute of Technology
†Department of Computer Science, University of North Carolina at Chapel Hill
Abstract
This paper introduces the delta tree, a data structure that represents an object using a set of reference images. It also describes an algorithm for generating arbitrary re-projections of an object by traversing its delta tree. Delta trees are an efficient representation in terms of both storage and rendering performance. Each node of a delta tree stores an image taken from a point on a sampling sphere that encloses the object. Each image is compressed by discarding pix-els that can be reconstructed by warping its ancestor’s images to the node’s viewpoint. The partial image stored at each node is divided into blocks and represented in the frequency domain. The rendering process generates an image at an arbitrary viewpoint by traversing the delta tree from a root node to one or more of its leaves. A subdivis
ion algorithm selects only the required blocks from the nodes along the path. For each block, only the frequency compo-nents necessary to reconstruct the final image at an appropriate sampling density are used. This frequency selection mechanism handles both antialiasing and level-of-detail within a single framework. A complex scene is initially ren-dered by compositing images generated by traversing the delta trees of its components. Once the reference views of a scene are rendered once in this manner, the entire scene can be reprojected to an arbitrary viewpoint by traversing its own delta tree. Our approach is limited to generating views of an object from outside the object’s convex hull. In practice we work around this problem by subdividing objects to render views from within the convex hull.
1.Introduction
The use of images as an underlying model for scene representation is an exciting new paradigm in computer graphics. Generating a scene by re-projecting and compositing images of objects rather than rendering geometric object representations has advantages in performance, level-of-detail management, and model creation. In image-based approaches the computation required to render a scene is proportional to the number of pixels in the images that compose the scene model, independent of the scene’s geometric complexity. Since the process of re-projecting an image to a n
ew viewpoint, often called an image warp [Wolberg90], uses fast incremental calculations, it can usually be accomplished faster than rendering a geometric model. Another advantage of the image-based rendering is that both level-of-detail and antialiasing can be handled within a single consistent framework by using a frequency-domain representation of the image and warping only those frequency components that are necessary for the desired re-projection. This frequency-based approach automatically allows for differences in the sampling densities between the reference image used to model the scene components and the desired re-projections. This further reduces the number of pixels warped. The reference images used in an image-based rendering system can be acquired from a multitude of sources including rendered polygonal models, photographs, and volume data sets. This provides signifi-cant modeling flexibility because the representation of each scene element can be determined according to both its complexity and its overall importance to the scene.
Efficient data representation is a key problem in building any image-based rendering system. Typically several hundred images are required to represent each object and rendering an image of an object at a new viewpoint may involve reprojecting pixels from several of these images. An ideal space-efficient representation stores exactly one sample for every potentially visible component of an object down to some minimal solid angle of resolvabilty. Like-wise, a representation that requires rea
ding only one sample for each pixel rendered to the screen would provide for optimal rendering performance.
In this paper we introduce the delta-tree, an efficient data structure for representing an object with images. We also present an algorithm for generating arbitrary views of an object using its delta tree. The delta tree stores an object’s images in a multi-rooted tree where each node corresponds to a region of a sampling sphere centered on the object and it stores an image of the object from a viewpoint within that region. The view stored at each node is com-pressed by discarding pixels that can be reconstructed by warping its ancestor’s views to the node’s viewpoint thus providing a space-efficient representation. A view of the object from an arbitrary viewpoint is generated by travers-ing a portion of the object’s delta-tree and reprojecting portions of the partial views at each node. The set of nodes to traverse and the portion of each view to reproject are selected by a simple recursive subdivision algorithm to give a bandwidth-efficient playback.
2.Background and Related Work
Image-based representations were first introduced to computer graphics in the form of texture mapping
[Catmull74] [Blinn76] [Blinn78] [Heckbert86]. The advantage of textures is that they increase the apparent visual complexity of a scene while adding only a small cost to the rasterization process. Notice that there are significant par-allels between the incremental computation of texture indices and the image warping process. Similarly, techniques such as MIP mapping [Williams83] and summed-area tables [Crow84] bear a strong resemblance to frequency-domain reconstruction methods. The limitation of texture maps is that the image elements are constrained to follow the underlying geometry of the primitive upon which the texture is mapped.
The recently introduced methods of view interpolation [Chen93], Quicktime VR [Chen95], fundamental-matrix-based re-projection [Laveau94], and plenoptic modeling [McMillan95b]demonstrate the potential of the image-based rendering approach. All of these methods take advantage of the incremental evaluation in their image warps and the amount of computation required is proportional to either the number of pixels in the reference images (view interpo-lation and plenoptic modeling) or the output image (fundamental-matrix-based re-projection and Quicktime VR). One notable disadvantage of all of these image-based rendering approaches is that they model the entire scene as a single image element, rather than a composition of individual scene elements. Put another way, in most previous image-based rendering approaches have taken a
viewpoint-centered approach to modeling a scene rather than an object-centered approach. However, in many applications it is desirable to synthesize scenes from the bottom up by combining various precomputed image-based scene elements. This object-centered view characterizes the image-based rendering system described in this paper. Other researchers have also built image-based systems with an
object-centered orientation. The digital compositing techniques [Porter84] [Duff85] have been used to construct com-plicated scenes by combining prerendered images that share a common viewpoint. Other object-centered approaches include the object-movie component of the QuicktimeVR system [Chen95] and the tree generation system described in [Max95].
3.Definitions
A reference image or view , , is a hybrid photometric/geometric representation of object x , with range data, taken from viewpoint p with a specified field-of-view. A view is represented as a dense two-dimensional array of samples with a given width and height. An image sample or pixel stores the range information, transparency, surface attributes, and surface normal of a point on the surface of an object.1 A partial view includes those pixels of that sample surfaces of x that are not adequately
sampled by the set of views, a . We represent a partial view as a sparse array of patches , arrays of pixels. A subview of a view is a rectangular subset of the view with possibly reduced width or height.
An object is a scene element represented by a set of views. An object may be primitive or compound. A primi-tive object is represented only by its reference images, which may have been acquired by rendering a polygonal
object tomodel, taking photographs of a real object, or some other means external to our system. A compound object is repre-sented by both its reference images and a set of object-transform pairs. The images of a compound object are gener-ated by transforming and reprojecting the images of its constituent objects. If a compound object is to be viewed only a few times it is easier to use its constituent representation, reprojecting each object to generate a view. On the other 1.Our prototype system does not yet handle transparency.
V p x
V a x p V p x n n ×
hand, if a compound object is to be viewed many times, it is more efficient to use its image represent
ation, by gener-ating the reference images of the compound object once and reprojecting them to generate a view. A scene is just a compound object.
We represent an object, primitive or compound, with a set of views taken from viewpoints on a sampling sphere , a sphere with a given radius, , centered on the object. The center of the sphere is considered to be the origin of the object coordinate system. We will refer to points on the sampling sphere with just their angular coordinates, , taking the radius to be implied. Within our system we represent these angles by their coordinates on the unit sphere.
4.Representing an Object with Views
As shown in Figure 1, reprojecting a view, , of an object from view-
point on a sampling sphere to a view from viewpoint may result in holes, if a surface visible from is occluded from , or blurring, if a surface
is at an oblique angle to and thus undersampled in . Plate 1 shows this effect on an actual image. When a view of the teapot (left) is warped to a
viewpoint 30 degrees to the right, the unsampled surfaces behind the spout
and to the rear of the pot leave holes. Constructing an adequately sampled
may require reprojecting pixels from several views. The view of the tea-pot in Plate 1 (right), for example was reconstructed by warping portions of
images from five viewpoints. However, it requires reprojecting only the number of pixels visible in the final image, . The challenge of object representation is to index the views so the proper set of pixels from the proper set of views can be quickly identified and accessed.
from a surrounding set of viewpoints. In three dimensions, a minimum of three viewpoints must be used to recon-struct an object. The delta-tree, as described below, divides space into square regions and always uses the four corners of the region enclosing the viewpoint to reconstruct an image.
r s θφ,()Figure 1: Reprojecting a view from V p p V q q q p p V p V q V q θφ,()
Theoretically, determining the number of views required to represent an object is a difficult problem and there is a broad literature on determining the aspect graph of an object [Plantinga90]. In practice, however, we have found a simple regular grid on the sampling sphere generates a set of views which adequately covers the surface of most objects. We are not concerned with generating a minimal set of views as the delta tree encoding, described below, automatically discards views that are completely redundant.
hull of the object as surfaces may be visible inside the object that are not visible from the sampling sphere. This means, for example, that we cannot move the viewpoint inside a tunnel through an object. In practice we overcome this limitation by splitting the object down the center of the tunnel to make the inside of the tunnel visible from a sam-pling sphere.5.The Delta Tree
A delta tree encodes the views representing an object in a quad-tree that divides the angular space of the sam-pling sphere. Each view is compressed by discarding pixels that sample surfaces adequately sampled by its ancestors. The number of pixels stored to represent an object is equal to the number of pixels needed to adequately sample the surface of the object as seen from the sampling sphere. The views are arranged so an image of the object from an arbitrary viewpoint can be rendered by reprojecting portions of a small set of views identified by a walk of the tree.
5.1Structure
As shown in Figure 5 each node of the delta tree is associated with a region of the sampling sphere and stores an image of the object as seen from a corner of its region. For example, the region of root node A is the entire square shown in the figure, , and node A stores a view from (0,0). A complete delta tree contains 4P-such root nodes, one for each quadrant about P primary views. Typically P=6 corresponding to the six cardinal directions. Each child of a node inherits a quarter of its parent’s region The regions of B1, C11, and D112, for example, are the shaded regions of the figure. The labeled dots in the figure show the location of the viewpoints asso-ciated with the nodes. Each child stores a partial view at the corner, p , of its region that is also a corner of its par-ent’s region. The partial view includes only those pixels that cannot be adequately reconstructed from the
partial views stored by the child’s ancestors, a . For leaf nodes, we include the node’s older siblings in set a since, as
described below, all siblings of a leaf are visited together during reconstruction. If the partial views associated with all of a node’s children are empty, the children are deleted and the node becomes a leaf.θφ,()045,[]045,[],()=V a
p
Figure 5: Each node of the Delta Tree corresponds to a region of viewing space.
An Example Delta Tree: Plate 2 shows one of 24 quadrants of a three-level delta-tree constructed for a teapot. The partial view stored at each non-empty tree node is plotted in a corner of the node’s region in space. This matches the layout of the left side of Figure 5 down to the level of the C nodes. The four columns correspond to viewpoints with = 0, 22.5, 22.5, and 45 from left to right and the four rows correspond to viewpoints with = 0, 22.5, 22.5, and 45 from bottom to top. Note that two nodes with the same viewpoint, e.g., C03 and C31 at (22.5,0), have very different partial views. The view at C31 has many fewer pixels as it inherits more pixels from its ancestors and siblings than C03. Storing fewer pixels at C31 reduces the number of pixels that must be reprojected to render a view in its region. Storing multiple views from a single viewpoint does not increase the storage overh
ead as these views share pixels by reference as described below. The compression by partial views is evident in the figure. View
A contains 34,000 non-background pixels while the average third-level view contains 3800 non-background pixels.View Sharing: There are four distinct cases in which nodes of the tree share views. In each case, common pixels are shared by reference and no redundant pixels are stored.
1.The four trees corresponding to the four quadrants about a primary view share views along their common edges.
For example, the views stored by A, C01, C13, and B1 are shared with the tree that covers quadrant 2,
. In this case the view at A and partial views along the edge are identical in the two quadrants so the views are shared by reference.
2.One child of each node shares a view with its parent. For example, B0 shares a view with A, and C10 shares a view with B1. In this case, the child, e.g., B0, need not store any pixels of its view since every pixel is ade-quately sampled by an ancestor, e.g., A. These children with empty partial views are shown cross hatched on the right side of Figure 5.
3.First cousins within a tree may share a view. For example, C11 shares a view with C23, and four cousins share . In this case, the cousins store different partial views. For example, C11 stores
containing only those pixels that are not adequately sampled by A or B1, while C23 stores . Any pixels in com-mon between the partial views are shared by reference as described below. In some cases the set of pixels stored by a node contains the set stored by its cousin. For example, C01 contains all of the pixels of C13 since C13’s ancestors {A,B1} includes C01’s ancestors {A}.
4.The trees associated with two primary views share views along their common edges. This case is identical to
case (3). The nodes have different ancestors and thus store different partial views sharing common pixels by ref-erence.
Redundancy: Even with parent-child compression and view sharing, a number of redundant pixels remain in the delta tree. Two sibling nodes may sample the same surface, and undersampled pixels in an ancestor may be resam-pled in a descendant. For example, the views corresponding to B1 and B2 in Plate 2 both sample the entire top of the
θφ,()θφθ450,–[]=V p 1V p 2V A B 1,{}p 1V A B 2,{}p 1
teapot. Redundant samples of the teapot top also appear in C01 and C33. Several views also include a These redun-dant samples are due to an artifact in the present version of our warper 2. The delta tree is robust in the presence of such artifacts. If a single pixel in a patch is not properly reconstructed from an ancestor, the entire patch is retained in a node’s partial view. This results in a final image free from artifacts at the expense of extra storage and rendering time.
Memory Management: A delta-tree typically has hundreds of nodes of which a small fraction are used at a particu-lar time. For example, the subtree representing the side of an object opposite the viewpoint is not likely to be
accessed in the near future. To optimize storage, the nodes of a delta tree can be loaded or synthesized on demand by treating a small set of in-memory nodes as a cache. Pointers to non-resident nodes are flagged with a reserved value. When this value is detected in traversing a tree, one of the in-memory nodes is evicted and the requested node is loaded from backing store or generated on-the-fly, for example by a polygonal renderer. If the demand for a node can be anticipated, for example by predicting viewpoint motion, then the generation or loading of nodes can be done at less than real time rates.
5.2Reconstructing a View
Views on the sampling sphere: A view of the object from a viewpoint, p ,
on the sampling sphere is generated by walking the tree from the root to the leaf whose region contains p and including the siblings of this leaf to sur-
round p with the nearest possible views. For example, if p is in the small shaded region in Figure 5, the path {A, B1, C11, D111, D112, D113} is traversed. As shown in Figure 6 the layout of the delta tree causes the
traversal follows a spiral pattern in space. This ensures that, except
in a few degenerate cases, the first four nodes visited will surround the
viewpoint. The nearest child node at each level is selected by dotting the
normal corresponding to p with the normal corresponding to the center of
the child’s region. At each node, each pixel of the required portion of the
stored partial view is reprojected (warped) to viewpoint p and composited
in a z-buffer with the warped views of its ancestors. The z-buffer is
required to eliminate hidden surfaces visible along the path.
Generating a view involves reprojecting slightly more than the number
of pixels that appear in the final view. The delta-tree’s construction
ensures that no surface of the object is sampled more than once. However,
some surfaces that are not visible from p will be rendered and some sur-
faces that are viewed obliquely from p will be slightly oversampled.Views off the sampling sphere: As illustrated in Figure 2, generating a view from a point off the sampling sphere may require reprojecting pixels from different points on the sampling sphere and hence from different leaf nodes of the delta tree. To handle this situation we recursively subdivide the view we are generating as we traverse the delta tree so that at each level of the delta tree all of the pixels in a subview are rendered from the same region.
2.We expect to have this eliminated before preparing the final draft of the paper.045B1A C11D111D112D11345
Figure 6: The Delta Tree is traversed
in a spiral pattern.p θφ,()
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。
发表评论