MODE�BASEDINFLUENCE DIAGRAMS FOR MACHINE VISION T. S. Levitt,• J. M. Agosta,+ T. 0. Binford+ • Advanced of a given object is always partial and ...

1MB Sizes 0 Downloads 6 Views

Recommend Documents

Iterative Machine Teaching - arXiv
Jun 13, 2017 - Iterative Machine Teaching. Weiyang Liu 1 Bo Dai 1 Ahmad Humayun 1 Charlene Tay 2 Chen Yu 2. Linda B. Smi

Mode d'emploi User instruction - Schaerer Espresso Machine
M. Schaerer AG, CH-3302 Moosseedorf. © Copyright by M. Schaerer AG, CH-3302 Moosseedorf. All rights reserved, including

Association Rule Based Flexible Machine Learning Module for - arXiv Association Rule Based Flexible Machine Learning. Module for Embedded System Platforms like Andro

Evaluation of Formal IDEs for Human-Machine Interface Design - arXiv
formal IDEs for the design and analysis of human-machine interfaces, and ... action, e.g., aspect and behaviour of user

A Comprehensive Approach to Mode Clustering - arXiv
Abstract: Mode clustering is a nonparametric method for clustering that defines clusters using the basins of attraction

A PAXOS based State Machine Replication System for - arXiv
system is based on PAXOS algorithm, an algorithm for solving the consensus problem in a network of unreliable processors

Predicting Economic Recessions Using Machine Learning - arXiv
Dec 2, 2016 - itself, but it is very similar to the recession of the early 1980s, which until 2008/09 was distinctly the

MO Diagrams for Diatomic Molecules
Oct 9, 2015 - Just as with atoms, we can write a molecular electron configuration for O2 σ2σ*2σ2π4π*2. We can also

Kinship Diagrams
Iroquois System. • EGO's father and father's brother are called by the same term, mother's brother is called by a diff

Venn Diagrams
Mar 4, 2014 - A. General Rule: The “Step-Up” in Basis to Fair Market Value. B. Community Property and Elective/Conse


T. S.

Levitt,• J.

M. Agosta,+

T. 0. Binford+

• Advanced

of a given object is always partial and some­ times incorrect due to obscuration, occlusion, noise and/or compounding of errorful interpre­ tation algorithms. On the other hand, there is typically an abundance of evidence [Lowe-86]. In our approach, three dimensional model­ based machine vision techniques are integrated with hierarchical Bayesian inference to provide a framework for representing and matching in­ stances of objects and relationships in imagery, and for accruing probabilities to rank order conflicting scene interpretations. In particu­ lar, the system design approach uses proba­ bilistic inference as a fundamental, integrated methodology in a system for reasoning with geometry, material and sensor modeling.

Decision Systems

Mountain View, California

+stanford University Stanford, California

Abstract We show the soundness of automated con­ trol of machine vision systems based on in­ cremental creation and evaluation of a par­ ticular family of influence diagrams that rep­ resent hypotheses of imagery interpretation and possible subsequent processing decisions. In our approach, model-based machine vi­ sion techniques are integrated with hierarchi­ cal Bayesian inference to provide a framework for representing and matching instances of ob­ jects and relationships in imagery, and for ac­ cruing probabilities to rank order con:liicting scene interpretations. We extend a result of Tatman and Shachter to show that the se­ quence of processing decisions derived from evaluating the diagrams at each stage is the same as the sequence that would have been derived by evaluating the final influence dia­ gram that contains all random variables cre­ ated during the run of the vision system. I. Introduction Levitt and Binford [Levitt et al.-88], [Bin­ ford et al.-87], presented an approach to per­ forming automated visual interpretation from imagery. The objective is to infer the content and structure of visual scenes of physical ob­ jects and their relationships. Inference for ma­ chine vision is an errorful process because the evidence provided in an image does not map in a one to one fashion into the space of possible object models. Evidence in support or denial


Our objective is to be capable of interpret­ ing observed objects using a very large visual memory of object models. N evatia [N evatia74] demonstrated efficient hypothesis gener­ ation, selecting subclasses of similar objects from a structured visual memory by shape in­ dexing using coarse, stick-figure, structural de­ scriptions. Ettinger [Etiinger-88] has demon­ strated the reduction in processing complexity available from hierarchical model-based search and matching. In hierarchical vision system representation, objects are recursively broken up into sub-parts. The geometric and func­ tional relations between sub-parts in turn de­ fine objects that they comprise. Taken to­ gether, the models form an interlocking net­ work of orthogonal part-of and is-a hierarchies. Besides their shape, geometrical decomposi­ tion, material and surface markings, in our ap­ proach, object modds hold knowledge about the image processing and aggregation opera­ tions that ean be used to gather evidence sup­ porting or denying their existence in imagery. Thus, relations or constraints between object sub-parts, such as the angle at which two ge­ ometric primitives meet in forming a joint in a plumbing fixture, are modeled explicitly as procedures that are attached to the node in the model to represent the relation. Thus model nodes index into executable actions represent­ ing image evidence gathering operations, im-

age feature aggregation procedures, and 3D volume from 2D surface inference. In Binford and Levitt's previous work, the model structuring was guided by the desi re to achieve the conditional independence between multiple children (i.e., sub-parts) of the same parent (super-part, or mechanical joint). This structuring allowed Pearl's parallel probabil­ ity propagation algorithm [Pearl-86] to be ap­ plied. Similarly, the concept of value of in­ formation wB.S applied to hierarchical object models to enable a partially parallelized al­ gorithm for decision-theoretic system control. That is, the Bayes net was incrementally built by searching the model space to match evi­ dence extracted from imagery. At each cy­ cle, the model space dictated what evidence gathering or net-instantiating actions could be taken, and a decision theoretic model was used to choose the best set of actions to execute. However, the requirement to force condi­ tional independence may lead to poor approxi­ mations to reality in object modeling, [Agosta88] . Further, the authors did not prove the co­ herence or optimality of the decision making process that guided system control. In this paper we make first steps toward for­ malizing the approach developed by Binford and Levitt. We set up the problem in an influ­ ence diagram framework in order to use their underlying theory in the formalization. Im­ age processing evidence, feature aggregation operations used to generate hypotheses about imagery interpretation, and the hypotheses themselves are represented in the in:ftuence di­ agram formalism. We want to capture the pro­ cesses of searching a model database to choose system processing actions that aggregate (i.e., generate higher level object hypotheses from lower level ones), search (i.e., predict and look elsewhere in an image for object parts based on what has already been observed) and re­ fine (i.e., gather more evidence in support or denial of instantiated hypotheses).

The behavior of machine vision system pro­ cessing is represented as dynamic, incrernen-


tal creation





of image evidence and inferences against ob­ ject models are used to direct the creation of new random variables representing hypotheses of additional details of imagery interpretation. Dynamic instantiation of hypotheses are for­ mally realized as a sequence of influence dia­ grams, each of whose random variables and in­ fluence :relations is a superset of the previous. The optimal system control can be viewed as the optimal policy for decision making based on the diagram that is the 14limit" of the se­ quence. We extend a result of Tatman and


[Tatrnan-86J, [Tatman and Shachter-89) to show that the sequence of processing decisions derived from evaluating the diagrams at each stage is the same B.S the sequence that would have been derived by evaluating the final in­ fluence diagram that contains all random vari­ ables created during the run of the vision sys­ tem. In the following we first review our approach to inference, section 2, and control, section 3, in computer vision. In section 4 we rep­ resent results of the basic image understand­ ing strategies of aggregation, search and re­ finement in influence diagram formalisms. In section 5 we sketch a proof of the soundness of control of a vision system by incremental creation and evaluation of influence diagrams. ll. Model-Based Reasoning for Machine Vision

We take the point of view that machine vi­ sion is the process of predicting and accumu­ lating evidence in support or denial of run­ time generated hypotheses of instances of a priori models of physical objects and their pho­ tometric, geometric, and functional relation­ ships. Therefore, in our approach, any ma­ chine vision system architecture must include a database of models of objects and relation­ ships, methods for acquiring evidence of in­ stances of models occuring in the world, and techniques for matching that evidence against the models to arrive at interpretations of the

imaged world.

Basic image evidence for ob­

level below.

jects and relationships includes structures ex­ trac�ed from images such


edges, vertices

and regions. In non-ranging imagery, these are one or two dimensional structures.


objects, on the other hand, three dimen­ sional. The inference process from image evi­ dence to

3D interpretation of an imaged


tends to break up into a natural hierarchy of representation and processing, [ Binford-SO). Processing in a machine vision system has two basic components:

image processing to

transform the image data

to other represen­

tations that are believed to have significance for interpretation; and aggregation operations over the processed data to generate the rela­ tions that are the basis for interpretation. For example, we might run an edge operator on an image to transform the data into



where imaged object boundaries are likely to have high values, while in�erior surfaces of ob­ jects are likely to have low values.

We then

threshold and run an edge linking operator

on this edge image ( an other im age processing operator) to produce a representation where connected sets of pixels are likely to be object boundaries. Now we search for pairs of edges that are roughly parallel and an appropriate distance apart


candidates for the opposite

sides of the projected image of an object we have modeled.

selecting and executing image processing

and grouping operations, searching the object model network to match groups to models, in­ stantiating hypotheses of possible observed ob­ jects or ob j ect parts, accruing the evidence to infer image interpretations, and deciding when interpretation is sufficient or complete. m.

Sequential Control for Machine Vi­

sion Inference Presented with an image, the first task for

a machine vision system is to


some ba­

sic image processing and aggregation opera­ tors to obtain evidence that can be used to find local areas of the image where objects may be present.

This initial figure-from-ground

reasoning can be viewed


bottom- up model

matehing to models that are at the coarsest level of the is-a hierarchy, i.e., the 14objectfnot­ object" level.

Having initialized the process­

ing on this image, basic hypotheses, such


"surface/not-surface" can be instantiated by matching surface models. After initialization, a method of sequential control for machine vision is



This search "'aggregates" the

boundaries into pairs that may have signifi­ cance for object recognition. Aggregation and segmentation operations


how the concept of aggregation in bottom

up reasoning can be the basis for generat­ ing hypotheses of object existence and type. Aggregati on applies cons train ts from

0. Check to see if we are done. If not, con­


are fund amental in data reduction. We

our un­

derstanding of geometry and image forma­ tion.

Control of a machine vision system consists


The aggregation operaiors also corr�

1. Create a list of all executable evidence

gathe ri n g and aggregation actions


concatenating the actions listed in each model node that eorrespond to



tiated hypothesis.

2. Select an action to execute.

spond to the transformations between levels in

3. Action execution results in either new hy­

the object recognition hierarchies; Sub-parts

potheses being instantiated, or more ev­

are grouped together at one level by relation­

idence being attached to an existing hy­

ships that



to a single node at 1.he

next higher level. Therefore grouping opera­ tors dictate the "out-degree" of a hypothesis at one hierarchy level with its children at the

4. Propagate evidence to accomplish infer­ ence



for interpretation, and go to

From our model-based point of view, an ac­

idence (e.g., infer-specularity, find-edges, etc.)

tion associated with a model node that corre­

or that aggregate object components or other

i nference proceeds by

sponds to an instantiated hypothesis has one

evidence nodes.

of the following effects: refining, searching or

choosing actions from the model space that


aggregation. In the following we expl ain these

create new hypotheses and relationships be­

a method

tween them. It follows that all possible chains

actions. In the next section, we show

of representin g the effects of these actions in

of inference that the system can perform are

an influence diagram forma.lism.

implicitly specified a priori in the model-base.

Refining a. hypothesis is either gathering

This feature clearly distinguishes inference

more evi dence in direct suppor t of it by search­

from control. Control chooses actions and al­

ing for sub-parts or relationships on the part­

locates them over available processors, and re­

of hierarchy below the model corresponding to

turns results to the inference process. Infer­

the hypothesis, or instantiating multiple com­

ence uses the existing hypothesis space, the

peting hypotheses at a finer level of the is-a. hi­

current results of actions (i.e., collected evi­

erarchy that are refined interpretations of the

dence) generates hypotheses and relationships,

hypothesized object. For example, given a hy­

propagates probabilities, and accumulates the

pothesized screwdriver handle, in refinement

selectable actions for examination by control.

we might look for grooves in the hypothesized

In this approach, it is impossible for the sys­

screwdriver handle.

tem to reason circularly, as all instantiated chains of inference must be supported by evi­

Searching from a hypothesis is both predict­ ing the location of other object parts or rela­

dence in a manner consistent with the model­ base.

tionships on the same hierarchy level, and ex­ ecuting procedures to gather evidence in sup­


port or denial of their existence. In searching


Model Guided Influence Diagram

for the screwdriver handle, we might look for the blade of the screwdriver, predicting it to be affixed to one end or othe other of the handle.






which we build the model-base allows three kinds of nodes; probability nodes, value nodes

Aggregation corresponds to moving up the

and decision nodes.

Probability nodes are

part-of hierarchy to instantiate hypotheses

the same

that include the current hypothesis

a sub­

nodes and decision nodes represent the value

part or sub-relationship. Having hypothesized

and decision functions from which a sequen­

the screwdriver handle and the screw-driver

tial stochastic decision procedure may be con­



in belief nets [Pearl-86].


blade, we can aggregate sub-parts to hypoth­


esize the existence of the whole screwdriver.

showing the relations among the nodes. Solu­

The diagram consists of a network

tion techniques exist to solve for the decision In summary,


we spawn hypotheses dy­

namically at runtime, hypothesis instantiation is guided by a priori models of objects, the ev­ idence of their components, and their relation­ ships. System control alternates between ex­

functions, (the optimal policies) given a com­ plete diagram.

Formulating the model-base

as an influence diagram allows existing solu­

tion techniques [Shachtet-86] to be exploited for evaluation of the interpretation process.

amination of in st ant i at ed hypotheses, compar­ ing them against models, and choosing what

The step of generating new hypotheses dy­

actions to take to grow the instantiated hy­

namically upward, from the evidence and hy­

pothesis space, which is equivalent to seeing

pothesis at the current stage, structure

more structure in the world. The possible ac­

to the influence diagram. Expanding the net­

tions are also stored in the model space either

work then re-evaluating it introduces a. new


operation that is not equivalent to any evalua-


lists of functions that gather ev-


tion step for influence diagrams. In a aggrega­ tion step, a hypothesis is created to represent a part composed of a set of sub-parts at the lower level. For example, in the domain of low level image constructs, such as lines and ver­ tices, aggregation by higher level parts deter­ mines a segmentation of the areas of the image into projected surfaces. This concept of seg­ mentation differs from "segmentation" used in image processing in several ways. First, a com­ mon process of aggregation is used through­

Figure 1: Deterministic Aggregation Process

out the part-of hierarchy; there is no unique segmentation operator.

Hypotheais of Existence of Physical Object

Second, the segmen­

tation need not be complete; the aggregation operator may only distinguish the most salient features. The notion of segmentation as "par­ titioning a region into segments" no longer ap­ plies. Finally, because the refinement step al­ lows the prediction by higher level hypothe­ ses of lower level features that have not yet been hypothesized, the segmentation may be extended by interpretations from above.

Figure 2: Hypothesis Generation from the Ag­

Hypothesis generation is implemented by aggregation operators.

The combinations of

all features at a level by all aggregation op­ erators that apply, is a eombinatorially de­ manding step.

To avoid this complexity the

adjacency of features is exploited.


that are aggregated belong to objects that are connected in space.

This does not necesS&I­

ily mean that the features appear next to each other in the image, rather they are near each other in object space.

the features to be aggregated. (See Figure 1.) The distribution of the aggregation function is conditioned by the hypothesis. scribed by a likelihood,

It is de­

p{•lh}, the proba­

bility of the score, given the hypothesis. (Fig­ ure 2.)

Exploiting this con­

straint limits the hypotheses generated to a small number of

gregation Process

all possible sets of features.

From the a model of the appearance of the object, a stochastic model of the distribution of the aggzegation score can be derived for the

Aggregation operators are derived from the models of parts in terms of the measured pa­ rameters of their sub-parts. From a physical model of the part, a functional relation among parameters is derived that distinguishes the presence of the part.

cases that the hypothesis does or does not ex­ ist. This likelihood distribution is the proba­ bilistic aspect of the aggregation node, that al­ lows the hypothesis probability to be inferred from the sub-part parameters.

In general, the aggrega­

tion operator calculates


score, based on dis­

tance and "congruence" between a part's sub­

This formulation is

valuable because


shows how the the recognition process may

parts. Aggregation hypotheses may be sorted

be forma.Wed as distributions within a prob­

so that "coarse" sub-parts are considered be­

ability net.

fore "fine," to further restrict the se� of hy­

surface boundaries, to identify the surfaces

potheses generated. As described, this score is

that compose them. In this ins�ance, suppose


deterministic function of the parameters of

Consider a search for projected­

the projected-surface boundaries are adjacent parallel lines. To aggregate projected-surface


boundaries we derive a scoring function based on

both the parallelism and proximity of line

boundaries. In searching for projected-surface boundaries, the model generation may disre­ gard most potential boundaries of lines by physical arguments without resort to calculat­ ing the aggregation function.

Those bound­

aries for which the scoring rule succeeds spawn a parent node containing a surface hypothesis. This is how the aggregation operator partici­



pates in the aggregation process. ' I

A suh-part may be be a member of the sets ------------- - ---�

of several aggregation operators. Further rules are then applied to determine whether hy­ potheses so formed exdude each other, are in­ dependent or are necessarily co-incident. The range of exclusion through co-incidence may

be captured in the derivation of the likelihood distributions of a sub-part


it is conditioned

on more than one hypothesis. In general, the diagrams, Figures 1 and solved by first


bility Nodes for Machine Vision

2 are

in the determin­

istic scoring functions, then applying Bayes rule. To derive a general form for the aggrega­ tion operator influence diagram, imagine the aggregation operator nodes.

In Pearl's


a parent to the part

solution method, the par­

ent receives a lambda message that are func­ tions of the parameters in each of the sub­ part nodes. This message contains the aggre­ gation function. Because the aggregation op­ erator expresses a relation among the parts, it may not be factorizable as it would

Figure 3: Generic Influence Diagram Proba­

be if

likelihood appears by a set of other factors. The addi­

The aggregation operator multiplied

tional terms like


we term "existence"

the arcs to the sub-puts, l;, from the hypothesis, h. Their interpreta­ tion is, given h is observed ( or is not observ­

likelihoods. They are

able ) does the sub-part appear?

Most often

these are certainty rdations: If there is no ob­ scuration, existence of h implies appearance of its composite features, and vice versa.Thus they may express observability relations where

h exists but not all of its features are obsened.

the sub-part nodes were conditionally inde­ pendent, hence the dependency expressed by the aggregation node among the pari nodes.

To further clarify, think of each feature node's state space as the range of parame­ ters that describe it, plus one point

H we consider the aggregation node's clique

to involve both the high level hypothesis and the suh-part nodes, then an additional set of arcs appear from \he hypothesis to its sub­ parts. This is clear when Bayes' rule is writ­ ten out for the posterior distribution of the .hypothesis:

- that the

node is not observed. The probability thai the node appears is the integral of all the probabil­


mass over the range of parameters. Thus

each part can

be envisioned as two probabilis­ t he part is

tic nodes; one a dichotomy, either

known to exist or it is not; the other a distri­ bution over parameters that describe the


cation and shape, dependent on the existence node.

The aggregation function expresses a

relation between composite sub-part param­ eters and the existence of the parent.


additional terms in Bayes rule suggests di-


red relations

between the existence

nod� of

the parent and appearance sub-part features.

namic programming problem. Thus each level has the structure shown in Figure 3.

These additional terms may be thought of as the membership relations in the is-part-of net­ work.

The relation between the parameters

of the sub-parts and the parent's parameters poses an additional inference problem, much along the lines of traditional statistical infer­ ence

of estimating


set of model parameters

from uncertain data.

Each stage in the dynamic program is con­ structed from the aggregation operators at one level of the hierarchy.

We add decision and

sub-value nodes to the influence diagram to represent control in a dynamic program. the following, we use



to represent the i-th

set of observations (i.e., evidence from image processing operators), 9i to represent the i-th

This method emphasizes the use of mea­

ap;grega.tion score, h0 to represent hypotheses

sured and inferred values to determine the ex­

about physical objects, d; to represent process­

istence of features; we are converting parame­

ing decisions, and

ters into existence probabilities as we move up

The V node represents the values assigned to

the network. The method concentrates on the

the top-level hypotheses.


to represent control costs.

classification aspect rather than the estima­ tion and localization aspect. The hope is that once a set of stable, high level hypotheses are generated, the more difficult part of recogni­ tion has been solved, and accurate estimation can follow using the data classification gener­ ated by what is effectively an "interpretation driven" segmentation process. Estimation can be thought of as a "value to value" process. It might well be necessary to carry this out concurrently if accurate values are required. Alternately, evidence may enter the network directly at higher levels.

Neither possibility

presents a problem to the algorithm. V. Dynamic

The process starts at the bottom of the di­ agram with the first aggregation forming the first set of hypotheses from the original evi­ dence. The evidence may guide the choice of aggregations; which we show by the decision, d0, with


knowledge arc uom e0• An example

would be to choose an edge! linking aggrega­ tion operator in



91, where


are edgels found

image, and h1 are hypothesized object

boundaries. This first stage is shown in Fig­ ure 5. The final decision, d1, selects the object hy­

Instantiation for Sequential


pothesis with the highest value. It will float to the top stage as we add more stages. The top level value V depends on the object hy­ potheses. Intermediate hypotheses do not eon­

In this section,


present a way to formal­

tribute to the value; stage decisions only affect which are additive,

ize the control problem for inference up the

the costs of calculation,

machine vision hierarchy. We show how con­

as the dynamic programming formulation re­


trol over the hierarchy can be expressed as a

quires. It may be interesting to consider what

dynamic program by

influence diagram for­

are the computational gains from a value func­

mulation. At this level of generality we can

tion that is separable by object hypotheses;

abstract out the structure at each level and

such a value function is not considered here.


coalesce all hypotheses at one level of the hier­ archy into one node. These hypotheses nodes

Next the system makes a decision of which

form a chain from the top level (the object) hypothesis to the lowest level. Each level has

processing action to take at the superior stage.

corresponding aggregation and, possibly, evi­

match boundaries into paralle) sets with aggre­


nodes for the aggregation process at that us show

level. This high level structure lets

that for purposes of control the level of the hierarchy can be considered as stages of a dy-


If we add the decision at d1 to, for example, gation operator 92 and so generate projected­ surface hypotheses, h2, we have the diagram shown in Figure 6. Here d2 is, as described, the ehoice-of-object decision.

We can continue to iterate the diagram building process to add another aggregation stage, as shown in Figure 7. It is clear how the sequence of diagrams proceeds as we continue to generalize upward to complete the part-of hierarchy.

If we look at the sequence of in­

fluence diagrams from initialization to object recognition, then we can regard the final dia­ gram


if it had been built before evaluation

took place. The distributions within the nodes

will differ depending on the solution to the di­ agram. Figure 4: First Inference Stage Influence Dia-

It follows that if we show that the

evaluation method is sound in terms of legal influence diagram operations, then we have a formal framework with which to develop an optimal recognition scheme, and in particular a value based method of control. These results are an application of work by

Tatman and Sh ach t er

[Tatman-85], [Tatman

and Shachter-89] on sub-value nodes and dy­ namic programming techniques represented in influence diagram form.

Tatman shows that

optimal policies for diagrams such as those

above can be obtained by influence diagram techniques that are equivalent to dynamic pro­ gram ming methods, and like these methods in­ crease linearly in complexity with the number of stages. Figure 5:

Aggregation Processing

Second In particular, Tatman's in:lluence diagram

Level Influence Diagram

realization of Bellman's Principle of Optimal­ ity [Bellman-57] states that in a diagram with stage decision variables d1, ... , tin, if there ex­ ists a set of nodes



{ :z:• (1), :Z:Jt(2), ... }


sociated to each decision dlt, such that




cessors 2.



x� are informational prede­

of d�

the value node is a sum or product of sub value nodes

3. at least one element of X� is on every di­

x. of X� (with the excep­

rected path from the predecessors of to the successors

tion of the value nodes), then



Aggregation Processing Third Level

the for the decision

Influence Diagram



process,{ d1 *• ..d,.*},

policy will


the property that policy { d� •,

dH 1 •,


d., • } is

optimal for the decision process defined by the original decision process with all nodes, except X.r. and its successors deleted.

If at each level in the hierarchy we set the aggregation node, 9• equal to the set x., we have met the requirements of Tatman's Theo­ rem. So far our influence diagram does not


low the incorporation of evidenc� above the lowest lev�l.

Decisions above

d1 receive no

evidence in addition to the deterministic ag­ gregation computation

from tb� level below.

We now consider the representation of the pro.


cesses of search and refinement. These opera­

Figure 7: Search Process Influence Diagram

tions will extend the range of actions at a deci­ sion node to incorporate evidence hypotheses at the same level or just above. Notice that aggregation is the process of generating a hypothesis at the next higher level. As such it is a process of generalization from sul:rparts to hypothesized super-parts.

In comparison, search is a process of adding more evidence to dis-ambiguate the compet­ ing higher level hypotheses. This is typically done by using the object models in combina­ tion with the location of currently hypothe­ sized imaged objects to direct search and pro­ cessing elsewhere in the image. For example, having hypothesized a projected-surface, h2, we could search in the region bounded by the projected-surface boundaries (from 92) to run a region operator to infer surface-like qualities, or we could search near the projected-sur£a.ce to attempt to infer neighboring surfaces. The influence diagram structure is pictured below. Either operation involves gathering more imagery-based evidence. Hence we denote this as

e3, because it corresponds to the third level

of the processing hierarchy. Notice that there is no direct dependency between g3 and e3. By letting Tatman's X3


{93, e3}, we still fulfill

the requi rem ents of Be l lman ' s theorem. We now turn to refinement.

Figure 8: Refined Process lnftuenc:e Diagram the projected-surface, h2, that compared con­ trasts across the projected-surface boundaries to see if they were likely to bound the same projected surface. sen after

Such an operator is cho­ h2 is instantiated, and so is made

at decision time d2• In this way the evidence collected revises a hypothesis already "aggre­ gated." This is the critical distinction between refinement and search. We view it



additional evidence about h2, so we call it e2; e2 "refines" the hypothesis h2. This

process is

pictured below.

A refinement

operation might be to run an op erato r over


This diagram violates Tatman's require­ ments. In particular h2 is a predecessor of 92,

Figure 9: Solution Steps for Refinement Pro·

cessing A-F

and e2 is a successor of g2, but there is a path

from h2 to Also,



that does not pass through g1•

is not a predecessor of d2, so it can­

not be included in

X(2). ( Notice

Fi gure 10: Sol ution Steps for Refinement Pro­

cessing G-L

t hat



to the value node. We see that the incremental solution maintains the modularity

of the origi·

nal dynamic programming m et h od so that the solution time is still linear in the number of stages.

a predecessor of d3.) However, by applying the same proof princi­

VI. Conclusions

pl es to this particular diagram structure that Tatman applied in his optimality proof, we can show that sub-value modularity is maint ained ,

at the cost of complicating each stage with


additional arc. To see this we appl y the stan­ dard influence diagram sol u tion steps to roll

We have formalized an approach to machine vision in an influence diagram framework and shown

that system processing can be r epre­

sente d


dynamic instantiation of image in­

terpretation hypotheses in influence d iagrams .

back the diagram as shown in Figures 9 and

Hypotheses are generated by matching aggre­


g ated im agery features against physical object

models . As we reverse arcs aDd

connect pred ecessors ,

Instantiatin g new hypotheses corre­

spond to introducing new nodes and random

the computational complexity rises from steps

variables in the influence diagram. We showed

a-e, and th en

a method of representing the affects of ba­


nodes are absorbed, at the

third level, the diagram simplifies almost back to the original, incremental two-stage

( up


sic imagery i nterp retation actions of search,

refinement and aggregation in influen ce dia­

h2) diagram, except that we have an extra arc

gram formalisms.

fro m g2 to the top

node. As we continue

taken leads to a new i nfluence diagram. We

rolling back the diagram, reversal of arcs en­

chose the next vision action by evaluating the


current diagram. The final influence diagram


valu e

a gain raises complexity, and again

resolves with only one additional arc from Yl


contains all

rando m

Each new action that is

variables dynamically in-

stantiated during control by

the vision system.

We showed that the sequence of decisions to

fore a formal a nal ysis

of this pro blem is com­


act taken by the vision system is the same as it would have been had we derived those de­

cisions from evaluation of







support for this research has

provided under the ADRIES co nt r act

been (U.S.

Our method of vision system control by in­ Government Contract DACA76-86-C-0010) by crementally eval uat ing influence diagrams as the Defense Advanced Research Projects we build them results in a consistent, evalu­ Agency (DARPA), the U.S. Army Engi­ ated, final influence diagram. Development of neer Topographic Laboratory (ETL), the U.S. an efficient evaluation method for partial in­ Army Space Program Office (ASPO), an d the

stantiation of diagrams remains as future re­ search.


There is m uch



and execution. So far we have only rep­

resented aggregation, search



to do to complete

the task of machine vision system representa­


U.S. Air Force Wright Avionics Development




neighboring hierarchical levels

of in­

[Agosta-88] J. M. Agosta, "The Structure of Bayes Nets for Vision Recognition", Proceed­ lem alJows these operations to jump around ing• of the Fourth Uncertainty in Artificial between levels. This in turn raises the is­ Intelligence Work1hop, sponsored by AAAI, sue of classifying machine vision operations in Minneapolis, Minnesota, August 1988. terms of their probabilistic dependencies. We believe we have captured some fundamental [Bellman-57] R. Bellman, Dynamic Program­ paradigms for computer vision, but there are ming ( Princeton Univer sity, New Jersey: many operators and processing paradigms in Princeton Univer sit y Press, 1957). ference.

However, the general vision prob­

the literature. For example aggregation oper­ ations between inferred 3D volumes

and adja­

cent 2D surfaces involves violating modularity assumptions used in this work. Another major issue is the pre-runtime com­

[Binford-SO] T. 0. Binford, R. A. Brooks, and D. G. Lowe, "Image Understanding

Via Ge­ ometric Models," in Proceeding• of the Fifth International Conference on Pattern Recogni­ tion, Miami, Florida, December 1980.

et al.-87] T. 0. Binford, T. S. Levitt, W. B. Mann, "Bayesian Inference in presented for hierarchical value computation. Model-Based Machine Vision," in Proceeding• This work shows that because we can cast ma­ of the AAAI Uncertainty in Artificial Intel­ chine vision control as a dynamic program­ ligence Work1hop, Seattle, Washington, July ming construct, the concept of value of infor­ 1987. mation can be applied. Casting this co ncept [Ettinger-88] G. J. Ettinger , "Large Hierarchi­ in this framework is work in progress. putation of expected values


from sy stem


In [Levitt et al.-88] a scheme was

[Binford and

cal Object Recognition Using Libraries of Pa­

Finally, the combinatorics of machine vision demand distributed processing. This requires multiple processing decisions to be made si­ multaneously. Here optimality computation is burdened with the expected interactions be­ tween processing results. It is likely that

more en gin eering


solutions will be realized be-


in Proceeding• of the Computer Vi1ion and Pattern Recogni­ tion Conference, Ann Arbor , Michigan, June 1988.

rameterized Model Sub-parts,"

[Leviti et al.-88] T.

S. Levitt, T. 0. Binford, Ett ing er , and P. Gelband, "Utility-Based Control for Computer Vision," Proceeding• of G. J.

the Fourth Uncertainty in Artificial Intelli­

sponsored by AAAI, Min­ neapolis, Minnesota, August 1988. gence Workshop,

[Lowe-86J D. Lowe, "Perceptual Organi�ation and V isual Recognition Ph.D. Thesis, De­ partment of Com put er Science, Stanford Uni­ versity, Stanford, California, 1984. ,"

[Nevatia-74] R. Nevatia, "Structured Descrip­ tions of Complex Curved Objects for Recog­ nition and Visual Memory," Memo, AIM250, Artificial Intelligence Laboratory, Stan­ ford University, Stanford, California, 1974. [Pearl-86} J. Pearl, "Fusion, Propagation, and Structuring in Bayesian Networks," Artificial Intelligence, 1986.

[Shachter-86] R. D. Shachter, "Evaluating Influence Diagrams," Operations Research, vol. 34, no. 6 (Nov-D�c 1986), pp. 871-882. [Tatman-86] J. A. Tatman, "Decision Pro­ cesses in Influence Diagrams: Formulation and Analysis," Ph.D. Thesis, Department of Engineering-Economic Systems, Stanford Uni­ versity, Stanford, California, December 1985. [T atman and Shachter-89] J. A. R. D. Shachter, in preparation.

Tatman and