**How does PSP
work?**

PSP is a search problem that attempts to answer the question, How does one determine the range of data patterns a model can generate? The problem is addressed by first adopting a particular definition of a data pattern. In doing so, one induces an unknown but finite partition on the parameter space of a model, with each region corresponding to a data pattern. The goal is to design a search procedure that visits each of these unknown regions at least once. Model analyses follow from what is learned from the search.

__Difficulties Inherent in a Sampling-based
Approach__

An intuitive approach for finding regions is to
sample the parameter space enough times to ensure that each region is
sampled
at least once. In this approach, one is effectively randomly sampling
parameter
values from some distribution, and then checking to see if they produce
a new
region. In a naive, Simple Monte Carlo (SMC), approach, this
would be a
uniform distribution over the whole parameter space. As the number of
samples
drawn increases, this procedure will eventually uncover all regions of
non-zero
volume with probability 1. However, as the dimensionality of the model
increases, SMC becomes inefficient quite rapidly. An unreasonably large
number
of samples is required for accurate estimation. This problem is
commonly
referred to as the* curse of dimensionality*, and it force us to
turn to
more efficient search methods. In essence, this amounts to finding a
better
probability distribution from which to sample.

One of the major inefficiencies in SMC search is that it takes no account of the fact that some regions are much larger than others. With the goal of PSP being to visit each search region at least once, it is inefficient to spend most of one's time in a few big regions, which is what would happen using a uniform distribution. Thus, an ideal sampling distribution would be one that assigns equal probability to every region in the partition. In short, a natural way to improve on SMC would be to rescale the parameter space so that all regions are treated as being the same size. Our PSP algorithm uses a Markov Chain Monte Carlo (MCMC) procedure to perform this kind of rescaling.

__PSP Sampling by MCMC: The Basic Idea__

MCMC (Gilks et al, 1996) is a powerful method
for sampling a nonstandard target distribution, denoted *f(x**)
*where x stands for the parameter vector 'theta'. The
assumption
is that we
know how to calculate the value of *f(**x)*
, but have no simple method of sampling from it. The
Metropolis-Hastings
algorithm (see Chib & Greenberg, 1995) provides such a method. One
begins
by specifying a* jumping distribution *(described more fully
below),
denoted *q(**x(k)**|*x(k-1)*),
*which
is used to sample candidate points by “jumping” around in
the target distribution. With probability *a*= min{1, [*f(**x(**k)**)
q(x(k-1)**|*x(k))*]** /[
f(x(**k-1)**)
q(x(k)**|**x(k)**)]},*
the next
sampled point is x(k).
However, with probability 1-*a*, we stay in the same spot (i.e.,
the next point is x(k-1)). The
ergodicity of
the algorithm guarantees that the values of x
converge to samples from the target distribution.

Recall that our goal was to specify a target
distribution that is uniform over the *regions* in parameter
space.
Obviously it is impossible to specify this distribution directly, since
the
partition itself is unknown. However, we can specify a series of
approximations
to this ideal distribution that will eventually converge to it, under
certain
assumptions. At any given point in the search, the PSP algorithm will
have
discovered some number of regions. In order to ensure that each of the
discovered regions is treated equally, we run a separate Markov chain
employing
the Metropolis-Hastings algorithm for each of these regions, each with
a
different target distribution that is uniform on that region
(i.e. *f(**x)=**
c _{o} *for
some positive constant

__“Growing” the Partition: How PSP finds regions__

The procedure outlined above has the rather
awkward property of converging to the ideal target distribution
(uniform on the
regions) so long as the algorithm has already discovered all the
regions, which
rather defeats the point of the search. In order to address this
problem, we
need to ensure that the algorithm looks *outside *of the
discovered
regions often enough to ensure that it finds new regions. Our approach
to
discovering new regions is based on the assumptions that new regions
will be
found along the edges of old regions (“contiguity,” see below). It also
helps
if adjacent regions are correlated in size. A natural way to find
regions that
satisfy this assumption is to adapt the jumping distribution to ensure
that the
Markov chains jump outside of these regions at an ideal rate. To
understand why
this approach succeeds, recall that the target distribution for a given
MCMC
routine is uniform on a particular region, so points that lie outside
the
region will always be rejected. If the parameter space meets the
aforementioned conditions, all regions will be found.

Another aspect of our approach is that, whenever a new region is encountered, the PSP algorithm concentrates solely on that region until it has “caught up”' to the pre-existing regions in terms of the number of samples drawn from it. (If multiple new regions are found, they are queued until they all catch up.) The reason for this is that the discovery of a new region is essentially a source of information about a potentially informative area of the parameter space that has not yet been fully explored. Accordingly, since we wish to treat all regions equally, the algorithm concentrates on a new region until it is on an equal footing with all the regions that were previously discovered. By deliberately ensuring that the various chains frequently jump outside their borders, the PSP algorithm is constantly ``looking'' for new regions. In sum, the algorithm aggressively searches for new regions in the vicinity of a newly discovered region at a level of granularity that is scaled to the size of the current region.

__The Jumping Distribution__

As the previous discussion should make clear, the choice of a jumping distribution is critical to the success of the PSP algorithm, even though it is chosen independently of the target distribution and serves only as the source from which to generate candidate points. We have used a uniform distribution over a hypersphere centered on the current location in the parameter space. The reason for using a hypersphere is that it is widely used in cases where the sampling region is constrained and no prior information is available about the shape of the region; It is our axiomatic belief that the hyperspherical shape of a jumping distribution is least likely to favor a particular shape of the sampling domain a priori.

In PSP, the size of the jumping distribution (i.e., the radius of the hypersphere) must adapt to each region. If it is too small, almost all candidate points will be accepted, but every jump will be so small that it will take too many jumps for an exhaustive search of a new region. Also, rejected points, by which new regions outside the current region are being discovered, will rarely be generated. In contrast, if the size of the jumping distribution is too large, candidate points will be rejected too often, and the granularity of the jumps will not be small enough to define the edges of a region, which requires a properly sized jumping distribution to succeed. Unfortunately, unless one is dealing with a normal distribution, no theory exists that defines the optimal jumping distribution.

With the PSP algorithm, we have found it best to adapt the size of the jumping distribution so that the Markov chain on average accepts 20% to 25 % of the candidate points. In other words, each chain spends 20-25% of its time moving around inside the region, and the rest of its time searching for new regions just outside the borders. This range was identified heuristically by testing the algorithm on real and toy models many times using jumping distribution that varied widely in size. Two performance measures were used in these tests: Search speed and the variance of the region’s volume estimates. The goal was to find an acceptance rate for the jumping distribution that maximized search speed and minimized the variance of volume estimates.

__Formal Description of the PSP Algorithm__

Having presented the main concepts underlying PSP, we are now in a position to define the search algorithm formally.

*
Initialize.
*Given *x(1)**
** *and pattern 1*, set m* = *i* = 1.

*
Step
1. *Establish *q_**m**(**.|**.) *by adapting the size
of its
hyperspherical domain. Go to step 2.

*
Step 2. *Set *i *=
mod(*i*, *m*) + 1. Go to step 3.

*
Step
3. *Sample *x**
*from *q_**i**(**.|**x(i**)**)*.
If x*
*generates a
new valid pattern, set *m* = *m* + 1, set x(m)=x*, *record
the new pattern as
pattern *m*, and then go to step 1. If *x** *generates
pattern *i*, set *x(i)**=x*
and go to step 2. Otherwise, go to step 2.

In the
algorithm, *q_**i**(**.|**x(i)**) *denotes
the jumping distribution of the region corresponding to
pattern *i*, centered at *x(i)**.
*The subscript* 1 *=< * i* =< *m *indexes
the region
from which we are currently sampling, and *m* represents the
number of
regions found so far. The algorithm terminates when a preset number of
search
trials (i.e., size of an MCMC chain) is obtained for *each* of
the
discovered regions. Several analytic methods are available to help
determine
how many search trials must be used for the series to converge to the
target
distribution (Cowles & Carlin, 1996; Gelman, 1996).

__Assumptions when using PSP__

For the PSP algorithm to succeed in finding all data patterns, some assumptions must be met. They are:

1.
*Continuity*.
A data pattern occupies a single continuous region in the parameter
space, such
that any two points that produce the pattern can be joined by a path
that
passes only through points that produce that same pattern.

2.
*Contiguity*.
Regions are contiguous with one another on a predefined parameter
space, in the
sense that any two patterns can be joined by a path that passes only
through
points that produce a valid data pattern.

3.
*Stationarity*.
Model behavior is stationary in the sense that a given parameter set
always
generates a single, fixed data pattern. This means that the boundaries
of the
regions are fixed, not varying every time the model generates a data
pattern

Among these three conditions, the first two (continuity and contiguity) are key conditions necessary for the algorithm to perform satisfactorily. That is, the assumption that the region is continuous is implicit in Step 1, and the contiguity condition is necessary for Step 3. In the following, we discuss how potential violations of all three can be detected and handled if suspected.

The assumptions of continuity and contiguity are reasonable to make if the computational model being analyzed is composed of mathematical functions that are known to exhibit these properties (i.e., the functions are continuous and smooth). If these assumptions are suspected of being violated (e.g., there are breaks in the range of a parameter), the modeler can check the consistency of the PSP solution through multiple runs of the algorithm. Estimated center or mean points and covariance matrices of discovered regions, which are outputted by the algorithm, will reveal inconsistencies, such as discrepant volume estimates or the number of data patterns found. Inspection of these values across multiple PSP runs should validate any suspicions and reveal the nature of the inconsistency, if it exists.

The stationarity assumption implies that PSP is not applicable to simulation-based, probabilistic models (e.g., distributed connectionist models, algorithmic models like REM, Shiffrin & Steyvers, 1997). For this reason, a probabilistic model can be analyzed by PSP only if its simulational component can be replaced with a closed-form probability density (or mass) function, so that data patterns are defined on the space of probability distributions. Implementation of an algorithm like PSP requires dealing with other issues as well. Many of these are practical in nature, but have the potential to influence the results. For example, if neighboring regions are similar in size, there is a good chance of finding them because the probability of sampling from a similarly-sized region is higher than a much smaller region. If one suspects that tiny regions are nestled between larger ones, then the best course of action is to increase the sample size so that the likelihood of landing in these tiny regions is greater. The use of multiple runs, as discussed above, will also help in this regard. Other pragmatic issues include ensuring the parameter ranges are finite and defining appropriate stopping criteria.

The preceding discussion is intended to alert potential users of PSP to the issues that must be considered when applying it. Our sampling-based approach to partitioning the parameter space makes it a powerful tool for model analysis, but like other numerical methods that work in high dimensions, including MCMC and non-linear optimization, correct answers are not guaranteed and cannot be proven in general settings, except for trivial cases. This is why heuristics must be devised and used during application of the method to combat potential problems. Their purpose is to minimize anomalous results and ensure that an answer is accurate and trustworthy.

__Accuracy of Volume Estimation__

The ability to analyze the volumes of regions is an attractive feature of PSP. Nevertheless, it must be recognized that it is nontrivial to estimate the volume of an arbitrary multidimensional object with no closed-form shape and location. Volume estimation was performed in two steps, both using sampling-based methods. In the first, a region’s border is approximated using an ellipsoid defined by the estimated mean vector and covariance matrix of the MCMC sample of the region. If we use the volume of this ellipsoid as a first approximation to the volume of the region, the result should be biased towards overestimation because an ellipsoid is an "evenly packed" object in every possible direction, and data patterns are not likely to have similarly simple, compact shapes. Because of this, the amount of bias should be related to the degree to which the region is concave (all or part of it). This bias can affect the volume analysis because the degree of concavity may differ across regions being compared.

The second step of
volume estimation adjusts for this
bias via the Hit-or-Miss method of Monte Carlo integration. First, a
random
sample of size *n* is drawn from the uniform distribution over
the
ellipsoid. Next, the ratio of the number of sample points within the
region
(i.e., the “hits”) to the total number of samples, *n,*
is used as a
factor of bias correction. This value is then multiplied by the volume
of the
ellipsoid to obtain the final volume estimate of the region. Our
expectation in
using this two-step procedure was that any bias of underestimation
after
adjustment would be negligible compared to the overestimation before
adjustment. Nonetheless, it was necessary to assess the severity of the
bias in
order to determine how it might limit analyses using the volume
measure.

We therefore
designed a very stringent simulation test
to assess the accuracy of volume estimation. A region in a
multidimensional
space was randomly generated in the following way. On an evenly spaced
grid in
this space, a cell was selected as a starting point of the region. The
region
was then expanded (i.e., grew) into one of its orthogonally neighboring
cells
in a random direction, to form a two-celled region. The region was
expanded
again into a neighboring cell, adjacent to any part of the region, not
just the
new cell. This process was repeated until this randomly-shaped region, *R*,
filled a preset number or cells. Its volume, *V _{R}*, was
estimated
by PSP using the equation:

where *V _{d}* is the volume of a

Accuracy was measured by comparing the estimated volumes of two of these arbitrarily-shaped regions. The ratio of these two volumes was used as the measure of accuracy, for which the known, true ratio served as the reference point. It is important to understand that it is not appropriate to compare the estimated volume of a region with its “true” volume. The reason is that the scale of the parameter space is arbitrary, so the value of a region’s volume is not meaningful by itself. The value is meaningful only in comparison with another region’s volume, which is how the measure was used in our three application examples, and how it was used in this test.

The simulation was carried out using a two-factorial design that yielded a variety of situations in which PSP’s volume estimation might be used in real applications. The number of dimensions of the parameter space was the first variable. There were three levels (4, 7, 10), which were chosen to cover a range of models in use in the discipline. The second factor was the number of cells used to form each of the two randomly-shaped regions (3, 10, 30, 100). All four levels were completely crossed (e.g., 3&10, 3&30, 3&100) to increase the diversity of possible shapes being compared, which was meant to further increase the realism and difficulty of the test. Note that because of the algorithm’s scale-adapting ability, as discussed earlier, it does not pose an obstacle to compare regions whose volumes comprise a different number of cells (e.g., 3 vs 30). The two regions were simply scaled to have the same volume so that their ratio should be one in all comparisons.

Because the region-generation method described above was highly unconstrained (requiring only continuity between cells), the 10-, 30-, and 100-cell regions had the potential to have extremely bizarre shapes (e.g., multi-pronged objects in various directions and dimensions). As a result, conditions 3&30, 3&100, and 10&100 were likely to be the most challenging pairings. The accuracy of volume estimation was pushed to the limits in these cases.

Figure 1A.

In each of the
pairings across dimensionality, the
region-generation and volume-estimation and -comparison process was
repeated
300,000 times. Regions were sampled 60,000 times (for ellipsoid
estimation) and
15,000 times (for the Hit-or-Miss adjustment) on each trial, which is
equivalent to the five multiple runs used in the Merge and TRACE
analyses. A
subset of the results, which includes the most difficult cases, is
shown in
Figure 1A. Each panel shows the distribution of volume ratio estimates
for a
given dimensionality, *d*, across the different cell-number
pairs. The
ratio

of the larger estimate to the smaller was used, so values greater than one indicate the size of error in volume comparison.

Across most pairing
trials, volume estimation was
good, with the peaks of the distributions at or near 1.0, and the tails
falling
off sharply so that few values exceeded 1.5, even when *d*=10.
Errors were
greatest for the pairings in which cell numbers differed most
(3&100,
10&100), and this tendency was magnified with dimensionality.
Nevertheless,
the bias is surprisingly modest, as the 3&100 comparison is a full
order of
magnitude larger than the 3&10 pairing, yet the means for the two
distributions differ by only 0.163 (1.234 and 1.071, respectively).
When this
same test was performed without the Hit-or-Miss adjustment
volume-adjustment
step, biases were much more severe, with the peaks of error
distributions far
beyond 1.0, especially when the number of dimensions and cells were
large.

In sum, the results
of this test demonstrate that
although volume estimation can be biased, its robustness across a wide
range of
region shapes indicates the method is reasonably accurate and
trustworthy most
of the time. For the models used in the present study, volume
differences between
regions were *exponentially* large. To view them all on the same
graph, we
had to plot them on a log scale. Thus, it goes without saying that the
volume
estimation method is more than sufficiently accurate to use
rank-ordered
estimates in analyzing the volumes of parameter regions no matter what
their
shape. In most instances, comparison on an interval scale is
appropriate.