MATLAB Codes and PSP Manual

 

The Matlab codes of PSP consist of two MATLAB functions: psp and optsetpsp. Commented source files can be downloaded:

 

psp.m     optsetpsp.m

 

Provided below is the manual of psp. We believe they are sufficiently detailed for using the algorithm in practice.

 

psp

Find data patterns a computational model can possibly generate. The PSP (Parameter Space Partitioning) algorithm finds discrete data patterns the given model could produce by identifying their corresponding regions in the unknown partition of the model's parameter space.

Syntax

Y = psp(model,x0,xbound)

Y = psp(model,x0,xbound,options)

Y = psp(model,x0,xbound,options,p1,p2,...)

[Y,xmcv] = psp(...)

[Y,xmcv,optjmp] = psp(...)

[Y,xmcv,optjmp,stime] = psp(...)

Description

Y = psp(model,x0,xbound) searches the parameter space defined by xbound for the data patterns of model, with starting point(s), x0, and returns discovered data patterns in Y.

Y = psp(model,x0,xbound,options) runs with the default search options replaced by values in the structure options. Use optsetpsp to set these options.

Y = psp(model,x0,xbound,options,p1,p2,...) passes the problem-dependent parameters p1,p2,... directly to the function model. In psp, model is evaluated in the form: feval(model,x,p1,p2,...). Use options=[] as a place holder if no options are set.

[Y,xmcv] = psp(...) returns information about the partition of parameter space in a cell array xmcv.

[Y,xmcv,optjmp] = psp(...) returns the results of the adaptation (i.e., optimal size of jumping distribution) of the Markov chain in each discovered region.

[Y,xmcv,optjmp,stime] = psp(...) returns the information of elapsed search time before finding each data pattern.

Input Arguments

model              The function whose input space is to be searched for its output patterns. model is a MATLAB function that accepts parameter values in a column vector x and returns a pattern of data in a row vector y. The function model can be specified as a function handle for an M-file function:

 

                          Y = psp(@myfun,x0,xbound)

 

                          where myfun is a MATLAB function such as

 

                          function y = myfun(x)

          y = ...          % Compute a data pattern at x

                                   The function must always return a row vector of the same length for any given parameter point x ranged within xbound.

x0,xbound    x0 must be an NDIM-by-M matrix where NDIM is the number of model parameters and M is the number of starting points--the user may supply multiple starting points. xbound is an NDIM-by-2 matrix whose first column sets the lower bounds of parameters, and second column, the upper bounds. xbound must be finite (i.e., Inf or -Inf is not allowed). In case some unconstrained parameters are unavoidable (i.e., plausible data patterns could be generated from their extreme values), it is advisable to reparameterize model, making use of log, inverse logistic, or other transformation function.

Options   Search options used in psp are MaxPSP, IniJump, SmpSize, VolEst, VolSmpSize, Display, and DispInterval. For details, see Options below.

Output Arguments

Y                        Y stores discovered data patterns in rows.

xmcv      xmcv is a 1-by-4 cell array containing the information about the resultant partition of the algorithm run. Each cell element contains:

                                   xmcv {1}            First element of xmcv is an NDIM-by-N matrix that consists of N column vectors of parameter values that can generate each of N discovered data patterns.

                                   xmcv {2}         Second element is a an NDIM-by-N matrix that contains N column vectors of mean parameter values, each corresponding to each of the regions identified in the partition.

                          xmcv {3}         Third element is a 3-dimensional matrix in which the covariance matrix of parameter values over the uniform distribution on each region stacks up along the 3rd dimension.

                          xmcv {4}         fourth element is a row vector of approximated volumes of the discovered regions on a log scale.

optjmp        A 1-by-N vector optjmp stores the results of adapting the Markov chain in each discovered region. The size of adapted jumping distribution (radius of a sphere) for the i-th region can be retrieved by IniJump*2^optjmp(i).

stime            stime is a 2-by-(N+1) matrix whose first row lists (cumulative) time points at which each of N data patterns is discovered, and the time of terminating the algorithm in the last column. The second row lists basically the same information except that the unit is the number of search trials made before finding each new pattern.

Options

Search options used by psp. You can use optsetpsp to set or change the values of these fields in the options structure options. Each of them is explained below:

MaxPSP           Maximum number of search cycles allowed before termination. The psp search stops when the number of search cycles of every Markov chain (in every discovered region) reaches MaxPSP. A cycle consists of a certain number of search trials (i.e., MCMC sample points). The cycle size is set by the 2nd element of SmpSize (see below). The counting of cycles does not begin until the adaptive stage of every Markov chain enters the stage of a regular MCMC sampling. The default value is 6. Setting MaxPSP to a low value may cause an incomplete search or unreliable information on the partition of parameter space.

IniJump         Initial size of jumping distribution with which the Markov chain in a newly found region starts. The jumping distribution of the MCMC process is a uniform density over a hyper-sphere. IniJump specifies the initial value for the sphere's radius. The regular MCMC sampling process in each discovered region does not begin until the size of jumping distribution is adapted so that it meets a certain rate of sample acceptance. IniJump does not depend on the scale of parameters because the range of every parameter is rescaled to [0 1] in psp. Even so, IniJump is problem-dependent and should affect the efficiency of psp. The default value is 0.1.

SmpSize         Sizes of MCMC samples used to adapt the Markov chain of each discovered region. The adaptation is performed in two stages each of which consists of a few sampling cycles. SmpSize specifies the size of MCMC sample in one cycle. SmpSize is a row vector of two integers, e.g., [smpsz1 smpsz2], where smpsz1 defines the size of one cycle in the Level 1 stage, and smpsz2 in the Level 2 stage. The former must be smaller than the latter (e.g., smpsz1 is half of smpsz2) since the Level 1 adaptation is designed to be fast but coarse-grained whereas the Level 2 adaptation to be slow but fine-grained. smpsz2 also serves as the size of regular MCMC cycles following the adaptation, which are counted to determine the termination of psp (see MaxPSP above). The default setting in psp is ceil([100 200]*1.2^NDIM).

VolEst               Method of volume estimation of dicovered regions.

                                   ‘Ellipsoid’ (default) estimates the volume through ellipsoid approximation using eigen decomposition of the estimated covariance matrix from each region's MCMC sample. Although this method does not require additional computing time, it causes a bias toward overestimation because an ellipsoid fills up the, if any, concavity of the region.

                                   ‘HitMiss’  uses a methodinspired by the hit-or-miss Monte Carlo integration, which corrects the bias due to ellipsoid approximation. This method is the same as the regular hit-or-miss integration method except that the estimated ellipsoid is employed as a base sampling domain, which may exclude some portion of the region and may thus result in underestimation. This method, however, will give better estimates in general. Additional computing time is required after the completion of search process.

VolSmpSize Size of a sample used to estimate a region's volume when VolEst is set as 'HitMiss.' Default is ceil(500*1.2^NDIM).

Display           Level of display.

                                   'off'               displays no output;

                                   'final'         displays just the final output (number of patterns found & elapsed time);

                                   'min'               displays the region index each time a new data pattern is discovered;

                                   'all'               (default) displays all the information regarding the status of search and MCMC adaptation.

DispInterval Length of intervals (in search trials) at which psp displays elapsed time while no new pattern is found. By default, psp does so every 30,000 search trials.

 


optsetpsp

Creates a psp search options structure options in which the named parameters have the specified values.

Syntax

options = optsetpsp('param1',value1,'param2',value2,...)

Notes

Any unspecified parameters are set to [] (parameters with value [] indicate to use the default value for that parameter when options is passed to psp). It is sufficient to type only the leading characters that uniquely identify the parameter. Case is ignored for parameter names. See psp above for details of search options.