Title: | Score Matching Estimation by Automatic Differentiation |
---|---|
Description: | Hyvärinen's score matching (Hyvärinen, 2005) <https://jmlr.org/papers/v6/hyvarinen05a.html> is a useful estimation technique when the normalising constant for a probability distribution is difficult to compute. This package implements score matching estimators using automatic differentiation in the 'CppAD' library <https://github.com/coin-or/CppAD> and is designed for quickly implementing score matching estimators for new models. Also available is general robustification (Windham, 1995) <https://www.jstor.org/stable/2346159>. Already in the package are estimators for directional distributions (Mardia, Kent and Laha, 2016) <doi:10.48550/arXiv.1604.08470> and the flexible Polynomially-Tilted Pairwise Interaction model for compositional data. The latter estimators perform well when there are zeros in the compositions (Scealy and Wood, 2023) <doi:10.1080/01621459.2021.2016422>, even many zeros (Scealy, Hingee, Kent, and Wood, 2024) <doi:10.1007/s11222-024-10412-w>. |
Authors: | Kassel Liam Hingee [aut, cre] , Janice Scealy [aut] , Bradley M. Bell [cph] |
Maintainer: | Kassel Liam Hingee <[email protected]> |
License: | GPL (>=3) |
Version: | 0.0.68 |
Built: | 2024-11-17 02:55:05 UTC |
Source: | https://github.com/kasselhingee/scorematchingad |
Hyvärinen's score matching (Hyvärinen, 2005) <https://jmlr.org/papers/v6/hyvarinen05a.html> is a useful estimation technique when the normalising constant for a probability distribution is difficult to compute. This package implements score matching estimators using automatic differentiation in the 'CppAD' library <https://github.com/coin-or/CppAD> and is designed for quickly implementing score matching estimators for new models. Also available is general robustification (Windham, 1995) <https://www.jstor.org/stable/2346159>. Already in the package are estimators for directional distributions (Mardia, Kent and Laha, 2016) <doi:10.48550/arXiv.1604.08470> and the flexible Polynomially-Tilted Pairwise Interaction model for compositional data. The latter estimators perform well when there are zeros in the compositions (Scealy and Wood, 2023) <doi:10.1080/01621459.2021.2016422>, even many zeros (Scealy, Hingee, Kent, and Wood, 2024) <doi:10.1007/s11222-024-10412-w>.
This package's main features are
A general capacity to implement score matching estimators that use algorithmic differentiation to avoid tedious manual algebra.
The package uses CppAD
and Eigen
to differentiate model densities and compute the score matching discrepancy function (see scorematchingtheory
).
The score matching discrepancy is usually minimised by solving a quadratic equation, but a method for solving numerically (through optimx::Rcgmin()
) is also included.
On Linux platforms using the gcc
compiler new models can be fitted with the help of customll()
, in a similar fashion to models in the TMB
package.
New manifolds or new transforms require small alterations to the source code of this package.
Score matching estimators for the Polynomially-Tilted Pairwise Interaction (PPI) model (Scealy and Wood 2023; Scealy et al. 2024). See function ppi()
.
Score matching and hybrid score matching estimators for von Mises Fisher, Bingham and Fisher-Bingham directional distributions (Mardia et al. 2016). See vMF()
, Bingham()
and FB()
.
Implementation of a modification of Windham's robustifying method (Windham 1995) for many exponential family distributions. See Windham()
.
For some models the density approaches infinity at some locations, creating difficulties for the weights in Windham's original method (Scealy et al. 2024).
Colleagues Andrew T. A. Wood and John T. Kent played important roles in developing the statistical ideas and theory for score matching estimation for the PPI model (Scealy et al. 2024).
We developed this package on Ngunnawal and Ngambri Country. We thank the Country for its influence.
Maintainer: Kassel Liam Hingee [email protected] (ORCID)
Authors:
Janice Scealy (ORCID)
Other contributors:
Bradley M. Bell [copyright holder]
Amaral GJA, Dryden IL, Wood ATA (2007).
“Pivotal Bootstrap Methods for k-Sample Problems in Directional Statistics and Shape Analysis.”
Journal of the American Statistical Association, 102(478), 695–707.
27639898, http://www.jstor.org/stable/27639898.
Bell B (2023).
“CppAD: A Package for Differentiation of C++ Algorithms.”
https://github.com/coin-or/CppAD.
Hyvärinen A (2005).
“Estimation of Non-Normalized Statistical Models by Score Matching.”
Journal of Machine Learning Research, 6(24), 695–709.
https://jmlr.org/papers/v6/hyvarinen05a.html.
Hyvärinen A (2007).
“Some extensions of score matching.”
Computational Statistics & Data Analysis, 51(5), 2499–2512.
doi:10.1016/j.csda.2006.09.003.
Liu S, Kanamori T, Williams DJ (2019).
“Estimating Density Models with Truncation Boundaries using Score Matching.”
doi:10.48550/arXiv.1910.03834.
Mardia K (2018).
“A New Estimation Methodology for Standard Directional Distributions.”
In 2018 21st International Conference on Information Fusion (FUSION), 724–729.
doi:10.23919/ICIF.2018.8455640.
Mardia KV, Jupp PE (2000).
Directional Statistics, Probability and Statistics.
Wiley, Great Britain.
ISBN 0-471-95333-4.
Mardia KV, Kent JT, Laha AK (2016).
“Score matching estimators for directional distributions.”
doi:10.48550/arXiv.1604.08470.
Martin I, Uh H, Supali T, Mitreva M, Houwing-Duistermaat JJ (2019).
“The mixed model for the analysis of a repeated-measurement multivariate count data.”
Statistics in Medicine, 38(12), 2248–2268.
doi:10.1002/sim.8101.
Scealy JL, Hingee KL, Kent JT, Wood ATA (2024).
“Robust score matching for compositional data.”
Statistics and Computing, 34, 93.
doi:10.1007/s11222-024-10412-w.
Scealy JL, Wood ATA (2023).
“Score matching for compositional distributions.”
Journal of the American Statistical Association, 118(543), 1811–1823.
doi:10.1080/01621459.2021.2016422.
Windham MP (1995).
“Robustifying Model Fitting.”
Journal of the Royal Statistical Society. Series B (Methodological), 57(3), 599–609.
2346159, http://www.jstor.org/stable/2346159.
Wiria AE, Prasetyani MA, Hamid F, Wammes LJ, Lell B, Ariawan I, Uh HW, Wibowo H, Djuardi Y, Wahyuni S, Sutanto I, May L, Luty AJ, Verweij JJ, Sartono E, Yazdanbakhsh M, Supali T (2010).
“Does treatment of intestinal helminth infections influence malaria?”
BMC Infectious Diseases, 10, 77.
doi:10.1186/1471-2334-10-77.
Yu S, Drton M, Shojaie A (2019).
“Generalized Score Matching for Non-Negative Data.”
Journal of Machine Learning Research, 20(76), 1–70.
https://jmlr.org/papers/v20/18-278.html.
Yu S, Drton M, Shojaie A (2020).
“Generalized Score Matching for General Domains.”
doi:10.48550/arXiv.2009.11428.
Useful links:
Report bugs at https://github.com/kasselhingee/scorematchingad/issues
This is a low level object useful for implementing score matching estimators.
An R6
class for storing a 'pointer' to a CppAD
tape in C++
(also called an ADFun
) and associated information.
Currently tools for modifying this information are not available in this package, however tools for creating new ADFun
objects from an existing ADFun
are available.
Typically an ADFun
object will be created by buildsmdtape()
.
This package uses version 2024000.5 of the algorithmic differentiation library CppAD
(Bell 2023) to build score matching estimators.
Full help for CppAD
can be found at https://cppad.readthedocs.io/.
Differentiation proceeds by taping the basic (atomic) operations performed on the independent variables and dynamic parameters. The atomic operations include multiplication, division, addition, sine, cosine, exponential and many more.
Example values for the variables and parameters are used to conduct this taping, so care must be taken with any conditional (e.g. if-then) operations, and CppAD
has a special tool for this called CondExp
(short for conditional expressions
).
The result of taping is an object of class ADFun
in CppAD
and is often called a tape.
This ADFun
object can be evaluated, differentiated, used for further taping (via CppAD
's base2ad()
), solving differential equations and more.
The differentiation is with respect to the independent variables, however the dynamic parameters can be altered which allows for creating a new ADFun
object where the dynamic parameters become independent variables (see tapeSwap()
).
For the purposes of score matching, there are also fixed parameters, which are the elements of the model's parameter vector that are given and not estimated.
Each time a tape is evaluated the corresponding C++
object is altered. Parallel use of the same ADFun
object thus requires care and is not tested. For now I recommend creating a new ADFun
object for each CPU.
ptr
A Rcpp
external pointer to a CppAD
ADFun
object.
xtape
The (numeric) vector of independent variable values used for taping.
dyntape
The (numeric) vector of dynamic parameters used for taping.
usertheta
A (numeric) vector of NA
values and fixed values specifying the parameters of taped function that were considered dynamic parameters or fixed parameters respectively.
name
An easy to read name for the taped function
new()
Create a new ADFun
object from an external pointer to a CppAD
ADFun
object.
ADFun$new( ptr, name = NULL, xtape = vector("numeric"), dyntape = vector("numeric"), usertheta = rep(NA_real_, length(dyntape)) )
ptr
A Rcpp
external pointer to a CppAD
ADFun
object.
name
An easy to read name for the taped function
xtape
The (numeric) vector of independent variables used for taping.
dyntape
The (numeric) vector of dynamic parameters used for taping.
usertheta
A (numeric) vector of NA
values and fixed values specifying the inputs of the taped function that were considered independent variables or dynamic parameters respectively.
tapes <- buildsmdtape( "sim", "sqrt", "sph", ll = "ppi", ytape = rep(1/3, 3), usertheta = ppi_paramvec(p=3), bdryw = "minsq", acut = 0.01, verbose = FALSE) tapes$smdtape$xtape tapes$smdtape$dyntape tapes$smdtape$name tapes$smdtape$ptr
tapes <- buildsmdtape( "sim", "sqrt", "sph", ll = "ppi", ytape = rep(1/3, 3), usertheta = ppi_paramvec(p=3), bdryw = "minsq", acut = 0.01, verbose = FALSE) tapes$smdtape$xtape tapes$smdtape$dyntape tapes$smdtape$name tapes$smdtape$ptr
Score matching estimators for the Bingham distribution's parameter matrix. Two methods are available: a full score matching method that estimates the parameter matrix directly and a hybrid method by Mardia et al. (2016) that uses score matching to estimate just the eigenvalues of the parameter matrix.
Bingham(Y, A = NULL, w = rep(1, nrow(Y)), method = "Mardia")
Bingham(Y, A = NULL, w = rep(1, nrow(Y)), method = "Mardia")
Y |
A matrix of multivariate observations in Cartesian coordinates. Each row is a multivariate measurement (i.e. each row corresponds to an individual). |
A |
For full score matching only: if supplied, then NA elements of |
w |
An optional vector of weights for each measurement in |
method |
Either "Mardia" or "hybrid" for the hybrid score matching estimator from Mardia et al. (2016) or "smfull" for the full score matching estimator. |
The Bingham distribution has a density proportional to
where is a symmetric matrix and the trace (sum of the diagonals) of
is zero for identifiability (p181, Mardia and Jupp 2000).
The full score matching method estimates all elements of directly except the final element of the diagonal, which is calculated from the sum of the other diagonal elements to ensure that the trace of
is zero.
The method by Mardia et al. (2016) first calculates the maximum-likelihood estimate of the eigenvectors of
.
The observations
Y
are then standardised to Y
.
This standardisation corresponds to diagonalising
where the eigenvalues of
become the diagonal elements of the new
.
The diagonal elements of the new
are then estimated using score matching, with the final diagonal element calculated from the sum of the other elements.
See Mardia et al. (2016) for details.
A list of est
, SE
and info
.
est
contains the estimated matrix A
and a vector form, paramvec
, of A
(ordered according to c(diag(A)[1:(p-1)], A[upper.tri(A)])
). For the Mardia method, the estimated eigenvalues of A
(named evals
) and eigenvectors of A
(named G
) are also returned.
SE
contains estimates of the standard errors if computed. See cppad_closed()
.
info
contains a variety of information about the model fitting procedure and results.
Mardia KV, Jupp PE (2000).
Directional Statistics, Probability and Statistics.
Wiley, Great Britain.
ISBN 0-471-95333-4.
Mardia KV, Kent JT, Laha AK (2016).
“Score matching estimators for directional distributions.”
doi:10.48550/arXiv.1604.08470.
Other directional model estimators:
FB()
,
vMF_robust()
,
vMF()
p <- 4 A <- rsymmetricmatrix(p) A[p,p] <- -sum(diag(A)[1:(p-1)]) #to satisfy the trace = 0 constraint if (requireNamespace("simdd")){ Y <- simdd::rBingham(100, A) Bingham(Y, method = "Mardia") }
p <- 4 A <- rsymmetricmatrix(p) A[p,p] <- -sum(diag(A)[1:(p-1)]) #to satisfy the trace = 0 constraint if (requireNamespace("simdd")){ Y <- simdd::rBingham(100, A) Bingham(Y, method = "Mardia") }
For a parametric model family, the function buildsmdtape()
generates CppAD
tapes (called ADFun
s) for the improper log-likelihood (without normalising constant) of the family and the score matching discrepancy function (defined in
scorematchingtheory
).
Three steps are performed by buildsmdtape()
: first an object that specifies the manifold and any transformation to another manifold is created; then a tape of the log-likelihood (without normalising constant) is created; finally a tape of is created.
buildsmdtape( start, tran = "identity", end = start, ll, ytape, usertheta, bdryw = "ones", acut = 1, thetatape_creator = function(n) { seq(length.out = n) }, verbose = FALSE )
buildsmdtape( start, tran = "identity", end = start, ll, ytape, usertheta, bdryw = "ones", acut = 1, thetatape_creator = function(n) { seq(length.out = n) }, verbose = FALSE )
start |
The starting manifold. Used for checking that |
tran |
The name of a transformation. Available transformations are
|
end |
The name of the manifold that
|
ll |
The name of an inbuilt improper log-likelihood function to tape (which also specifies the parametric model family). On Linux operating systems a custom log-likelihood function created by |
ytape |
An example measurement value to use for creating the tapes. In the natural (i.e. |
usertheta |
A vector of parameter elements for the likelihood function. |
bdryw |
The name of the boundary weight function. "ones" for manifolds without boundary. For the simplex and positive orthant of the sphere, "prodsq" and "minsq" are possible - see |
acut |
A parameter passed to the boundary weight function |
thetatape_creator |
A function that accepts an integer |
verbose |
If |
The improper log-likelihood (without normalising constant) must be implemented in C++
and is selected by name. Similarly the transforms of the manifold must be implemented in C++
and selected by name.
When using, CppAD
one first creates tapes of functions. These tapes can then be used for evaluating the function and its derivatives, and generating further tapes through argument swapping, differentiation and composition.
The taping relies on specifying typical argument values for the functions (see Introduction to CppAD Tapes below).
Tapes can have both independent variables and dynamic parameters.
The differentiation with CppAD
occurs with respect to the independent variables.
Tapes of tapes are possible, including tapes that swap the independent and dynamic variables - this is how this package differentiates with respect to a dynamic variables (see tapeSwap()
).
To build a tape for the score matching discrepancy function, the package first tapes the map from a point on the
end
manifold to the value of the improper log-likelihood, where the independent variable is the , the dynamic parameter is a vector of the parameters to estimate, and the remaining model parameters are fixed and not estimated.
This tape is then used to generate a tape for the score matching discrepancy function where the parameters to estimate are the independent variable.
Only some combinations of start
, tran
and end
are available because tran
must map between start
and end
.
These combinations of start
-tran
-end
are currently available:
sim-sqrt-sph
sim-identity-sim
sim-alr-Euc
sim-clr-Hn111
sph-identity-sph
Euc-identity-Euc
Currently available improper log-likelihood functions are:
dirichlet
ppi
vMF
Bingham
FB
A list of:
an ADFun
object containing a tape of an improper likelihood with on the
end
manifold as the independent variable
an ADFun
object containing a tape of the score matching discrepancy function with the non-fixed parameters as the independent variable, and the measurements on the end
manifold as the dynamic parameter.
some information about the tapes
This package uses version 2024000.5 of the algorithmic differentiation library CppAD
(Bell 2023) to build score matching estimators.
Full help for CppAD
can be found at https://cppad.readthedocs.io/.
Differentiation proceeds by taping the basic (atomic) operations performed on the independent variables and dynamic parameters. The atomic operations include multiplication, division, addition, sine, cosine, exponential and many more.
Example values for the variables and parameters are used to conduct this taping, so care must be taken with any conditional (e.g. if-then) operations, and CppAD
has a special tool for this called CondExp
(short for conditional expressions
).
The result of taping is an object of class ADFun
in CppAD
and is often called a tape.
This ADFun
object can be evaluated, differentiated, used for further taping (via CppAD
's base2ad()
), solving differential equations and more.
The differentiation is with respect to the independent variables, however the dynamic parameters can be altered which allows for creating a new ADFun
object where the dynamic parameters become independent variables (see tapeSwap()
).
For the purposes of score matching, there are also fixed parameters, which are the elements of the model's parameter vector that are given and not estimated.
Each time a tape is evaluated the corresponding C++
object is altered. Parallel use of the same ADFun
object thus requires care and is not tested. For now I recommend creating a new ADFun
object for each CPU.
There is no checking of the inputs ytape
and usertheta
.
Bell B (2023). “CppAD: A Package for Differentiation of C++ Algorithms.” https://github.com/coin-or/CppAD.
Other tape builders:
moretapebuilders
p <- 3 u <- rep(1/sqrt(p), p) ltheta <- p #length of vMF parameter vector intheta <- rep(NA, length.out = ltheta) tapes <- buildsmdtape("sph", "identity", "sph", "vMF", ytape = u, usertheta = intheta, "ones", verbose = FALSE ) evaltape(tapes$lltape, u, runif(n = ltheta)) evaltape(tapes$smdtape, runif(n = ltheta), u) u <- rep(1/3, 3) tapes <- buildsmdtape("sim", "sqrt", "sph", "ppi", ytape = u, usertheta = ppi_paramvec(p = 3), bdryw = "minsq", acut = 0.01, verbose = FALSE ) evaltape(tapes$lltape, u, rppi_egmodel(1)$theta) evaltape(tapes$smdtape, rppi_egmodel(1)$theta, u)
p <- 3 u <- rep(1/sqrt(p), p) ltheta <- p #length of vMF parameter vector intheta <- rep(NA, length.out = ltheta) tapes <- buildsmdtape("sph", "identity", "sph", "vMF", ytape = u, usertheta = intheta, "ones", verbose = FALSE ) evaltape(tapes$lltape, u, runif(n = ltheta)) evaltape(tapes$smdtape, runif(n = ltheta), u) u <- rep(1/3, 3) tapes <- buildsmdtape("sim", "sqrt", "sph", "ppi", ytape = u, usertheta = ppi_paramvec(p = 3), bdryw = "minsq", acut = 0.01, verbose = FALSE ) evaltape(tapes$lltape, u, rppi_egmodel(1)$theta) evaltape(tapes$smdtape, rppi_egmodel(1)$theta, u)
For a tape (i.e. an ADFun
object) of a quadratic-form score matching discrepancy function, calculates the vector of parameters such that the gradient of the score matching discrepancy is zero.
Also estimates standard errors and covariance.
Many score matching estimators have an objective function that has a quadratic form (see scorematchingtheory
).
cppad_closed( smdtape, Y, Yapproxcentres = NA * Y, w = rep(1, nrow(Y)), approxorder = 10 )
cppad_closed( smdtape, Y, Yapproxcentres = NA * Y, w = rep(1, nrow(Y)), approxorder = 10 )
smdtape |
A |
Y |
A matrix of multivariate observations. Each row is an observation. |
Yapproxcentres |
A matrix of Taylor approximation centres for rows of Y that require approximation. |
w |
Weights for each observation. |
approxorder |
The order of Taylor approximation to use. |
When the score matching discrepancy function is of quadratic form, then the gradient of the score matching discrepancy is zero at ,
where
is the average of the Hessian of the score matching discrepancy function evaluated at each measurement and
is the average of the gradient offset (see
quadratictape_parts()
) evaluated at each measurement.
Both the Hessian and the gradient offset are constant with respect to the model parameters for quadratic-form score matching discrepancy functions.
Standard errors use the Godambe information matrix (aka sandwich method) and are only computed when the weights are constant.
The estimate of the negative of the sensitivity matrix is
the average of the Hessian of
smdtape
evaluated at each observation in Y
.
The estimate of the variability matrix is
the sample covariance (denominator of
) of the gradient of
smdtape
evaluated at each of the observations in Y
for the estimated .
The estimated variance of the estimator is then as
where
n
is the number of observations.
Taylor approximation is available because boundary weight functions and transformations of the measure in Hyvärinen divergence can remove singularities in the model log-likelihood, however evaluation at these singularities may still involve computing intermediate values that are unbounded. If the singularity is ultimately removed, then Taylor approximation from a nearby location will give a very accurate evaluation at the removed singularity.
Other generic score matching tools:
Windham()
,
cppad_search()
smdtape <- buildsmdtape("sim", "sqrt", "sph", "ppi", ytape = rep(1/3, 3), usertheta = ppi_paramvec(p = 3), bdryw = "minsq", acut = 0.01, verbose = FALSE )$smdtape Y <- rppi_egmodel(100) cppad_closed(smdtape, Y$sample)
smdtape <- buildsmdtape("sim", "sqrt", "sph", "ppi", ytape = rep(1/3, 3), usertheta = ppi_paramvec(p = 3), bdryw = "minsq", acut = 0.01, verbose = FALSE )$smdtape Y <- rppi_egmodel(100) cppad_closed(smdtape, Y$sample)
Uses conjugate gradient descent to search for a vector of parameters such that gradient of the score matching discrepancy is within tolerance of zero. Also estimates standard errors and covariance.
cppad_search( smdtape, theta, Y, Yapproxcentres = NA * Y, w = rep(1, nrow(Y)), approxorder = 10, control = list(tol = 1e-15, checkgrad = TRUE) )
cppad_search( smdtape, theta, Y, Yapproxcentres = NA * Y, w = rep(1, nrow(Y)), approxorder = 10, control = list(tol = 1e-15, checkgrad = TRUE) )
smdtape |
A |
theta |
The starting parameter set |
Y |
A matrix of multivariate observations. Each row is an observation. |
Yapproxcentres |
A matrix of Taylor approximation centres for rows of Y that require approximation. |
w |
Weights for each observation. |
approxorder |
The order of Taylor approximation to use. |
control |
Control parameters passed to |
The score matching discrepancy function and gradient of the score matching function are passed to optimx::Rcgmin()
.
The call to optimx::Rcgmin()
uses the sum of observations (as opposed to the mean) to reduce floating point inaccuracies. This has implications for the meaning of the control parameters passed to Rcgmin()
(e.g. tol
). The results are converted into averages so the use of sums can be ignored when not setting control parameters, or studying the behaviour of Rcgmin.
Standard errors use the Godambe information matrix (aka sandwich method) and are only computed when the weights are constant.
The estimate of the sensitivity matrix is
the negative of the average over the Hessian of
smdtape
evaluated at each observation in Y
.
The estimate of the variability matrix is then
the sample covariance (denominator of
) of the gradient of
smdtape
evaluated at each of the observations in Y
for the estimated .
The variance of the estimator is then estimated as
where
n
is the number of observations.
Taylor approximation is available because boundary weight functions and transformations of the measure in Hyvärinen divergence can remove singularities in the model log-likelihood, however evaluation at these singularities may still involve computing intermediate values that are unbounded. If the singularity is ultimately removed, then Taylor approximation from a nearby location will give a very accurate evaluation at the removed singularity.
There appears to be floating point issues with evaluation of the gradient leading to slow or no convergence in many situations. Tweaking the convergence tolerance tol
may be appropriate. If the closed form solution exists please use cppad_closed()
instead.
Other generic score matching tools:
Windham()
,
cppad_closed()
smdtape <- buildsmdtape("sim", "sqrt", "sph", "ppi", ytape = rep(1/3, 3), usertheta = ppi_paramvec(p = 3), bdryw = "minsq", acut = 0.01, verbose = FALSE )$smdtape Y <- rppi_egmodel(100) cppad_search(smdtape, 0.9 * Y$theta, Y$sample) sum((smvalues_wsum(smdtape, Y$sample, Y$theta)$grad/nrow(Y$sample))^2)
smdtape <- buildsmdtape("sim", "sqrt", "sph", "ppi", ytape = rep(1/3, 3), usertheta = ppi_paramvec(p = 3), bdryw = "minsq", acut = 0.01, verbose = FALSE )$smdtape Y <- rppi_egmodel(100) cppad_search(smdtape, 0.9 * Y$theta, Y$sample) sum((smvalues_wsum(smdtape, Y$sample, Y$theta)$grad/nrow(Y$sample))^2)
Supply C++
code to specify a custom log-likelihood, much like TMB::compile()
is passed C++
code that formulate models.
For score matching the normalising constant of the log-likelihood can be omitted.
Taping of the function currently only works with the gcc
compiler, typically on Linux; taping works if customll_test()
returns TRUE
.
Packages RcppEigen
and RcppXPtrUtils
must be installed.
customll( code, rebuild = FALSE, includes = character(), cacheDir = getOption("rcpp.cache.dir", tempdir()), showOutput = verbose, verbose = getOption("verbose") ) customll_test()
customll( code, rebuild = FALSE, includes = character(), cacheDir = getOption("rcpp.cache.dir", tempdir()), showOutput = verbose, verbose = getOption("verbose") ) customll_test()
code |
|
rebuild |
passed to |
includes |
passed to |
cacheDir |
passed to |
showOutput |
passed to |
verbose |
passed to |
The function uses RcppXPtrUtils::cppXPtr()
and Rcpp::cppFunction()
.
It is good practice to check the returned object using evalll()
.
The first compilation in a session can be very slow.
customll()
returns an adloglikelood
object (which is just an externalptr
with attributes) for the compiled log-likelihood function. The returned object has an attribute fname
.
customll_test()
returns TRUE
if taping works in your R
session. Otherwise it will return FALSE
and generate warnings.
code
must be C++
that uses only CppAD
and Eigen
, which makes it very similar to the requirements of the input to TMB::compile()
(which also uses CppAD
and Eigen
).
The start of code
should always be "a1type fname(const veca1 &x, const veca1 &theta){
" where fname
is your chosen name of the log-likelihood function, x
represents a point in the data space and theta
is a vector of parameters for the log-likelihood. This specifies that the function will have two vector arguments (of type veca1
) and will return a single numeric value (a1type
).
The type a1type
is a double with special ability for being taped by CppAD
. The veca1
type is a vector of a1type
elements, with the vector wrapping supplied by the Eigen
C++ package (that is an Eigen
matrix with 1 column and dynamic number of rows).
The body of the function must use operations from Eigen and/or CppAD, prefixed by Eigen::
and CppAD::
respectively.
There are no easy instructions for writing these as it is genuine C++
code, which can be very opaque to those unfamiliar with C++
.
I've found the quick reference pages for for Eigen
useful. Limited unary and binray operations are available directly from CppAD
without Eigen
.
For the purposes of score matching the operations should all be smooth to create a smooth log-likelihood and the normalising constant may be omitted.
## Not run: customll_test() myll <- customll("a1type dirichlet(const veca1 &u, const veca1 &beta) { size_t d = u.size(); a1type y(0.); // initialize summation at 0 for(size_t i = 0; i < d; i++) { y += beta[i] * log(u[i]); } return y; }") evalll(myll, rep(1/3, 3), rep(-0.5, 3)) tapes <- buildsmdtape("sim", "identity", "sim", myll, rep(1/3, 3), rep(NA, 3), bdryw="minsq", acut = 0.01) evaltape(tapes$lltape, rep(1/3, 3), rep(-0.5, 3)) ## End(Not run)
## Not run: customll_test() myll <- customll("a1type dirichlet(const veca1 &u, const veca1 &beta) { size_t d = u.size(); a1type y(0.); // initialize summation at 0 for(size_t i = 0; i < d; i++) { y += beta[i] * log(u[i]); } return y; }") evalll(myll, rep(1/3, 3), rep(-0.5, 3)) tapes <- buildsmdtape("sim", "identity", "sim", myll, rep(1/3, 3), rep(NA, 3), bdryw="minsq", acut = 0.01) evaltape(tapes$lltape, rep(1/3, 3), rep(-0.5, 3)) ## End(Not run)
Compute the natural logarithm of the improper density for the PPI model for the given matrix of measurements Y
. Rows with negative values or with a sum that differs from 1
by more than 1E-15
are assigned a value of -Inf
.
dppi(Y, ..., paramvec = NULL)
dppi(Y, ..., paramvec = NULL)
Y |
A matrix of measurements in the simplex. Each row is a multivariate measurement. |
... |
Arguments passed on to
|
paramvec |
The PPI parameter vector, created easily using |
The value calculated by dppi
is
where is the multivariate observation (i.e. a row of
Y
), and omits the final element of
.
Other PPI model tools:
ppi_param_tools
,
ppi_robust()
,
ppi()
,
rppi()
m <- rppi_egmodel(10) dppi(m$sample, paramvec = m$theta)
m <- rppi_egmodel(10) dppi(m$sample, paramvec = m$theta)
Evaluates a custom log-likelihood function from customll()
without taping. This is useful to check that the custom log-likelihood is behaving.
To check a tape of the custom log-likelihood use buildsmdtape()
then evaltape()
.
evalll(ll, u, theta)
evalll(ll, u, theta)
ll |
A compiled log-likelihood function created by |
u |
A vector of measurements for an individual |
theta |
A vector of parameters |
The value of the log-likelihood at u
with parameters theta
.
Evaluates a tape exactly or approximately for an array of provided variable values and dynamic parameter values.
The function evaltape_wsum()
computes the column-wise weighted sum of the result.
evaltape(tape, xmat, pmat, xcentres = NA * xmat, approxorder = 10) evaltape_wsum( tape, xmat, pmat, w = NULL, xcentres = NA * xmat, approxorder = 10 )
evaltape(tape, xmat, pmat, xcentres = NA * xmat, approxorder = 10) evaltape_wsum( tape, xmat, pmat, w = NULL, xcentres = NA * xmat, approxorder = 10 )
tape |
An |
xmat |
A matrix of (multivariate) independent variables where each represents a single independent variable vector. Or a single independent variable vector that is used for all rows of |
pmat |
A matrix of dynamic parameters where each row specifies a new set of values for the dynamic parameters of |
xcentres |
A matrix of approximation for Taylor approximation centres for |
approxorder |
Order of Taylor approximation |
w |
Weights to apply to each row of |
Approximation is via Taylor approximation of the independent variable around the approximation centre provided in xcentres
.
A matrix, each row corresponding to the evaluation of the same row in xmat
, pmat
and xcentres
.
Other tape evaluators:
quadratictape_parts()
,
smvalues()
,
testquadratic()
u <- rep(1/3, 3) tapes <- buildsmdtape("sim", "sqrt", "sph", "ppi", ytape = u, usertheta = ppi_paramvec(p = 3), bdryw = "minsq", acut = 0.01, verbose = FALSE ) evaltape(tapes$lltape, u, rppi_egmodel(1)$theta) evaltape(tapes$smdtape, rppi_egmodel(1)$theta, u) evaltape(tapes$lltape, rbind(c(0, 0, 1), c(0,0,1)), rppi_egmodel(1)$theta, xcentres = rbind(c(0.0005, 0.0005, 0.999), NA))
u <- rep(1/3, 3) tapes <- buildsmdtape("sim", "sqrt", "sph", "ppi", ytape = u, usertheta = ppi_paramvec(p = 3), bdryw = "minsq", acut = 0.01, verbose = FALSE ) evaltape(tapes$lltape, u, rppi_egmodel(1)$theta) evaltape(tapes$smdtape, rppi_egmodel(1)$theta, u) evaltape(tapes$lltape, rbind(c(0, 0, 1), c(0,0,1)), rppi_egmodel(1)$theta, xcentres = rbind(c(0.0005, 0.0005, 0.999), NA))
Estimates parameters for the Fisher-Bingham distribution using score-matching.
FB(Y, km = NULL, A = NULL)
FB(Y, km = NULL, A = NULL)
Y |
A matrix of multivariate observations in Cartesian coordinates. Each row is a multivariate measurement (i.e. each row corresponds to an individual). |
km |
Optional. A vector of the same length as the dimension, representing the parameter vector for the von Mises-Fisher component (i.e. the |
A |
Optional. The Bingham matrix. If supplied the non-NA elements of the Bingham matrix are fixed.
The final element of the diagonal of |
The density of the Fisher-Bingham distribution is proportional to
where is a matrix as in the Bingham distribution, and
and
are the concentration and mean direction, respectively, as in the von Mises-Fisher distribution.
Score matching estimates of all elements of and
converge slowly with sample size.
Even with a million simulated measurements,
the gradient of the score matching discrepancy at the true parameters can have size (L2 Euclidean norm) more than 0.001, which is substantially non-zero.
Other directional model estimators:
Bingham()
,
vMF_robust()
,
vMF()
p <- 3 A <- rsymmetricmatrix(p, -10, 10) A[p,p] <- -sum(diag(A)[1:(p-1)]) #to satisfy the trace = 0 constraint m <- runif(p, -10, 10) m <- m / sqrt(sum(m^2)) if (requireNamespace("simdd")){ Y <- simdd::rFisherBingham(1000, 2 * m, A) FB(Y) }
p <- 3 A <- rsymmetricmatrix(p, -10, 10) A[p,p] <- -sum(diag(A)[1:(p-1)]) #to satisfy the trace = 0 constraint m <- runif(p, -10, 10) m <- m / sqrt(sum(m^2)) if (requireNamespace("simdd")){ Y <- simdd::rFisherBingham(1000, 2 * m, A) FB(Y) }
The microbiome
data contains paired DNA samples from before treatment and 21 months after treatment for helminth infections (Martin et al. 2019).
This data was analysed by Martin et al. (2019) and a further subset was studied by Scealy and Wood (2023).
The data are from a study into the effect of helminth infections on the course of malaria infections (ImmunoSPIN-Malaria) in the Nangapanda subdistrict, Indonesia (Wiria et al. 2010).
As part of the study, some participants were given 400mg of albendazole every three months for 1.5 years,
remaining participants were given a placebo
(Wiria et al. 2010).
microbiome
microbiome
A dataframe with 300 rows (two rows per individual) and 31 columns:
An integer uniquely specifying the individual.
The collection year for the sample. 2008
for before treatment. 2010
for after treatment.
1
if female, 0
otherwise.
TRUE
if individual given 400mg of albendazole every three months for 1.5 years, FALSE
otherwise.
Age at first sample.
A Helminth measurement: The qPCR cycle threshold (CT) for Ascaris lumbricoides (large roundworm). Ascaris lumbricoides can be considered present if the value is 30 or less.
A Helminth measurement: The qPCR cycle threshold (CT) for Necator americanus (a hookworm). Necator americanus can be considered present if the value is 30 or less.
A Helminth measurement: The qPCR cycle threshold (CT) for Ancylostoma duodenale (a hookworm). Ancylostoma duodenale can be considered present if the value is 30 or less.
A Helminth measurement: The presence of Trichuris trichiura as determined by microscopy. A value of TRUE
means Trichuris trichiura was detected.
A Helminth measurement: If any of the above helminths were detected then TRUE
, otherwise FALSE
.
Count prevalence of 18 bacterial phyla and 2 unclassified columns.
The measurements in the data come from stool samples before and after treatment. Gut microbiome prevalence was measured using 16s rRNA 454 sequencing (Martin et al. 2019). Helminth infections were detected by PCR or microscopy (Martin et al. 2019).
The subset studied by Scealy and Wood (2023) contained only the measurements from before treatment, and only those individuals with a helminth infection. These measurements can be obtained by running
microbiome[(microbiome$Year == 2008) & microbiome$Helminth, ]
Two further individuals (IndividualID
of 2079
and 2280
) were deemed outliers by Scealy and Wood (2023).
The microbiome
data was created from the file S1_Table.xlsx
hosted on Nematode.net
at http://nematode.net/Data/environmental_interaction/S1_Table.xlsx
using the below code.
microbiome <- readxl::read_excel("S1_Table.xlsx", range = "A3:AE303") #avoids the genus data, keeping - only phyla metacolnames <- readxl::read_excel("S1_Table.xlsx", range = "A2:J2", col_names = FALSE) colnames(microbiome)[1:ncol(metacolnames)] <- metacolnames[1, ] colnames(microbiome)[2] <- "Year" microbiome[, 11] <- (microbiome$ct_Al <= 30) | (microbiome$ct_Na <= 30) | (microbiome$ct_Ad <= 30) | (microbiome$ct_St <= 30) | (microbiome$micr_Tt == 1) colnames(microbiome)[11] <- "Helminth" microbiome <- microbiome |> dplyr::mutate(across(c(1,2,3,12:31), as.integer)) |> dplyr::mutate(micr_Tt = as.logical(micr_Tt), Treatment = as.logical(Treatment)) |> dplyr::rename(IndividualID = `Individual ID`) microbiome <- as.data.frame(microbiome)
http://nematode.net/Data/environmental_interaction/S1_Table.xlsx
from http://nematode.net
.
S1_Table.xlsx
was created by Dr. Bruce A Rosa for Martin et al. (2019). Permission to share this data was obtained from Dr. Bruce Rosa and Dr. Ivonne Martin.
Martin I, Uh H, Supali T, Mitreva M, Houwing-Duistermaat JJ (2019).
“The mixed model for the analysis of a repeated-measurement multivariate count data.”
Statistics in Medicine, 38(12), 2248–2268.
doi:10.1002/sim.8101.
Scealy JL, Wood ATA (2023).
“Score matching for compositional distributions.”
Journal of the American Statistical Association, 118(543), 1811–1823.
doi:10.1080/01621459.2021.2016422.
Wiria AE, Prasetyani MA, Hamid F, Wammes LJ, Lell B, Ariawan I, Uh HW, Wibowo H, Djuardi Y, Wahyuni S, Sutanto I, May L, Luty AJ, Verweij JJ, Sartono E, Yazdanbakhsh M, Supali T (2010).
“Does treatment of intestinal helminth infections influence malaria?”
BMC Infectious Diseases, 10, 77.
doi:10.1186/1471-2334-10-77.
Build new tapes (i.e ADFun
objects) from existing tapes, including differentiation, swapping independent variables and dynamic parameters, and Jacobian determinants.
tapeJacobian(tape) tapeHessian(tape) tapeGradOffset(tape) tapeLogJacDet(tape) tapeSwap(tape)
tapeJacobian(tape) tapeHessian(tape) tapeGradOffset(tape) tapeLogJacDet(tape) tapeSwap(tape)
tape |
An |
The information in the fields xtape
and dyntape
of tape
are used to perform the taping.
The returned vector is ordered with the range elements iterating fastest, then the domain elements. See https://cppad.readthedocs.io/latest/Jacobian.html.
Suppose the function represented by tape
maps from -dimensional space to
-dimensional space, then
the first
elements of the vector is the gradient of the partial derivative with respect to the first dimension of the function's domain.
The next
elements of the vector is the gradient of the partial derivative of the second dimension of the function's domain.
The Hessian as a matrix, can be obtained by using
matrix()
with ncol = d
.
A quadratic function can be written as
where the vector is the independent variable of
tape
and the vector is the dynamic parameter vector of
tape
.
The gradient of with respect to
is
The Hessian is
which does not depend on ,
so the gradient of the function can be rewritten as
The tape calculates as
which does not depend on .
An ADFun
object.
tapeJacobian
: Tape the Jacobian of a tape. The resulting tape returns the Jacobian as a vector.
tapeHessian
: Tape the Hessian of a tape. The resulting tape returns the Jacobian as a vector (see https://cppad.readthedocs.io/latest/Hessian.html).
tapeGradOffset
: A quadratic function can be written as
The function tapeGradOffset
creates a tape of where
is the independent variable.
tapeLogJacDet
: Creates a tape of the log of the Jacobian determinant of a function taped in tape
. The dimensions of the domain (length of independent variable) and range (length of output variable) of tape
must be equal for computation of the determinant.
tapeSwap
: Convert an ADFun so that the independent variables become dynamic parameters
and the dynamic parameters become independent variables.
Other tape builders:
buildsmdtape()
tapes <- buildsmdtape("sph", "identity", "sph", "vMF", ytape = rep(1, 3)/sqrt(3), usertheta = rep(NA, 3) ) tapeJacobian(tapes$smdtape) tapeHessian(tapes$smdtape) tapeLogJacDet(tapeJacobian(tapes$smdtape)) tapeSwap(tapes$smdtape)
tapes <- buildsmdtape("sph", "identity", "sph", "vMF", ytape = rep(1, 3)/sqrt(3), usertheta = rep(NA, 3) ) tapeJacobian(tapes$smdtape) tapeHessian(tapes$smdtape) tapeLogJacDet(tapeJacobian(tapes$smdtape)) tapeSwap(tapes$smdtape)
Estimates the parameters of the Polynomially-Tilted Pairwise Interaction (PPI) model (Scealy and Wood 2023) for compositional data.
By default ppi()
uses cppad_closed()
to find estimate.
For many situations a hard-coded implementation of the score matching estimator is also available.
For a given parameter vector evalparam
, ppi_smvalues()
computes the score matching discrepancy, the gradient and the Hessian of the score matching discrepancy (see smvalues()
) and the gradient offset of the score matching discrepancy (see quadratictape_parts()
and tapeGradOffset()
).
ppi( Y, paramvec = NULL, trans, method = "closed", w = rep(1, nrow(Y)), constrainbeta = FALSE, bdryw = "ones", acut = NULL, bdrythreshold = 1e-10, shiftsize = bdrythreshold, approxorder = 10, control = list(tol = 1e-15, checkgrad = TRUE), paramvec_start = NULL ) ppi_smvalues( Y, paramvec = NULL, evalparam, trans, method = "closed", w = rep(1, nrow(Y)), bdryw = "ones", acut = NULL, bdrythreshold = 1e-10, shiftsize = bdrythreshold, approxorder = 10, average = TRUE )
ppi( Y, paramvec = NULL, trans, method = "closed", w = rep(1, nrow(Y)), constrainbeta = FALSE, bdryw = "ones", acut = NULL, bdrythreshold = 1e-10, shiftsize = bdrythreshold, approxorder = 10, control = list(tol = 1e-15, checkgrad = TRUE), paramvec_start = NULL ) ppi_smvalues( Y, paramvec = NULL, evalparam, trans, method = "closed", w = rep(1, nrow(Y)), bdryw = "ones", acut = NULL, bdrythreshold = 1e-10, shiftsize = bdrythreshold, approxorder = 10, average = TRUE )
Y |
A matrix of measurements. Each row is a compositional measurement (i.e. each row sums to 1 and has non-negative elements). |
paramvec |
Optionally a vector of the PPI models parameters. |
trans |
The name of the transformation of the manifold in Hyvärinen divergence (See |
method |
|
w |
Weights for each observation, if different observations have different importance. Used by |
constrainbeta |
If |
bdryw |
The boundary weight function for down weighting measurements as they approach the manifold boundary. Either "ones", "minsq" or "prodsq". See details. |
acut |
The threshold |
bdrythreshold |
|
shiftsize |
|
approxorder |
|
control |
|
paramvec_start |
|
evalparam |
The parameter set to evaluate the score matching values.
This is different to |
average |
If TRUE return the (weighted average) of the measurements, otherwise return the values for each measurement. |
Estimation may be performed via transformation of the measure in Hyvärinen divergence from Euclidean space to the simplex (inverse of the additive log ratio transform), from a hyperplane to the simplex (inverse of the centred log ratio transform), from the positive quadrant of the sphere to the simplex (inverse of the square root transform), or without any transformation. In the latter two situations there is a boundary and weighted Hyvärinen divergence (Equation 7, Scealy and Wood 2023) is used. Properties of the estimator using the square root transform were studied by Scealy and Wood (2023). Properties of the estimator using the additive log ratio transform were studied by Scealy et al. (2024).
There are three boundary weight functions available:
The function "ones" applies no weights and should be used whenever the manifold does not have a boundary.
The function "minsq" is the minima-based boundary weight function for the PPI model (Equation 12, Scealy and Wood 2023)
where is a point in the positive orthant of the p-dimensional unit sphere
and
is the jth component of z.
The function "prodsq" is the product-based (Equation 9, Scealy and Wood 2023)
where is a point in the positive orthant of the p-dimensional unit sphere
and
is the jth component of z.
Scealy and Wood (Theorem 1, Scealy and Wood 2023) prove that minimising the weighted Hyvärinen Divergence is equivalent to minimising (See
scorematchingtheory
)
when the boundary weight function is smooth or for the functions "minsq" and "prodsq" above when the manifold is the simplex or positive orthant of a sphere.
Hard-coded estimators are available for the following situations:
Square root transformation ("sqrt") with the "minsq" boundary weight function:
full parameter vector (paramvec
not provided)
paramvec
fixes only the final element of
paramvec
fixes all elements of
paramvec
fixes and provides fixed values of
paramvec
fixes and
, leaving
to be fitted.
Square root transformation ("sqrt") with the "prodsq" boundary weight function:
paramvec
fixes all elements of
paramvec
fixes and provides fixed values of
paramvec
fixes and
, leaving
to be fitted.
The additive log ratio transformation ("alr") using the final component on the denominator, with and fixed final component of
.
ppi()
returns:
A list of est
, SE
and info
.
est
contains the estimates in vector form, paramvec
, and as ,
and
.
SE
contains estimates of the standard errors if computed. See cppad_closed()
.
info
contains a variety of information about the model fitting procedure and results.
ppi_smvalues()
returns a list of
obj
the score matching discrepancy value
grad
the gradient of the score matching discrepancy
hess
the Hessian of the score matching discrepancy
offset
gradient offset (see quadratictape_parts()
)
The PPI model density is proportional to
where is the dimension of a compositional measurement
, and
is
without the final (
th) component.
is a
symmetric matrix that controls the covariance between components.
is a
vector that controls the location within the simplex.
The
th component
of
controls the concentration of density when
is close to zero: when
there is no concentration and
is hard to identify; when
then the probability density of the PPI model increases unboundedly as
approaches zero, with the increasing occuring more rapidly and sharply the closer
is to
.
Scealy JL, Hingee KL, Kent JT, Wood ATA (2024).
“Robust score matching for compositional data.”
Statistics and Computing, 34, 93.
doi:10.1007/s11222-024-10412-w.
Scealy JL, Wood ATA (2023).
“Score matching for compositional distributions.”
Journal of the American Statistical Association, 118(543), 1811–1823.
doi:10.1080/01621459.2021.2016422.
Other PPI model tools:
dppi()
,
ppi_param_tools
,
ppi_robust()
,
rppi()
model <- rppi_egmodel(100) estalr <- ppi(model$sample, paramvec = ppi_paramvec(betap = -0.5, p = ncol(model$sample)), trans = "alr") estsqrt <- ppi(model$sample, trans = "sqrt", bdryw = "minsq", acut = 0.1)
model <- rppi_egmodel(100) estalr <- ppi(model$sample, paramvec = ppi_paramvec(betap = -0.5, p = ncol(model$sample)), trans = "alr") estsqrt <- ppi(model$sample, trans = "sqrt", bdryw = "minsq", acut = 0.1)
These functions help to quickly generate a set of Windham exponents for use in ppi_robust()
or Windham()
.
Rows and columns of and
corresponding to components with strong concentrations of probability mass near zero have non-zero constant tuning exponent, and all other elements have a tuning constant of zero.
All elements of
have a tuning exponent of zero.
The function ppi_cW_auto()
automatically detects concentrations near zero by fitting a PPI distribution with and
(i.e. a Dirichlet distribution) with the centred log-ratio transformation of the manifold.
ppi_cW(cW, ...) ppi_cW_auto(cW, Y)
ppi_cW(cW, ...) ppi_cW_auto(cW, Y)
cW |
The value of the non-zero Windham tuning exponents. |
... |
Values of |
Y |
A matrix of observations |
The Windham robustifying method involves weighting observations by a function of the proposed model density (Windham 1995).
Scealy et al. (2024) found that only some of the tuning constants should be non-zero:
the tuning exponents corresponding to should be zero to avoid infinite weights;and to improve efficiency any rows or columns of
corresponding to components without concentrations of probability mass (i.e. outliers can't exist) should have exponents of zero.
Scealy et al. (2024) set the remaining tuning exponents to a constant.
A vector of the same length as the parameter vector of the PPI model. Elements of will have a value of
cW
if both their row and column component has probability mass concentrated near zero. Similarly, elements of will have a value of
cW
if their row corresponds to a component that has a probability mass concentrated near zero. All other elements are zero.
Scealy JL, Hingee KL, Kent JT, Wood ATA (2024).
“Robust score matching for compositional data.”
Statistics and Computing, 34, 93.
doi:10.1007/s11222-024-10412-w.
Windham MP (1995).
“Robustifying Model Fitting.”
Journal of the Royal Statistical Society. Series B (Methodological), 57(3), 599–609.
2346159, http://www.jstor.org/stable/2346159.
Y <- rppi_egmodel(100)$sample ppi_cW_auto(0.01, Y) ppi_cW(0.01, TRUE, TRUE, FALSE)
Y <- rppi_egmodel(100)$sample ppi_cW_auto(0.01, Y) ppi_cW(0.01, TRUE, TRUE, FALSE)
Computes a marginal moment matching estimator (Section 6.2, Scealy and Wood 2023), which assumes is a known vector with the same value in each element, and
.
Only
is estimated.
ppi_mmmm(Y, ni, beta0, w = rep(1, nrow(Y)))
ppi_mmmm(Y, ni, beta0, w = rep(1, nrow(Y)))
Y |
Count data, each row is a multivariate observation. |
ni |
The total for each sample (sum across rows) |
beta0 |
|
w |
Weights for each observation. Useful for weighted estimation in |
is fixed and not estimated.
is fixed at zero.
See (Section 6.2 and A.8 of Scealy and Wood 2023).
The boundary weight function in the score matching discrepancy is the unthresholded product weight function
A vector of estimates for entries (diagonal and off diagonal).
Scealy JL, Wood ATA (2023). “Score matching for compositional distributions.” Journal of the American Statistical Association, 118(543), 1811–1823. doi:10.1080/01621459.2021.2016422.
The default parameterisation of the PPI model is a symmetric covariance-like matrix , a location-like vector
and a set of Dirichlet exponents
. For
p
components, has
p-1
rows, is a vector with
p-1
elements and is a vector with
p
elements.
For score matching estimation this form of the parameters must be converted into a single parameter vector using ppi_paramvec()
.
ppi_paramvec()
also includes easy methods to set parameters to NA
for estimation with ppi()
(in ppi()
the NA-valued elements are estimated and all other elements are fixed).
The reverse of ppi_paramvec()
is ppi_parammats()
.
An alternative parametrisation of the PPI model uses a single p
by p
matrix instead of
and
, and for identifiability
is such that
where
and
(Scealy and Wood 2023).
Convert between parametrisations using
ppi_toAstar()
and ppi_fromAstar()
.
ppi_paramvec( p = NULL, AL = NULL, bL = NULL, Astar = NULL, beta = NULL, betaL = NULL, betap = NULL ) ppi_parammats(paramvec) ppi_toAstar(AL, bL) ppi_fromAstar(Astar)
ppi_paramvec( p = NULL, AL = NULL, bL = NULL, Astar = NULL, beta = NULL, betaL = NULL, betap = NULL ) ppi_parammats(paramvec) ppi_toAstar(AL, bL) ppi_fromAstar(Astar)
p |
The number of components. If |
AL |
Either |
bL |
Either |
Astar |
The |
beta |
Either |
betaL |
Either |
betap |
Either |
paramvec |
A PPI parameter vector, typically created by |
ppi_paramvec()
returns a vector starting with the diagonal elements of , then the off-diagonal elements extracted by
upper.tri()
(which extracts elements of along each row, left to right, then top to bottom), then
, then
.
The Astar
parametrisation rewrites the PPI density as proportional to
where (
Astar
) is a by
matrix.
Because
lies in the simplex (in particular
), the density is the same regardless of the value of
=
sum(Astar)
, where is the vector of ones. Thus
and
specify
up to an additive factor. In the conversion
ppi_toAstar()
, is returned such that
.
NULL
values or NA
elements are not allowed for ppi_toAstar()
and ppi_fromAstar()
.
ppi_paramvec()
: a vector of length .
ppi_parammats()
: A named list of ,
, and
.
ppi_toAstar()
: The matrix .
ppi_fromAstar()
: A list of the matrix , the vector
and a discarded constant.
The PPI model density is proportional to
where is the dimension of a compositional measurement
, and
is
without the final (
th) component.
is a
symmetric matrix that controls the covariance between components.
is a
vector that controls the location within the simplex.
The
th component
of
controls the concentration of density when
is close to zero: when
there is no concentration and
is hard to identify; when
then the probability density of the PPI model increases unboundedly as
approaches zero, with the increasing occuring more rapidly and sharply the closer
is to
.
Other PPI model tools:
dppi()
,
ppi_robust()
,
ppi()
,
rppi()
ppi_paramvec(AL = "diag", bL = 0, betap = -0.5, p = 3) vec <- ppi_paramvec(AL = rsymmetricmatrix(2), beta = c(-0.8, -0.7, 0)) ppi_parammats(vec) Astar <- rWishart(1, 6, diag(3))[,,1] ppi_fromAstar(Astar) ppi_toAstar(ppi_fromAstar(Astar)$AL, ppi_fromAstar(Astar)$bL)
ppi_paramvec(AL = "diag", bL = 0, betap = -0.5, p = 3) vec <- ppi_paramvec(AL = rsymmetricmatrix(2), beta = c(-0.8, -0.7, 0)) ppi_parammats(vec) Astar <- rWishart(1, 6, diag(3))[,,1] ppi_fromAstar(Astar) ppi_toAstar(ppi_fromAstar(Astar)$AL, ppi_fromAstar(Astar)$bL)
ppi_robust()
uses Windham()
and ppi()
to estimate a PPI distribution robustly.
There are many arguments to the ppi()
function and we highly recommend testing your arguments on ppi()
first before running ppi_robust()
.
ppi_robust_alrgengamma()
performs the Windham robustification algorithm exactly as described in Scealy et al. (2024) for score matching via log-ratio transform of the PPI model with . This function calls the more general
Windham()
and ppi()
.
ppi_robust(Y, cW, ...) ppi_robust_alrgengamma( Y, cW, ..., fpcontrol = list(Method = "Simple", ConvergenceMetricThreshold = 1e-10) )
ppi_robust(Y, cW, ...) ppi_robust_alrgengamma( Y, cW, ..., fpcontrol = list(Method = "Simple", ConvergenceMetricThreshold = 1e-10) )
Y |
A matrix of measurements. Each row is a measurement, each component is a dimension of the measurement. |
cW |
A vector of robustness tuning constants. Easy to build using |
... |
|
fpcontrol |
A named list of control arguments to pass to |
ppi_robust_alrgengamma()
: must fit a PPI model via additive-log ratio transform of the simplex with fixed and the final element of
fixed.
The default convergence metric and threshold are different to the default for
ppi_robust()
to match the implementation in (Scealy et al. 2024): convergence is measured by the change in the first element of , and convergence is reached when the change is smaller than
1E-6
. Override this behaviour by specifying the elements ConvergenceMetric
and ConvergenceMetricThreshold
in a list passed as fpcontrol
.
Windham()
is called with alternative_populationinverse = TRUE
.
A list:
est
The estimated parameters in vector form (paramvec
) and as AL
, bL
and beta
.
SE
"Not calculated." Returned for consistency with other estimators.
info
Information returned in the optim
slot of Windham()
. Includes the final weights in finalweights
.
Scealy JL, Hingee KL, Kent JT, Wood ATA (2024). “Robust score matching for compositional data.” Statistics and Computing, 34, 93. doi:10.1007/s11222-024-10412-w.
Other PPI model tools:
dppi()
,
ppi_param_tools
,
ppi()
,
rppi()
Other Windham robustness functions:
Windham()
,
vMF_robust()
set.seed(7) model <- rppi_egmodel(100) estsqrt <- ppi_robust(model$sample, cW = ppi_cW_auto(0.01, model$sample), paramvec_start = model$theta, trans = "sqrt", bdryw = "minsq", acut = 0.1) set.seed(14) model <- rppi_egmodel(100) ppi_robust_alrgengamma(model$sample, cW = ppi_cW_auto(0.01, model$sample), paramvec = ppi_paramvec(betap = -0.5, p = ncol(model$sample)))
set.seed(7) model <- rppi_egmodel(100) estsqrt <- ppi_robust(model$sample, cW = ppi_cW_auto(0.01, model$sample), paramvec_start = model$theta, trans = "sqrt", bdryw = "minsq", acut = 0.1) set.seed(14) model <- rppi_egmodel(100) ppi_robust_alrgengamma(model$sample, cW = ppi_cW_auto(0.01, model$sample), paramvec = ppi_paramvec(betap = -0.5, p = ncol(model$sample)))
When the score matching discrepancy function is quadratic then the gradient of the score matching discrepancy function can be written using the Hessian and an offset term. This can be useful for solving for the situation when the gradient is zero.
The Hessian and offset term are computed using CppAD
tapes.
Taylor approximation can be used for locations at removed singularities (i.e. where intermediate values are unbounded).
quadratictape_parts()
will error if testquadratic(tape)
returns FALSE
.
quadratictape_parts(tape, tmat, tcentres = NA * tmat, approxorder = 10)
quadratictape_parts(tape, tmat, tcentres = NA * tmat, approxorder = 10)
tape |
A tape of a quadratic function where the independent and dynamic parameters correspond to the |
tmat |
A matrix of vectors corresponding to values of |
tcentres |
A matrix of Taylor approximation centres for rows of |
approxorder |
The order of the Taylor approximation to use. |
A quadratic function can be written
where is considered a vector that is constant with respect to the differentiation.
The Hessian of the function is with respect to
is
The gradient of the function with respect to can then be written
where the Hessian and offset depend only on
.
The functions here evaluate the Hessian and offset for many values of
.
Tapes of the Hessian and gradient offset are created using
tapeHessian()
and tapeGradOffset()
respectively.
These tapes are then evaluated for every row of tmat
.
When the corresponding tcentres
row is not NA
, then approximate (but very accurate) results are calculated using Taylor approximation around the location given by the row of tcentres
.
For score matching is the set of model parameters and the vector
is a (multivariate) measurement.
A list of
offset
Array of offsets , each row corresponding to a row in
tmat
Hessian
Array of vectorised (see
tapeHessian()
), each row corresponding to a row in tmat
. For each row, obtain the Hessian in matrix format by using matrix(ncol = length(tape$xtape))
.
Other tape evaluators:
evaltape()
,
smvalues()
,
testquadratic()
u <- rep(1/3, 3) smdtape <- buildsmdtape("sim", "sqrt", "sph", "ppi", ytape = u, usertheta = ppi_paramvec(p = 3), bdryw = "minsq", acut = 0.01, verbose = FALSE )$smdtape quadratictape_parts(smdtape, tmat = rbind(u, c(1/4, 1/4, 1/2)))
u <- rep(1/3, 3) smdtape <- buildsmdtape("sim", "sqrt", "sph", "ppi", ytape = u, usertheta = ppi_paramvec(p = 3), bdryw = "minsq", acut = 0.01, verbose = FALSE )$smdtape quadratictape_parts(smdtape, tmat = rbind(u, c(1/4, 1/4, 1/2)))
Given parameters of the PPI model, generates independent samples.
rppi(n, ..., paramvec = NULL, maxden = 4, maxmemorysize = 1e+05) rppi_egmodel(n, maxden = 4)
rppi(n, ..., paramvec = NULL, maxden = 4, maxmemorysize = 1e+05) rppi_egmodel(n, maxden = 4)
n |
Number of samples to generate |
... |
Arguments passed on to
|
paramvec |
The PPI parameter vector, created easily using |
maxden |
This is the constant |
maxmemorysize |
Advanced use. The maximum size, in bytes, for matrices containing simulated Dirchlet samples. The default of |
We recommend running rppi()
a number of times to ensure the choice of maxden
is good. rppi()
will error when maxden
is too low.
The simulation uses a rejection-sampling algorithm with Dirichlet proposal (Appendix A.1.3 Scealy and Wood 2023).
Initially n
Dirichlet proposals are generated. After rejection there are fewer samples remaining, say .
The ratio
is used to guess the number of new Dirichlet proposals to generate until
n
samples of the PPI model are reached.
Advanced use: The number of Dirichlet proposals created at a time is limited such that the matrices storing the Dirchlet proposals are always smaller than maxmemorysize
bytes (give or take a few bytes for wrapping).
Larger maxmemorysize
leads to faster simulation so long as maxmemorysize
bytes are reliably contiguously available in RAM.
A matrix with n
rows and p
columns. Each row is an independent draw from the specified PPI distribution.
rppi_egmodel
returns a list:
sample
A matrix of the simulated samples (n
rows)
p
The number of components of the model
theta
The PPI parameter vector
AL
The parameter matrix
bL
The parameter vector
beta
The parameter vector
rppi_egmodel
: Simulates the 3-component PPI model from (Section 2.3, Scealy and Wood 2023) and returns both simulations and model parameters.
Scealy JL, Wood ATA (2023). “Score matching for compositional distributions.” Journal of the American Statistical Association, 118(543), 1811–1823. doi:10.1080/01621459.2021.2016422.
Other PPI model tools:
dppi()
,
ppi_param_tools
,
ppi_robust()
,
ppi()
beta0=c(-0.8, -0.8, -0.5) AL = diag(nrow = 2) bL = c(2, 3) samp <- rppi(100,beta=beta0,AL=AL,bL=bL) rppi_egmodel(1000)
beta0=c(-0.8, -0.8, -0.5) AL = diag(nrow = 2) bL = c(2, 3) samp <- rppi(100,beta=beta0,AL=AL,bL=bL) rppi_egmodel(1000)
A simple function for generating a symmetric matrix for use in examples. The diagonal, and upper-triangular elements of the matrix are simulated independently from a uniform distribution. The lower-triangle of the output matrix is copied from the upper-triangle. These matrices do not represent the full range of possible symmetric matrices.
rsymmetricmatrix(p, min = 0, max = 1)
rsymmetricmatrix(p, min = 0, max = 1)
p |
The desired dimension of the matrix |
min |
The minimum of the uniform distribution. |
max |
The maximum of the uniform distribution |
A p
x p
symmetric matrix.
rsymmetricmatrix(5)
rsymmetricmatrix(5)
This package includes score matching estimators for particular distributions and a general capacity to implement additional score matching estimators. Score matching is a popular estimation technique when normalising constants for the proposed model are difficult to calculate or compute. Score matching was first developed by Hyvärinen (2005) and was further developed for subsets of Euclidean space (Hyvärinen 2007; Yu et al. 2019; Yu et al. 2020; Liu et al. 2019), Riemannian manifolds (Mardia et al. 2016; Mardia 2018), and Riemannian manifolds with boundary (Scealy and Wood 2023). In this help entry we briefly describe score matching estimation.
In the most general form (Riemannian manifolds with boundary) score matching minimises the weighted Hyvärinen divergence (Equation 7, Scealy and Wood 2023)
where
is the manifold, isometrically embedded in Euclidean space, and
is the unnormalised uniform measure on
.
is the matrix that projects points onto the tangent space of the manifold at
, which is closely related to to Riemannian metric of
.
is the density of the data-generating process, defined with respect to
.
is the density of a posited model, again defined with respect to
.
is a function, termed the boundary weight function, that is zero on the boundary of
and smooth (Section 3.2, Scealy and Wood 2023) or potentially piecewise smooth.
is the Euclidean gradient operator that differentiates with respect to
.
is the Euclidean norm.
Note that, because is the projection matrix,
is the natural inner product of the gradient of the log ratio of
and
.
When the density functions and
are smooth and positive inside
,
and the boundary weight function is smooth or of particular forms for specific manifolds (Section 3.2, Scealy and Wood 2023),
then minimising the weighted Hyvärinen divergence
is equivalent to minimising the score matching discrepancy (Theorem 1, Scealy and Wood 2023)
where
and is the Laplacian operator.
We term
the score matching discrepancy function.
We suspect that (Theorem 1, Scealy and Wood 2023) holds more generally for nearly all realistic continuous and piecewise-smooth boundary weight functions, although no proof exists to our knowledge.
When independent observations from
are available, the integration in
can be approximated by an average over the observations,
If we parameterise a family of models according to a vector of parameters
, then the score matching estimate is the
that minimises
.
In general, the score matching estimate must be found via numerical optimisation techniques, such as in the function
cppad_search()
.
However, when the family of models is a canonical exponential family then often and the score matching discrepancy function is a quadratic function of
(Mardia 2018) and the minimum has a closed-form solution found by
cppad_closed()
.
Note that when has a few or more dimensions, the calculations of
,
and
can become cumbersome. This package uses
CppAD
to automatically compute ,
and
, and the quadratic simplification if it exists.
Hyvärinen divergence is sensitive to transformations of the measure
, including transforming the manifold.
That is, transforming the manifold
changes the divergence between distributions and changes the minimum of
.
The transformation changes measure
, the divergence and the estimator but does not transform the data.
For example, many different transformations of the simplex (i.e. compositional data) are possible (Appendix A.3, Scealy et al. 2024). Hyvärinen divergences that use the sphere, obtained from the simplex by a square root, have different behaviour to Hyvärinen divergence using Euclidean space obtained from the simplex using logarithms (Scealy et al. 2024). The estimator for the latter does not apply logarithms to the observations, in fact the estimator involves only polynomials of the observed compositions (Scealy et al. 2024).
The variety of estimator behaviour available through different transformations was a major motivator for this package as each transformation has different ,
and
, and without automatic differentiation, implementation of the score matching estimator in each case would require a huge programming effort.
Hyvärinen A (2005).
“Estimation of Non-Normalized Statistical Models by Score Matching.”
Journal of Machine Learning Research, 6(24), 695–709.
https://jmlr.org/papers/v6/hyvarinen05a.html.
Hyvärinen A (2007).
“Some extensions of score matching.”
Computational Statistics & Data Analysis, 51(5), 2499–2512.
doi:10.1016/j.csda.2006.09.003.
Liu S, Kanamori T, Williams DJ (2019).
“Estimating Density Models with Truncation Boundaries using Score Matching.”
doi:10.48550/arXiv.1910.03834.
Mardia K (2018).
“A New Estimation Methodology for Standard Directional Distributions.”
In 2018 21st International Conference on Information Fusion (FUSION), 724–729.
doi:10.23919/ICIF.2018.8455640.
Mardia KV, Kent JT, Laha AK (2016).
“Score matching estimators for directional distributions.”
doi:10.48550/arXiv.1604.08470.
Scealy JL, Hingee KL, Kent JT, Wood ATA (2024).
“Robust score matching for compositional data.”
Statistics and Computing, 34, 93.
doi:10.1007/s11222-024-10412-w.
Scealy JL, Wood ATA (2023).
“Score matching for compositional distributions.”
Journal of the American Statistical Association, 118(543), 1811–1823.
doi:10.1080/01621459.2021.2016422.
Yu S, Drton M, Shojaie A (2019).
“Generalized Score Matching for Non-Negative Data.”
Journal of Machine Learning Research, 20(76), 1–70.
https://jmlr.org/papers/v20/18-278.html.
Yu S, Drton M, Shojaie A (2020).
“Generalized Score Matching for General Domains.”
doi:10.48550/arXiv.2009.11428.
Computes a range of relevant information for investigating score matching estimators.
smvalues(smdtape, xmat, pmat, xcentres = NA * xmat, approxorder = 10) smvalues_wsum( tape, xmat, pmat, w = NULL, xcentres = NA * xmat, approxorder = 10 )
smvalues(smdtape, xmat, pmat, xcentres = NA * xmat, approxorder = 10) smvalues_wsum( tape, xmat, pmat, w = NULL, xcentres = NA * xmat, approxorder = 10 )
smdtape |
A taped score matching discrepancy. Most easily created by |
xmat |
A matrix of (multivariate) independent variables where each represents a single independent variable vector. Or a single independent variable vector that is used for all rows of |
pmat |
A matrix of dynamic parameters where each row specifies a new set of values for the dynamic parameters of |
xcentres |
A matrix of approximation for Taylor approximation centres for |
approxorder |
Order of Taylor approximation |
tape |
An |
w |
Weights to apply to each row of |
Computes the score matching discrepancy function from scorematchingtheory
or weighted sum of the score matching discrepancy function.
The gradient and Hessian are returned as arrays of row-vectors with each row corresponding to a row in xmat
and pmat
.
Convert a Hessian row-vector to a matrix using matrix(ncol = length(smdtape$xtape))
.
A list of
obj
the score matching discrepancy values
grad
the gradient of the score matching discrepancy
hess
the Hessian of the score matching discrepancy
Other tape evaluators:
evaltape()
,
quadratictape_parts()
,
testquadratic()
m <- rppi_egmodel(100) smdtape <- buildsmdtape("sim", "sqrt", "sph", "ppi", ytape = rep(1/m$p, m$p), usertheta = ppi_paramvec(beta = m$beta), bdryw = "minsq", acut = 0.01)$smdtape smvalues(smdtape, xmat = m$sample, pmat = m$theta[1:5]) smvalues_wsum(smdtape, m$sample, m$theta[1:5])$grad/nrow(m$sample)
m <- rppi_egmodel(100) smdtape <- buildsmdtape("sim", "sqrt", "sph", "ppi", ytape = rep(1/m$p, m$p), usertheta = ppi_paramvec(beta = m$beta), bdryw = "minsq", acut = 0.01)$smdtape smvalues(smdtape, xmat = m$sample, pmat = m$theta[1:5]) smvalues_wsum(smdtape, m$sample, m$theta[1:5])$grad/nrow(m$sample)
Uses the CppAD
parameter property and derivatives (via tapeJacobian()
) to test whether
the tape is quadratic.
testquadratic(tape, xmat = NULL, dynparammat = NULL, verbose = FALSE)
testquadratic(tape, xmat = NULL, dynparammat = NULL, verbose = FALSE)
tape |
An |
xmat |
If non- |
dynparammat |
If non- |
verbose |
If TRUE information about the failed tests is printed. |
Uses the xtape
and dyntape
values stored in tape
to create new tapes.
A tape of the Hessian is obtained by applying tapeJacobian()
twice, and then uses a CppAD
property to test whether the Hessian is constant. A function of quadratic form should have constant Hessian.
If xmat
and dynparammat
are non-NULL
then testquadratic()
also checks the Jacobian of the Hessian at xmat
and dynparammat
values. For quadratic form functions the Jacobian of the Hessian should be zero.
TRUE
or FALSE
Other tape evaluators:
evaltape()
,
quadratictape_parts()
,
smvalues()
tapes <- buildsmdtape( "sim", "sqrt", "sph", ll = "ppi", ytape = c(0.2, 0.3, 0.5), usertheta = ppi_paramvec(p = 3), bdryw = "minsq", acut = 0.1, verbose = FALSE) testquadratic(tapes$smdtape)
tapes <- buildsmdtape( "sim", "sqrt", "sph", ll = "ppi", ytape = c(0.2, 0.3, 0.5), usertheta = ppi_paramvec(p = 3), bdryw = "minsq", acut = 0.1, verbose = FALSE) testquadratic(tapes$smdtape)
In general the normalising constant in von Mises Fisher distributions is hard to compute, so Mardia et al. (2016) suggested a hybrid method that uses maximum likelihood to estimate the mean direction and score matching for the concentration.
We can also estimate all parameters using score matching (smfull
method), although this estimator is likely to be less efficient than the hybrid estimator.
On the circle the hybrid estimators were often nearly as efficient as maximum likelihood estimators (Mardia et al. 2016).
For maximum likelihood estimators of the von Mises Fisher distribution, which all use approximations of the normalising constant, consider movMF::movMF()
.
vMF(Y, paramvec = NULL, method = "Mardia", w = rep(1, nrow(Y)))
vMF(Y, paramvec = NULL, method = "Mardia", w = rep(1, nrow(Y)))
Y |
A matrix of multivariate observations in Cartesian coordinates. Each row is a multivariate measurement (i.e. each row corresponds to an individual). |
paramvec |
|
method |
Either "Mardia" or "hybrid" for the hybrid score matching estimator from Mardia et al. (2016) or "smfull" for the full score matching estimator. |
w |
An optional vector of weights for each measurement in |
The full score matching estimator (method = "smfull"
) estimates .
The hybrid estimator (
method = "Mardia"
) estimates and
separately.
Both use
cppad_closed()
for score matching estimation.
A list of est
, SE
and info
.
est
contains the estimates in vector form, paramvec
, and with user friendly names k
and m
.
SE
contains estimates of the standard errors if computed. See cppad_closed()
.
info
contains a variety of information about the model fitting procedure and results.
The von Mises Fisher density is proportional to
where is on a unit sphere,
is termed the concentration,
and
is the mean direction unit vector.
The effect of the
and
can be decoupled in a sense (p169, Mardia and Jupp 2000), allowing for estimating
and
separately.
Mardia KV, Jupp PE (2000).
Directional Statistics, Probability and Statistics.
Wiley, Great Britain.
ISBN 0-471-95333-4.
Mardia KV, Kent JT, Laha AK (2016).
“Score matching estimators for directional distributions.”
doi:10.48550/arXiv.1604.08470.
Other directional model estimators:
Bingham()
,
FB()
,
vMF_robust()
if (requireNamespace("movMF")){ Y <- movMF::rmovMF(1000, 100 * c(1, 1) / sqrt(2)) movMF::movMF(Y, 1) #maximum likelihood estimate } else { Y <- matrix(rnorm(1000 * 2, sd = 0.01), ncol = 2) Y <- Y / sqrt(rowSums(Y^2)) } vMF(Y, method = "smfull") vMF(Y, method = "Mardia") vMF(Y, method = "hybrid")
if (requireNamespace("movMF")){ Y <- movMF::rmovMF(1000, 100 * c(1, 1) / sqrt(2)) movMF::movMF(Y, 1) #maximum likelihood estimate } else { Y <- matrix(rnorm(1000 * 2, sd = 0.01), ncol = 2) Y <- Y / sqrt(rowSums(Y^2)) } vMF(Y, method = "smfull") vMF(Y, method = "Mardia") vMF(Y, method = "hybrid")
Robust estimation for von Mises Fisher distribution using Windham()
.
vMF_robust(Y, cW, ...)
vMF_robust(Y, cW, ...)
Y |
A matrix of observations in Cartesian coordinates. |
cW |
Tuning constants for each parameter in the vMF parameter vector. If a single number then the constant is the same for each element of the parameter vector. |
... |
Other directional model estimators:
Bingham()
,
FB()
,
vMF()
Other Windham robustness functions:
Windham()
,
ppi_robust()
if (requireNamespace("movMF")){ Y <- movMF::rmovMF(1000, 100 * c(1, 1) / sqrt(2)) } else { Y <- matrix(rnorm(1000 * 2, sd = 0.01), ncol = 2) Y <- Y / sqrt(rowSums(Y^2)) } vMF_robust(Y, cW = c(0.01, 0.01), method = "smfull") vMF_robust(Y, cW = c(0.01, 0.01), method = "Mardia")
if (requireNamespace("movMF")){ Y <- movMF::rmovMF(1000, 100 * c(1, 1) / sqrt(2)) } else { Y <- matrix(rnorm(1000 * 2, sd = 0.01), ncol = 2) Y <- Y / sqrt(rowSums(Y^2)) } vMF_robust(Y, cW = c(0.01, 0.01), method = "smfull") vMF_robust(Y, cW = c(0.01, 0.01), method = "Mardia")
Performs a generalisation of Windham's robustifying method (Windham 1995) for exponential models with natural parameters that are a linear function of the parameters for estimation. Estimators must solve estimating equations of the form
The estimate is found iteratively through a fixed point method as suggested by Windham (1995).
Windham( Y, estimator, ldenfun, cW, ..., fpcontrol = list(Method = "Simple", ConvergenceMetricThreshold = 1e-10), paramvec_start = NULL, alternative_populationinverse = FALSE )
Windham( Y, estimator, ldenfun, cW, ..., fpcontrol = list(Method = "Simple", ConvergenceMetricThreshold = 1e-10), paramvec_start = NULL, alternative_populationinverse = FALSE )
Y |
A matrix of measurements. Each row is a measurement, each component is a dimension of the measurement. |
estimator |
A function that estimates parameters from weighted observations.
It must have arguments |
ldenfun |
A function that returns a vector of values proportional to the log-density for a matrix of observations |
cW |
A vector of robustness tuning constants. When computing the weight for an observation the parameter vector is multiplied element-wise with |
... |
Arguments passed to |
fpcontrol |
A named list of control arguments to pass to |
paramvec_start |
Initially used to check the function |
alternative_populationinverse |
The default is to use |
For any family of models with density , Windham's method finds the parameter set
such that the estimator applied to observations weighted by
returns an estimate that matches the theoretical effect of weighting the full population of the model.
When
is proportional to
and
is linear, these weights are equivalent to
and the theoretical effect of the weighting on the full population is to scale the parameter vector
by
.
The function Windham()
assumes that is proportional to
and
is linear. It allows a generalisation where
is a vector so the weight for an observation
is
where is the parameter vector,
is a vector of tuning constants, and
is the element-wise product (Hadamard product).
The solution is found iteratively (Windham 1995).
Given a parameter set ,
Windham()
first computes weights for each observation
.
Then, a new parameter set
is estimated by
estimator
with the computed weights.
This new parameter set is element-wise-multiplied by the (element-wise) reciprocal of to obtain an adjusted parameter set
.
The estimate returned by
Windham()
is the parameter set such that
.
A list:
paramvec
the estimated parameter vector
optim
information about the fixed point iterations and optimisation process. Including a slot finalweights
for the weights in the final iteration.
Other generic score matching tools:
cppad_closed()
,
cppad_search()
Other Windham robustness functions:
ppi_robust()
,
vMF_robust()
if (requireNamespace("movMF")){ Y <- movMF::rmovMF(1000, 100 * c(1, 1) / sqrt(2)) } else { Y <- matrix(rnorm(1000 * 2, sd = 0.01), ncol = 2) Y <- Y / sqrt(rowSums(Y^2)) } Windham(Y = Y, estimator = vMF, ldenfun = function(Y, theta){ #here theta is km return(drop(Y %*% theta)) }, cW = c(0.01, 0.01), method = "Mardia")
if (requireNamespace("movMF")){ Y <- movMF::rmovMF(1000, 100 * c(1, 1) / sqrt(2)) } else { Y <- matrix(rnorm(1000 * 2, sd = 0.01), ncol = 2) Y <- Y / sqrt(rowSums(Y^2)) } Windham(Y = Y, estimator = vMF, ldenfun = function(Y, theta){ #here theta is km return(drop(Y %*% theta)) }, cW = c(0.01, 0.01), method = "Mardia")
Returns the matrix which reverses the effect of weights on a population for certain models.
Windham_populationinverse(cW) Windham_populationinverse_alternative(newtheta, previoustheta, cW, cWav)
Windham_populationinverse(cW) Windham_populationinverse_alternative(newtheta, previoustheta, cW, cWav)
cW |
A vector of tuning constants for the Windham robustification method performed by |
newtheta |
The parameter vector most recently estimated |
previoustheta |
The parameter vector estimated in the previous step |
cWav |
The value of the non-zero elements of |
In the Windham robustification method (Windham()
) the effect of weighting a population plays a central role.
When the
the model density is proportional to ,
where
is a vector of sufficient statistics for a measurement
,
and
is a linear function,
Then weights proportional to
,
where
is a vector of tuning constants and
is the Hadamard (element-wise) product,
have a very simple effect on the population parameter vector
:
the weighted population follows a density of the same form, but with a parameter vector of
.
The inverse of this change to the parameter vector is then a matrix multiplication by a diagonal matrix with elements
, with
denoting the elements of
.
A diagonal matrix with the same number of columns as cW
.
Windham_populationinverse
: The matrix with diagonal elements
Windham_populationinverse_alternative
: The transform implemented as described by Scealy et al. (2024). It is mathematically equivalent to multiplication by the result of Windham_populationinverse()
in the situation in Scealy et al. (2024).