Bayesian Mass Balance — BMB • BayesMassBal

Allows the user to specify the covariance structure for a Bayesian mass balance, simulates draws from reconciled masses and relevant covariance matrix, and approximates the log-marginal likelihood.

Usage

BMB(
  X,
  y,
  cov.structure = c("indep", "component", "location"),
  priors = "default",
  BTE = c(500, 20000, 1),
  lml = FALSE,
  ybal = TRUE,
  diagnostics = TRUE,
  verb = 1
)

Arguments

X: A matrix that maps constrained masses to observed masses. Can be built from the function constrainProcess, see documentation for details.
y: A list of matrices of observed mass flow rates. Each matrix is a separate sample component. The rows of each matrix index the sampling location, and the columns index the sample set number. Can be specified using the importObservations function.
cov.structure: Character string. "indep" allows for no correlation. "component" indicates correlation within an individual sample component. "location" indicates correlation within an individual sampling location. Not specifying cov.structure defaults to the "indep" structure.
priors: List or character string. When the default value priors = "default" is used, the BMB uses a set of default conjugate priors. When passing a list to the argument, the list must contain user specified hyperparameter values for each conjugate prior. To see the required list structure run BMB with BTE = c(1,2,1) and inspect the output. When priors = "Jeffreys" the Jeffreys priors for $\Sigma$ and $\sigma^2$ with known mean given in (Yang and Berger 1996) . When priors = "Jeffreys", the prior used for $\beta$ is proportional to the indicator function $I\lbrack \beta > 0 \rbrack$. See Details for more information.
BTE: Numeric vector giving c(Burn-in, Total-iterations, and Every) for MCMC approximation of target distributions. The function BMB produces a total number of samples of $(T - B)/E$. $E$ specifies that only one of every $E$ draws are saved. $E > 1$ reduces autocorrelation between obtained samples at the expense of computation time.
lml: Logical indicating if the log-marginal likelihood should be approximated. Default is FALSE, which reduces computation time. Log-marginal likelihood is approximated using methods in (Chib 1995) .
ybal: Logical indicating if the mass balanced samples for each $y$ should be returned. Default is TRUE. Setting ybal=FALSE results in a savings in RAM and computation time.
diagnostics: Logical or list indicating if diagnostic functions geweke.diag and effectiveSize (Plummer et al. 2006) should computed for the obtained samples. The default of TRUE indicates diagnostics should be run with their default parameters. Alternatively, passing a list of the structure list(frac1 = 0.1, frac2 = 0.5) will run both diagnostics and allow geweke.diag to be run with parameters other than the default.
verb: Numeric indicating verbosity of progress printed to R-console. The default of 1 prints messages and a progress bar to the console during all iterative methods. verb = 0 indicates no messages are printed.

Value

Returns a list of outputs

beta

List of matrices of samples from the distribution of reconciled data. Each matrix in the list is a separate sample component. Each column of a matrix in beta is a draw from the target distribution.

Sig

List of matrices containing draws from the distribution of each covariance matrix. If S.t is the $t^{th}$ draw from the distribution of covariance matrix S and:

cov.structure = "component" or "location", the $t^{th}$ column of a matrix in Sig is equal to S.t[upper.tri(S.t, diag = TRUE)].

priors

List of prior hyperparameters used in generating conditional posterior distributions and approximating log-marginal likelihood. The structure of the input argument priors is required to be the same as the structure of this returned list slice. See Details.

cov.structure

Character string containing the covariance structure used.

y.cov

List of character matrices indicating details for the structure of each covariance matrix. Only returned when cov.structure = "location"

lml

Numeric of the log-marginal likelihood approximation. Returns NA when lml = FALSE

diagnostics

List containing results from diagnostic functions geweke.diag and effectiveSize

ybal

List of samples from the distribution of reconciled mass flow rates, in the same format as the function argument y. Produced with argument ybal = TRUE. Equivalent to lapply(BMB(...)$beta,function(X,x){x %*% X} , x = X) . Viewing this output is more intuitive than viewing samples of beta, at the expense of RAM and some computation time.

X

The function argument X is passed to the output so that it can be used with other BayesMassBal functions.

type

Character string used by plot.BayesMassBal. type = "BMB" for an object returned from the BMB function.

Details

See vignette("Two_Node_Process", package = "BayesMassBal") for further details on how to use function outputs.

When the priors argument is left unspecified, a set of default conjugate priors are used, which are chosen to allow BMB() to work well in a general setting. In the current version of the BayesMassBal package, only the conjugate priors stated below can be used, but hyperparameter values can be specified by the user using the priors argument.

The prior distribution on beta is a normal distribution truncated at 0. The mean of this distribution before truncation is the ordinary least squares (OLS) estimate of $\beta$. OLS estimates less than 0, are changed to 0. The prior variance, before truncation, of each element of $\beta$ is set to:

$$10^{\mathrm{number of integer digits of an element of } \beta + 6}$$

Currently, there is only support for diagonal prior covariance matrices for $\beta$

When cov.structure = "indep" the error of all observations in a sample set are independent. An inverse gamma prior distribution, with $\alpha_0 = 0.000001$ and $\beta_0 = 0.000001$, is placed on the variance of the mass flow rate for each sample component at each sample location.

When cov.structure = "component" or "location", the prior distribution on $\Sigma_i$ is inverse Wishart $(\nu_0, \nu_0 \times S_0)$. The degrees of freedom parameter, $\nu_0$, is equal to the dimension of $\Sigma_i$. The scale matrix parameter is equal to a matrix, $S_0$, with the sample variance of the relevant observation on the diagonal, multiplied by $\nu_0$.

The user is able to specify the prior hyperparameters of the mean and variance of beta, $\alpha_0$ and $\beta_0$ for each $\sigma^2$, and the degrees of freedom and scale matrix for each $\Sigma_i$ using the priors argument. It is advisable for the user to specify their own prior hyperparameters for $p(\sigma^2)$ if the variance of any element is well under 1, or $p(\beta)$ if the there is a wide range in the magnitude of observations.

When priors = "Jeffreys" Jeffreys priors are used for the prior distribution of the variance and covariance parameters. Priors used are $p(\sigma^2) \propto \frac{1}{\sigma^2}$ and $p(\Sigma) \propto |\Sigma|^{-(p+1)/2}$, as listed in (Yang and Berger 1996) . The Jeffreys prior for a $\beta$ with infinite support is $p(\beta) \propto 1$. To preserve the prior information that $\beta > 0$, $p(\beta)\propto I\lbrack \beta > 0 \rbrack$ is chosen. It is not possible to calculate log-marginal likelihood using the methods in (Chib 1995) with Jeffreys priors. Therefore, if priors = "Jeffreys" and lml = TRUE, the lml argument will be ignored and a warning will be printed.

lml is reported in base $e$. See here for some guidance on how to interpret Bayes Factors, but note log base 10 is used on Wikipedia.

References

Chib S (1995). “Marginal likelihood from the Gibbs output.” Journal of the american statistical association, 90(432), 1313--1321. Casella G, George EI (1992). “Explaining the Gibbs sampler.” The American Statistician, 46(3), 167--174. Plummer M, Best N, Cowles K, Vines K (2006). “CODA: Convergence Diagnosis and Output Analysis for MCMC.” R News, 6(1), 7--11. https://journal.r-project.org/archive/. Yang R, Berger JO (1996). A catalog of noninformative priors. Institute of Statistics and Decision Sciences, Duke University.

Examples

y <- importObservations(file = system.file("extdata", "twonode_example.csv",
                                    package = "BayesMassBal"),
                   header = TRUE, csv.params = list(sep = ";"))

C <- matrix(c(1,-1,0,-1,0,0,1,-1,0,-1), byrow = TRUE, ncol = 5, nrow = 2)
X <- constrainProcess(C = C)

BMB_example <- BMB(X = X, y = y, cov.structure = "indep",
                   BTE = c(10,300,1), lml = FALSE, verb=0)

summary(BMB_example)
#> Mass Flow Rates:
#> 
#> CuFeS2:
#> --------------------
#>  Sampling Location Expected Value    95% LB     95% UB
#>                  1     1.19311040 1.1421729 1.23467276
#>                  2     1.16962330 1.1211585 1.21267635
#>                  3     1.10673464 1.0600670 1.14944982
#>                  4     0.02348710 0.0138185 0.03412558
#>                  5     0.06288866 0.0553323 0.07481783
#> 
#> gangue:
#> --------------------
#>  Sampling Location Expected Value     95% LB      95% UB
#>                  1     99.2508540 95.6684424 102.0460354
#>                  2      6.6095539  5.8357639   7.2884030
#>                  3      0.2634349  0.2326396   0.2913704
#>                  4     92.6413001 89.2196506  95.6711067
#>                  5      6.3461190  5.5881684   7.0089890
#> 
#> Total:
#> --------------------
#>  Sampling Location Expected Value    95% LB     95% UB
#>                  1     100.443964 96.831132 103.233470
#>                  2       7.779177  7.060423   8.519826
#>                  3       1.370170  1.308998   1.422567
#>                  4      92.664787 89.245861  95.700231
#>                  5       6.409008  5.651150   7.077618
#> 
#> 
#> log-marginal likelihood:
#> NA