You find more information about pyOptimalEstimation and examples in:
Maahn, M., D. D. Turner, U. Löhnert, D. J. Posselt, K. Ebell, G. G. Mace, and J. M. Comstock, 2020: Optimal Estimation Retrievals and Their Uncertainties: What Every Atmospheric Scientist Should Know. Bull. Amer. Meteor. Soc., doi:https://doi.org/10.1175/BAMS-D-19-0027.1
Please reference to our publication if you use the pyOptimalEstimation package
The core optimalEstimation class, which contains all required parameters.
See [1] for an extensive introduction into Optimal Estimation theory,
[2] discusses this library
Parameters:
x_vars (list of str) – names of the elements of state vector x.
x_a (pd.Series or list or np.ndarray) – prior information of state x.
S_a (pd.DataFrame or list or np.ndarray) – covariance matrix of state x.
y_vars (list of str) – names of the elements of state vector x
y_obs (pd.Series or list or np.ndarray) – observed measurement vector y.
S_y (pd.DataFrame or list or np.ndarray) – covariance matrix of measurement y. If there is no b vector, S_y
is equal to S_e
forward (function) – forward model expected as forward(xb,**forwardKwArgs):returny
with xb = pd.concat((x,b)).
userJacobian (function, optional) – For forward models that can calculate the Jacobian internally (e.g.
RTTOV), a call to estimate the Jacobian can be added. Otherwise, the
Jacobian is estimated by pyOE using the standard ‘forward’ call. The
function is expected as self.userJacobian(xb,self.perturbation,\self.y_vars,**self.forwardKwArgs):returnjacobian
with xb = pd.concat((x,b)). Defaults to None
x_truth (pd.Series or list or np.ndarray, optional) – If truth of state x is known, it can added to the data object. If
provided, the value will be used for the routines linearityTest and
plotIterations, but _not_ by the retrieval itself. Defaults to None.
b_vars (list of str, optional) – names of the elements of parameter vector b. Defaults to [].
b_p (pd.Series or list or np.ndarray.) – parameter vector b. defaults to []. Note that defining b_p makes
only sense if S_b != 0. Otherwise it is easier (and cheaper) to
hardcode b into the forward operator.
S_b (pd.DataFrame or list or np.ndarray) – covariance matrix of parameter b. Defaults to [[]].
forwardKwArgs (dict,optional) – additional keyword arguments for forward function.
multipleForwardKwArgs (dict,optional) – additional keyword arguments for forward function in case multiple
profiles should be provided to the forward operator at once. If not
defined, forwardKwArgs is used instead and forward is called
for every profile separately
x_lowerLimit (dict, optional) – reset state vector x[key] to x_lowerLimit[key] in case x_lowerLimit is
undercut. Defaults to {}.
x_upperLimit (dict, optional) – reset state vector x[key] to x_upperLimit[key] in case x_upperLimit is
exceeded. Defaults to {}.
perturbation (float or dict of floats, optional) – relative perturbation of state vector x to estimate the Jacobian. Can
be specified for every element of x separately. Defaults to 0.1 of
prior.
disturbance (float or dict of floats, optional) – DEPRECATED: Identical to perturbation option. If both options are
provided, perturbation is used instead.
useFactorInJac (bool,optional) – True if disturbance should be applied by multiplication, False if it
should be applied by addition of fraction of prior. Defaults to False.
gammaFactor (list of floats, optional) – Use additional gamma parameter for retrieval, see [3].
convergenceTest ({'x', 'y', 'auto'}, optional) – Apply convergence test in x or y-space. If ‘auto’ is
selected, the test will be done in x-space if len(x) <= len(y) and in
y-space otherwise. Experience shows that in both cases convergence is
faster in x-space without impacting retrieval quality. Defaults to ‘x’.
convergenceFactor (int, optional) – Factor by which the convergence criterion needs to be smaller than
len(x) or len(y)
verbose (bool, optional) – True or not present: iteration, residual, etc. printed to screen during
normal operation. If False, it will turn off such notifications.
G. Mace, and J. M. Comstock, 2020: Optimal Estimation Retrievals and Their
Uncertainties: What Every Atmospheric Scientist Should Know. Bull. Amer.
Meteor. Soc., 101, E1512–E1523, https://doi.org/10.1175/BAMS-D-19-0027.1.
Uncertainties in Thermodynamic Profiles and Liquid Cloud Properties
Retrieved from the Ground-Based Atmospheric Emitted Radiance
Interferometer (AERI). Journal of Applied Meteorology & Climatology, 53,
752–771, doi:10.1175/JAMC-D-13-0126.1.
test whether the solution is moderately linear following chapter
5.1 of Rodgers 2000.
values lower than 1 indicate that the effect of linearization is
smaller than the measurement error and problem is nearly linear.
Populates self.linearity.
Parameters:
maxErrorPatterns (int, optional) – maximum number of error patterns to return. Provide None to return
all.
significance (real, optional) –
significance level, defaults to 0.05, i.e. probability is 5% that
correct null hypothesis is rejected. Only used when testing
against x_truth.
atol (float (default 1e-5)) – The absolute tolerance for comparing eigen values to zero. We
found that values should be than the numpy.isclose default value
of 1e-8.
Returns:
self.linearity (float) – ratio of error due to linearization to measurement error sorted by
size. Should be below 1 for all.
self.trueLinearityChi2 (float) – Chi2 value that model is moderately linear based on ‘self.x_truth’.
Must be smaller than critical value to conclude that model is
linear.
test with significance level ‘significance’ whether
A) optimal solution agrees with observation in Y space
B) observation agrees with prior in Y space
C) optimal solution agrees with prior in Y space
D) optimal solution agrees with prior in X space
Parameters:
significance (real, optional) –
significance level, defaults to 0.05, i.e. probability is 5% that
correct null hypothesis is rejected.
Returns:
Pandas Series (dtype bool) – True if test is passed
Pandas Series (dtype float) – Chi2 value for tests. Must be smaller than critical value to pass
tests.
Pandas Series (dtype float) – Critical Chi2 value for tests
test with significance level ‘significance’ whether retrieval agrees
with measurements (see chapter 12.3.2 of Rodgers, 2000)
Parameters:
significance (real, optional) –
significance level, defaults to 0.05, i.e. probability is 5% that
correct null hypothesis is rejected.
atol (float (default 1e-5)) – The absolute tolerance for comparing eigen values to zero. We
found that values should be than the numpy.isclose default value
of 1e-8.
Returns:
chi2Passed (bool) – True if chi² test passed, i.e. OE retrieval agrees with
measurements and null hypothesis is NOT rejected.
chi2 (real) – chi² value
chi2TestY (real) – chi² cutoff value with significance ‘significance’
test with significance level ‘significance’ whether measurement agrees
with prior (see chapter 12.3.3.1 of Rodgers, 2000)
Parameters:
significance (real, optional) –
significance level, defaults to 0.05, i.e. probability is 5% that
correct null hypothesis is rejected.
atol (float (default 1e-5)) – The absolute tolerance for comparing eigen values to zero. We
found that values should be than the numpy.isclose default value
of 1e-8.
Returns:
YObservationPrior (bool) – True if chi² test passed, i.e. OE retrieval agrees with
measurements and null hypothesis is NOT rejected.
YObservationPrior (real) – chi² value
chi2TestY (real) – chi² cutoff value with significance ‘significance’
test with significance level ‘significance’ whether retrieval result agrees
with prior in y space (see chapter 12.3.3.3 of Rodgers, 2000)
Parameters:
significance (real, optional) –
significance level, defaults to 0.05, i.e. probability is 5% that
correct null hypothesis is rejected.
atol (float (default 1e-5)) – The absolute tolerance for comparing eigen values to zero. We
found that values should be than the numpy.isclose default value
of 1e-8.
Returns:
chi2Passed (bool) – True if chi² test passed, i.e. OE retrieval agrees with
Prior and null hypothesis is NOT rejected.
chi2 (real) – chi² value
chi2TestY (real) – chi² cutoff value with significance ‘significance’
test with significance level ‘significance’ whether retrieval agrees
with prior in x space (see chapter 12.3.3.3 of Rodgers, 2000)
Parameters:
significance (real, optional) –
significance level, defaults to 0.05, i.e. probability is 5% that
correct null hypothesis is rejected.
atol (float (default 1e-5)) – The absolute tolerance for comparing eigen values to zero. We
found that values should be than the numpy.isclose default value
of 1e-8.
Returns:
chi2Passed (bool) – True if chi² test passed, i.e. OE retrieval agrees with
Prior and null hypothesis is NOT rejected.
chi2 (real) – chi² value
chi2TestX (real) – chi² cutoff value with significance ‘significance’
Plot the retrieval results using 4 panels: (1) iterations of x
(normalized to self.x_truth or x[0]), (2) iterations of y (normalized
to y_obs), (3) iterations of degrees of freedom, (4) iterations of
convergence criteria
Parameters:
fileName (str, optional) – plot is saved to fileName, if provided
cmap (str, optional) – colormap for 1st and 2nd panel (default ‘hsv’)
figsize (tuple, optional) – Figure size in inch (default (8, 10))
legend (bool, optional) – Add legend for X and Y (defualt True)
mode (str, optional) – plot ‘ratio’ or ‘difference’ to truth/prior/measurements
(defualt: ratio)
Provide a summary of the retrieval results as a dictionary.
Parameters:
returnXarray ({bool}, optional) – return xarray dataset instead of dict. Can be easily combined when
applying the retrieval multiple times. (the default is False)
combineXB ({bool}, optional) – append b parameter values to state vector X variables. Can be useful
for comparing runs with and without b parameters.
Wrapper function for np.linalg.inv, because original function reports
LinAlgError if nan in array for some numpy versions. We want that the
retrieval is robust with respect to that. Also, checks for singular
matrices were added.
Parameters:
A ((..., M, M) array_like) – Matrix to be inverted.
raise_error ({bool}, optional) – ValueError is raised if A is singular (the default is True)