`pyOptimalEstimation` Package¶

Python package to solve an inverse problem using Optimal Estimation and an arbitrary Forward model following Rodgers, 2000.

Download¶

The code is available at https://github.com/maahn/pyOptimalEstimation

Reference¶

You find more information about pyOptimalEstimation and examples in:

Maahn, M., D. D. Turner, U. Löhnert, D. J. Posselt, K. Ebell, G. G. Mace, and J. M. Comstock, 2020: Optimal Estimation Retrievals and Their Uncertainties: What Every Atmospheric Scientist Should Know. Bull. Amer. Meteor. Soc., doi:https://doi.org/10.1175/BAMS-D-19-0027.1

Please reference to our publication if you use the pyOptimalEstimation package

Examples¶

Please see pyOptimalEstimation/examples for a minimal working example. For more extensive, interactive examples, check out https://github.com/maahn/pyOptimalEstimation_examples and our paper.

Installation¶

Make sure you use Python 2.7, 3.6 or newer.

Change to the folder containing the project and do

python setup.py install

in the terminal. If you do not have root privileges, you can also do

python setup.py install --user

which will install pyOptimalEstimation in userbase/lib/pythonX.Y/site-packages or

python setup.py install --home=~

which will install pyOptimalEstimation in ~/lib/python.

API documentation¶

class pyOptimalEstimation.pyOEcore.optimalEstimation(x_vars, x_a, S_a, y_vars, y_obs, S_y, forward, userJacobian=None, x_truth=None, b_vars=[], b_p=[], S_b=[[]], x_lowerLimit={}, x_upperLimit={}, useFactorInJac=False, gammaFactor=None, perturbation=0.1, disturbance=None, convergenceFactor=10, convergenceTest='x', forwardKwArgs={}, multipleForwardKwArgs=None, verbose=None)¶

Bases: object

The core optimalEstimation class, which contains all required parameters. See [1] for an extensive introduction into Optimal Estimation theory, [2] discusses this library

Parameters:

x_vars (list of str) – names of the elements of state vector x.
x_a (pd.Series or list or np.ndarray) – prior information of state x.
S_a (pd.DataFrame or list or np.ndarray) – covariance matrix of state x.
y_vars (list of str) – names of the elements of state vector x
y_obs (pd.Series or list or np.ndarray) – observed measurement vector y.
S_y (pd.DataFrame or list or np.ndarray) – covariance matrix of measurement y. If there is no b vector, S_y is equal to S_e
forward (function) – forward model expected as forward(xb,**forwardKwArgs): return y with xb = pd.concat((x,b)).
userJacobian (function, optional) – For forward models that can calculate the Jacobian internally (e.g. RTTOV), a call to estimate the Jacobian can be added. Otherwise, the Jacobian is estimated by pyOE using the standard ‘forward’ call. The function is expected as self.userJacobian(xb, self.perturbation, \ self.y_vars, **self.forwardKwArgs): return jacobian with xb = pd.concat((x,b)). Defaults to None
x_truth (pd.Series or list or np.ndarray, optional) – If truth of state x is known, it can added to the data object. If provided, the value will be used for the routines linearityTest and plotIterations, but _not_ by the retrieval itself. Defaults to None.
b_vars (list of str, optional) – names of the elements of parameter vector b. Defaults to [].
b_p (pd.Series or list or np.ndarray.) – parameter vector b. defaults to []. Note that defining b_p makes only sense if S_b != 0. Otherwise it is easier (and cheaper) to hardcode b into the forward operator.
S_b (pd.DataFrame or list or np.ndarray) – covariance matrix of parameter b. Defaults to [[]].
forwardKwArgs (dict,optional) – additional keyword arguments for forward function.
multipleForwardKwArgs (dict,optional) – additional keyword arguments for forward function in case multiple profiles should be provided to the forward operator at once. If not defined, forwardKwArgs is used instead and forward is called for every profile separately
x_lowerLimit (dict, optional) – reset state vector x[key] to x_lowerLimit[key] in case x_lowerLimit is undercut. Defaults to {}.
x_upperLimit (dict, optional) – reset state vector x[key] to x_upperLimit[key] in case x_upperLimit is exceeded. Defaults to {}.
perturbation (float or dict of floats, optional) – relative perturbation of state vector x to estimate the Jacobian. Can be specified for every element of x separately. Defaults to 0.1 of prior.
disturbance (float or dict of floats, optional) – DEPRECATED: Identical to perturbation option. If both options are provided, perturbation is used instead.
useFactorInJac (bool,optional) – True if disturbance should be applied by multiplication, False if it should be applied by addition of fraction of prior. Defaults to False.
gammaFactor (list of floats, optional) – Use additional gamma parameter for retrieval, see [3].
convergenceTest ({'x', 'y', 'auto'}, optional) – Apply convergence test in x or y-space. If ‘auto’ is selected, the test will be done in x-space if len(x) <= len(y) and in y-space otherwise. Experience shows that in both cases convergence is faster in x-space without impacting retrieval quality. Defaults to ‘x’.
convergenceFactor (int, optional) – Factor by which the convergence criterion needs to be smaller than len(x) or len(y)
verbose (bool, optional) – True or not present: iteration, residual, etc. printed to screen during normal operation. If False, it will turn off such notifications.

converged¶

True if retrieval converged successfully

Type:: boolean

x_op¶

optimal state given the observations, i.e. retrieval solution

Type:: pd.Series

y_op¶

Optimal y, i.e. observation associated with retrieval solution

Type:: pd.Series

S_op¶

covariance of x_op, i.e. solution uncertainty

Type:: pd.DataFrame

x_op_err¶

1 sigma errors of x_op. derived with sqrt(diag(S_op))

Type:: pd.Series

convI¶

iteration where convergence was achieved

Type:: int

K_i¶

list of Jacobians for iteration i.

Type:: list of pd.DataFrame

x_i¶

iterations of state vector x

Type:: list of pd.Series

y_i¶

iterations of measurement vector y

Type:: list of pd.Series

dgf_i¶

degrees of freedom for each iteration

Type:: list of float

A_i¶

Averaging kernel for each iteration

Type:: list of pd.DataFrame

d_i2¶

convergence criteria for each iteration

Type:: list of float

S_aposteriori_i¶

a posteriori covariance matrix of x for each iteration

Type:: list of pd.DataFrame

gam_i¶

gamma parameters used in retrievals, see also gammaFactor and [1].

Type:: list of floats

dgf¶

total degrees of freedom for signal of the retrieval solution

Type:: float

dgf_x¶

degrees of freedom for signal per state variable

Type:: pd.Series

Returns:: returns the pyOptimalEstimation object
Return type:: pyOptimalEstimation object

References

Theory and Practice. World Scientific Publishing Company, 240 pp. https://library.wmo.int/index.php?lvl=notice_display&id=12279.

G. Mace, and J. M. Comstock, 2020: Optimal Estimation Retrievals and Their Uncertainties: What Every Atmospheric Scientist Should Know. Bull. Amer. Meteor. Soc., 101, E1512–E1523, https://doi.org/10.1175/BAMS-D-19-0027.1.

Uncertainties in Thermodynamic Profiles and Liquid Cloud Properties Retrieved from the Ground-Based Atmospheric Emitted Radiance Interferometer (AERI). Journal of Applied Meteorology & Climatology, 53, 752–771, doi:10.1175/JAMC-D-13-0126.1.

getJacobian(xb, y)¶

Author: M. Echeverri, May 2021.

estimate Jacobian using the forward model and the specified perturbation

Parameters:

xb (pd.Series or list or np.ndarray) – combination of state vector x and parameter vector b
y (pd.Series or list or np.ndarray) – measurement vector for xb

Returns:

pd.DataFrame – Jacobian around x
pd.DataFrame – Jacobian around b

getJacobian_external(xb, y)¶

Author: M. Echeverri, June 2021.

estimate Jacobian using the external function provided by user and the specified perturbation. This method has external dependencies

Parameters:

xb (pd.Series or list or np.ndarray) – combination of state vector x and parameter vector b
y (pd.Series or list or np.ndarray) – measurement vector for xb

Returns:

pd.DataFrame – Jacobian around x
pd.DataFrame – Jacobian around b

doRetrieval(maxIter=10, x_0=None, maxTime=10000000.0)¶

run the retrieval

Parameters:

maxIter (int, optional) – maximum number of iterations, defaults to 10
x_0 (pd.Series or list or np.ndarray, optional) – first guess for x. If x_0 == None, x_a is taken as first guess.
maxTime (int, optional) – maximum runTime, defaults to 1e7 (~ 4 months). Note that the forward model is not killed if time is exceeded

Returns:

True is convergence was obtained.

Return type:

bool

property y_a¶: Estimate the observations corresponding to the prior.

linearityTest(maxErrorPatterns=10, significance=0.05, atol=1e-05)¶

test whether the solution is moderately linear following chapter 5.1 of Rodgers 2000. values lower than 1 indicate that the effect of linearization is smaller than the measurement error and problem is nearly linear. Populates self.linearity.

Parameters:

maxErrorPatterns (int, optional) – maximum number of error patterns to return. Provide None to return
all.
significance (real, optional) –

significance level, defaults to 0.05, i.e. probability is 5% that
correct null hypothesis is rejected. Only used when testing against x_truth.
atol (float (default 1e-5)) – The absolute tolerance for comparing eigen values to zero. We found that values should be than the numpy.isclose default value of 1e-8.

Returns:

self.linearity (float) – ratio of error due to linearization to measurement error sorted by size. Should be below 1 for all.
self.trueLinearityChi2 (float) – Chi2 value that model is moderately linear based on ‘self.x_truth’. Must be smaller than critical value to conclude that model is linear.
self.trueLinearityChi2Critical (float) – Corresponding critical Chi2 value.

chiSquareTest(significance=0.05)¶

test with significance level ‘significance’ whether A) optimal solution agrees with observation in Y space B) observation agrees with prior in Y space C) optimal solution agrees with prior in Y space D) optimal solution agrees with prior in X space

Parameters:

significance (real, optional) –

significance level, defaults to 0.05, i.e. probability is 5% that: correct null hypothesis is rejected.

Returns:

Pandas Series (dtype bool) – True if test is passed
Pandas Series (dtype float) – Chi2 value for tests. Must be smaller than critical value to pass tests.
Pandas Series (dtype float) – Critical Chi2 value for tests

chiSquareTestYOptimalObservation(significance=0.05, atol=1e-05)¶

test with significance level ‘significance’ whether retrieval agrees with measurements (see chapter 12.3.2 of Rodgers, 2000)

Parameters:

significance (real, optional) –

significance level, defaults to 0.05, i.e. probability is 5% that
correct null hypothesis is rejected.
atol (float (default 1e-5)) – The absolute tolerance for comparing eigen values to zero. We found that values should be than the numpy.isclose default value of 1e-8.

Returns:

chi2Passed (bool) – True if chi² test passed, i.e. OE retrieval agrees with measurements and null hypothesis is NOT rejected.
chi2 (real) – chi² value
chi2TestY (real) – chi² cutoff value with significance ‘significance’

chiSquareTestYObservationPrior(significance=0.05, atol=1e-05)¶

test with significance level ‘significance’ whether measurement agrees with prior (see chapter 12.3.3.1 of Rodgers, 2000)

Parameters:

significance (real, optional) –

significance level, defaults to 0.05, i.e. probability is 5% that
correct null hypothesis is rejected.
atol (float (default 1e-5)) – The absolute tolerance for comparing eigen values to zero. We found that values should be than the numpy.isclose default value of 1e-8.

Returns:

YObservationPrior (bool) – True if chi² test passed, i.e. OE retrieval agrees with measurements and null hypothesis is NOT rejected.
YObservationPrior (real) – chi² value
chi2TestY (real) – chi² cutoff value with significance ‘significance’

chiSquareTestYOptimalPrior(significance=0.05, atol=1e-05)¶

test with significance level ‘significance’ whether retrieval result agrees with prior in y space (see chapter 12.3.3.3 of Rodgers, 2000)

Parameters:

significance (real, optional) –

significance level, defaults to 0.05, i.e. probability is 5% that
correct null hypothesis is rejected.
atol (float (default 1e-5)) – The absolute tolerance for comparing eigen values to zero. We found that values should be than the numpy.isclose default value of 1e-8.

Returns:

chi2Passed (bool) – True if chi² test passed, i.e. OE retrieval agrees with Prior and null hypothesis is NOT rejected.
chi2 (real) – chi² value
chi2TestY (real) – chi² cutoff value with significance ‘significance’

chiSquareTestXOptimalPrior(significance=0.05, atol=1e-05)¶

test with significance level ‘significance’ whether retrieval agrees with prior in x space (see chapter 12.3.3.3 of Rodgers, 2000)

Parameters:

significance (real, optional) –

significance level, defaults to 0.05, i.e. probability is 5% that
correct null hypothesis is rejected.
atol (float (default 1e-5)) – The absolute tolerance for comparing eigen values to zero. We found that values should be than the numpy.isclose default value of 1e-8.

Returns:

chi2Passed (bool) – True if chi² test passed, i.e. OE retrieval agrees with Prior and null hypothesis is NOT rejected.
chi2 (real) – chi² value
chi2TestX (real) – chi² cutoff value with significance ‘significance’

saveResults(fname)¶

Helper function to save a pyOptimalEstimation object. The forward operator is removed from the pyOptimalEstimation object before saving.

Parameters:: fname (str) – filename
Return type:: None

plotIterations(cmap='viridis', figsize=(8, 10), legend=True, mode='ratio')¶

Plot the retrieval results using 4 panels: (1) iterations of x (normalized to self.x_truth or x[0]), (2) iterations of y (normalized to y_obs), (3) iterations of degrees of freedom, (4) iterations of convergence criteria

Parameters:

fileName (str, optional) – plot is saved to fileName, if provided
cmap (str, optional) – colormap for 1st and 2nd panel (default ‘hsv’)
figsize (tuple, optional) – Figure size in inch (default (8, 10))
legend (bool, optional) – Add legend for X and Y (defualt True)
mode (str, optional) – plot ‘ratio’ or ‘difference’ to truth/prior/measurements (defualt: ratio)

Returns:

The created figure.

Return type:

matplotlib figure object

summary(*args, **kwargs)¶

summarize(returnXarray=False, combineXB=False)¶

Provide a summary of the retrieval results as a dictionary.

Parameters:

returnXarray ({bool}, optional) – return xarray dataset instead of dict. Can be easily combined when applying the retrieval multiple times. (the default is False)
combineXB ({bool}, optional) – append b parameter values to state vector X variables. Can be useful for comparing runs with and without b parameters.

Returns:

Summary of retrieval results

Return type:

dict or xarray.Dataset

pyOptimalEstimation.pyOEcore.optimalEstimation_loadResults(fname, allow_pickle=True)¶

Helper function to load a saved pyOptimalEstimation object

Parameters:: fname (str) – filename
Returns:: pyOptimalEstimation obtained from file.
Return type:: pyOptimalEstimation object

pyOptimalEstimation.pyOEcore.invertMatrix(A, raise_error=True)¶

Wrapper function for np.linalg.inv, because original function reports LinAlgError if nan in array for some numpy versions. We want that the retrieval is robust with respect to that. Also, checks for singular matrices were added.

Parameters:

A ((..., M, M) array_like) – Matrix to be inverted.
raise_error ({bool}, optional) – ValueError is raised if A is singular (the default is True)

Returns:

Ainv – Inverse of the matrix A.

Return type:

(…, M, M) ndarray or matrix

`pyOptimalEstimation` Package¶

Download¶

Reference¶

Examples¶

Installation¶

API documentation¶

pyOptimalEstimation

Navigation

Related Topics

pyOptimalEstimation Package¶

Download¶

Reference¶

Examples¶

Installation¶

API documentation¶

`pyOptimalEstimation` Package¶