pyOptimalEstimation Package

Python package to solve an inverse problem using Optimal Estimation and an arbritrary Forward model following Rodgers, 2000.

Download

The code is available at https://github.com/maahn/pyOptimalEstimation

Reference

You find more information about pyOptimalEstimation and examples in:

Maahn, M., D. D. Turner, U. Löhnert, D. J. Posselt, K. Ebell, G. G. Mace, and J. M. Comstock, 2020: Optimal Estimation Retrievals and Their Uncertainties: What Every Atmospheric Scientist Should Know. Bull. Amer. Meteor. Soc., doi:https://doi.org/10.1175/BAMS-D-19-0027.1

Please reference to our publication if you use the pyOptimalEstimation package

Examples

Please see pyOptimalEstimation/examples for a minimal working example. For more extensive, interactive examples, check out https://github.com/maahn/pyOptimalEstimation_examples and our paper.

Installation

Make sure you use Python 2.7, 3.6 or newer.

Change to the folder containing the project and do

python setup.py install

in the terminal. If you do not have root privileges, you can also do

python setup.py install --user

which will install pyOptimalEstimation in userbase/lib/pythonX.Y/site-packages or

python setup.py install --home=~

which will install pyOptimalEstimation in ~/lib/python.

API documentation

class pyOptimalEstimation.pyOEcore.optimalEstimation(x_vars, x_a, S_a, y_vars, y_obs, S_y, forward, userJacobian=None, x_truth=None, b_vars=[], b_p=[], S_b=[[]], x_lowerLimit={}, x_upperLimit={}, useFactorInJac=False, gammaFactor=None, perturbation=0.1, disturbance=None, convergenceFactor=10, convergenceTest='x', forwardKwArgs={}, multipleForwardKwArgs=None, verbose=None)

Bases: object

The core optimalEstimation class, which contains all required parameters. See [1] for an extensive introduction into Optimal Estimation theory, [2] discusses this library

Parameters:
  • x_vars (list of str) – names of the elements of state vector x.
  • x_a (pd.Series or list or np.ndarray) – prior information of state x.
  • S_a (pd.DataFrame or list or np.ndarray) – covariance matrix of state x.
  • y_vars (list of str) – names of the elements of state vector x
  • y_obs (pd.Series or list or np.ndarray) – observed measurement vector y.
  • S_y (pd.DataFrame or list or np.ndarray) – covariance matrix of measurement y. If there is no b vector, S_y is sequal to S_e
  • forward (function) – forward model expected as forward(xb,**forwardKwArgs): return y with xb = pd.concat((x,b)).
  • userJacobian (function, optional) – For forwarld models that can calculate the Jacobian internally (e.g. RTTOV), a call to estiamte the Jacobian can be added. Otherwise, the Jacobian is estimated by pyOEusing the standard ‘forward’ call. The fucntion is expected as self.userJacobian(xb, self.perturbation, \ self.y_vars, **self.forwardKwArgs): return jacobian with xb = pd.concat((x,b)). Defaults to None
  • x_truth (pd.Series or list or np.ndarray, optional) – If truth of state x is known, it can added to the data object. If provided, the value will be used for the routines linearityTest and plotIterations, but _not_ by the retrieval itself. Defaults to None/
  • b_vars (list of str, optional) – names of the elements of parameter vector b. Defaults to [].
  • b_p (pd.Series or list or np.ndarray.) – parameter vector b. defaults to []. Note that defining b_p makes only sence if S_b != 0. Otherwise it is easier (and cheaper) to hardcode b into the forward operator.
  • S_b (pd.DataFrame or list or np.ndarray) – covariance matrix of parameter b. Defaults to [[]].
  • forwardKwArgs (dict,optional) – additional keyword arguments for forward function.
  • multipleForwardKwArgs (dict,optional) – additional keyword arguments for forward function in case multiple profiles should be provided to the forward operator at once. If not . defined, forwardKwArgs is used instead and forward is called for every profile separately
  • x_lowerLimit (dict, optional) – reset state vector x[key] to x_lowerLimit[key] in case x_lowerLimit is undercut. Defaults to {}.
  • x_upperLimit (dict, optional) – reset state vector x[key] to x_upperLimit[key] in case x_upperLimit is exceeded. Defaults to {}.
  • perturbation (float or dict of floats, optional) – relative perturbation of statet vector x to estimate the Jacobian. Can be specified for every element of x seperately. Defaults to 0.1 of prior.
  • disturbance (float or dict of floats, optional) – DEPRECATED: Identical to perturbation option. If both option are provided, perturbation is used instead.
  • useFactorInJac (bool,optional) – True if disturbance should be applied by multiplication, False if it should by applied by addition of fraction of prior. Defaults to False.
  • gammaFactor (list of floats, optional) – Use additional gamma parameter for retrieval, see [3].
  • convergenceTest ({'x', 'y', 'auto'}, optional) – Apply convergence test in x or y-space. If ‘auto’ is selected, the test will be done in x-space if len(x) <= len(y) and in y-space otherwise. Experience shows that in both cases convergence is faster in x-space without impacting retrieval quality. Defaults to ‘x’.
  • convergenceFactor (int, optional) – Factor by which the convergence criterion needs to be smaller than len(x) or len(y)
  • verbose (bool, optional) – True or not present: iteration, residual, etc. printed to screen during normal operation. If False, it will turn off such notifications.
converged

True if retriveal converged successfully

Type:boolean
x_op

optimal state given the observations, i.e. retrieval solution

Type:pd.Series
y_op

Optimal y, i.e. observation associated with retrieval solution

Type:pd.Series
S_op

covariance of x_op, i.e. solution uncertainty

Type:pd.DataFrame
x_op_err

1 sigma errors of x_op. derived with sqrt(diag(S_op))

Type:pd.Series
convI

iteration where convergence was achieved

Type:int
K_i

list of Jacobians for iteration i.

Type:list of pd.DataFrame
x_i

iterations of state vector x

Type:list of pd.Series
y_i

iterations of measurement vector y

Type:list of pd.Series
dgf_i

degrees of freedom for each iteration

Type:list of float
A_i

Averaging kernel for each iteration

Type:list of pd.DataFrame
d_i2

convergence criteria for each iteration

Type:list of float
S_aposteriori_i

a posteriori covariance matrix of x for each iteration

Type:list of pd.DataFrame
gam_i

gamma parameters used in retrievals, see also gammaFactor and [1].

Type:list of floats
dgf

total degrees of freedom for signal of the retrieval solution

Type:float
dgf_x

degrees of freedom for signal per state variable

Type:pd.Series
Returns:returns the pyOptimalEstimation object
Return type:pyOptimalEstimation object

References

[1](1, 2) Rodgers, C. D., 2000: Inverse Methods for Atmospheric Sounding:

Theory and Practice. World Scientific Publishing Company, 240 pp. https://library.wmo.int/index.php?lvl=notice_display&id=12279.

[2]Maahn, M., D. D. Turner, U. Löhnert, D. J. Posselt, K. Ebell, G.

G. Mace, and J. M. Comstock, 2020: Optimal Estimation Retrievals and Their Uncertainties: What Every Atmospheric Scientist Should Know. Bull. Amer. Meteor. Soc., 101, E1512–E1523, https://doi.org/10.1175/BAMS-D-19-0027.1.

[3]Turner, D. D., and U. Löhnert, 2014: Information Content and

Uncertainties in Thermodynamic Profiles and Liquid Cloud Properties Retrieved from the Ground-Based Atmospheric Emitted Radiance Interferometer (AERI). Journal of Applied Meteorology & Climatology, 53, 752–771, doi:10.1175/JAMC-D-13-0126.1.

getJacobian(xb, y)

Author: M. Echeverri, May 2021.

estimate Jacobian using the forward model and the specified perturbation

Parameters:
  • xb (pd.Series or list or np.ndarray) – combination of state vector x and parameter vector b
  • y (pd.Series or list or np.ndarray) – measurement vector for xb
Returns:

  • pd.DataFrame – Jacobian around x
  • pd.DataFrame – Jacobian around b

getJacobian_external(xb, y)

Author: M. Echeverri, June 2021.

estimate Jacobian using the external function provided by user and the specified perturbation. This method has external dependencies

Parameters:
  • xb (pd.Series or list or np.ndarray) – combination of state vector x and parameter vector b
  • y (pd.Series or list or np.ndarray) – measurement vector for xb
Returns:

  • pd.DataFrame – Jacobian around x
  • pd.DataFrame – Jacobian around b

doRetrieval(maxIter=10, x_0=None, maxTime=10000000.0)

run the retrieval

Parameters:
  • maxIter (int, optional) – maximum number of iterations, defaults to 10
  • x_0 (pd.Series or list or np.ndarray, optional) – first guess for x. If x_0 == None, x_a is taken as first guess.
  • maxTime (int, optional) – maximum runTime, defaults to 1e7 (~ 4 months). Note that the forward model is not killed if time is exceeded
Returns:

True is convergence was obtained.

Return type:

bool

y_a

Estimate the observations corresponding to the prior.

linearityTest(maxErrorPatterns=10, significance=0.05, atol=1e-05)

test whether the solution is moderately linear following chapter 5.1 of Rodgers 2000. values lower than 1 indicate that the effect of linearization is smaller than the measurement error and problem is nearly linear. Populates self.linearity.

Parameters:
  • maxErrorPatterns (int, optional) – maximum number of error patterns to return. Provide None to return
  • all.
  • significance (real, optional) –
    significance level, defaults to 0.05, i.e. probability is 5% that
    correct null hypothesis is rejected. Only used when testing against x_truth.
  • atol (float (default 1e-5)) – The absolute tolerance for comparing eigen values to zero. We found that values should be than the numpy.isclose defualt value of 1e-8.
Returns:

  • self.linearity (float) – ratio of error due to linearization to measurement error sorted by size. Should be below 1 for all.
  • self.trueLinearityChi2 (float) – Chi2 value that model is moderately linear based on ‘self.x_truth’. Must be smaller than critical value to conclude thast model is linear.
  • self.trueLinearityChi2Critical (float) – Corresponding critical Chi2 value.

chiSquareTest(significance=0.05)

test with significance level ‘significance’ whether A) optimal solution agrees with observation in Y space B) observation agrees with prior in Y space C) optimal solution agrees with prior in Y space D) optimal solution agrees with priot in X space

Parameters:significance (real, optional) –
significance level, defaults to 0.05, i.e. probability is 5% that
correct null hypothesis is rejected.
Returns:
  • Pandas Series (dtype bool) – True if test is passed
  • Pandas Series (dtype float) – Chi2 value for tests. Must be smaler than critical value to pass tests.
  • Pandas Series (dtype float) – Critical Chi2 value for tests
chiSquareTestYOptimalObservation(significance=0.05, atol=1e-05)

test with significance level ‘significance’ whether retrieval agrees with measurements (see chapter 12.3.2 of Rodgers, 2000)

Parameters:
  • significance (real, optional) –
    significance level, defaults to 0.05, i.e. probability is 5% that
    correct null hypothesis is rejected.
  • atol (float (default 1e-5)) – The absolute tolerance for comparing eigen values to zero. We found that values should be than the numpy.isclose defualt value of 1e-8.
Returns:

  • chi2Passed (bool) – True if chi² test passed, i.e. OE retrieval agrees with measurements and null hypothesis is NOT rejected.
  • chi2 (real) – chi² value
  • chi2TestY (real) – chi² cutoff value with significance ‘significance’

chiSquareTestYObservationPrior(significance=0.05, atol=1e-05)

test with significance level ‘significance’ whether measurement agrees with prior (see chapter 12.3.3.1 of Rodgers, 2000)

Parameters:
  • significance (real, optional) –
    significance level, defaults to 0.05, i.e. probability is 5% that
    correct null hypothesis is rejected.
  • atol (float (default 1e-5)) – The absolute tolerance for comparing eigen values to zero. We found that values should be than the numpy.isclose defualt value of 1e-8.
Returns:

  • YObservationPrior (bool) – True if chi² test passed, i.e. OE retrieval agrees with measurements and null hypothesis is NOT rejected.
  • YObservationPrior (real) – chi² value
  • chi2TestY (real) – chi² cutoff value with significance ‘significance’

chiSquareTestYOptimalPrior(significance=0.05, atol=1e-05)

test with significance level ‘significance’ whether retrieval result agrees with prior in y space (see chapter 12.3.3.3 of Rodgers, 2000)

Parameters:
  • significance (real, optional) –
    significance level, defaults to 0.05, i.e. probability is 5% that
    correct null hypothesis is rejected.
  • atol (float (default 1e-5)) – The absolute tolerance for comparing eigen values to zero. We found that values should be than the numpy.isclose defualt value of 1e-8.
Returns:

  • chi2Passe (bool) – True if chi² test passed, i.e. OE retrieval agrees with Prior and null hypothesis is NOT rejected.
  • chi2 (real) – chi² value
  • chi2TestY (real) – chi² cutoff value with significance ‘significance’

chiSquareTestXOptimalPrior(significance=0.05, atol=1e-05)

test with significance level ‘significance’ whether retrieval agrees with prior in x space (see chapter 12.3.3.3 of Rodgers, 2000)

Parameters:
  • significance (real, optional) –
    significance level, defaults to 0.05, i.e. probability is 5% that
    correct null hypothesis is rejected.
  • atol (float (default 1e-5)) – The absolute tolerance for comparing eigen values to zero. We found that values should be than the numpy.isclose defualt value of 1e-8.
Returns:

  • chi2Passed (bool) – True if chi² test passed, i.e. OE retrieval agrees with Prior and null hypothesis is NOT rejected.
  • chi2 (real) – chi² value
  • chi2TestX (real) – chi² cutoff value with significance ‘significance’

saveResults(fname)

Helper function to save a pyOptimalEstimation object. The forward operator is removed from the pyOptimalEstimation object before saving.

Parameters:fname (str) – filename
Returns:
Return type:None
plotIterations(cmap='viridis', figsize=(8, 10), legend=True, mode='ratio')

Plot the retrieval results using 4 panels: (1) iterations of x (normalized to self.x_truth or x[0]), (2) iterations of y (normalized to y_obs), (3) iterations of degrees of freedom, (4) iterations of convergence criteria

Parameters:
  • fileName (str, optional) – plot is saved to fileName, if provided
  • cmap (str, optional) – colormap for 1st and 2nd panel (default ‘hsv’)
  • figsize (tuple, optional) – Figure size in inch (default (8, 10))
  • legend (bool, optional) – Add legend for X and Y (defualt True)
  • mode (str, optional) – plot ‘ratio’ or ‘difference’ to truth/prior/measurements (defualt: ratio)
Returns:

The created figure.

Return type:

matplotlib figure object

summary(*args, **kwargs)
summarize(returnXarray=False, combineXB=False)

Provide a summary of the retrieval results as a dictionary.

Parameters:
  • returnXarray ({bool}, optional) – return xarray dataset instead of dict. Can be easily combined when applying the retrieval multiple times. (the default is False)
  • combineXB ({bool}, optional) – append b parameter values to state vector X variables. Can be useful for comparing runs with and without b parameters.
Returns:

Summary of retrieval results

Return type:

dict or xarray.Dataset

pyOptimalEstimation.pyOEcore.optimalEstimation_loadResults(fname, allow_pickle=True)

Helper function to load a saved pyOptimalEstimation object

Parameters:fname (str) – filename
Returns:pyOptimalEstimation obtained from file.
Return type:pyOptimalEstimation object
pyOptimalEstimation.pyOEcore.invertMatrix(A, raise_error=True)

Wrapper funtion for np.linalg.inv, because original function reports LinAlgError if nan in array for some numpy versions. We want that the retrieval is robust with respect to that. Also, checks for singular matrices were added.

Parameters:
  • A ((.., M, M) array_like) – Matrix to be inverted.
  • raise_error ({bool}, optional) – ValueError is raised if A is singular (the default is True)
Returns:

Ainv – Inverse of the matrix A.

Return type:

(.., M, M) ndarray or matrix