pyOptimalEstimation
Package¶
Python package to solve an inverse problem using Optimal Estimation and an arbritrary Forward model following Rodgers, 2000.
Download¶
The code is available at https://github.com/maahn/pyOptimalEstimation
Reference¶
You find more information about pyOptimalEstimation and examples in:
Maahn, M., D. D. Turner, U. Löhnert, D. J. Posselt, K. Ebell, G. G. Mace, and J. M. Comstock, 2020: Optimal Estimation Retrievals and Their Uncertainties: What Every Atmospheric Scientist Should Know. Bull. Amer. Meteor. Soc., doi:https://doi.org/10.1175/BAMS-D-19-0027.1
Please reference to our publication if you use the pyOptimalEstimation package
Examples¶
Please see pyOptimalEstimation/examples for a minimal working example. For more extensive, interactive examples, check out https://github.com/maahn/pyOptimalEstimation_examples and our paper.
Installation¶
Make sure you use Python 2.7, 3.6 or newer.
Change to the folder containing the project and do
python setup.py install
in the terminal. If you do not have root privileges, you can also do
python setup.py install --user
which will install pyOptimalEstimation in userbase/lib/pythonX.Y/site-packages or
python setup.py install --home=~
which will install pyOptimalEstimation in ~/lib/python.
API documentation¶
-
class
pyOptimalEstimation.pyOEcore.
optimalEstimation
(x_vars, x_a, S_a, y_vars, y_obs, S_y, forward, userJacobian=None, x_truth=None, b_vars=[], b_p=[], S_b=[[]], x_lowerLimit={}, x_upperLimit={}, useFactorInJac=False, gammaFactor=None, perturbation=0.1, disturbance=None, convergenceFactor=10, convergenceTest='x', forwardKwArgs={}, multipleForwardKwArgs=None, verbose=None)¶ Bases:
object
The core optimalEstimation class, which contains all required parameters. See [1] for an extensive introduction into Optimal Estimation theory, [2] discusses this library
Parameters: - x_vars (list of str) – names of the elements of state vector x.
- x_a (pd.Series or list or np.ndarray) – prior information of state x.
- S_a (pd.DataFrame or list or np.ndarray) – covariance matrix of state x.
- y_vars (list of str) – names of the elements of state vector x
- y_obs (pd.Series or list or np.ndarray) – observed measurement vector y.
- S_y (pd.DataFrame or list or np.ndarray) – covariance matrix of measurement y. If there is no b vector, S_y is sequal to S_e
- forward (function) – forward model expected as
forward(xb,**forwardKwArgs): return y
with xb = pd.concat((x,b)). - userJacobian (function, optional) – For forwarld models that can calculate the Jacobian internally (e.g.
RTTOV), a call to estiamte the Jacobian can be added. Otherwise, the
Jacobian is estimated by pyOEusing the standard ‘forward’ call. The
fucntion is expected as
self.userJacobian(xb, self.perturbation, \ self.y_vars, **self.forwardKwArgs): return jacobian
with xb = pd.concat((x,b)). Defaults to None - x_truth (pd.Series or list or np.ndarray, optional) – If truth of state x is known, it can added to the data object. If provided, the value will be used for the routines linearityTest and plotIterations, but _not_ by the retrieval itself. Defaults to None/
- b_vars (list of str, optional) – names of the elements of parameter vector b. Defaults to [].
- b_p (pd.Series or list or np.ndarray.) – parameter vector b. defaults to []. Note that defining b_p makes only sence if S_b != 0. Otherwise it is easier (and cheaper) to hardcode b into the forward operator.
- S_b (pd.DataFrame or list or np.ndarray) – covariance matrix of parameter b. Defaults to [[]].
- forwardKwArgs (dict,optional) – additional keyword arguments for
forward
function. - multipleForwardKwArgs (dict,optional) – additional keyword arguments for forward function in case multiple
profiles should be provided to the forward operator at once. If not .
defined,
forwardKwArgs
is used instead andforward
is called for every profile separately - x_lowerLimit (dict, optional) – reset state vector x[key] to x_lowerLimit[key] in case x_lowerLimit is undercut. Defaults to {}.
- x_upperLimit (dict, optional) – reset state vector x[key] to x_upperLimit[key] in case x_upperLimit is exceeded. Defaults to {}.
- perturbation (float or dict of floats, optional) – relative perturbation of statet vector x to estimate the Jacobian. Can be specified for every element of x seperately. Defaults to 0.1 of prior.
- disturbance (float or dict of floats, optional) – DEPRECATED: Identical to
perturbation
option. If both option are provided,perturbation
is used instead. - useFactorInJac (bool,optional) – True if disturbance should be applied by multiplication, False if it should by applied by addition of fraction of prior. Defaults to False.
- gammaFactor (list of floats, optional) – Use additional gamma parameter for retrieval, see [3].
- convergenceTest ({'x', 'y', 'auto'}, optional) – Apply convergence test in x or y-space. If ‘auto’ is selected, the test will be done in x-space if len(x) <= len(y) and in y-space otherwise. Experience shows that in both cases convergence is faster in x-space without impacting retrieval quality. Defaults to ‘x’.
- convergenceFactor (int, optional) – Factor by which the convergence criterion needs to be smaller than len(x) or len(y)
- verbose (bool, optional) – True or not present: iteration, residual, etc. printed to screen during normal operation. If False, it will turn off such notifications.
-
converged
¶ True if retriveal converged successfully
Type: boolean
-
x_op
¶ optimal state given the observations, i.e. retrieval solution
Type: pd.Series
-
y_op
¶ Optimal y, i.e. observation associated with retrieval solution
Type: pd.Series
-
S_op
¶ covariance of x_op, i.e. solution uncertainty
Type: pd.DataFrame
-
x_op_err
¶ 1 sigma errors of x_op. derived with sqrt(diag(S_op))
Type: pd.Series
-
convI
¶ iteration where convergence was achieved
Type: int
-
K_i
¶ list of Jacobians for iteration i.
Type: list of pd.DataFrame
-
x_i
¶ iterations of state vector x
Type: list of pd.Series
-
y_i
¶ iterations of measurement vector y
Type: list of pd.Series
-
dgf_i
¶ degrees of freedom for each iteration
Type: list of float
-
A_i
¶ Averaging kernel for each iteration
Type: list of pd.DataFrame
-
d_i2
¶ convergence criteria for each iteration
Type: list of float
-
S_aposteriori_i
¶ a posteriori covariance matrix of x for each iteration
Type: list of pd.DataFrame
-
dgf
¶ total degrees of freedom for signal of the retrieval solution
Type: float
-
dgf_x
¶ degrees of freedom for signal per state variable
Type: pd.Series
Returns: returns the pyOptimalEstimation object Return type: pyOptimalEstimation object References
[1] (1, 2) Rodgers, C. D., 2000: Inverse Methods for Atmospheric Sounding: Theory and Practice. World Scientific Publishing Company, 240 pp. https://library.wmo.int/index.php?lvl=notice_display&id=12279.
[2] Maahn, M., D. D. Turner, U. Löhnert, D. J. Posselt, K. Ebell, G. G. Mace, and J. M. Comstock, 2020: Optimal Estimation Retrievals and Their Uncertainties: What Every Atmospheric Scientist Should Know. Bull. Amer. Meteor. Soc., 101, E1512–E1523, https://doi.org/10.1175/BAMS-D-19-0027.1.
[3] Turner, D. D., and U. Löhnert, 2014: Information Content and Uncertainties in Thermodynamic Profiles and Liquid Cloud Properties Retrieved from the Ground-Based Atmospheric Emitted Radiance Interferometer (AERI). Journal of Applied Meteorology & Climatology, 53, 752–771, doi:10.1175/JAMC-D-13-0126.1.
-
getJacobian
(xb, y)¶ Author: M. Echeverri, May 2021.
estimate Jacobian using the forward model and the specified perturbation
Parameters: - xb (pd.Series or list or np.ndarray) – combination of state vector x and parameter vector b
- y (pd.Series or list or np.ndarray) – measurement vector for xb
Returns: - pd.DataFrame – Jacobian around x
- pd.DataFrame – Jacobian around b
-
getJacobian_external
(xb, y)¶ Author: M. Echeverri, June 2021.
estimate Jacobian using the external function provided by user and the specified perturbation. This method has external dependencies
Parameters: - xb (pd.Series or list or np.ndarray) – combination of state vector x and parameter vector b
- y (pd.Series or list or np.ndarray) – measurement vector for xb
Returns: - pd.DataFrame – Jacobian around x
- pd.DataFrame – Jacobian around b
-
doRetrieval
(maxIter=10, x_0=None, maxTime=10000000.0)¶ run the retrieval
Parameters: - maxIter (int, optional) – maximum number of iterations, defaults to 10
- x_0 (pd.Series or list or np.ndarray, optional) – first guess for x. If x_0 == None, x_a is taken as first guess.
- maxTime (int, optional) – maximum runTime, defaults to 1e7 (~ 4 months). Note that the forward model is not killed if time is exceeded
Returns: True is convergence was obtained.
Return type: bool
-
y_a
¶ Estimate the observations corresponding to the prior.
-
linearityTest
(maxErrorPatterns=10, significance=0.05, atol=1e-05)¶ test whether the solution is moderately linear following chapter 5.1 of Rodgers 2000. values lower than 1 indicate that the effect of linearization is smaller than the measurement error and problem is nearly linear. Populates self.linearity.
Parameters: - maxErrorPatterns (int, optional) – maximum number of error patterns to return. Provide None to return
- all. –
- significance (real, optional) –
- significance level, defaults to 0.05, i.e. probability is 5% that
- correct null hypothesis is rejected. Only used when testing against x_truth.
- atol (float (default 1e-5)) – The absolute tolerance for comparing eigen values to zero. We found that values should be than the numpy.isclose defualt value of 1e-8.
Returns: - self.linearity (float) – ratio of error due to linearization to measurement error sorted by size. Should be below 1 for all.
- self.trueLinearityChi2 (float) – Chi2 value that model is moderately linear based on ‘self.x_truth’. Must be smaller than critical value to conclude thast model is linear.
- self.trueLinearityChi2Critical (float) – Corresponding critical Chi2 value.
-
chiSquareTest
(significance=0.05)¶ test with significance level ‘significance’ whether A) optimal solution agrees with observation in Y space B) observation agrees with prior in Y space C) optimal solution agrees with prior in Y space D) optimal solution agrees with priot in X space
Parameters: significance (real, optional) – - significance level, defaults to 0.05, i.e. probability is 5% that
- correct null hypothesis is rejected.
Returns: - Pandas Series (dtype bool) – True if test is passed
- Pandas Series (dtype float) – Chi2 value for tests. Must be smaler than critical value to pass tests.
- Pandas Series (dtype float) – Critical Chi2 value for tests
-
chiSquareTestYOptimalObservation
(significance=0.05, atol=1e-05)¶ test with significance level ‘significance’ whether retrieval agrees with measurements (see chapter 12.3.2 of Rodgers, 2000)
Parameters: - significance (real, optional) –
- significance level, defaults to 0.05, i.e. probability is 5% that
- correct null hypothesis is rejected.
- atol (float (default 1e-5)) – The absolute tolerance for comparing eigen values to zero. We found that values should be than the numpy.isclose defualt value of 1e-8.
Returns: - chi2Passed (bool) – True if chi² test passed, i.e. OE retrieval agrees with measurements and null hypothesis is NOT rejected.
- chi2 (real) – chi² value
- chi2TestY (real) – chi² cutoff value with significance ‘significance’
- significance (real, optional) –
-
chiSquareTestYObservationPrior
(significance=0.05, atol=1e-05)¶ test with significance level ‘significance’ whether measurement agrees with prior (see chapter 12.3.3.1 of Rodgers, 2000)
Parameters: - significance (real, optional) –
- significance level, defaults to 0.05, i.e. probability is 5% that
- correct null hypothesis is rejected.
- atol (float (default 1e-5)) – The absolute tolerance for comparing eigen values to zero. We found that values should be than the numpy.isclose defualt value of 1e-8.
Returns: - YObservationPrior (bool) – True if chi² test passed, i.e. OE retrieval agrees with measurements and null hypothesis is NOT rejected.
- YObservationPrior (real) – chi² value
- chi2TestY (real) – chi² cutoff value with significance ‘significance’
- significance (real, optional) –
-
chiSquareTestYOptimalPrior
(significance=0.05, atol=1e-05)¶ test with significance level ‘significance’ whether retrieval result agrees with prior in y space (see chapter 12.3.3.3 of Rodgers, 2000)
Parameters: - significance (real, optional) –
- significance level, defaults to 0.05, i.e. probability is 5% that
- correct null hypothesis is rejected.
- atol (float (default 1e-5)) – The absolute tolerance for comparing eigen values to zero. We found that values should be than the numpy.isclose defualt value of 1e-8.
Returns: - chi2Passe (bool) – True if chi² test passed, i.e. OE retrieval agrees with Prior and null hypothesis is NOT rejected.
- chi2 (real) – chi² value
- chi2TestY (real) – chi² cutoff value with significance ‘significance’
- significance (real, optional) –
-
chiSquareTestXOptimalPrior
(significance=0.05, atol=1e-05)¶ test with significance level ‘significance’ whether retrieval agrees with prior in x space (see chapter 12.3.3.3 of Rodgers, 2000)
Parameters: - significance (real, optional) –
- significance level, defaults to 0.05, i.e. probability is 5% that
- correct null hypothesis is rejected.
- atol (float (default 1e-5)) – The absolute tolerance for comparing eigen values to zero. We found that values should be than the numpy.isclose defualt value of 1e-8.
Returns: - chi2Passed (bool) – True if chi² test passed, i.e. OE retrieval agrees with Prior and null hypothesis is NOT rejected.
- chi2 (real) – chi² value
- chi2TestX (real) – chi² cutoff value with significance ‘significance’
- significance (real, optional) –
-
saveResults
(fname)¶ Helper function to save a pyOptimalEstimation object. The forward operator is removed from the pyOptimalEstimation object before saving.
Parameters: fname (str) – filename Returns: Return type: None
-
plotIterations
(cmap='viridis', figsize=(8, 10), legend=True, mode='ratio')¶ Plot the retrieval results using 4 panels: (1) iterations of x (normalized to self.x_truth or x[0]), (2) iterations of y (normalized to y_obs), (3) iterations of degrees of freedom, (4) iterations of convergence criteria
Parameters: - fileName (str, optional) – plot is saved to fileName, if provided
- cmap (str, optional) – colormap for 1st and 2nd panel (default ‘hsv’)
- figsize (tuple, optional) – Figure size in inch (default (8, 10))
- legend (bool, optional) – Add legend for X and Y (defualt True)
- mode (str, optional) – plot ‘ratio’ or ‘difference’ to truth/prior/measurements (defualt: ratio)
Returns: The created figure.
Return type: matplotlib figure object
-
summary
(*args, **kwargs)¶
-
summarize
(returnXarray=False, combineXB=False)¶ Provide a summary of the retrieval results as a dictionary.
Parameters: - returnXarray ({bool}, optional) – return xarray dataset instead of dict. Can be easily combined when applying the retrieval multiple times. (the default is False)
- combineXB ({bool}, optional) – append b parameter values to state vector X variables. Can be useful for comparing runs with and without b parameters.
Returns: Summary of retrieval results
Return type: dict or xarray.Dataset
-
pyOptimalEstimation.pyOEcore.
optimalEstimation_loadResults
(fname, allow_pickle=True)¶ Helper function to load a saved pyOptimalEstimation object
Parameters: fname (str) – filename Returns: pyOptimalEstimation obtained from file. Return type: pyOptimalEstimation object
-
pyOptimalEstimation.pyOEcore.
invertMatrix
(A, raise_error=True)¶ Wrapper funtion for np.linalg.inv, because original function reports LinAlgError if nan in array for some numpy versions. We want that the retrieval is robust with respect to that. Also, checks for singular matrices were added.
Parameters: - A ((.., M, M) array_like) – Matrix to be inverted.
- raise_error ({bool}, optional) – ValueError is raised if A is singular (the default is True)
Returns: Ainv – Inverse of the matrix A.
Return type: (.., M, M) ndarray or matrix