# offpolicy_selection_eslb **Repository Path**: mirrors_deepmind/offpolicy_selection_eslb ## Basic Information - **Project Name**: offpolicy_selection_eslb - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2022-01-11 - **Last Updated**: 2025-10-12 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Confident off-policy evaluation and selection through self-normalized importance weighting The package provided here contains implementation of commonly used estimators for off-policy evaluation, corresponding high probability lower bounds on the value, and a new Efron-Stein type bound for the self-normalized estimator described in ["Kuzborskij, I., Vernade, C., Gyorgy, A., & Szepesvári, C. (2021, March). Confident off-policy evaluation and selection through self-normalized importance weighting. In International Conference on Artificial Intelligence and Statistics (pp. 640-648). PMLR."](https://arxiv.org/abs/2006.10460). The package also contains an off-policy selection benchmark used in the paper above and a setup to reproduce most of the results. ## Setup To install necessary packages, execute: ``` python3 -m venv /tmp/eslb_venv source /tmp/eslb_venv/bin/activate pip3 install --upgrade pip setuptools wheel pip3 install -r offpolicy_selection_eslb/requirements.txt ``` ## Usage To run the benchmark on the UCI datasets considered in the paper, execute: ``` offpolicy_selection_eslb/demo/run.sh ``` or ``` python3 -m offpolicy_selection_eslb.demo.benchmark --dataset_type=uci_all --n_trials=5 --delta=0.01 ``` This will run a full benchmark. You can replace `uci_all` in the above with `uci_medium` or `uci_small` for smaller subsets of UCI datasets. ## Examples In `colabs/eslb_synthetic_example.ipynb` you can find a standalone example demonstrating the usage of our estimator on synthetic data. ## Usage as a library Module `estimators` contains several classes which implement estimators which can be used in a standalone fashion. In particular: * `ESLB` implements an Efron-Stein high probability bound for off-policy evaluation (Theorem 1 and Algorithm 1). * `IWEstimator` implements the standard importance weighted estimator (IW). * `SNIWEstimator` implements a self-normalized version of IW. * `IWLambdaEmpBernsteinEstimator` implements a high probability empirical Bernstein bound for λ-corrected IW (the estimator is stabilized by adding λ to the denominator) with appropriate tuning of λ (see Proposition 1). See `colabs/eslb_synthetic_example.ipynb` for a usage example. ## Citing this work If you use this code, please cite our work: ``` @InProceedings{pmlr-v130-kuzborskij21a, title = {Confident Off-Policy Evaluation and Selection through Self-Normalized Importance Weighting }, author = {Kuzborskij, Ilja and Vernade, Claire and Gyorgy, Andras and Szepesvari, Csaba}, booktitle = {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics}, year = {2021} } ``` ## Disclaimer This is not an official Google product.