報(bào)告題目:Multi-Objective Reinforcement Learning for Optimizing Tolerant Dynamic Decision Rules
報(bào)告時(shí)間:2025年7月22日(星期二)15:00-17:00
報(bào)告地點(diǎn):翡翠科教樓A座1樓第三會(huì)議室
報(bào) 告 人:王璐 教授
工作單位:美國密歇根大學(xué)
報(bào)告簡介:
Many real-world problems involve multiple competing priorities, and decision rules differ when trade-offs are present. Correspondingly, there may be more than one feasible decision that leads to empirically sufficient optimization. In this talk, we present a concept of “tolerant regime,” which provides a set of individualized feasible decision rules under a prespecified tolerance rate. A multi-objective tree-based reinforcement learning (MOT-RL) method is developed to directly estimate the tolerant DTR (tDTR) that optimizes multiple objectives in a multistage multi-treatment setting. At each stage, MOT-RL constructs an unsupervised decision tree by modeling the counterfactual mean outcome of each objective via semiparametric regression and maximizing a purity measure constructed by the scalarized augmented inverse probability weighted estimators (SAIPWE). The algorithm is implemented in a backward inductive manner through multiple decision stages, and it estimates the optimal DTR and tDTR depending on the decision-maker’s preferences. Mult-objective tree-based reinforcement learning is robust, efficient, easy-to-interpret, and flexible to different settings.
報(bào)告人簡介:
王璐,美國密西根大學(xué)生物統(tǒng)計(jì)學(xué)系教授,美國統(tǒng)計(jì)協(xié)會(huì)會(huì)士(ASA Fellow),國際統(tǒng)計(jì)學(xué)會(huì)當(dāng)選會(huì)員(Elected Member of ISI)。2002年本科畢業(yè)于北京大學(xué),2008年博士畢業(yè)于哈佛大學(xué)。研究領(lǐng)域包括評(píng)估動(dòng)態(tài)治療方案的統(tǒng)計(jì)方法、個(gè)性化醫(yī)療、因果推斷、非參數(shù)和半?yún)?shù)回歸、缺失數(shù)據(jù)分析、以及縱向(相關(guān)/聚類)數(shù)據(jù)分析等。在JASA、Biometrika、Biometrics、AoAS等學(xué)術(shù)期刊上發(fā)表論文180余篇,并合著了一章書籍?,F(xiàn)任JASA和Biometrics的副主編。