Publications
How Do LLMs Perform Two-Hop Reasoning in Context?
Tianyu Guo*, Hanlin Zhu*, Ruiqi Zhang, Jiantao Jiao, Song Mei, Michael I. Jordan, Stuart Russell
preprint, 2025
Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning
DiJia Su, Hanlin Zhu*, Yingchen Xu*, Jiantao Jiao, Yuandong Tian*, Qinqing Zheng*
preprint, 2025
Towards a Theoretical Understanding of the ‘Reversal Curse’ via Training Dynamics
Hanlin Zhu*, Baihe Huang*, Shaolun Zhang, Michael Jordan, Jiantao Jiao, Yuandong Tian, Stuart Russell
Conference on Neural Information Processing Systems (NeurIPS), 2024
Learning Personalized Alignment for Evaluating Open-ended Text Generation
Danqing Wang, Kevin Yang, Hanlin Zhu, Xiaomeng Yang, Andrew Cohen, Lei Li, Yuandong Tian
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Starling-7B: Improving Helpfulness and Harmlessness with RLAIF
Banghua Zhu*, Evan Frick*, Tianhao Wu*, Hanlin Zhu, Karthik Ganesan, Wei-Lin Chiang, Jian Zhang, Jiantao Jiao
Conference on Language Modeling (COLM), 2024 (Oral)
Avoiding Catastrophe in Continuous Spaces by Asking for Help
Benjamin Plaut, Hanlin Zhu, Stuart Russell
preprint, 2024
Efficient Prompt Caching via Embedding Similarity
Hanlin Zhu, Banghua Zhu, Jiantao Jiao
preprint, 2024
Towards Optimal Statistical Watermarking
Baihe Huang, Hanlin Zhu, Banghua Zhu, Kannan Ramchandran, Michael Jordan, Jason Lee, Jiantao Jiao
preprint, 2024
On Representation Complexity of Model-based and Model-free Reinforcement Learning
Hanlin Zhu*, Baihe Huang*, Stuart Russell
International Conference on Learning Representations (ICLR), 2024
End-to-end Story Plot Generator
Hanlin Zhu*, Andrew Cohen*, Danqing Wang, Kevin Yang, Xiaomeng Yang, Jiantao Jiao, Yuandong Tian
preprint, 2023
Hanlin Zhu, Amy Zhang
Conference on Neural Information Processing Systems (NeurIPS), 2023
Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement Learning
Hanlin Zhu, Paria Rashidinejad, Jiantao Jiao
Conference on Neural Information Processing Systems (NeurIPS), 2023
Optimal Conservative Offline RL with General Function Approximation via Augmented Lagrangian
Paria Rashidinejad, Hanlin Zhu, Kunhe Yang, Stuart Russell, Jiantao Jiao
International Conference on Learning Representations (ICLR), 2023 (Spotlight)
Provably Efficient Reinforcement Learning via Surprise Bound
Hanlin Zhu, Ruosong Wang, Jason Lee
Artificial Intelligence and Statistics (AISTATS), 2023
Average-Case Communication Complexity of Statistical Problems
Cyrus Rashtchian, David P. Woodruff, Peng Ye, Hanlin Zhu ($\alpha$-$\beta$ order)
Conference on Learning Theory (COLT), 2021
Vector-Matrix-Vector Queries for Solving Linear Algebra, Statistics, and Graph Problems
Cyrus Rashtchian, David P. Woodruff, Hanlin Zhu ($\alpha$-$\beta$ order)
RANDOM, 2020
Guided Dialog Policy Learning: Reward Estimation for Multi-Domain Task-Oriented Dialog
Ryuichi Takanobu, Hanlin Zhu, Minlie Huang
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019 (Oral)