Andrew Tang

Hi! I'm a Computer Science PhD student at Columbia University, advised by Vishal Misra and Dan Rubenstein. I study retrieval-augmented LLMs—how models use retrieved evidence to reason and act under real-world constraints, and the retrieval systems that enable them.

Research interests: agentic LLMs, long-horizon planning, external tool use, and interpretability.

Email / Google Scholar / X / LinkedIn

Current Projects

Mosaic-Risk Modeling with Multi-Hop RAG
Andrew Tang, Aria Vikram, Columbia History Lab, Vishal Misra
NSF EAGER, WIP, Updated July 2025

We are building evaluation pipelines that chain open and declassified documents to estimate when sensitive facts can be inferred (“mosaic risk”). Our goal is to measure the capabilities and limits of RAG pipelines for mosaic risk prediction.

Budget-Constrained Promotions via Decision Transformers
Andrew Tang, Panos Karampourniotis, Brett Gohre, Vijay Pappu, Dan Rubenstein, Vishal Misra
Industry Collaboration (Dream Sports), WIP, Updated July 2025

Adapting Decision Transformers for long‑horizon marketing actions under a strict global spending budget. We explore safe‑RL style preference tuning and counterfactual evaluation on logged trajectories to optimize retention per dollar.

TokenProbe: Visualizing LLM Learning in Real-Time
Andrew Tang, Amy Wu, Rashfiqur Rahman, Charlie Kerfoot, Vishal Misra
WIP, Updated June 2025
[website]

A lightweight dashboard that streams token logits, probabilities, and entropy during inference to spot when/where models learn concepts, experience mode collapse, or forget. We study links to curriculum design and catastrophic-drift debugging.

Publications

ClusterSC: Advancing Synthetic Control with Donor Selection
Saeyoung Rho, Andrew Tang, Noah Bergam, Rachel Cummings, Vishal Misra
AISTATS, 2025
[arxiv] [code]

We propose ClusterSC to mitigate noise and the curse of dimensionality in disaggregate-level synthetic control by uncovering latent donor subgroups. Results: theoretical guarantees and significant MSE improvement on synthetic and real-world datasets.

MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge
Linxi "Jim" Fan, Guanzhi Wang*, Yunfan Jiang*, Ajay Mandlekar, Yuncong Yang, Haoyi Zhu, Andrew Tang, De-An Huang, Yuke Zhu^†, Anima Anandkumar^†
NeurIPS Datasets and Benchmarks Track, 2022 (Outstanding Paper Award)
[arxiv] [website] [code]

We introduced an open-ended benchmark suite for embodied agent research, built on Minecraft and backed by a web-scale knowledge base.
My role: built multimodal data pipeline for Minecraft Wiki and Reddit. I am highly grateful for this early project that inspired me to pursue agentic AI research.

Last updated July 2025. Design and source code from Jon Barron's website