* share first authorship
Shitian Zhao*,
Haoquan Zhang*,
Shaoheng Lin*,
Ming Li*,
Qilong Wu*,
Kaipeng Zhang,
Chen Wei
PyVision, an interactive framework where an MLLM can autonomously generate, execute, and iteratively refine Python code in response to multimodal queries.