Research Interests
- NLP for Low-Resource Languages: Enhancing the transparency, inclusivity, and efficiency in
language technology for underrepresented languages.
- LLMs for Language Science: Leveraging LLMs to analyze linguistic phenomena, such as language
contact and acquisition, for insights into how human languages function and evolve.
- Complex Reasoning with LLMs: Unlocking the potential of LLMs to tackle complex problems and
building trustworthy, socially beneficial systems for specialized domains.
News
- [NOV 2025] Involved in organizing the WiNLP 2025
workshop at EMNLP 2025.
- [MAY 2025] Our papers on low-resource languages are accepted to ACL 2025.
- [MAR 2025] Released MiLiC-Eval, an NLP
evaluation suite for China's minority languages.
Contact
Wangxuan Institute of Computer Technology, Peking University
No. 128 Zhongguancun North Street
Haidian District, Beijing, 100871
zhangch [at] pku [dot] edu [dot] cn
* denotes equal contribution.
Following the #BenderRule, languages are specified for each work.
2025
Read it in Two Steps: Translating Extremely Low-Resource Languages with
Code-Augmented Grammar Books
ACL 2025
Chen Zhang*, Jiuheng Lin*, Xiao Liu, Zekai Zhang, Yansong Feng
[
paper] [
github]
Zhuang, Kalamang
Cross-Lingual Transfer of Cultural Knowledge: An Asymmetric Phenomenon
ACL 2025
Chen Zhang, Zhiyuan Liao, Yansong Feng
[
paper] [
github]
English, Chinese, Korean, Tibetan, Mongolian
MiLiC-Eval: Benchmarking Multilingual LLMs for China's Minority Languages
ACL 2025 (Findings)
Chen Zhang, Mingxu Tao, Zhiyuan Liao, Yansong Feng
[
paper] [
github] [
huggingface]
Tibetan, Uyghur, Kazakh, Mongolian
Eliciting and Improving the Causal Reasoning Abilities of Large Language Models
with Conditional Statements
Computational Linguistics 2025
Xiao Liu, Da Yin,
Chen Zhang, Yansong Feng, Dongyan Zhao
[
paper]
English
2024
Unlocking the Potential of Model Merging for Low-Resource Languages
EMNLP 2024 (Findings)
Mingxu Tao*,
Chen Zhang*, Quzhe Huang*, Tianyao Ma, Songfang Huang,
Dongyan Zhao, Yansong Feng
[
paper] [
huggingface]
Tibetan, Uyghur, Mongolian, Tamil, Telugu, Odia, Bengali
Teaching Large Language Models an Unseen Language on the Fly
ACL 2024 (Findings)
Chen Zhang, Xiao Liu, Jiuheng Lin, Yansong Feng
[
paper] [
github] [
website]
Zhuang, Kalamang, and other 7 mid-resource languages
MC2: Towards Transparent and Culturally-Aware NLP for Minority
Languages in China
ACL 2024
Chen Zhang*, Mingxu Tao*, Quzhe Huang*, Jiuheng Lin*, Zhibin Chen,
Yansong Feng
[
paper] [
github] [
website]
Tibetan, Uyghur, Kazakh, Mongolian
Harder Task Needs More Experts: Dynamic Routing in MoE Models
ACL 2024
Quzhe Huang*, Zhenwei An*, Nan Zhuang, Mingxu Tao,
Chen Zhang, Yang Jin,
Kun Xu, Kun Xu, Liwei Chen, Songfang Huang, Yansong Feng
[
paper]
English
Can LLMs Learn a New Language on the Fly? A Case Study on Zhuang
ICLR 2024 Tiny Paper
Chen Zhang, Mingxu Tao, Quzhe Huang, Zhibin Chen, Yansong Feng
[
paper]
Zhuang
Can Perplexity Reflect Large Language Model's Ability in Long Text
Understanding?
ICLR 2024 Tiny Paper
Yutong Hu, Quzhe Huang, Mingxu Tao,
Chen Zhang, Yansong Feng
[
paper]
English
2023
Lawyer LLaMA: Enhancing LLMs with Legal Knowledge
arXiv 2305.15062
Quzhe Huang*, Mingxu Tao*,
Chen Zhang*, Zhenwei An*, Cong Jiang, Zhibin
Chen, Zirui Wu, Yansong Feng
[
preprint] [
github]
Chinese
How Many Answers Should I Give? An Empirical Study of Multi-Answer Reading
Comprehension
ACL 2023 (Findings)
Chen Zhang, Jiuheng Lin, Xiao Liu, Yuxuan Lai, Yansong Feng, Dongyan Zhao
[
paper] [
github]
English
The Magic of IF: Investigating Causal Reasoning Abilities in Large Language
Models of Code
ACL 2023 (Findings)
Xiao Liu, Da Yin,
Chen Zhang, Yansong Feng, Dongyan Zhao
[
paper] [
github]
English
Relation-Aware Question Answering for Heterogeneous Knowledge Graphs
EMNLP 2023 (Findings)
Haowei Du, Quzhe Huang, Chen Li,
Chen Zhang, Yang Li, Dongyan Zhao
[
paper]
English
Cross-Lingual Question Answering over Knowledge Base as Reading
Comprehension
EACL 2023 (Findings)
Chen Zhang, Yuxuan Lai, Yansong Feng, Xingyu Shen, Haowei Du, Dongyan
Zhao
[
paper] [
github]
Chinese, Persian, German, Romanian, Italian, Russian, French, Dutch, Spanish,
Hindi, Portuguese
UnifEE: Unified Evidence Extraction for Fact Verification
EACL 2023
Nan Hu, Zirui Wu, Yuxuan Lai,
Chen Zhang, Yansong Feng
[
paper] [
github]
English
2022 and before
Knowledge-Enhanced Iterative Instruction Generation and Reasoning for Knowledge
Base Question Answering
NLPCC 2022
Haowei Du, Quzhe Huang,
Chen Zhang, Dongyan Zhao
[
paper] [
preprint]
English
Extract, Integrate, Compete: Towards Verification Style Reading
Comprehension
EMNLP 2021 (Findings)
Chen Zhang, Yuxuan Lai, Yansong Feng, Dongyan Zhao
[
paper] [
github]
Chinese
A review of deep learning in question answering over knowledge bases
AI Open 2021, Volume 2
Chen Zhang, Yuxuan Lai, Yansong Feng, Dongyan Zhao
[
paper]
English
Why Machine Reading Comprehension Models Learn Shortcuts?
ACL-IJCNLP 2021 (Findings)
Yuxuan Lai,
Chen Zhang, Yansong Feng, Quzhe Huang, Dongyan Zhao
[
paper] [
github]
English
Academic Service
- Area Chair: ACL Rolling Review, LREC
- Reviewer: ACL Rolling Review (Great Reviewer x3), ACL, EMNLP, LREC, COLING, NLPCC (Best
Reviewer Award 2025)
- Workshop Organizer: Widening
NLP (WiNLP) 2025
- Student Volunteer: ACL 2025, ACL 2024, EMNLP 2021 (remote)
Open-Source Artifacts
- ZhuangBench: A benchmark consisting of a small set of Zhuang–Chinese parallel sentences and a
Zhuang dictionary, designed to evaluate whether LLMs can comprehend an unseen language on the fly.
Now part of LongBench v2.
[github] [paper]
- ZhuangRules: A collection of Zhuang grammar rules for evaluating whether LLMs can effectively
leverage grammar books in low-resource language understanding.
[github] [paper]
- MC2: A web-crawled corpus covering four minority languages in China:
Tibetan, Uyghur, Kazakh, and Mongolian.
[github] [huggingface] [paper]
- MiLiC-Eval: A multi-task evaluation benchmark for four minority languages in China: Tibetan,
Uyghur, Kazakh, and Mongolian.
[github] [huggingface] [paper]
- Lawyer LLaMA: A Chinese legal-domain LLM based on Llama 2 trained on synthetic data of legal consultations.
[github] [huggingface] [report]
Teaching
Teaching Assistant @ Peking University
- Foundations of Natural Language Processing (Spring 2024, 2025)
- Empirical Methods for Natural Language Processing (Spring 2022)
- Data Structures and Algorithms (B) (Fall 2020, Spring 2021)
Honors & Awards
- President Scholarship (校长奖学金), Peking University (2025)
- Award for Scientific Research (优秀科研奖), Peking University (2024)
- Outstanding Graduate of Beijing (北京市优秀毕业生), Beijing Municipal Education Commission (2021)
- Outstanding Graduate (北京大学优秀毕业生), Peking University (2021)
- Best Project, Google ML Winter Camp (2020) [link]
- Founder Scholarship (方正奖学金), Peking University (2018, 2019)