Zeming Wei

Cited by

	All	Since 2019
Citations	194	194
h-index	7	7
i10-index	4	4

160

120

2023202445 149

Public access

View all

3 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Yifei WangMITVerified email at mit.edu
Yihao ZhangPeking UniversityVerified email at stu.pku.edu.cn
Yisen WangAssistant Professor, Peking UniversityVerified email at pku.edu.cn
Meng SunProfessor, School of Mathematical Science, Peking UniversityVerified email at math.pku.edu.cn
Xiyue ZhangUniversity of OxfordVerified email at cs.ox.ac.uk
Jingyu ZhuVerified email at stu.pku.edu.cn
Chawin SitawarinPostdoctoral Researcher @ MetaVerified email at meta.com
David WagnerProfessor of Computer Science, UC BerkeleyVerified email at cs.berkeley.edu
Yichuan MoPh.D. Candidate, Peking UniversityVerified email at stu.pku.edu.cn
Huanran ChenUndergraduate, Beijing Institute of TechnologyVerified email at bit.edu.cn
Hangzhou HePeking UniversityVerified email at stu.pku.edu.cn
Sun JunProfessor of SCIS, SMUVerified email at smu.edu.sg
Stefanie JegelkaTUM and MITVerified email at mit.edu

Zeming Wei

Undergraduate, Peking University

Verified email at stu.pku.edu.cn - Homepage

Trustworthy AI Adversarial Robustness Explainability


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations Z Wei, Y Wang, A Li, Y Mo, Y Wang arXiv preprint arXiv:2310.06387, 2023	82	2023
CFA: Class-wise Calibrated Fair Adversarial Training Z Wei, Y Wang, Y Guo, Y Wang CVPR 2023, 2023	32	2023
Jatmo: Prompt injection defense by task-specific finetuning J Piet, M Alrashed, C Sitawarin, S Chen, Z Wei, B Alomair, D Wagner ESORICS 2024, 2024	20	2024
Sharpness-Aware Minimization Alone can Improve Adversarial Robustness Z Wei✉️, J Zhu, Y Zhang ICML 2023 Workshop on New Frontiers in Adversarial Machine Learning, 2023	12*	2023
Fight back against jailbreaking via prompt adversarial tuning Y Mo, Y Wang, Z Wei, Y Wang ICLR 2024 Workshop on Secure and Trustworthy Large Language Models, 2024	8*	2024
Extracting Weighted Finite Automata from Recurrent Neural Networks for Natural Languages Z Wei, X Zhang, M Sun ICFEM 2022, 2022	8	2022
Weighted Automata Extraction and Explanation of Recurrent Neural Networks for Natural Language Tasks Z Wei, X Zhang, Y Zhang, M Sun Journal of Logical and Algebraic Methods in Programming 136, 100907, 2023	7	2023
Architecture Matters: Uncovering Implicit Mechanisms in Graph Contrastive Learning X Guo, Y Wang, Z Wei, Y Wang NeurIPS 2023, 2023	6	2023
Using Z3 for Formal Modeling and Verification of FNN Global Robustness Y Zhang, Z Wei, X Zhang, M Sun SEKE 2023, 2023	6	2023
Boosting Jailbreak Attack with Momentum Y Zhang, Z Wei✉️ ICLR 2024 Workshop on Reliable and Responsible Foundation Models, 2024	5	2024
On the Duality Between Sharpness-Aware Minimization and Adversarial Training Y Zhang, H He, J Zhu, H Chen, Y Wang, Z Wei✉️ ICML 2024, 2024	4	2024
Exploring the Robustness of In-Context Learning with Noisy Labels C Cheng, X Yu, H Wen, J Sun, G Yue, Y Zhang, Z Wei✉️ ICLR 2024 Workshop on Reliable and Responsible Foundation Models, 2024	3	2024
Characterizing Robust Overfitting in Adversarial Training via Cross-Class Features Z Wei, Y Guo, Y Wang OpenReview preprint, 2023	1	2023
Automata Extraction from Transformers Y Zhang, Z Wei, M Sun arXiv preprint arXiv:2406.05564, 2024		2024
A Theoretical Understanding of Self-Correction through In-context Alignment Y Wang, Y Wu, Z Wei, S Jegelka, Y Wang ICML 2024 Workshop on In-Context Learning, 2024		2024
Towards General Conceptual Model Editing via Adversarial Representation Engineering Y Zhang, Z Wei, J Sun, M Sun arXiv preprint arXiv:2404.13752, 2024		2024

The system can't perform the operation now. Try again later.

Articles 1–16

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors