Follow
Zeming Wei
Zeming Wei
Undergraduate, Peking University
Verified email at stu.pku.edu.cn - Homepage
Title
Cited by
Cited by
Year
Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations
Z Wei, Y Wang, A Li, Y Mo, Y Wang
arXiv preprint arXiv:2310.06387, 2023
562023
CFA: Class-wise Calibrated Fair Adversarial Training
Z Wei, Y Wang, Y Guo, Y Wang
CVPR 2023, 2023
272023
Jatmo: Prompt injection defense by task-specific finetuning
J Piet, M Alrashed, C Sitawarin, S Chen, Z Wei, B Alomair, D Wagner
ESORICS 2024, 2024
132024
Sharpness-Aware Minimization Alone can Improve Adversarial Robustness
Z Wei✉, J Zhu, Y Zhang
ICML 2023 Workshop on New Frontiers in Adversarial Machine Learning, 2023
9*2023
Using Z3 for Formal Modeling and Verification of FNN Global Robustness
Y Zhang, Z Wei, X Zhang, M Sun
SEKE 2023, 2023
62023
Extracting Weighted Finite Automata from Recurrent Neural Networks for Natural Languages
Z Wei, X Zhang, M Sun
ICFEM 2022, 2022
62022
Fight back against jailbreaking via prompt adversarial tuning
Y Mo, Y Wang, Z Wei, Y Wang
ICLR 2024 Workshop on Secure and Trustworthy Large Language Models, 2024
5*2024
Weighted Automata Extraction and Explanation of Recurrent Neural Networks for Natural Language Tasks
Z Wei, X Zhang, Y Zhang, M Sun
Journal of Logical and Algebraic Methods in Programming 136, 100907, 2023
52023
Architecture Matters: Uncovering Implicit Mechanisms in Graph Contrastive Learning
X Guo, Y Wang, Z Wei, Y Wang
NeurIPS 2023, 2023
42023
On the Duality Between Sharpness-Aware Minimization and Adversarial Training
Y Zhang, H He, J Zhu, H Chen, Y Wang, Z Wei✉
ICML 2024, 2024
32024
Boosting Jailbreak Attack with Momentum
Y Zhang, Z Wei✉
ICLR 2024 Workshop on Reliable and Responsible Foundation Models, 2024
22024
Exploring the Robustness of In-Context Learning with Noisy Labels
C Cheng, X Yu, H Wen, J Sun, G Yue, Y Zhang, Z Wei✉
ICLR 2024 Workshop on Reliable and Responsible Foundation Models, 2024
22024
Characterizing Robust Overfitting in Adversarial Training via Cross-Class Features
Z Wei, Y Guo, Y Wang
OpenReview preprint, 2023
12023
A Theoretical Understanding of Self-Correction through In-context Alignment
Y Wang, Y Wu, Z Wei, S Jegelka, Y Wang
arXiv preprint arXiv:2405.18634, 2024
2024
Towards General Conceptual Model Editing via Adversarial Representation Engineering
Y Zhang, Z Wei, J Sun, M Sun
arXiv preprint arXiv:2404.13752, 2024
2024
The system can't perform the operation now. Try again later.
Articles 1–15