Follow
Zejun Li
Zejun Li
Verified email at fudan.edu.cn
Title
Cited by
Cited by
Year
Tcic: Theme concepts learning cross language and vision for image captioning
Z Fan, Z Wei, S Wang, R Wang, Z Li, H Shan, X Huang
arXiv preprint arXiv:2106.10936, 2021
282021
Mvptr: Multi-level semantic alignment for vision-language pre-training via multi-stage learning
Z Li, Z Fan, H Tou, J Chen, Z Wei, X Huang
Proceedings of the 30th ACM International Conference on Multimedia, 4395-4405, 2022
172022
Mvp: Multi-stage vision-language pre-training via multi-level semantic alignment
Z Li, Z Fan, H Tou, Z Wei
arXiv preprint arXiv:2201.12596 1, 2022
132022
Unifying cross-lingual and cross-modal modeling towards weakly supervised multilingual vision-language pre-training
Z Li, Z Fan, J Chen, Q Zhang, XJ Huang, Z Wei
Proceedings of the 61st Annual Meeting of the Association for Computational …, 2023
112023
Constructing phrase-level semantic labels to form multi-grained supervision for image-text retrieval
Z Fan, Z Wei, Z Li, S Wang, H Shan, X Huang, J Fan
Proceedings of the 2022 International Conference on Multimedia Retrieval …, 2022
102022
Negative sample is negative in its own way: Tailoring negative sentences for image-text retrieval
Z Fan, Z Wei, Z Li, S Wang, J Fan
arXiv preprint arXiv:2111.03349, 2021
72021
EmbSpatial-Bench: Benchmarking Spatial Understanding for Embodied Tasks with Large Vision-Language Models
M Du, B Wu, Z Li, X Huang, Z Wei
arXiv preprint arXiv:2406.05756, 2024
52024
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
Z Li, R Luo, J Zhang, M Qiu, Z Wei
arXiv preprint arXiv:2405.16919, 2024
52024
Unifying Local and Global Knowledge: Empowering Large Language Models as Political Experts with Knowledge Graphs
X Mou, Z Li, H Lyu, J Luo, Z Wei
Proceedings of the ACM on Web Conference 2024, 2603-2614, 2024
42024
A unified continuous learning framework for multi-modal knowledge discovery and pre-training
Z Fan, Z Wei, J Chen, S Wang, Z Li, J Xu, X Huang
arXiv preprint arXiv:2206.05555, 2022
42022
An unsupervised sampling approach for image-sentence matching using document-level structural information
Z Li, Z Wei, Z Fan, H Shan, X Huang
Proceedings of the AAAI Conference on Artificial Intelligence 35 (15), 13324 …, 2021
42021
Reform-eval: Evaluating large vision language models via unified re-formulation of task-oriented benchmarks
Z Li, Y Wang, M Du, Q Liu, B Wu, J Zhang, C Zhou, Z Fan, J Fu, J Chen, ...
Proceedings of the 32nd ACM International Conference on Multimedia, 1971-1980, 2024
32024
DELAN: Dual-Level Alignment for Vision-and-Language Navigation by Cross-Modal Contrastive Learning
M Du, B Wu, J Zhang, Z Fan, Z Li, R Luo, X Huang, Z Wei
arXiv preprint arXiv:2404.01994, 2024
22024
Continuous or Discrete, That Is the Question: A Survey on Large Multi-Modal Models from the Perspective of Input-Output Space Extension
Z Li, J Zhang, D Wang, Y Wang, X Huang, Z Wei
2024
The system can't perform the operation now. Try again later.
Articles 1–14