mplug-owl: Modularization empowers large language models with multimodality Q Ye, H Xu, G Xu, J Ye, M Yan, Y Zhou, J Wang, A Hu, P Shi, Y Shi, C Li, ... arXiv preprint arXiv:2304.14178, 2023 | 617 | 2023 |
X-clip: End-to-end multi-grained contrastive learning for video-text retrieval Y Ma, G Xu, X Sun, M Yan, J Zhang, R Ji Proceedings of the 30th ACM International Conference on Multimedia, 638-647, 2022 | 205 | 2022 |
mplug-owl2: Revolutionizing multi-modal large language model with modality collaboration Q Ye, H Xu, J Ye, M Yan, A Hu, H Liu, Q Qian, J Zhang, F Huang Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 175 | 2024 |
mplug-2: A modularized multi-modal foundation model across text, image and video H Xu, Q Ye, M Yan, Y Shi, J Ye, Y Xu, C Li, B Bi, Q Qian, W Wang, G Xu, ... International Conference on Machine Learning, 38728-38748, 2023 | 102 | 2023 |
mplug: Effective and efficient vision-language learning by cross-modal skip-connections C Li, H Xu, J Tian, W Wang, M Yan, B Bi, J Ye, H Chen, G Xu, Z Cao, ... arXiv preprint arXiv:2205.12005, 2022 | 97 | 2022 |
Semi-autoregressive neural machine translation C Wang, J Zhang, H Chen arXiv preprint arXiv:1808.08583, 2018 | 91 | 2018 |
mplug-docowl: Modularized multimodal large language model for document understanding J Ye, A Hu, H Xu, Q Ye, M Yan, Y Dan, C Zhao, G Xu, C Li, J Tian, Q Qi, ... arXiv preprint arXiv:2307.02499, 2023 | 76 | 2023 |
Evaluation and analysis of hallucination in large vision-language models J Wang, Y Zhou, G Xu, P Shi, C Zhao, H Xu, Q Ye, M Yan, J Zhang, J Zhu, ... arXiv preprint arXiv:2308.15126, 2023 | 70 | 2023 |
Ureader: Universal ocr-free visually-situated language understanding with multimodal large language model J Ye, A Hu, H Xu, Q Ye, M Yan, G Xu, C Li, J Tian, Q Qian, J Zhang, Q Jin, ... arXiv preprint arXiv:2310.05126, 2023 | 69 | 2023 |
AliMeKG: Domain knowledge graph construction and application in e-commerce FL Li, H Chen, G Xu, T Qiu, F Ji, J Zhang, H Chen Proceedings of the 29th ACM International Conference on Information …, 2020 | 68 | 2020 |
A deep cascade model for multi-document reading comprehension M Yan, J Xia, C Wu, B Bi, Z Zhao, J Zhang, L Si, R Wang, W Wang, ... Proceedings of the AAAI conference on artificial intelligence 33 (01), 7354-7361, 2019 | 61 | 2019 |
Rosita: Enhancing vision-and-language semantic alignments via cross-and intra-modal knowledge integration Y Cui, Z Yu, C Wang, Z Zhao, J Zhang, M Wang, J Yu Proceedings of the 29th ACM International Conference on Multimedia, 797-806, 2021 | 58 | 2021 |
Hitea: Hierarchical temporal-aware video-language pre-training Q Ye, G Xu, M Yan, H Xu, Q Qian, J Zhang, F Huang Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 57 | 2023 |
Shifting more attention to visual backbone: Query-modulated refinement networks for end-to-end visual grounding J Ye, J Tian, M Yan, X Yang, X Wang, J Zhang, L He, X Lin Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 54 | 2022 |
Cvalues: Measuring the values of chinese large language models from safety to responsibility G Xu, J Liu, M Yan, H Xu, J Si, Z Zhou, P Yi, X Gao, J Sang, R Zhang, ... arXiv preprint arXiv:2307.09705, 2023 | 50 | 2023 |
An llm-free multi-dimensional benchmark for mllms hallucination evaluation J Wang, Y Wang, G Xu, J Zhang, Y Gu, H Jia, M Yan, J Zhang, J Sang arXiv preprint arXiv:2311.07397, 2023 | 46 | 2023 |
CAT-MNER: multimodal named entity recognition with knowledge-refined cross-modal attention X Wang, J Ye, Z Li, J Tian, Y Jiang, M Yan, J Zhang, Y Xiao 2022 IEEE international conference on multimedia and expo (ICME), 1-6, 2022 | 40 | 2022 |
Mobile-agent: Autonomous multi-modal mobile device agent with visual perception J Wang, H Xu, J Ye, M Yan, W Shen, J Zhang, F Huang, J Sang arXiv preprint arXiv:2401.16158, 2024 | 31 | 2024 |
Adavqa: Overcoming language priors with adapted margin cosine loss Y Guo, L Nie, Z Cheng, F Ji, J Zhang, A Del Bimbo arXiv preprint arXiv:2105.01993, 2021 | 29 | 2021 |
KACE: Generating knowledge aware contrastive explanations for natural language inference Q Chen, F Ji, X Zeng, FL Li, J Zhang, H Chen, Y Zhang Proceedings of the 59th Annual Meeting of the Association for Computational …, 2021 | 27 | 2021 |