Visual chatgpt: Talking, drawing and editing with visual foundation models C Wu, S Yin, W Qi, X Wang, Z Tang, N Duan arXiv preprint arXiv:2303.04671, 2023 | 546 | 2023 |
Nüwa: Visual synthesis pre-training for neural visual world creation C Wu, J Liang, L Ji, F Yang, Y Fang, D Jiang, N Duan European conference on computer vision, 720-736, 2022 | 285 | 2022 |
Godiva: Generating open-domain videos from natural descriptions C Wu, L Huang, Q Zhang, B Li, L Ji, F Yang, G Sapiro, N Duan arXiv preprint arXiv:2104.14806, 2021 | 176 | 2021 |
Taskmatrix. ai: Completing tasks by connecting foundation models with millions of apis Y Liang, C Wu, T Song, W Wu, Y Xia, Y Liu, Y Ou, S Lu, L Ji, S Mao, ... Intelligent Computing 3, 0063, 2024 | 141 | 2024 |
Reco: Region-controlled text-to-image generation Z Yang, J Wang, Z Gan, L Li, K Lin, C Wu, N Duan, Z Liu, C Liu, M Zeng, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 97 | 2023 |
Nuwa-xl: Diffusion over diffusion for extremely long video generation S Yin, C Wu, H Yang, J Wang, X Wang, M Ni, Z Yang, L Li, S Liu, F Yang, ... arXiv preprint arXiv:2303.12346, 2023 | 64 | 2023 |
Dragnuwa: Fine-grained control in video generation by integrating text, image, and trajectory S Yin, C Wu, J Liang, J Shi, H Li, G Ming, N Duan arXiv preprint arXiv:2308.08089, 2023 | 56 | 2023 |
Object-difference attention: A simple relational attention for visual question answering C Wu, J Liu, X Wang, X Dong Proceedings of the 26th ACM international conference on Multimedia, 519-527, 2018 | 55 | 2018 |
Bridgetower: Building bridges between encoders in vision-language representation learning X Xu, C Wu, S Rosenman, V Lal, W Che, N Duan Proceedings of the AAAI Conference on Artificial Intelligence 37 (9), 10637 …, 2023 | 52 | 2023 |
Chain of reasoning for visual question answering C Wu, J Liu, X Wang, X Dong Advances in Neural Information Processing Systems 31, 2018 | 52 | 2018 |
Differential networks for visual question answering C Wu, J Liu, X Wang, R Li Proceedings of the AAAI Conference on Artificial Intelligence 33 (01), 8997-9004, 2019 | 44 | 2019 |
Vl-interpret: An interactive visualization tool for interpreting vision-language transformers E Aflalo, M Du, SY Tseng, Y Liu, C Wu, N Duan, V Lal Proceedings of the IEEE/CVF Conference on computer vision and pattern …, 2022 | 43 | 2022 |
Low-code llm: Visual programming over llms Y Cai, S Mao, W Wu, Z Wang, Y Liang, T Ge, C Wu, W You, T Song, Y Xia, ... arXiv preprint arXiv:2304.08103 2, 2023 | 39 | 2023 |
Nuwa-infinity: Autoregressive over autoregressive generation for infinite visual synthesis C Wu, J Liang, X Hu, Z Gan, J Wang, L Wang, Z Liu, Y Fang, N Duan arXiv preprint arXiv:2207.09814, 2022 | 32 | 2022 |
Kd-vlp: Improving end-to-end vision-and-language pretraining with object knowledge distillation Y Liu, C Wu, S Tseng, V Lal, X He, N Duan arXiv preprint arXiv:2109.10504, 2021 | 25 | 2021 |
NUWA-LIP: language-guided image inpainting with defect-free VQGAN M Ni, X Li, W Zuo Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 22 | 2023 |
Nuwa-infinity: Autoregressive over autoregressive generation for infinite visual synthesis J Liang, C Wu, X Hu, Z Gan, J Wang, L Wang, Z Liu, Y Fang, N Duan Advances in Neural Information Processing Systems 35, 15420-15432, 2022 | 22 | 2022 |
Visual ChatGPT: Talking C Wu, S Yin, W Qi, X Wang, Z Tang, N Duan Drawing and Editing with Visual Foundation Models, 2023 | 18 | 2023 |
Learning to program with natural language Y Guo, Y Liang, C Wu, W Wu, D Zhao, N Duan arXiv preprint arXiv:2304.10464, 2023 | 18 | 2023 |
Divae: Photorealistic images synthesis with denoising diffusion decoder J Shi, C Wu, J Liang, X Liu, N Duan arXiv preprint arXiv:2206.00386, 2022 | 18 | 2022 |