VisualGLM-6B is an open source multi-modal dialogue language model that supports images, Chinese and English. The language model is based on ChatGLM-6B and has 6.2 billion parameters; the image part builds a bridge between the visual model and the language model by training BLIP2-Qformer , the overall model has a total of 7.8 billion parameters. VisualGLM-6B relies on 30M high-quality Chinese image-text pairs from the CogView dataset, pre-trained with 300M screened English image-text pairs, with the same weight for Chinese and English. This training method better aligns the visual information to the semantic space of ChatGLM; in the subsequent fine-tuning stage, the model is in the long visual… |
#Multimodal #Dialogue #Language #Model #VisualGLM6B