ChatGLM2-6B is an open source Chinese-English bilingual dialogue model ChatGLM-6B The second-generation version of ChatGLM, on the basis of retaining many excellent features of the first-generation model, such as smooth dialogue and low deployment threshold, ChatGLM2-6B introduces the following new features:
- more powerful performance: Based on the development experience of the first generation model of ChatGLM, the base model of ChatGLM2-6B has been fully upgraded. ChatGLM2-6B used GLM The mixed objective function of , after the pre-training of 1.4T Chinese and English identifiers and human preference alignment training,evaluation resultsIt shows that compared with the original model, the performance of ChatGLM2-6B on data sets such as MMLU (+23%), CEval (+33%), GSM8K (+571%), and BBH (+60%) has achieved a substantial improvement. It has strong competitiveness in open source models of the same size.
- longer context:based on Flash Attention technology, the context length (Context Length) of the pedestal model is extended from 2K of ChatGLM-6B to 32K, and the context length of 8K is used for training in the dialogue stage, allowing more rounds of dialogue. However, the current version of ChatGLM2-6B has limited ability to understand single-round ultra-long documents, and will focus on optimization in subsequent iterative upgrades.
- more efficient reasoning:based on Multi-Query Attention Technology, ChatGLM2-6B has more efficient reasoning speed and lower video memory usage: Under the official model implementation, the reasoning speed has increased by 42% compared with the first generation, and under INT4 quantization, the dialogue length supported by 6G video memory has been increased from 1K to 8K .
- more open protocol: ChatGLM2-6B weights for academic researchfully openafter obtaining official written permission, alsocommercial use allowed.
The ChatGLM2-6B open source model aims to promote the development of large model technology together with the open source community. I implore developers and everyone to followopen source agreementDo not use open source models and codes and derivatives based on open source projects for any purposes that may cause harm to the country and society, and for any services that have not undergone security assessment and filing.Currently, the project team has not developed any applications based on ChatGLM2-6B, including web, Android, Apple iOS and Windows App applications.
Although the model tries its best to ensure the compliance and accuracy of the data at all stages of training, due to the small scale of the ChatGLM2-6B model, and the model is affected by probabilistic random factors, the accuracy of the output content cannot be guaranteed, and the model is easy to be misleading.This project does not assume the risks and responsibilities of data security and public opinion risks caused by open source models and codes, or any risks and responsibilities arising from misleading, misusing, spreading, and improper use of any models.
evaluation results
The following is the ChatGLM2-6B model in MMLU (English),C-Eval(Chinese),GSM8K(math),BBH(English) on the evaluation results.
MMLU
model | Average | STEM | Social Sciences | Humanities | Others |
---|---|---|---|---|---|
ChatGLM-6B | 40.63 | 33.89 | 44.84 | 39.02 | 45.71 |
ChatGLM2-6B (base) | 47.86 | 41.20 | 54.44 | 43.66 | 54.46 |
ChatGLM2-6B | 45.46 | 40.06 | 51.61 | 41.23 | 51.24 |
The Chat model is tested using the zero-shot CoT (Chain-of-Thought) method, and the Base model is tested using the few-shot answer-only method
C-Eval
model | Average | STEM | Social Sciences | Humanities | Others |
---|---|---|---|---|---|
ChatGLM-6B | 38.9 | 33.3 | 48.3 | 41.3 | 38.0 |
ChatGLM2-6B (base) | 51.7 | 48.6 | 60.5 | 51.3 | 49.8 |
ChatGLM2-6B | 50.1 | 46.4 | 60.4 | 50.6 | 46.9 |
The Chat model is tested using the zero-shot CoT method, and the Base model is tested using the few-shot answer only method
GSM8K
model | Accuracy | Accuracy (Chinese)* |
---|---|---|
ChatGLM-6B | 4.82 | 5.85 |
ChatGLM2-6B (base) | 32.37 | 28.95 |
ChatGLM2-6B | 28.05 | 20.45 |
All models are tested using few-shot CoT method, CoT prompt from http://arxiv.org/abs/2201.11903
* We translated 500 questions and CoT prompts in GSM8K using the translation API and manually proofread them
BBH
model | Accuracy |
---|---|
ChatGLM-6B | 18.73 |
ChatGLM2-6B (base) | 33.68 |
ChatGLM2-6B | 30.00 |
All models are tested using few-shot CoT method, CoT prompt from https://github.com/suzgunmirac/BIG-Bench-Hard/tree/main/cot-prompt
#ChatGLM26B #Homepage #Documentation #Download #Open #Source #Bilingual #Dialogue #Language #Model #News Fast Delivery