Pre-Training with Whole Word Masking for Chinese BERT (Chinese BERT-wwm series model)
In the field of natural language processing, pre-trained language models (Pre-trained Language Models) have become a very important basic technology. In order to further promote the research and development of Chinese information processing, Harbin Institute of Technology Xunfei Joint Laboratory (HFL) released the Chinese pre-training model BERT-wwm based on Whole Word Masking technology, and models closely related to this technology: BERT-wwm-ext, RoBERTa-wwm-ext, RoBERTa-wwm-ext-large, RBT3, RBTL3.
Whole Word Masking (wwm)temporarily translated as全词Mask
or整词Mask
, is an upgraded version of BERT released by Google on May 31, 2019, which mainly changes the training sample generation strategy in the original pre-training stage. To put it simply, the original WordPiece-based word segmentation method will divide a complete word into several subwords. When generating training samples, these divided subwords will be randomly masked.exist全词Mask
In , if a WordPiece subword of a complete word is masked, other parts of the word will also be masked, that is全词Mask
.
It should be noted that the mask here refers to the generalized mask (replaced by[MASK]; keep the original vocabulary; randomly replace with another word), not limited to words replaced by[MASK]
The case of the label. For more detailed instructions and examples, please refer to:#4
Similarly, due to Google’s officialBERT-base, Chinese
Chinese is based onCharacterSegmentation for granularity does not take into account the Chinese word segmentation (CWS) in traditional NLP.HFL The method of full-word Mask is applied to Chinese, Chinese Wikipedia (including simplified and traditional) is used for training, and theHarbin Institute of Technology LTPAs a word segmentation tool, that is, to form the samewordAll Chinese characters are masked.
The following text shows全词Mask
Generated samples of . Note: For ease of understanding, in the following examples only the replacement is considered[MASK]The case of the label.
illustrate | sample |
---|---|
original text | Use a language model to predict the probability of the next word. |
participle text | Use a language model to predict the probability of the next word. |
Original Mask input | use language [MASK] type to [MASK] Test the pro of the next word [MASK] ##lity. |
Full word Mask input | use language [MASK] [MASK] Come [MASK] [MASK] the next word [MASK] [MASK] [MASK] . |
Chinese model download
This directory mainly contains the base model, so HFL is not marked in the model abbreviationbase
typeface. Models of other sizes will be marked with corresponding marks (such as large).
BERT-large模型
:24-layer, 1024-hidden, 16-heads, 330M parametersBERT-base模型
:12-layer, 768-hidden, 12-heads, 110M parameters
model abbreviation | corpus | Google download | Xunfei cloud download |
---|---|---|---|
RBT6, Chinese | EXT data[1] | – | TensorFlow (password XNMA) |
RBT4, Chinese | EXT data[1] | – | TensorFlow (password e8dN) |
RBTL3, Chinese | EXT data[1] | TensorFlow PyTorch | TensorFlow (password vySW) |
RBT3, Chinese | EXT data[1] | TensorFlow PyTorch | TensorFlow (password b9nx) |
RoBERTa-wwm-ext-large, Chinese | EXT data[1] | TensorFlow PyTorch | TensorFlow (password u6gC) |
RoBERTa-wwm-ext, Chinese | EXT data[1] | TensorFlow PyTorch | TensorFlow (password Xe1p) |
BERT-wwm-ext, Chinese | EXT data[1] | TensorFlow PyTorch | TensorFlow (password 4cMG) |
BERT-wwm, Chinese | Chinese Wiki | TensorFlow PyTorch | TensorFlow (password 07Xj) |
BERT-base, Chinese Google | Chinese Wiki | Google Cloud | – |
BERT-base, Multilingual Cased Google | Multilingual Wiki | Google Cloud | – |
BERT-base, Multilingual Uncased Google | Multilingual Wiki | Google Cloud | – |
[1] EXT data includes: Chinese Wikipedia, other encyclopedias, news, Q&A and other data, with a total word count of 5.4B.
#ChineseBERTwwm #Homepage #Documentation #Download #Chinese #BERTwwm #Series #Models #News Fast Delivery