Chinese-BERT-wwm Homepage, Documentation and Download- Chinese BERT-wwm Series Models- News Fast Delivery

Pre-Training with Whole Word Masking for Chinese BERT (Chinese BERT-wwm series model)

In the field of natural language processing, pre-trained language models (Pre-trained Language Models) have become a very important basic technology. In order to further promote the research and development of Chinese information processing, Harbin Institute of Technology Xunfei Joint Laboratory (HFL) released the Chinese pre-training model BERT-wwm based on Whole Word Masking technology, and models closely related to this technology: BERT-wwm-ext, RoBERTa-wwm-ext, RoBERTa-wwm-ext-large, RBT3, RBTL3.

Whole Word Masking (wwm)temporarily translated as全词Maskor整词Mask, is an upgraded version of BERT released by Google on May 31, 2019, which mainly changes the training sample generation strategy in the original pre-training stage. To put it simply, the original WordPiece-based word segmentation method will divide a complete word into several subwords. When generating training samples, these divided subwords will be randomly masked.exist全词MaskIn , if a WordPiece subword of a complete word is masked, other parts of the word will also be masked, that is全词Mask.

It should be noted that the mask here refers to the generalized mask (replaced by[MASK]; keep the original vocabulary; randomly replace with another word), not limited to words replaced by[MASK]The case of the label. For more detailed instructions and examples, please refer to:#4

Similarly, due to Google’s officialBERT-base, ChineseChinese is based onCharacterSegmentation for granularity does not take into account the Chinese word segmentation (CWS) in traditional NLP.HFL The method of full-word Mask is applied to Chinese, Chinese Wikipedia (including simplified and traditional) is used for training, and theHarbin Institute of Technology LTPAs a word segmentation tool, that is, to form the samewordAll Chinese characters are masked.

The following text shows全词MaskGenerated samples of . Note: For ease of understanding, in the following examples only the replacement is considered[MASK]The case of the label.

illustrate	sample
original text	Use a language model to predict the probability of the next word.
participle text	Use a language model to predict the probability of the next word.
Original Mask input	use language [MASK] type to [MASK] Test the pro of the next word [MASK] ##lity.
Full word Mask input	use language [MASK] [MASK] Come [MASK] [MASK] the next word [MASK] [MASK] [MASK] .

Chinese model download

This directory mainly contains the base model, so HFL is not marked in the model abbreviationbasetypeface. Models of other sizes will be marked with corresponding marks (such as large).

BERT-large模型：24-layer, 1024-hidden, 16-heads, 330M parameters

BERT-base模型：12-layer, 768-hidden, 12-heads, 110M parameters

model abbreviation	corpus	Google download	Xunfei cloud download
`RBT6, Chinese`	EXT data[1]	–	TensorFlow (password XNMA)
`RBT4, Chinese`	EXT data[1]	–	TensorFlow (password e8dN)
`RBTL3, Chinese`	EXT data[1]	TensorFlow PyTorch	TensorFlow (password vySW)
`RBT3, Chinese`	EXT data[1]	TensorFlow PyTorch	TensorFlow (password b9nx)
`RoBERTa-wwm-ext-large, Chinese`	EXT data[1]	TensorFlow PyTorch	TensorFlow (password u6gC)
`RoBERTa-wwm-ext, Chinese`	EXT data[1]	TensorFlow PyTorch	TensorFlow (password Xe1p)
`BERT-wwm-ext, Chinese`	EXT data[1]	TensorFlow PyTorch	TensorFlow (password 4cMG)
`BERT-wwm, Chinese`	Chinese Wiki	TensorFlow PyTorch	TensorFlow (password 07Xj)
`BERT-base, Chinese`Google	Chinese Wiki	Google Cloud	–
`BERT-base, Multilingual Cased`Google	Multilingual Wiki	Google Cloud	–
`BERT-base, Multilingual Uncased`Google	Multilingual Wiki	Google Cloud	–

[1] EXT data includes: Chinese Wikipedia, other encyclopedias, news, Q&A and other data, with a total word count of 5.4B.

#ChineseBERTwwm #Homepage #Documentation #Download #Chinese #BERTwwm #Series #Models #News Fast Delivery

Chinese model download

Leave a Comment Cancel Reply