Whisper is OpenAI’s open source Automatic Speech Recognition (ASR, Automatic Speech Recognition) system. OpenAI trained Whisper on 680,000 hours of multilingual (98 languages) and multitask supervised data collected from the web. OpenAI believes that using such a large and diverse dataset can improve the ability to recognize accents, background noise, and technical jargon. In addition to being used for speech recognition, Whisper can also transcribe and translate these languages ​​into English.

set up

We use Python 3.9.9 and PyTorch 1.10.1 to train and test our model, but the codebase is expected to be compatible with Python 3.7 or higher and the latest PyTorch version.The codebase also depends on some Python packages, the following command will pull and install the latest commit and its Python dependencies from this repository

pip install git+https://github.com/openai/whisper.git 

It also requires command line tools to be installed on your system ffmpegmost package managers can use:

# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg

# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg

# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg

# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg

At present, Whisper has 9 models (divided into pure English and multi-language), four of which are only available in English. Developers can make a trade-off between speed and accuracy according to their needs. The following are the sizes of existing models and their memory requirements. and relative velocity:

sizeparameterEnglish-only modelmultilingual modelrequired video memoryRelative velocity
tiny39 Mtiny.entiny~1 GB~32x
base74 Mbase.enbase~1 GB~16x
small244Msmall.ensmall~2 GB~6x
medium769 Mmedium.enmedium~5 GB~2x
large1550MN/Alarge~10GB1x

#OpenAIWhisper #Homepage #Documentation #Downloads #OpenAI #Open #Source #Speech #Recognition #System #News Fast Delivery

Leave a Comment

Your email address will not be published. Required fields are marked *