TurboPilot Homepage, Documentation and Download- AI Code Completion Engine- News Fast Delivery

Turbopilot is an open source large language model based code completion engine that runs natively on the CPU.

Specifically, TurboPilot is a self-hosted GitHub copilot clone that uses the library behind llama.cpp to run a 6 billion parameter Salesforce Codegen model in 4GiB of RAM. It is heavily based on and inspired by the fauxpilot project.

NOTE: The project is in a proof-of-concept stage, not a stable tool. In this version of the project, autocompletion is very slow.

start

The easiest way to try the project is to get the preprocessed model and run the server in docker.

get model

There are 2 options to get the model

Option A: Direct Download – Simple, Quick Start

Pre-transformed, pre-quantized models can be downloaded from Google Drive.The project team produced a 350M, 2B and 6B parameters multi Flavor Models – These models are in the C , C++ , Go , Java , JavaScript and Python pre-trained on

Option B: Convert the model yourself – difficult, more flexible

If you want to try quantizing the model yourself, followthis guide.

Running TurboPilot Server

Download the latest binaries and extract them to the root project folder.

run

./codegen-serve -m ./models/codegen-6B-multi-ggml-4bit-quant.bin

The application should be on port 18080 start the server on

If you have a multi-core system, you can control how many CPUs are used with the -t option

./codegen-serve -t 6 -m ./models/codegen-6B-multi-ggml-4bit-quant.bin

Run from Docker

Canfrom hereThe provided pre-built docker image runs Turbopilot

The models still need to be downloaded separately, then you can run:

docker run --rm -it \
  -v ./models:/models \
  -e THREADS=6 \
  -e MODEL="/models/codegen-2B-multi-ggml-4bit-quant.bin" \
  -p 18080:18080 \
  ghcr.io/ravenscroftj/turbopilot:latest

You still need to download the model separately, then run:

docker run --rm -it \
  -v ./models:/models \
  -e THREADS=6 \
  -e MODEL="/models/codegen-2B-multi-ggml-4bit-quant.bin" \
  -p 18080:18080 \
  ghcr.io/ravenscroftj/turbopilot:latest

use the API

Using the API with the FauxPilot plugin

To use the API from VSCode, the vscode-fauxpilot plugin is recommended. After installation, you need to change some settings in the settings.json file.

Open Settings (CTRL/CMD + SHIFT + P) and select Preferences: Open User Settings (JSON)
Add the following values:

{
    ... // other settings

    "fauxpilot.enabled": true,
    "fauxpilot.server": "http://localhost:18080/v1/engines",
}

The fauxpilot can be enabled using CTRL + SHIFT + P and selecting Enable Fauxpilot

When completing, the plugin sends API calls to the running codegen-serve process. It will then wait for each request to complete before sending further requests.

Call the API directly

can to http://localhost:18080/v1/engines/codegen/completions Make a request and it behaves just like the same Copilot endpoint.

For example:

curl --request POST \
  --url http://localhost:18080/v1/engines/codegen/completions \
  --header 'Content-Type: application/json' \
  --data '{
 "model": "codegen",
 "prompt": "def main():",
 "max_tokens": 100
}'

known limitations

Since v0.0.2:

These models can be very slow – especially the 6B model. Making recommendations across 4 CPU cores can take about 30-40 seconds.
The system has only been tested on Ubuntu 22.04, but ARM docker images are now available, and ARM binary releases will be available soon.
Sometimes completion suggestions are truncated where they don’t make sense – for example by variable names or parts of string names. This is due to a hard limit of 2048 on the context length (hints + suggestions).

#TurboPilot #Homepage #Documentation #Download #Code #Completion #Engine #News Fast Delivery