Models
Browse our entire catalog of models.
Automatic Speech Recognition
Automatic speech recognition (ASR) models convert a speech signal, typically an audio input, to text.
Model | Description |
---|---|
whisper | Automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data |
Image Classification
Image classification models take an image input and assigns it labels or classes.
Model | Description |
---|---|
resnet-50 | 50 layers deep image classification CNN trained on more than 1M images from ImageNet |
Image-to-Text
Model | Description |
---|---|
uform-gen2-qwen-500m Beta | UForm-Gen is a small generative vision-language model primarily designed for Image Captioning and Visual Question Answering. The model was pre-trained on the internal image captioning dataset and fine-tuned on public instructions datasets: SVIT, LVIS, VQAs datasets. |
Object Detection
Object detection models can detect instances of objects like persons, faces, license plates, or others in an image. This task takes an image as input and returns a list of detected objects, each one containing a label, a probability score, and its surrounding box coordinates.
Model | Description |
---|---|
detr-resnet-50 Beta | DEtection TRansformer (DETR) model trained end-to-end on COCO 2017 object detection (118k annotated images). |
Summarization
Summarization is the task of producing a shorter version of a document while preserving its important information. Some models can extract text from the original input, while other models can generate entirely new text.
Model | Description |
---|---|
bart-large-cnn Beta | BART is a transformer encoder-encoder (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder. You can use this model for text summarization. |
Text Classification
Sentiment analysis or text classification is a common NLP task that classifies a text input into labels or classes.
Model | Description |
---|---|
distilbert-sst-2-int8 | Distilled BERT model that was finetuned on SST-2 for sentiment classification |
Text Embeddings
Feature extraction models transform raw data into numerical features that can be processed while preserving the information in the original dataset. These models are ideal as part of building vector search applications or Retrieval Augmented Generation workflows with Large Language Models (LLM).
Model | Description |
---|---|
bge-base-en-v1.5 | BAAI general embedding (bge) models transform any given text into a compact vector |
bge-large-en-v1.5 | BAAI general embedding (bge) models transform any given text into a compact vector |
bge-small-en-v1.5 | BAAI general embedding (bge) models transform any given text into a compact vector |
Text Generation
Family of generative text models, such as large language models (LLM), that can be adapted for a variety of natural language tasks.
Model | Description |
---|---|
llama-2-7b-chat-fp16 | Full precision (fp16) generative text model with 7 billion parameters from Meta |
llama-2-7b-chat-int8 | Quantized (int8) generative text model with 7 billion parameters from Meta |
mistral-7b-instruct-v0.1 | Instruct fine-tuned version of the Mistral-7b generative text model with 7 billion parameters |
deepseek-coder-6.7b-base-awq Beta | Deepseek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese. |
deepseek-coder-6.7b-instruct-awq Beta | Deepseek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese. |
deepseek-math-7b-base Beta | DeepSeekMath is initialized with DeepSeek-Coder-v1.5 7B and continues pre-training on math-related tokens sourced from Common Crawl, together with natural language and code data for 500B tokens. |
deepseek-math-7b-instruct Beta | DeepSeekMath-Instruct 7B is a mathematically instructed tuning model derived from DeepSeekMath-Base 7B. DeepSeekMath is initialized with DeepSeek-Coder-v1.5 7B and continues pre-training on math-related tokens sourced from Common Crawl, together with natural language and code data for 500B tokens. |
discolm-german-7b-v1-awq Beta | DiscoLM German 7b is a Mistral-based large language model with a focus on German-language applications. AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. |
falcon-7b-instruct Beta | Falcon-7B-Instruct is a 7B parameters causal decoder-only model built by TII based on Falcon-7B and finetuned on a mixture of chat/instruct datasets. |
llama-2-13b-chat-awq Beta | Llama 2 13B Chat AWQ is an efficient, accurate and blazing-fast low-bit weight quantized Llama 2 variant. |
llamaguard-7b-awq Beta | Llama Guard is provided as-is without any representations, warranties, or guarantees. Any rules or examples contained in blogs, developer docs, or other reference materials are provided for informational purposes only. You acknowledge and agree that you are responsible for the results and outcomes of your use of Workers AI. Cloudflare has no control or authority over the third-party models, which are provided to you subject to separate third-party licenses between you and the model provider. |
mistral-7b-instruct-v0.1-awq Beta | Mistral 7B Instruct v0.1 AWQ is an efficient, accurate and blazing-fast low-bit weight quantized Mistral variant. |
neural-chat-7b-v3-1-awq Beta | This model is a fine-tuned 7B parameter LLM on the Intel Gaudi 2 processor from the mistralai/Mistral-7B-v0.1 on the open source dataset Open-Orca/SlimOrca. |
openchat-3.5-0106 Beta | OpenChat is an innovative library of open-source language models, fine-tuned with C-RLFT - a strategy inspired by offline reinforcement learning. |
openhermes-2.5-mistral-7b-awq Beta | OpenHermes 2.5 Mistral 7B is a state of the art Mistral Fine-tune, a continuation of OpenHermes 2 model, which trained on additional code datasets. |
phi-2 Beta | Phi-2 is a Transformer-based model with a next-word prediction objective, trained on 1.4T tokens from multiple passes on a mixture of Synthetic and Web datasets for NLP and coding. |
qwen1.5-0.5b-chat Beta | Qwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud. |
qwen1.5-1.8b-chat Beta | Qwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud. |
qwen1.5-14b-chat-awq Beta | Qwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud. AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. |
qwen1.5-7b-chat-awq Beta | Qwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud. AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. |
sqlcoder-7b-2 Beta | This model is intended to be used by non-technical users to understand data inside their SQL databases. |
tinyllama-1.1b-chat-v1.0 Beta | The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens. This is the chat model finetuned on top of TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T. |
zephyr-7b-beta-awq Beta | Zephyr 7B Beta AWQ is an efficient, accurate and blazing-fast low-bit weight quantized Zephyr model variant. |
Text-to-Image
Generates images from input text. These models can be used to generate and modify images based on text prompts.
Model | Description |
---|---|
dreamshaper-8-lcm Beta | Stable Diffusion model that has been fine-tuned to be better at photorealism without sacrificing range. |
stable-diffusion-v1-5-img2img Beta | Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images. Img2img generate a new image from an input image with Stable Diffusion. |
stable-diffusion-v1-5-inpainting Beta | Stable Diffusion Inpainting is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input, with the extra capability of inpainting the pictures by using a mask. |
stable-diffusion-xl-base-1.0 Beta | Diffusion-based text-to-image generative model by Stability AI. Generates and modify images based on text prompts. |
stable-diffusion-xl-lightning Beta | SDXL-Lightning is a lightning-fast text-to-image generation model. It can generate high-quality 1024px images in a few steps. |
Translation
Translation models convert a sequence of text from one language to another.
Model | Description |
---|---|
m2m100-1.2b | Multilingual encoder-decoder (seq-to-seq) model trained for Many-to-Many multilingual translation |