uform-gen2-qwen-500m
Model ID: @cf/unum/uform-gen2-qwen-500m
UForm-Gen is a small generative vision-language model primarily designed for Image Captioning and Visual Question Answering. The model was pre-trained on the internal image captioning dataset and fine-tuned on public instructions datasets: SVIT, LVIS, VQAs datasets.
Properties
Task Type: Image-to-Text
API Schema
The following schema is based on JSON SchemaInput JSON Schema
Output JSON Schema