Visualize Image
Answer a prompt based on the content of an image.
Description
This endpoint returns a response based on the content of an image and a base prompt.
The prompt can be a question, statement, or any text that you want to ask about the image. The API will analyze the content of the image and generate a response based on the prompt using a pre-trained model.
Right now there are two models available for this endpoint:
- Uform-Gen: UForm-Gen is a small generative vision-language model primarily designed for Image Captioning and Visual Question Answering.
- Llava: LLaVA is a large multimodal model that can generate text based on images and text prompts.
Test images may return accurate results due the test
watermarks applied to them. If you want to get
better results, please use live images. If you just want to test this feature, contact support to temporarily
upgrade your account.
Request Parameters
The request path should contain the following parameters:
The ID of the image to classify.
Request Body
The request body should contain a JSON object with the following fields:
The prompt to answer based on the content of the image.
The model to use for the classification.
The available models are uform-gen
and llava
.
Response
The API will return a JSON object with the following fields:
The response generated by the model based on the image and the prompt.