POST
/
v1
/
images
/
{imageId}
/
visualize

Description

This endpoint returns a response based on the content of an image and a base prompt.

The prompt can be a question, statement, or any text that you want to ask about the image. The API will analyze the content of the image and generate a response based on the prompt using a pre-trained model.

Right now there are two models available for this endpoint:

  • Uform-Gen: UForm-Gen is a small generative vision-language model primarily designed for Image Captioning and Visual Question Answering.
  • Llava: LLaVA is a large multimodal model that can generate text based on images and text prompts.

Test images may return accurate results due the test watermarks applied to them. If you want to get better results, please use live images. If you just want to test this feature, contact support to temporarily upgrade your account.


Request Parameters

The request path should contain the following parameters:

imageId
string
required

The ID of the image to classify.

Request Body

The request body should contain a JSON object with the following fields:

prompt
string
required

The prompt to answer based on the content of the image.

model
string
default:
"uform-gen"

The model to use for the classification.

The available models are uform-gen and llava.

Response

The API will return a JSON object with the following fields:

response
string

The response generated by the model based on the image and the prompt.