POST
/
v1
/
images
/
{imageId}
/
visualize

Description

This endpoint returns a response based on the content of an image and a base prompt.

The prompt can be a question, statement, or any text that you want to ask about the image. The API will analyze the content of the image and generate a response based on the prompt using a pre-trained model.

Right now there are two models available for this endpoint:

  • Uform-Gen: UForm-Gen is a small generative vision-language model primarily designed for Image Captioning and Visual Question Answering.
  • Llava: LLaVA is a large multimodal model that can generate text based on images and text prompts.

Request Parameters

The request path should contain the following parameters:

imageId
string
required

The ID of the image to classify.

Request Body

The request body should contain a JSON object with the following fields:

prompt
string
required

The prompt to answer based on the content of the image.

model
string
default: "uform-gen"

The model to use for the classification.

The available models are uform-gen and llava.

Response

The API will return a JSON object with the following fields:

response
string

The response generated by the model based on the image and the prompt.