Description
This endpoint returns a response based on the content of an image and a base prompt. The prompt can be a question, statement, or any text that you want to ask about the image. The API will analyze the content of the image and generate a response based on the prompt using a pre-trained model. Right now there are two models available for this endpoint:- Uform-Gen: UForm-Gen is a small generative vision-language model primarily designed for Image Captioning and Visual Question Answering.
- Llava: LLaVA is a large multimodal model that can generate text based on images and text prompts.
- Gemini: Gemini is a multimodal model with advanced capabilities for understanding and generating text based on images.
Test images may return accurate results due the
test
watermarks applied to them. If you want to get
better results, please use live images. If you just want to test this feature, contact support to temporarily
upgrade your account.Authorizations
API Key for authentication
Path Parameters
The unique identifier of the image. This identifier is used to reference the image in subsequent requests.
Body
application/json
The prompt to answer based on the content of the image. This is a natural language question or instruction that the model will respond to.
Required string length:
1 - 1000
The model to use for the visualization. Supported models are uform-gen, llava, and gemini. If not provided, the default model will be used.
Available options:
uform-gen
, llava
, gemini
Response
The API will return the Image object in the response body.
Response object for the visualize endpoint.
The response from the AI model. This is the description of the image based on the prompt provided.