classify

Classify an image using a pre-trained model. At the moment, the only supported model is the ResNet50 model, a deep learning model that excels at image classification tasks.

visualize

Returns a response based on the content of an image and a base prompt.

The prompt can be a question, statement, or any text that you want to ask about the image. The API will analyze the content of the image and generate a response based on the prompt using a pre-trained model.

Right now there are two models available for this endpoint:

  • Uform-Gen: UForm-Gen is a small generative vision-language model primarily designed for Image Captioning and Visual Question Answering.
  • Llava: LLaVA is a large multimodal model that can generate text based on images and text prompts.