Stay organized with collections
Save and categorize content based on your preferences.
The Gemini API can run inference on images and videos passed to it. When passed
an image, a series of images, or a video, Gemini can:
Describe or answer questions about the content
Summarize the content
Extrapolate from the content
This tutorial demonstrates some possible ways to prompt the Gemini API with
images and video input. All output is text-only.
What's next
This guide shows how to upload image and video files using the File API and
then generate text outputs from image and video inputs. To learn more,
see the following resources:
File prompting strategies: The
Gemini API supports prompting with text, image, audio, and video data, also
known as multimodal prompting.
System instructions: System
instructions let you steer the behavior of the model based on your specific
needs and use cases.
Safety guidance: Sometimes generative AI
models produce unexpected outputs, such as outputs that are inaccurate,
biased, or offensive. Post-processing and human evaluation are essential to
limit the risk of harm from such outputs.