Gemini 2.0 Flash Thinking

The Gemini 2.0 Flash Thinking model is an experimental model that's trained to generate the "thinking process" the model goes through as part of its response. As a result, the Flash Thinking model is capable of stronger reasoning capabilities in its responses than the Gemini 2.0 Flash Experimental model.

Use thinking models

Flash Thinking models are available in Google AI Studio and through the Gemini API. The Gemini API doesn't return thoughts in the response.

Send a basic request

Python

This example uses the new Google Genai SDK and the v1alpha version of the API.

from google import genai

client = genai.Client(api_key='GEMINI_API_KEY', http_options={'api_version':'v1alpha'})

response = client.models.generate_content(
    model='gemini-2.0-flash-thinking-exp',
    contents='Explain how RLHF works in simple terms.',
)

print(response.text)

Multi-turn thinking conversations

During multi-turn conversations, you pass the entire conversation history as input, so the model has access to its previous thoughts in a multi-turn conversation.

Python

The new Google Genai SDK provides the ability to create a multi-turn chat session which is helpful to manage the state of a conversation.

from google import genai

client = genai.Client(api_key='GEMINI_API_KEY', http_options={'api_version':'v1alpha'})

chat = client.aio.chats.create(
    model='gemini-2.0-flash-thinking-exp',
)
response = await chat.send_message('What is your name?')
print(response.text)
response = await chat.send_message('What did you just say before this?')
print(response.text)

Limitations

The Flash Thinking model is an experimental model and has the following limitations:

  • No JSON mode or Search Grounding
  • Thoughts are only shown in Google AI Studio

What's next?