The Gemini 2.0 Flash Thinking model is an experimental model that's trained to generate the "thinking process" the model goes through as part of its response. As a result, the Flash Thinking model is capable of stronger reasoning capabilities in its responses than the Gemini 2.0 Flash Experimental model.
Use thinking models
Flash Thinking models are available in Google AI Studio and through the Gemini API. The Gemini API doesn't return thoughts in the response.
Send a basic request
Python
This example uses the new
Google Genai SDK and the v1alpha
version of the API.
from google import genai
client = genai.Client(api_key='GEMINI_API_KEY', http_options={'api_version':'v1alpha'})
response = client.models.generate_content(
model='gemini-2.0-flash-thinking-exp',
contents='Explain how RLHF works in simple terms.',
)
print(response.text)
Multi-turn thinking conversations
During multi-turn conversations, you pass the entire conversation history as input, so the model has access to its previous thoughts in a multi-turn conversation.
Python
The new Google Genai SDK provides the ability to create a multi-turn chat session which is helpful to manage the state of a conversation.
from google import genai
client = genai.Client(api_key='GEMINI_API_KEY', http_options={'api_version':'v1alpha'})
chat = client.aio.chats.create(
model='gemini-2.0-flash-thinking-exp',
)
response = await chat.send_message('What is your name?')
print(response.text)
response = await chat.send_message('What did you just say before this?')
print(response.text)
Limitations
The Flash Thinking model is an experimental model and has the following limitations:
- No JSON mode or Search Grounding
- Thoughts are only shown in Google AI Studio
What's next?
- Try the Flash Thinking model in Google AI Studio.
- Try the Flash Thinking Colab.