Build a conversation, turn by turn.
Add a turn and write its text, then generate audio for it. Or let the model continue the conversation on its own. Every turn stays editable — change the text and re-run from there.