Text To Speech

Prev Next

The Text to Speech input lets you turn written text into natural, spoken audio. Instead of recording a voice yourself, you simply enter the text you want to hear, and Composer generates realistic speech automatically using ElevenLabs — a high-quality voice generation service known for clear, expressive voices.

Whether you need narration, voice prompts, or spoken messages, Text to Speech lets you generate clear, natural-sounding audio directly from text.

What you can use it for

Text-to-Speech is useful in many situations, for example:

  • Creating voice narration for videos or presentations

  • Adding spoken feedback or messages to applications

  • Improving accessibility by offering audio versions of text

  • Generating placeholder or prototype voice content quickly

  • Because the voice is generated from text, you can easily update or reuse it without recording anything again.

Requirements

To use the Text to Speech input, you need:

  • An active ElevenLabs subscription (required for commercial use).

  • A valid ElevenLabs API key.

Responsibility

We provide a technical integration with ElevenLabs. Users supply their own API key and are responsible for complying with ElevenLabs’ licensing and use policies.

ElevenLabs Terms of Use.

💡ElevenLabs credits

Usage is based on a credit system, which limits how much speech you can generate. The available credits and limits depend on the plan you choose.

For more information, please visit ElevenLabs.

Troubleshooting

Known ElevenLabs issues may occasionally affect Text to Speech output. These can include audio glitches, sharp breaths between paragraphs, pronunciation issues, unexpected audio artifacts, or small variations in voice quality. These behaviors originate from ElevenLabs’ voice generation system and are outside of Composer’s control.

If audio sounds distorted or unexpected, regenerating the affected or previous paragraph usually resolves the issue.

For more details, known limitations, and recommended workarounds, refer to ElevenLabs’ official troubleshooting documentation.

Configuration

To allow Composer to use ElevenLabs, you must provide your ElevenLabs API key as a system environment variable. Composer reads this key automatically when it starts.

Step 1: Get your ElevenLabs API key

  • Sign in to your ElevenLabs account.

  • Go to your account or profile settings.

  • Create and copy your API key. Keep this key private.

Step 2: Set API Key Permission

  • In your ElevenLabs account, make sure to give your API key permission to read Text to Speech.

Step 3: Set the environment variable

Create a system environment variable:

  • Name: COMPOSER_ELEVENLABS_APIKEY

  • Value: your ElevenLabs API key

  1. Open Start and search for Environment Variables.

  2. Select Edit the system environment variables.

  3. Click Environment Variables.

  4. Under User variables, click New.

  5. Set:

    • Name: COMPOSER_ELEVENLABS_APIKEY

    • Value: your ElevenLabs API key

  6. Click OK to save

  7. Reboot your system

To make the environment variable permanent, you need to add it to your shell configuration file. This ensures it is available every time you log in.

  1. Open a terminal.

  2. Open your shell configuration file (for most systems, this is ~/.bashrc or ~/.zshrc)

  3. Add the following line at the end of the file:

    export COMPOSER_ELEVENLABS_APIKEY=your_api_key_here
  4. Save the file.

  5. Restart your terminal, or log out and log back in.

  6. Restart Composer.

The API key will now be available automatically every time the system starts.

💡Important notes

  • Composer only reads environment variables when it starts.

  • If you change API key, you must restart your computer.

  • Keep your API key private and do not share it.

Text To Speech Options

  • Start when loaded – When enabled, the text starts playing automatically as soon as the Text to Speech input is loaded.

  • Show advanced options – Shows or hides additional advanced settings.

Configuration

  • Voice ID – Selects which voice will be used to speak the text. Different voices have different tones and speaking styles. Choose the voice that best matches your content.

    How to select a voice from ElevenLabs:

    • Visit the ElevenLabs Voice Library: https://elevenlabs.io/app/voice-library

    • Search for a voice you want to use, and add it to “My Voices”:

    • In My Voices, locate the voice you want to use and copy its Voice ID and paste it into Composer.

      💡Default vs Custom voices

      • All Default ElevenLabs voices are available automatically. They do not need to be added to My Voices and can be selected directly.

      • Custom or non-default voices must be added to one of your ElevenLabs voice slots in My Voices. If a custom voice is not added to a slot, it can’t be used via the API.

      The number of available voice slots depends on your ElevenLabs subscription.

  • Model ID – Defines which voice model is used to generate the speech. Different models may vary in quality, speed, or supported features.

    To learn more about available models and find a Model ID, see the ElevenLabs models page.

  • Language (ISO 639-1) – Optional. Explicitly sets the language of the input text using a two-letter language code. ElevenLabs detects the language automatically from the text, but setting it can improve pronunciation and consistency.
    Examples:

    • en for English

    • sv for Swedish

    💡 Language support

    Language support depends on the selected ElevenLabs model and accent by the selected voice. If a language does not work as expected, try selecting a different model and/or voice that supports it.

    More about ElevenLabs language support.

Text to Speech

  • Text – The text that will be converted into spoken audio. Enter exactly what you want the voice to say.
    You can change the text at any time to generate new audio. The language is usually detected automatically from the text, unless a language is explicitly set in the configuration.

Voice Settings

  • Stability – Controls how consistent the voice sounds between each generation.

    Lower values make the voice more expressive and varied. Higher values make the voice more stable but can sound flatter and less emotional.

  • Speaker Boost – Makes the generated voice sound more similar to the original speaker.

    When enabled, the voice usually sounds more realistic, but audio generation may take slightly longer.

  • Style – Controls how strongly the voice’s speaking style is emphasized.

    Higher values exaggerate the voice’s personality and delivery. Using this setting may increase generation time.

  • Speed – Adjusts how fast the voice speaks.

    Lower values slow down the speech, while higher values make it speak faster. A value of 1.0 is the normal speaking speed.

  • Reset – Resets all voice settings back to their default values.

Commands

  • Playback state – Shows the current status of the audio playback.

  • Play – Sends the text to ElevenLabs, generates the audio, and starts playback.

    Use this to hear the current text with the selected voice and settings.

  • Stop – Stops the current audio playback or cancels an ongoing generation request.

Status

  • Status – Shows the current status of the Text to Speech input.

  • Response – Displays the response from ElevenLabs, including the status code of the request.

  • Response Time (ms) – Shows how long it took, in milliseconds, from sending the request until the first audio data was received.

  • Speech Duration – Shows the total length of the generated audio in seconds.

Cache Settings

  • Enable Cache – When enabled, generated audio is stored so it can be reused without generating it again. This reduces response time and saves ElevenLabs credits.

  • Keep Cache – Defines how long cached audio should be kept before it expires.

    Available options are Forever, Minutes, Hours, Days, and Months.

  • Cache Expiration – Sets how long cached audio is kept, based on the selected Keep Cache option.

    For example, if Keep Cache is set to Days and this value is 7, cached audio will be kept for 7 days.

  • Total in Cache – Shows how many cached audio files are currently stored.

  • Clear Cache – Deletes all cached audio files immediately.

Icon Text

  • Icon text - short descriptive text used in the input list. Limited to 5 characters.

Audio mixer

  • Hide in the audio mixer - if this option is set, the input will not appear in the audio mixer.

💡Where cached audio is stored

When caching is enabled, generated audio files are stored locally on the same machine running Composer. This allows Composer to reuse previously generated speech without requesting it again from ElevenLabs.

The cache is stored in Composer’s application directory, under Media Audio Cache.

Render Options

These options, together with Render Tuning (see Performance and Options), help optimize performance by letting Composer automatically decide whether the input should be rendered.

When your project’s Render Tuning option is active, Composer manages each input’s rendering automatically for best performance.

These options allow you to override that behavior and manually decide if this input should be rendered or excluded from the scene.

Render Tuning vs Render Options

  • Render Tuning works at the project level (see Performance and Options), automatically managing which inputs are rendered across all Scenes for optimal performance.

  • Render Options work at the input level, letting you manually override or fine-tune how a specific input behaves within Composer’s automatic rendering process.

Additional performance tips

Beyond the settings covered above, the following resources provide additional guidance on optimizing Composer's performance on your system:

Troubleshooting

Known ElevenLabs issues may occasionally affect Text to Speech output. These can include audio glitches, sharp breaths between paragraphs, pronunciation issues, or small variations in voice quality. These behaviors come from ElevenLabs’ voice generation system and are outside of Composer’s control.

If a glitch occurs between paragraphs, regenerating the previous paragraph often resolves the issue.

For more details, known limitations, and recommended workarounds, refer to ElevenLabs’ official troubleshooting documentation.

Connectors API

Text-to-Speech can, just like all other components within the Composer ecosystem, be integrated with external systems or devices via Connectors. This allows remote control of inputs, overlays, and transitions, enabling automation and integration with control boards, broadcast automation tools, or custom workflows. For more details on configuring and using Connectors, see the Connectors documentation.

Below is an example using Connectors to control Text-To-Speech.

Example - Setup

  • Create a new Connector, e.g. TTSConnector.

  • From the Target dropdown, select your Text To Speech Input.

  • Add new API commands:

    • VoiceId - set Value to @@VoiceId

    • ModelId - set Value to @@ModelId

    • Text - set Value to @@Text

    • PlayCommand

  • Under Value, define a custom Composer @@-parameter for setting the input name to be set in Preview, eg: @@input

Once configured, you can now trigger Text To Speech remotely using the Connector API.

Example Usage

To trigger the TTSConnector with a voiceId, ModelId, and a Text message:

http://[YOUR_IP]:[PORT]/api/connector/trigger?name=TTSConnector&VoiceId=21m00Tcm4TlvDq8ikWAM&ModelID=eleven_flash_v2_5&Text=Hi From Composer!

Script Engine API

The Script Engine in Composer allows for advanced automation and customized workflows. By writing JavaScript functions, you can programmatically control Text to Speech. This is particularly useful for creating dynamic, event-driven workflows or integrating with external systems.

For more information, see the Script Engine documentation.

This example shows how to control the Text to Speech input using JSON data. By passing voice, model, and text information in a single payload, you can fully customize speech generation from scripts, connectors, or remote systems.

The example below shows a Script Engine function that receives JSON data and triggers speech generation automatically.

const ttsInput = Project.GetInputByName("Text To Speech (ElevenLabs)");

function SayThis(jsonData)
{
    // Parse the JSON data.
    const json = JSON.parse(jsonData);

    // Set the properties of the Text to Speech input.
    ttsInput.VoiceId = json.voiceId;
    ttsInput.ModelId = json.modelId;
    ttsInput.Text = json.text;

    // Generate and play the audio.
    Project.ExecuteInputCommand(ttsInput, "PlayCommand");
}

Example JSON input format

{
    "voiceId": "your-voice-id",
    "modelId": "your-model-id",
    "text": "Hi from Composer!"
}

Example API Request

http://[YOUR_IP]:[PORT]/api/scriptengine/execute?function=SayThis
        &parameter={"voiceId":"21m00Tcm4TlvDq8ikWAM","modelId":"eleven_flash_v2_5","text":"Hi from Composer!"}