JS SDK

Overview

Embodia AI Embodied Driving upgrades AI expression from "text" to "3D multi-modal." Based on text input, it generates voice, expressions, and actions in real-time to drive 3D digital humans or humanoid robots, achieving natural expressions comparable to real people. Compared to traditional AI that only outputs text or voice, Embodia AI provides richer expressiveness and a more natural interactive experience.

Key Features

1. Real-time 3D Digital Human Rendering and Driving

2. Voice Synthesis (SSML Support) and Lip-sync

3. Multi-state Behavior Control (Idle / Listen / Speak, etc.)

4. Widget Component Display (Images, subtitles, videos, etc.)

5. Customizable Event Callbacks and Logging System

Environment Requirements

JS SDK

Browser Requirements

Android SDK

System Requirement: Android 11 and above

Chip Requirements:

Chip Version	Recommended Resolution
RK3588	1080P
RK3566	720P
Other Chips	Please contact us for testing

Quick Start

1.1 Preparation

(1) Include the following dependencies in your page:

<!DOCTYPE html>
<html lang="en">
<body>
  <div style="width: 400px; height: 600px">
    <div id="sdk"></div>
  </div>
  <script src="https://media.embodia.ai/xingyun3d/general/litesdk/xmovAvatar@latest.js"></script>
</body>
</html>

Set Virtual Human Character and Obtain Credentials: Log in to Embodia AI (https://www.embodia.ai/)

(2) to create a driving application in the App Center. Select the character, voice, and performance style to obtain the App ID and App Secret.

1.2 Creating Instance

const LiteSDK = new XmovAvatar({
  containerId: '#sdk',
  appId: '',
  appSecret: '',
  gatewayServer: 'https://nebula-agent.embodia.ai/user/v1/ttsa/session',
  // Custom renderer. When this method is passed in, all events of the SDK will be returned, and the implementation logic of all types of events is defined by this method
  onWidgetEvent(data) {
    // Handle widget events
    console.log('Widget事件:', data)
  },
  // Proxy renderer. The SDK supports subtitle_on, subtitle_off and widget_pic events by default. Through proxy,
  // the default events can be modified, and the business side can also implement various other events.
  proxyWidget: {
    "widget_slideshow": (data: any) => {
      console.log("widget_slideshow", data);
    },
    "widget_video": (data: any) => {
      console.log("widget_video", data);
    },
  },
  onNetworkInfo(networkInfo) {
    console.log('networkInfo:', networkInfo)
  },
  onMessage(message) {
    console.log('SDK message:', message);
  },
  onStateChange(state: string) {
    console.log('SDK State Change:', state);
  },
  onStatusChange(status) {
    console.log('SDK Status Change:', status);
  },
  onStateRenderChange(state: string, duration: number) {
    console.log('SDK State Change Render:', state, duration);
  },
  onVoiceStateChange(status:string) {
      console.log("sdk voice status", status);
  },
  enableLogger: false, // Do not display SDK logs, default is false
})

Initialization Parameters

Parameter Name	Type	Parameter	Mandatory	Description
containerId	string	-	Yes	Container element ID
appId	string	-	Yes	Digital human app ID (obtained from the business system)
appSecret	string	-	Yes	Digital human secret ID (obtained from the business system)
gatewayServer	string	https://nebula-agent.embodia.ai/user/v1/ttsa/session	Yes	Digital human service interface access path
onWidgetEvent	function	-	No	Widget event callback function
proxyWidget	Object	-	No	The SDK supports subtitle_on, subtitle_off and widget_pic events by default.You can rewrite default events or support custom events in two ways:1. Use onWidgetEvent to proxy all events (the SDK outputs all events).2. Use proxyWidget to proxy events (implement custom events or proxy default events as needed).Event priority: onWidgetEvent > proxyWidget > default events
onVoiceStateChange	Function	Status: audio playback state- "start": start playback- "end": playback ended	No	Monitor SDK audio playback state
onNetworkInfo	function	networkInfo: SDKNetworkInfo (see: Parameter Description)	No	Current network status
onMessage	function	message: SDKMessage (see: Parameter Description)	Yes	SDK message
onStateChange	function	state: stringSee digital human state description below	No	Monitor digital human state changes
onStatusChange	function	status: SDKStatus (see: Parameter Description)	-	-
onStateRenderChange	function	-	No	Monitor the period from sending action to state animation rendering when the digital human state changes
enableLogger	boolean	-	No	Whether to print SDK logs

Parameter Definitions

enum EErrorCode {
  CONTAINER_NOT_FOUND = 10001,   // Container does not exist
  CONNECT_SOCKET_ERROR = 10002,  // Socket connection error
  START_SESSION_ERROR = 10003,   // Session start error
  STOP_SESSION_ERROR = 10004,    // Session stop error
  VIDEO_FRAME_EXTRACT_ERROR = 20001, // Video frame extraction error
  INIT_WORKER_ERROR = 20002,         // Worker initialization error
  PROCESS_VIDEO_STREAM_ERROR = 20003, // Stream processing error
  FACE_PROCESSING_ERROR = 20004,      // Facial processing error
  BACKGROUND_IMAGE_LOAD_ERROR = 30001, // Background load error
  FACE_BIN_LOAD_ERROR = 30002,         // Face data load error
  INVALID_BODY_NAME = 30003,           // Invalid body data
  VIDEO_DOWNLOAD_ERROR = 30004,        // Video download error
  AUDIO_DECODE_ERROR = 40001,          // Audio decode error
  FACE_DECODE_ERROR = 40002,           // Face data decode error
  VIDEO_DECODE_ERROR = 40003,          // Video decode error
  EVENT_DECODE_ERROR = 40004,          // Event decode error
  INVALID_DATA_STRUCTURE = 40005,      // Invalid data structure from TTSA
  TTSA_ERROR = 40006,                  // TTSA downlink error
  NETWORK_DOWN = 50001,                // Offline mode
  NETWORK_UP = 50002,                  // Online mode
  NETWORK_RETRY = 50003,               // Retrying network
  NETWORK_BREAK = 50004,               // Network disconnected
}

interface SDKMessage {
  code: EErrorCode;
  message: string;
  timestamp: number;
  originalError?: string;
}

enum SDKStatus {
  online = 0,
  offline = 1,
  network_on = 2,
  network_off = 3,
  close = 4,
}

1.3 Initialization

Call the init method to initialize the SDK and load resources:

async init({
  onDownloadProgress?: (progress: number) => void,
}): Promise<void>

Parameter Description

onDownloadProgress: Progress callback (range: 0 to 100). If loading fails, stopSession is called automatically to prevent reconnection issues.

1.4 Driving Digital Human to Speak

speak:Control the virtual human to speak:

speak(ssml: string, is_start: boolean, is_end: boolean): void

Parameter Description:

SSML: You can directly pass in the content you want the digital human to say, or you can pass in markup language in SSML format to specify the digital human to perform KA actions. For details, see Advanced Integration.

Non-streaming Example:

speak("Welcome to Embodia AI", true, true);

1.5 Destroy Example

Destroy the SDK instance and disconnect:

destroy(): void

Advanced Integration

1 Digital Human State Switching

1.1 Overview

State Name	English ID	Description	API Method
Offline Mode	offlineMode	No credits consumed in this state	offlineMode(): void
Online Mode	onlineMode	Return to online mode from offline	onlineMode(): void
Idle	idle	No user interaction for a long time	idle(): void
Interactive Idle	interactive_idle	Loop state to interrupt current behavior	interactiveidle(): void
Listen	listen	Digital human is in listening state	listen(): void
Think	think	State before response starts	think(): void
Speak	speak	Control virtual human to speak	speak(...)

1.2 Speak Details

1.2.1 Streaming Output

Call speak method for streaming by managing the is_start and is_end flags:

First chunk: is_start = true, is_end = false
Middle chunks: is_start = false, is_end = false
Last chunk: is_start = false, is_end = true

1.2.2 SSML Samples

Semantic KA Instruction

<speak>
  Warmly
  <ue4event>
    <type>ka_intent</type>
    <data><ka_intent>Welcome</ka_intent></data>
  </ue4event>
  Welcome, distinguished guests, for taking the time from your busy schedules to visit and guide us! Your arrival is like a spring breeze, bringing us valuable experience and wisdom. This affirmation and support encourage us greatly. We look forward to gaining more growth and inspiration under your guidance and writing a more brilliant chapter together!
</speak>

Skill KA Instruction

<speak>
<ue4event>
    <type>ka</type>
    <data><action_semantic>dance</action_semantic></data>
  </ue4event>
</speak>

Speak KA Instruction

<speak>
  <ue4event>
    <type>ka</type>
    <data><action_semantic>Hello</action_semantic></data>
  </ue4event>
  Welcome to the Embodia AI 3D digital human platform. There is a lot of exciting content waiting for you to discover~
</speak>

For detailed KA query interface instructions, please refer to the "Embodied Driving KA Query Interface Documentation."

Notes:

To ensure optimal digital human presentation, it is recommended to accumulate a short segment of content before the first call in a streaming scenario. This ensures that the digital human's speaking speed (consumption rate of text) remains lower than the subsequent streaming output speed of the LLM.
speak does not allow multiple consecutive calls (i.e., calling speak immediately after is_end=true in a previous call). It is recommended to use interactive_idle or listen to perform a state transition in between.
During the speak process, onVoiceStateChange will trigger events: voice_start indicates the start of speech, and voice_end indicates the end. These can be used to manage the digital human's speaking status.

2. Other Methods

setVolume: Controls volume. Range: 0 to 1 (0: Mute, 1: Max).

setVolume(volume: number): void

showDebugInfo: Display debugging information.

hideDebugInfo: Hide debugging information.

showDebugInfo(): void
hideDebugInfo(): void

Precautions

1. Ensure the container element has explicit width and height; otherwise, the rendering effect may be affected.

2. All mandatory parameters must be provided during initialization.

3. Calling the destroy() method will clean up all resources and disconnect the session.

4. It is recommended to use showDebugInfo() in the development environment to assist with debugging.

Error Code Definition and Troubleshooting Suggestions

Type	Error Code	Description
Initialization Error	10001	Container does not exist
	10002	Socket connection error
	10003	Session error: `start_session` enters catch (abnormal interface data for `/api/session`; `response.error_code` is used in all cases)
	10004	Session error: `stop_session` enters catch
Frontend Processing Logic Error	20001	Video frame extraction error
	20002	Error initializing video frame extraction WORKER
	20003	Error processing extracted video stream
	20004	Expression processing error
Resource Management Error	30001	Background image loading error
	30002	Expression data loading error
	30003	Body data has no Name
	30004	Video download error
SDK TTSA Data Acquisition & Decompression Error	40001	Audio decoding error
	40002	Expression decoding error
	40003	Body video decoding error
	40004	Event decoding error
	40005	TTSA returned data type error (not audio, body, face, event, etc.)
	40006	Abnormal information sent by TTSA downstream
Network Issues	50001	Offline mode
	50002	Online mode
	50003	Network retry
	50004	Network disconnected

Best Practices

1. Call the destroy() method to clean up resources before the page unmounts

2. Use error callbacks properly to handle abnormal situations

3. Use SSML format to control the virtual human's speaking effect

4. Implement the onWidgetEvent callback when you need to customize Widget behavior

5. When using the KA plugin, it is recommended to test the action intent recognition effect first, and adjust the configuration parameters as needed

Frequently Asked Questions

Q: How to avoid excessive points consumption?
A: It is recommended to use basic voice timbre during debugging; you can switch to offline mode when there is no interaction for a long time.

Q: How to switch digital humans?
A: You can create multiple embodied driving applications on the Mofa Nebula Platform, and switch digital humans by destroying the instance and reconnecting (accessing the new application).

Q: Will starting the project via an IP address cause an error?
A: Some methods used in the SDK only support localhost or HTTPS calls.

Q: Can digital humans be customized?
A: Yes. If you have such needs, you can scan the QR code to contact us.

Overview

Key Features

Environment Requirements

JS SDK

Quick Start

1.1 Preparation

1.2 Creating Instance

Initialization Parameters

Parameter Definitions

1.3 Initialization

1.4 Driving Digital Human to Speak

1.5 Destroy Example

Advanced Integration

1 Digital Human State Switching

1.1 Overview

1.2 Speak Details

1.2.1 Streaming Output

1.2.2 SSML Samples

2. Other Methods

Precautions

Error Code Definition and Troubleshooting Suggestions

Best Practices

Frequently Asked Questions

Embodia AI — Beyond Digital Humans， Empowering AI to think, express, and truly engage.

Embodia AI — Beyond Digital Humans，
Empowering AI to think, express, and truly engage.