JS SDK

Overview

Embodia AI Embodied Driving upgrades AI expression from "text" to "3D multi-modal." Based on text input, it generates voice, expressions, and actions in real-time to drive 3D digital humans or humanoid robots, achieving natural expressions comparable to real people. Compared to traditional AI that only outputs text or voice, Embodia AI provides richer expressiveness and a more natural interactive experience.


Key Features

1. Real-time 3D Digital Human Rendering and Driving

2. Voice Synthesis (SSML Support) and Lip-sync

3. Multi-state Behavior Control (Idle / Listen / Speak, etc.)

4. Widget Component Display (Images, subtitles, videos, etc.)

5. Customizable Event Callbacks and Logging System


Environment Requirements

JS SDK

Browser Requirements

Android SDK

System Requirement: Android 11 and above

Chip Requirements:

Chip Version

Recommended Resolution

RK3588

1080P

RK3566

720P

Other Chips

Please contact us for testing


Quick Start

1.1 Preparation

(1) Include the following dependencies in your page:

<!DOCTYPE html>
<html lang="en">
<body>
<div style="width: 400px; height: 600px">
<div id="sdk"></div>
</div>
<script src="https://media.embodia.ai/xingyun3d/general/litesdk/xmovAvatar@latest.js"></script>
</body>
</html>

Set Virtual Human Character and Obtain Credentials: Log in to Embodia AI (https://www.embodia.ai/)

(2) to create a driving application in the App Center. Select the character, voice, and performance style to obtain the App ID and App Secret.

1.2 Creating Instance

const LiteSDK = new XmovAvatar({
containerId: '#sdk',
appId: '',
appSecret: '',
gatewayServer: 'https://nebula-agent.embodia.ai/user/v1/ttsa/session',
// Custom renderer. When this method is passed in, all events of the SDK will be returned, and the implementation logic of all types of events is defined by this method
onWidgetEvent(data) {
// Handle widget events
console.log('Widget事件:', data)
},
// Proxy renderer. The SDK supports subtitle_on, subtitle_off and widget_pic events by default. Through proxy,
// the default events can be modified, and the business side can also implement various other events.
proxyWidget: {
"widget_slideshow": (data: any) => {
console.log("widget_slideshow", data);
},
"widget_video": (data: any) => {
console.log("widget_video", data);
},
},
onNetworkInfo(networkInfo) {
console.log('networkInfo:', networkInfo)
},
onMessage(message) {
console.log('SDK message:', message);
},
onStateChange(state: string) {
console.log('SDK State Change:', state);
},
onStatusChange(status) {
console.log('SDK Status Change:', status);
},
onStateRenderChange(state: string, duration: number) {
console.log('SDK State Change Render:', state, duration);
},
onVoiceStateChange(status:string) {
console.log("sdk voice status", status);
},
enableLogger: false, // Do not display SDK logs, default is false
})

Initialization Parameters

Parameter Name

Type

Parameter

Mandatory

Description

containerId

string

-

Yes

Container element ID

appId

string

-

Yes

Digital human app ID (obtained from the business system)

appSecret

string

-

Yes

Digital human secret ID (obtained from the business system)

gatewayServer

string

https://nebula-agent.embodia.ai/user/v1/ttsa/session

Yes

Digital human service interface access path

onWidgetEvent

function

-

No

Widget event callback function

proxyWidget

Object

-

No

The SDK supports subtitle_on, subtitle_off and widget_pic events by default.You can rewrite default events or support custom events in two ways:1. Use onWidgetEvent to proxy all events (the SDK outputs all events).2. Use proxyWidget to proxy events (implement custom events or proxy default events as needed).Event priority: onWidgetEvent > proxyWidget > default events

onVoiceStateChange

Function

Status: audio playback state- "start": start playback- "end": playback ended

No

Monitor SDK audio playback state

onNetworkInfo

function

networkInfo: SDKNetworkInfo (see: Parameter Description)

No

Current network status

onMessage

function

message: SDKMessage (see: Parameter Description)

Yes

SDK message

onStateChange

function

state: stringSee digital human state description below

No

Monitor digital human state changes

onStatusChange

function

status: SDKStatus (see: Parameter Description)

-

-

onStateRenderChange

function

-

No

Monitor the period from sending action to state animation rendering when the digital human state changes

enableLogger

boolean

-

No

Whether to print SDK logs

Parameter Definitions

enum EErrorCode {
CONTAINER_NOT_FOUND = 10001, // Container does not exist
CONNECT_SOCKET_ERROR = 10002, // Socket connection error
START_SESSION_ERROR = 10003, // Session start error
STOP_SESSION_ERROR = 10004, // Session stop error
VIDEO_FRAME_EXTRACT_ERROR = 20001, // Video frame extraction error
INIT_WORKER_ERROR = 20002, // Worker initialization error
PROCESS_VIDEO_STREAM_ERROR = 20003, // Stream processing error
FACE_PROCESSING_ERROR = 20004, // Facial processing error
BACKGROUND_IMAGE_LOAD_ERROR = 30001, // Background load error
FACE_BIN_LOAD_ERROR = 30002, // Face data load error
INVALID_BODY_NAME = 30003, // Invalid body data
VIDEO_DOWNLOAD_ERROR = 30004, // Video download error
AUDIO_DECODE_ERROR = 40001, // Audio decode error
FACE_DECODE_ERROR = 40002, // Face data decode error
VIDEO_DECODE_ERROR = 40003, // Video decode error
EVENT_DECODE_ERROR = 40004, // Event decode error
INVALID_DATA_STRUCTURE = 40005, // Invalid data structure from TTSA
TTSA_ERROR = 40006, // TTSA downlink error
NETWORK_DOWN = 50001, // Offline mode
NETWORK_UP = 50002, // Online mode
NETWORK_RETRY = 50003, // Retrying network
NETWORK_BREAK = 50004, // Network disconnected
}

interface SDKMessage {
code: EErrorCode;
message: string;
timestamp: number;
originalError?: string;
}

enum SDKStatus {
online = 0,
offline = 1,
network_on = 2,
network_off = 3,
close = 4,
}

1.3 Initialization

Call the init method to initialize the SDK and load resources:

async init({
onDownloadProgress?: (progress: number) => void,
}): Promise<void>

Parameter Description

  • onDownloadProgress: Progress callback (range: 0 to 100). If loading fails, stopSession is called automatically to prevent reconnection issues.

1.4 Driving Digital Human to Speak

speak:Control the virtual human to speak:

speak(ssml: string, is_start: boolean, is_end: boolean): void

Parameter Description:

SSML: You can directly pass in the content you want the digital human to say, or you can pass in markup language in SSML format to specify the digital human to perform KA actions. For details, see Advanced Integration.

Non-streaming Example:

speak("Welcome to Embodia AI", true, true);

1.5 Destroy Example

Destroy the SDK instance and disconnect:

destroy(): void


Advanced Integration

1 Digital Human State Switching

1.1 Overview

State Name

English ID

Description

API Method

Offline Mode

offlineMode

No credits consumed in this state

offlineMode(): void

Online Mode

onlineMode

Return to online mode from offline

onlineMode(): void

Idle

idle

No user interaction for a long time

idle(): void

Interactive Idle

interactive_idle

Loop state to interrupt current behavior

interactiveidle(): void

Listen

listen

Digital human is in listening state

listen(): void

Think

think

State before response starts

think(): void

Speak

speak

Control virtual human to speak

speak(...)

1.2 Speak Details

1.2.1 Streaming Output

Call speak method for streaming by managing the is_start and is_end flags:

  • First chunk: is_start = true, is_end = false
  • Middle chunks: is_start = false, is_end = false
  • Last chunk: is_start = false, is_end = true
1.2.2 SSML Samples

Semantic KA Instruction

<speak>
Warmly
<ue4event>
<type>ka_intent</type>
<data><ka_intent>Welcome</ka_intent></data>
</ue4event>
Welcome, distinguished guests, for taking the time from your busy schedules to visit and guide us! Your arrival is like a spring breeze, bringing us valuable experience and wisdom. This affirmation and support encourage us greatly. We look forward to gaining more growth and inspiration under your guidance and writing a more brilliant chapter together!
</speak>

Skill KA Instruction

<speak>
<ue4event>
<type>ka</type>
<data><action_semantic>dance</action_semantic></data>
</ue4event>
</speak>

Speak KA Instruction

<speak>
<ue4event>
<type>ka</type>
<data><action_semantic>Hello</action_semantic></data>
</ue4event>
Welcome to the Embodia AI 3D digital human platform. There is a lot of exciting content waiting for you to discover~
</speak>

For detailed KA query interface instructions, please refer to the "Embodied Driving KA Query Interface Documentation."

Notes:

  1. To ensure optimal digital human presentation, it is recommended to accumulate a short segment of content before the first call in a streaming scenario. This ensures that the digital human's speaking speed (consumption rate of text) remains lower than the subsequent streaming output speed of the LLM.
  2. speak does not allow multiple consecutive calls (i.e., calling speak immediately after is_end=true in a previous call). It is recommended to use interactive_idle or listen to perform a state transition in between.
  3. During the speak process, onVoiceStateChange will trigger events: voice_start indicates the start of speech, and voice_end indicates the end. These can be used to manage the digital human's speaking status.

2. Other Methods

setVolume: Controls volume. Range: 0 to 1 (0: Mute, 1: Max).

setVolume(volume: number): void

showDebugInfo: Display debugging information.

hideDebugInfo: Hide debugging information.

showDebugInfo(): void
hideDebugInfo(): void

Precautions

1. Ensure the container element has explicit width and height; otherwise, the rendering effect may be affected.

2. All mandatory parameters must be provided during initialization.

3. Calling the destroy() method will clean up all resources and disconnect the session.

4. It is recommended to use showDebugInfo() in the development environment to assist with debugging.

Error Code Definition and Troubleshooting Suggestions

Type

Error Code

Description

Initialization Error

10001

Container does not exist

10002

Socket connection error

10003

Session error: start_session enters catch (abnormal interface data for /api/session; response.error_code is used in all cases)

10004

Session error: stop_session enters catch

Frontend Processing Logic Error

20001

Video frame extraction error

20002

Error initializing video frame extraction WORKER

20003

Error processing extracted video stream

20004

Expression processing error

Resource Management Error

30001

Background image loading error

30002

Expression data loading error

30003

Body data has no Name

30004

Video download error

SDK TTSA Data Acquisition & Decompression Error

40001

Audio decoding error

40002

Expression decoding error

40003

Body video decoding error

40004

Event decoding error

40005

TTSA returned data type error (not audio, body, face, event, etc.)

40006

Abnormal information sent by TTSA downstream

Network Issues

50001

Offline mode

50002

Online mode

50003

Network retry

50004

Network disconnected

Best Practices

1. Call the destroy() method to clean up resources before the page unmounts

2. Use error callbacks properly to handle abnormal situations

3. Use SSML format to control the virtual human's speaking effect

4. Implement the onWidgetEvent callback when you need to customize Widget behavior

5. When using the KA plugin, it is recommended to test the action intent recognition effect first, and adjust the configuration parameters as needed

Frequently Asked Questions

Q: How to avoid excessive points consumption?
A: It is recommended to use basic voice timbre during debugging; you can switch to offline mode when there is no interaction for a long time.

Q: How to switch digital humans?
A: You can create multiple embodied driving applications on the Mofa Nebula Platform, and switch digital humans by destroying the instance and reconnecting (accessing the new application).

Q: Will starting the project via an IP address cause an error?
A: Some methods used in the SDK only support localhost or HTTPS calls.

Q: Can digital humans be customized?
A: Yes. If you have such needs, you can scan the QR code to contact us.