API

Overview

The Xmov Embodia Virtual Human Video Generation function can convert text and PPT into high-quality virtual human videos, helping developers quickly build professional-level video generation capabilities.

Ⅰ API Documentation

1. Authentication

1.1 X-TOKEN Calculation

Interface received parameters:

a. Data body: data={"xxx":"xxx"} or {}

b. Secret key assigned to external parties: secret="iamsecret"

c. Interface method path (excluding host): api_path="/xxx/xxxxxx"

Calculation steps:

1. Convert api_path to all lowercase: lower_api_path

2. Convert the request method to lowercase: lower_method (e.g., "delete"/"post"/...)

3. Convert data to a JSON string: sort_json_str

Take Python as an example: json.dumps(dict(data), sort_keys=True).replace("'", "\"")

4. Concatenate strings in the following order: lower_api_path + lower_method + sort_json_str + secret + X-TIMESTAMP

a. X-TIMESTAMP: Interface second-level timestamp, valid within 60 seconds from the current time

b. Resulting sign: "/xxx/xxxxxx" + "post" + "{\"xxx\":\"xxx\"}" + "iamsecret" + "1489133053"

5. Encode the sign in UTF-8 and calculate MD5 to get X-TOKEN: ddc6457fd0b373475ac65912b797ef05

1.2 Interface Call

The following header information should be added when requesting the interface:

X-APP-ID: Application AK
X-TIMESTAMP: Second-level timestamp
X-TOKEN: Signature calculation result

1.3 Demo Code

import time
import json
import hashlib
import requests
from urllib.parse import urljoin

def encode_with_md5(s):
    m = hashlib.md5()
    m.update(s.encode('utf-8'))
    return m.hexdigest()

def headers_need_sign(ak, secret, method, url, data):
    headers = {}
    t = int(time.time())
    data_str = json.dumps(dict(data), sort_keys=True).replace("'", "\"")
    ori_sign = "{}{}{}{}{}".format(url.lower(), method.lower(), data_str, secret, t)
    sign = encode_with_md5(ori_sign)
    headers["X-APP-ID"] = ak
    headers["X-TOKEN"] = sign
    headers["X-TIMESTAMP"] = str(t)
    return headers

if __name__ == '__main__':
    ak = "37514ac-3fce-4f4c-bc3f-86eba37da7dd"
    secret = 'bb81b786-ef1f-443e-9e86-9df8399f796b'
    method = 'POST'
    host = 'https://nebula-agent.xingyun3d.com'
    url = '/xxx/xxx?x=xx&z=22'
    req_data = {
        "data1": "data1",
        "data2": "data2"
    }
    # Calculate and get request headers
    req_headers = headers_need_sign(ak, secret, method, url, req_data)
    # Request the interface
    req_url = urljoin(host, url)
    resp = requests.request(method, req_url, json=req_data, headers=req_headers)

2. API Call

2.1 Initiate Rendering

2.1.1 Method 1: Initiate rendering via segment

Host: https://nebula-agent.xingyun3d.com

Request path: Post: /user/v1/video_synthesis_task/create_render_task

Request Parameters

Type	Parameter Name	Chinese Name	Mandatory	Remarks
string	video_name	Video Name	No	Output video name = download file name Length limit (Chinese: 24 characters; English: 50 characters) Optional item, default name generated by timestamp
string	look_name	Avatar ID	Yes	ID of the avatar used in the video
string	tts_vcn_name	Voice ID	Yes	ID of the voice color used in the video
string	studio_name	Studio ID	Yes	ID of the studio used in the video
string	sub_title	Enable Subtitles	No	Enumerated values: on/off Optional item, default is on
JSON Array	segment	SSML Script	No
string	output_resolution	Video Resolution	No	Enumerated values: 540P; 720P; 1080P; 2K; 4K Optional item, default is 720P
bool	if_aigc_mark	AI Generation Mark	No	Enumerated values: true (display "Xmov Embodia · AI Generated" in the lower right corner of the video) false (remove AI mark from the video) Optional item, default is true
string	video_format	Video Format	No	Supported formats: mp4, mov, webm, mkv Default is mp4 Format support by resolution: 540P: mp4 720P: mp4 1080P and above: mp4, mov, webm, mkv

Return Parameters

First-level Parameter Name	Second-level Parameter Name	Type	Chinese Name	Remarks
-	error_code	int	Error Code	0: Success; Others: Error
-	error_reason	string	Error Reason	-
-	data	dict	Data	-

Use the segment Tag for Phonetic Notation and Pauses

Phonetic Notation Examples

Single-character Phonetic Notation Illustration

<phoneme contenteditable="false" data-text="认" py="ʐʅn4">认</phoneme>

Multi-character Phonetic Notation Illustration

<phoneme contenteditable="false" data-text="你们" py="nɪn3 haʊ3">你们</phoneme>

Pause Example

<break time="1000ms"></break>

Complete Example

[
  {
    "text": "Dear audience friends, hello everyone.<phoneme contenteditable=\"false\" data-text=\"认\" py=\"ʐʅn4\">认</phoneme>真收看以下内容哦。欢迎收看大众电视台社会民生栏目，我是主播小朱，现在为您带来最新的民生资讯。",
    "media": "https://media.yoyan.yyz/yoyan/user_upload.prod/9957_5c339a8b1914973a542c.png"
  },
  {
    "text": "The above is today's livelihood news report.<break time=\"1000ms\"></break> We will continue to follow social livelihood dynamics and bring you the latest information. Thank you for watching. See you next time!",
    "media": "https://media.yoyan.yyz/yoyan/user_upload.prod/9957_5c339a8b1914973a542c.png"
  }
]

2.1.2 Method 2: Initiate Rendering via PPT

First call the PPT parsing interface, then call the create rendering task interface.

(1) PPT Parsing Interface

Host: https://nebula-agent.xingyun3d.com

Request Path: Post: /user/v1/video_synthesis_task/parse_ppt_file

Request Parameters

Parameter Name	Type	Name	Mandatory	Remarks
ppt_file	binary	PPT File	Mandatory	This parameter is not included in X-TOKEN calculation

Return Parameters

First-level Parameter Name	Second-level Parameter Name	Type	Name	Remarks
error_code	-	int	Error Code	0: Success; Others: Error
error_reason	-	string	Error Reason	-
data	-	dict	-	-
	parse_ppt_file_name	string	PPT File Parsing Name	-

(2) Create Rendering Task Interface

Host: https://nebula-agent.xingyun3d.com

Request Path: Post: /user/v1/video_synthesis_task/create_render_task

Request Parameters

Parameter Name	Type	Name	Mandatory	Remarks
video_name	string	Video Name	Optional	Meaning: Output video name = download file name; Default: timestamp-generated name
look_name	string	Avatar ID	Mandatory	Meaning: Avatar ID used in the video
tts_vcn_name	string	Voice ID	Mandatory	Meaning: Voice ID used in the video
studio_name	string	Studio ID	Mandatory	Meaning: Studio ID used in the video
sub_title	string	Enable Subtitles	Optional	Meaning: Whether to enable subtitles; Enumerated values: on/off; Default: on
parse_ppt_file_name	string	PPT File Parsing Name	Mandatory	Obtained via the PPT parsing interface
output_resolution	string	Video Resolution	Optional	Meaning: Video resolution; Enumerated values: 540P; 720P; 1080P; 2K; 4K; Default: 720P
if_aigc_mark	bool	AI Generation Mark	Optional	Meaning: Whether to add AI generation mark; Enumerated values: true (displays "Xmov Embodia · AI Generated" in the video's lower right corner) / false (removes the AI mark); Default: true

Return Parameters

First-level Parameter Name	Second-level Parameter Name	Type	Name	Remarks
error_code	-	int	Error Code	0: Success; Others: Error
error_reason	-	string	Error Reason	-
data	-	dict	-	-
	task_id	int	Video Task ID	-

2.2 Query Rendering Result

Host: https://nebula-agent.xingyun3d.com

Request Path: GET: /user/v1/video_synthesis_task/get_render_task

Request Parameters

Parameter Name	Type	Name	Mandatory	Remarks
task_id	int	Video Task ID	Mandatory	-

Return Parameters

First-level Parameter Name	Second-level Parameter Name	Type	Name	Remarks
error_code	-	int	Error Code	0: Success; Others: Error
error_reason	-	string	Error Reason	-
data	-	dict	-	-
	task_id	int	Video Task ID	-
	synth_state	string	Task Status	Enumerated values: not_send: Pending waiting: Processing processing: In Progress finished: Completed error: Synthesis Failed cancel: Synthesis Cancelled
	render_image_oss	string	Rendered Image OSS	Meaning: OSS link of the rendered image Valid only when the cloud task status is successful
	render_video_oss	string	Rendered Video OSS	Meaning: OSS link of the rendered video Valid only when the cloud task status is successful
	amount	float	Points Consumed	-
	synth_start_time	datetime	Synthesis Start Time	Video synthesis start time
	synth_finish_time	datetime	Synthesis Completion Time	Meaning: Video synthesis completion time Valid only when the cloud task status is successful
	error_reason	string	Error Log	Meaning: Log info recorded after cloud task creation, e.g.: a. PPT file parsing stuck b. Synthesis task creation failed c. Limit check failed d. Voice task creation failed

2.3 Cancel Rendering Task

Host: https://nebula-agent.xingyun3d.com

Request Path: Post: /user/v1/video_synthesis_task/cancel_render_task

Request Parameters

Parameter Name	Type	Name	Mandatory	Remarks
task_id	int	Video Task ID	Mandatory	-

Return Parameters

First-level Parameter Name	Second-level Parameter Name	Type	Name	Remarks
error_code	-	int	Error Code	0: Success; Others: Error
error_reason	-	string	Error Reason	-

2.4 Preview

Host: https://nebula-agent.xingyun3d.com

Request Path: GET: /user/v1/video_synthesis_task/get_render_task_preview_url

Request Parameters

Parameter Name	Type	Name	Mandatory	Remarks
task_id	int	Video Task ID	Mandatory	-

Return Parameters

First-level Parameter Name	Second-level Parameter Name	Type	Name	Remarks
error_code	-	int	Error Code	0: Success; Others: Error
error_reason	-	string	Error Reason	-
data	-	dict	-	-
	preview_url	string	Preview URL	-

3 Error Codes

Error Code	Description
20001	Application does not exist or is unavailable
30002	PPT file does not exist
30003	PPT file parsing error
30004	Video task not found
30005	Video task creation error
30006	Video task cancellation error

Ⅱ DEMO Example

1. Initiate Rendering

1.1 Method Ⅰ: Initiate Rendering via segment


Post: /user/v1/video_synthesis_task/create_render_task
Body:
{
  "look_name": "WWJS_4p_9021_new",
  "tts_vcn_name": "XMOV_HN_TTS__236",
  "studio_name": "youling_2d_v",
  "segment": [
    {
      "text": "Promote high-quality development of regional industries, advance proper application of digital intelligence technology, and enhance friendly exchanges and cooperation among enterprises.",
      "media_url": "https://media.xmov.ai/youyan/user_upload_qa/74275_4b733edeca68484e8940da.png"
    },
    {
      "text": "The 14th Five-Year Plan focuses on high-quality development, integrates multiple industrial resources, gathers development momentum, and will inject new impetus into regional industrial development in 2024.",
      "media_url": "https://media.xmov.ai/youyan/user_upload_qa/74275_1ca7a4a5407c43acb0da8b.mp4"
    },
    {
      "text": "Meanwhile, we will conduct multi-dimensional, multi-level, multi-field discussions, inviting entrepreneurs, well-known enterprises, and industry leaders to share the latest development trends and opportunities of the industry.",
      "media_url": "https://media.yoyan.yyz/yoyan/user_upload/prod/74275_1ca7a5647d3acb.mp4"
    }
  ]
  //  Either segment or PPT file is required
  //  "ppt_file": "example_ppt.pptx" * binary
}

Response:
{
  "error_code": 0,
  "error_reason": "",
  "data": {
    "task_id": 135
  }
}

1.2 Method Ⅱ: Initiate Rendering via PPT

# coding=utf-8
import hashlib
import json
import time
import requests

def generate_x_token(data, secret, api_path, method, timestamp=None):
    # Convert api_path to lowercase
    lower_api_path = api_path.lower()
    # Convert request method to lowercase
    lower_method = method.lower()
    # Convert data to a sorted JSON string
    sort_json_str = json.dumps(dict(data), sort_keys=True).replace("'", "\"")
    # Use the passed timestamp or get current timestamp
    x_timestamp = str(int(timestamp) if timestamp else int(time.time()))
    # Concatenate strings for signature calculation
    sign_str = f"{lower_api_path}{lower_method}{sort_json_str}{secret}{x_timestamp}"
    # Generate MD5 signature and return
    return hashlib.md5(sign_str.encode('utf-8')).hexdigest()

# Replace with your application's App ID
app_id = 'xxxxx'
# Replace with your application's App Secret
secret = 'xxxxx'
host = 'https://nebula-agent.xingyun3d.com'

# PPT Parsing Interface
files = [
    ('ppt_file', ('PPT模板.pptx', open('PPT模板.pptx', 'rb'), 'application/vnd.openxmlformats-officedocument.presentationml.presentation'))
]
data = {}
api_path = '/user/v1/video_synthesis_task/parse_ppt_file'
method = 'POST'
timestamp = int(time.time())

headers = {}
headers["X-APP-ID"] = app_id
# Generate X-TOKEN
headers["X-TOKEN"] = generate_x_token(data, secret, api_path, method, timestamp)
headers["X-TIMESTAMP"] = str(timestamp)

# Send PPT parsing request
resp = requests.request(method, f"{host}{api_path}", data=data, headers=headers, files=files, timeout=30)
res = resp.json()
print(res)
# Get parsed PPT file name from response
parse_ppt_file_name = res.get('data').get('parse_ppt_file_name')

# Create Render Task Interface
data = {
    "look_name": "caishengwei_3663_new",
    "tts_vcn_name": "XMW_FM_TTS_13",
    "if_aigc_mark": True,
    "studio_name": "telestudio_simple_red_01",
    "sub_title": "on",
    "video_name": "Test Video Generated by API",
    "parse_ppt_file_name": parse_ppt_file_name
}
api_path = '/user/v1/video_synthesis_task/create_render_task'
method = 'POST'
timestamp = int(time.time())

headers["Content-Type"] = "application/json"
headers["X-APP-ID"] = app_id
headers["X-TOKEN"] = generate_x_token(data, secret, api_path, method, timestamp)
headers["X-TIMESTAMP"] = str(timestamp)

# Send create render task request
resp = requests.request(method, f"{host}{api_path}", json=data, headers=headers, timeout=30)
res = resp.json()
print(res)
# Get task ID from response
task_id = res.get('data').get('task_id')
data = {
    "task_id": task_id
}

2. Get Video Result

GET: /user/v1/video_synthesis_task/get_render_task
Params: task_id: 155  # Task ID
Response:
{
  "error_code": 0,
  "error_reason": "",
  "data": {
    "task_id": 155,
    "error_reason": "Synthesis task failed, please re-initiate or contact customer service for handling",
    "create_time": "2025-07-17T11:30:08.637104800",
    "update_time": "2025-07-17T11:30:54.131848080",
    "enable": true,
    "name": "b5b7f20c0c64e72f9fda59a4a392667",
    "video_name": "20250717_11_30_07.833",
    "output_resolution": "540P",
    "look_name": "AM058_19518_new",
    "tts_vcn_name": "XMW_HM_TTS_6",
    "studio_name": "bust_chic_an_museum_01",
    "sub_title": "on",
    "synth_start_time": null,
    "synth_finish_time": null,
    "synth_state": "error",
    "segment": [
      {
        "text": "This is a piece of test data.",
        "media_id": 12338,
        "media_url": "https://media.xmov.ai/yoyuan/user_upload/qa/171998_b37eb4358c64bcf5fd4be.png"
      },
      {
        "text": "Test segment 1",
        "media_id": 12339,
        "media_url": "https://media.xmov.ai/yoyuan/user_upload/qa/171998_b3195995e6d4cfa380.png"
      },
      {
        "text": "Test segment 2",
        "media_id": 12340,
        "media_url": "https://media.xmov.ai/yoyuan/user_upload/qa/171998_3d1838496914b831ce6.png"
      },
      {
        "text": "Test segment 3",
        "media_id": 12341,
        "media_url": "https://media.xmov.ai/yoyuan/user_upload/qa/171998_0cefd3514347e4d7ad18b1.png"
      },
      {
        "text": "Test segment 4",
        "media_id": 12342,
        "media_url": "https://media.xmov.ai/yoyuan/user_upload/qa/171998_97acf1318a0489f5b6ae.png"
      }
    ]
  }
}

# Task ID
task_id: 155
Response:
{
  "error_code": 0,
  "error_reason": ""
}

3. Preview

Request Path: GET: /user/v1/video_synthesis_task/get_render_task_preview_url

Request Parameters

Parameter Name	Type	Name	Mandatory	Remarks
task_id	int	Video Task ID	Mandatory	-

Return Parameters

First-level Parameter Name	Second-level Parameter Name	Type	Name	Remarks
error_code	-	int	Error Code	0: Success; Others: Error
error_reason	-	string	Error Reason	-
data	-	dict	-	-
	preview_url	string	Preview URL	-

Overview

Ⅰ API Documentation

1. Authentication

1.1 X-TOKEN Calculation

1.2 Interface Call

1.3 Demo Code

2. API Call

2.1 Initiate Rendering

Return Parameters

2.2 Query Rendering Result

2.3 Cancel Rendering Task

2.4 Preview

3 Error Codes

Ⅱ DEMO Example

1. Initiate Rendering

1.1 Method Ⅰ: Initiate Rendering via segment

1.2 Method Ⅱ: Initiate Rendering via PPT

2. Get Video Result

3. Preview

Embodia AI — Beyond Digital Humans， Empowering AI to think, express, and truly engage.

Embodia AI — Beyond Digital Humans，
Empowering AI to think, express, and truly engage.