Introduction to Data Transformation

Functional Overview

Data transformation provides data engineers and developers with an efficient, professional, and intelligent data development platform. By providing capabilities such as script development, visual development, task orchestration, task publishing, and task operations and maintenance, it helps organizations and businesses efficiently build real-time data lakehouses.

Feature Details

Modeling

Visual Mode (Recommended): You can create a new transformation model table through a graphical interface by navigating to: Data Source -> Output Source -> Transformation Warehouse -> ETL Layer -> New Table.
DDL Mode: You can create a new table using SQL statements by navigating to: Data Source -> Output Source -> Transformation Warehouse -> ETL Layer -> Query.

Layer Concept

All source database data is synchronized to the input layer of the data warehouse. All transformation tables are created in the etl transformation layer of the data warehouse.
A task's level is determined by the maximum level of the transformation tasks that produce its input tables.
If all input tables are from the input layer, the current task is Level 1.
If the input tables include an output table from a Level n transformation task, the current task becomes Level n+1.
Task levels ensure clear data transformation dependencies, enabling layered transformation and streamed triggering while preventing circular dependencies.

Scripting Guide

To reduce code duplication, the platform supports using global variables ${var} in SQL to replace repetitive code.
For a list of supported SQL functions, refer to the documentation: 👉 Yaoqing SQL
For SQL transformation standards, see: 👉 Transformation Guidelines
For SQL editor shortcuts, see: 👉 Keyboard Shortcuts

Task Details

The platform processes data in real-time streams based on user-provided SQL. To prevent the real-time state from growing indefinitely, users must specify a time-based field in the WHERE clause to constrain the scope of real-time calculations:
- Daily Job - Computes on the last 2 days of data, with support for hourly backfills of historical data.
- Hourly Job - Computes on the last 3 hours of data, with support for backfills every 5 minutes.
- Minute-level Job - Computes on the last 2 minutes of data, with support for backfills every 5 minutes.
By default, when a task starts, it resumes incremental processing from the point where it last stopped or encountered an error. On its first run, it reads and processes data based on the WHERE clause.
The platform provides two distinct runtime environments, optimized for small and large tasks respectively, and intelligently switches between them automatically based on the job's characteristics.

Introduction to Data Transformation ​

Functional Overview ​

Feature Details ​

Modeling ​

Layer Concept ​

Scripting Guide ​

Task Details ​