Trace Completion API
This document was translated by GPT-4
# 1. Introduction
APM (Application Performance Management) focuses mainly on code and does not possess the ability to view issues from a full-stack multi-dimensional perspective without any blind spots. Additionally, due to the hindering effect of instrumentation, it often fails to cover all services. DeepFlow leverages the zero-instrumentation, full-coverage data collection of eBPF for distributed tracing, and associates this with the generation of call-chains. In scenarios where DeepFlow and APM are configured to operate independently of each other, they can function in harmony through the use of a loosely coupled methodology. This involves using DeepFlow's Trace Completion API to enhance APM's call chains, eliminating blind spots in APM for cloud-native infrastructure and non-instrumented services, considerably reducing the time for problem triage.
Before we start explaining the API, let's first illustrate how APM can supplement missing data after calling the DeepFlow API using a diagram.
Full Stack Distributed Tracing
- Span with prefix A denotes Application Span (from APM); Span with prefix S represents System Span (from DeepFlow); Span with prefix N refers to Network Span (from DeepFlow)
- The black part of the diagram displays the parameters when APM calls the DeepFlow API. DeepFlow uses these
Application Span
as search boundaries to supplement surroundingSystem/Network Span
, and reconstructs Parent-Child relationship - The blue part displays
System/Network Span
computed based onApplication Span
that has injected TraceID/SpanID from APM. It fills the gap between two services in APM regarding Syscall, Bridge, IPVS and other kernel system call and network transmission path - The green part denotes basic service calls that DeepFlow traces automatically based on its
system span
, for instance, non-instrumented DNS calls, and MySQL calls, Redis calls etc., whose TraceID/SpanID can't be injected - The red part summarizes up and down stream services, traced by DeepFlow automatically based on its system span, that aren't instrumented, such as non-instrumented ALB, NLB, Ingress etc., gateway services, and other services not instrumented by APM in business logic
# 2. API Description
Get DeepFlow service endpoint port number:
port=$(kubectl get --namespace deepflow -o jsonpath="{.spec.ports[0].nodePort}" services deepflow-app)
Trace Completion API call method:
curl -XPOST "http://${deepflow_server_node_ip}:${port}/v1/stats/querier/tracing-completion-by-external-app-spans"
# 2.1 Input Description
{
"max_iteration": 30,
"network_delay_us": 3000000,
"app_spans":[
{
"trace_id": "xxxx",
"span_id": "xxxx",
"parent_span_id": "xxxx",
"span_kind": 0,
"start_time_us": 1681960139619998,
"end_time_us": 1681960139620004,
},
...
]
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Field | Type | Required | Description |
---|---|---|---|
max_iteration | int | No | Depth of System Span tracking, default is 30, unit: layers |
network_delay_us | int | No | Time span for Network Span tracking, default is 3000000, unit: microseconds |
app_spans | array[AppSpans] | Yes | List of Application Span that you want to use to complete call chains, could be all Application Span from a complete Trace (not suggested though) |
App_spans normally include a part of Application Span from an APM's Trace, and DeepFlow completes it based on this. It's recommended to carry the following Span each time calling the API:
- The Application Span you care about the most (referred to as X), and the service it belongs to is called a.
- X's ancestor Span, until find the first one whose ancestor Span is not from service a, for instance, in SkyWalking, the first Exit Span.
- X's descendant Span, until find the first one whose ancestor Span is not from service a for every child branch, for instance, in SkyWalking, for each branch, find the first Entry Span.
The purpose of carrying these Span in the request is to tell DeepFlow to complete it with Span X being the core, and reconstruct the relationship between all Span in the return result with X's ancestors and descendants as boundaries. The parameters needed for each app_span are listed below:
Field | Type | Required | Description |
---|---|---|---|
trace_id | string | Yes | TraceID for Application Span |
span_id | string | Yes | SpanID for Application Span |
parent_span_id | string | Yes | ParentSpanID for Application Span |
span_kind | int | Yes | Type of Application Span meaning the same as OpenTelemetry, optional value:0: unspecified, 1: internal, 2: server, 3: client, 4: producer, 5: consumer |
start_time_us | int | Yes | Start time of Application Span ,unit: microseconds |
end_time_us | int | Yes | End time of Application Span ,unit: microseconds |
# 2.2 Output Description
{
"OPT_STATUS": "SUCCESS",
"DESCRIPTION": "",
"DATA": {
"tracing": [
{
"start_time_us": 1682216627824419,
"end_time_us": 1681960139620004,
"name": "querier_client",
"signal_source": 4,
"tap_side": "c-app",
"trace_id": "a03a848c3121b817b0e866fb71607bc2",
"span_id": "d5b574eb7ac48503",
"parent_span_id": "69cc875250b4043c",
"deepflow_span_id": "d5b574eb7ac48503",
"deepflow_parent_span_id": "69cc875250b4043c",
"_ids": ["7225065397752915120"],
"related_ids": [
"2-app-7225065397752915115"
],
"flow_id": "0",
"duration": 32219,
"req_tcp_seq": 0,
"resp_tcp_seq": 0,
"l7_protocol": 20,
"l7_protocol_str": "HTTP",
"request_type": "POST",
"request_resource": "xxxx",
"response_status": 2,
"request_id": "xxxx",
"endpoint": "querier_client",
"process_id": 1234,
"app_service": "deepflow-statistics",
"app_instance": "",
"x_request_id": "",
"syscall_trace_id_request": "0",
"syscall_trace_id_response": "0",
"syscall_cap_seq_0": 0,
"syscall_cap_seq_1": 0,
"vtap_id": 1,
},
...
]
}
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
The tracing in the returned result is a complete Span traced by DeepFlow, an array type. Each item in this array is a Span, which includes both Application Span from APM and System/Network Span from DeepFlow. Important attributes for each Span include:
Field | Type | Description |
---|---|---|
start_time_us | int | Span start time, unit: microseconds |
end_time_us | int | Span end time, unit: microseconds |
duration | int | Span execution time, unit: microseconds |
name | string | Span name, For System/Network Span, it's corresponding to DeepFlow's request_resource field description |
signal_source | int | Span source, corresponding to DeepFlow's signal_source field description |
tap_side | int | Span statistic location, corresponding to DeepFlow's tap_side field description |
trace_id | string | TraceID, if System/Network Span has corresponding Application Span , value should be the same; otherwise, it's empty |
span_id | string | Original Span ID, if System/Network Span has corresponding Application Span , value should be the same; otherwise, it's empty |
parent_span_id | string | Original Parent Span ID, if System/Network Span has corresponding Application Span , value should be the same; otherwise, it's empty |
deepflow_span_id | string | Re-calculated Span ID by DeepFlow |
deepflow_parent_span_id | string | Re-calculated Parent Span ID by DeepFlow |
Besides, API will return extra fields for each Span:
Field | Type | Description | Note |
---|---|---|---|
_ids | array | DipFlow call logs corresponding to Span | |
related_ids | int | Other DeepFlow call logs associated with Span | |
flow_id | string | DeepFlow flow logs corresponding to Span, no data for Application/ System Span | |
l7_protocol | int | Application protocol for Span, corresponding to DipFlow's l7_protocol field description | |
l7_protocol_str | string | Application protocol for Span | |
request_type | string | Request type for Span | |
request_id | string | Request ID for Span | |
endpoint | string | Request endpoint for Span | |
request_resource | string | Request resource for Span | |
response_status | int | Response status for Span, corresponding to DipFlow's response_status field description | |
process_id | int | Process ID for Span, only for System Span | |
app_service | string | Service where Span belongs, only for Application Span | |
app_instance | string | Instance where Span belongs, only for Application Span | |
vtap_id | int | Collection ID corresponding to Span | |
req_tcp_seq | int | TCP Seq for Span request, only for System/ Network Span | Used for trace calculation |
resp_tcp_seq | int | TCP Seq for Span response, only for System/ Network Span | Used for trace calculation |
x_request_id | string | X-Request-ID for Span request or response, only for System/ Network Span | Used for trace calculation |
syscall_trace_id_request | string | Syscall TraceID corresponding to Span request, only for System Span | Used for trace calculation |
syscall_trace_id_response | string | Syscall TraceID corresponding to Span response, only for System Span | Used for trace calculation |
syscall_cap_seq_0 | string | Syscall Seq corresponding to Span request, only for System Span | Used for trace calculation |
syscall_cap_seq_1 | string | Syscall Seq corresponding to Span response, only for System Span | Used for trace calculation |
Note:
- Use
deepflow_span_id
anddeepflow_parent_span_id
to construct the new parent-child relationship in the returned results. - TraceID/SpanID injected into protocol after application instrumentation can be automatically collected and parsed by agent in default OpenTelemetry and SkyWalking's Header format has been adapted. Please modify agent configuration for custom headers, for detailed instructions, please refer to Agent Advanced Configuration