v6.2 EE Release Notes
This document was translated by GPT-4
# 1. Applications, Networks, Metrics, Events
- AutoTracing
- Support for zero-code Golang application distributed tracing
- Added file IO event performance data and associated it with log calls
- Universal Service Map
- Support for zero-code automatic display of the universal application topology at the process granularity
- Support for using TOA (TCP Option Address) mechanism to calculate the real access relationship before and after NAT
- Aggregated OpenTelemetry Span data into service and path metrics
- Used the operating system's socket information to correct the direction of path data
- Added a
direction score
metric, the higher the score, the more accurate the direction of the client and server is, and when the score is 255, the direction is definitely correct. - Support for displaying application topology based on Custom Tag (such as k8s.label.xxx) as the grouping condition
- Private protocol parsing, business field parsing
- Added WebAssembly Plugin mechanism, supported adding private protocol parsing, and enhanced the field parsing capabilities of standard protocols, see specific document here
- New continuous profiling feature
- Supported the collection and display of Java, Golang Native Profile
- Network NAT tracing, flow logs, flow downloads
- Flow logs associated with PCAP policies are not rate-limited
- Supported viewing the flow logs of specific PCAP policies
- The status of non-TCP flow logs is uniformly set to normal
- Lowest time span for PCAP download reduced to 1s
- Supported direct traffic download based on PCAP policy
- Cloud network traffic distribution
- Supported traffic distribution using TCP tunnel
- Displayed traffic size matched by distribution strategy separately by distribution point
- Supported preconfiguring uncreated workloads and container services in distribution strategies
- Usability improvements
- Data correlation
- Supported switching grouping conditions in the metric tabs of right-slide frames for easy drilling
- Page operation
- Expanded the types of fields supported by template variables
- Added quick filter function on table page side
- Metric quantity advanced configuration dropdown supports search filtering
- Improved usability of table
column selection
. - Supported direct download of PCAP files from TCP time sequence diagram page
- Supported configuring metric templates to quickly select common metric sets in subviews
- Enhanced control of Tip display in subviews
- Tags in right-slide pages, flow log call log detail pages support shortcut copying and quick filtering
- Optimized the filter logic in the traffic relation cards of the right-slide frames to avoid ambiguity
- Supported setting historical search conditions as page default loading content
- Full-screen display of all subviews changed to pop-up display
- Supported direct modification of path options in search box
- Optimized the operation and display of distributed tracing flame graph, and supported filtering APP/SYS/NET Span
- New pages
- Distributed tracing added topology diagram display
- Added
service list
page, consolidated display of OTel, eBPF, Packet signal source service granularity metric data - Added
endpoint
monitoring page, supported related expansion from service list page
- Streamlined pages
- Merged path, flow log/call log pages
- Page display
- Optimized the display of the resource corresponding to the network card in the displayed network path, and displayed the uncollector when the corresponding resource could not be found
- Optimized the display of the knowledge graph information in the right-slide pages
- NAT tracing page used waterfall topology presentation and added physical topology display
- Optimized the display of network path in right-slide pages, supported default display physical network path
- Empty value tags are not displayed on flow log, call log and other pages
- Supported adding description information to historical search conditions
- Optimized the display of search box field text, provided more help information
- Displayed data table, enumeration value field values, and description information
- When predicting that the target page has no data, the jump button is disabled in advance
- Adjusted and optimized page layout, space utilization is more reasonable, and the amount of information displayed is larger
- Replaced the hard-to-understand resource_gl0/gl1/gl2 with auto_instance and auto_service
- Data correlation
# 2. Views, Alarms, Reports
- Optimized the display of
trigger value
in alarm events, displayed as current value instead of trigger value - Reviewed all system alarm texts
- Environment threshold alarm strategy for all packet loss of collectors and data nodes
# 3. Resources
- AutoTagging
- Supported the synchronization of process information of container nodes and cloud servers, and injected it as Universal Tag into all data, supporting filtering and grouping
- Supported adding custom metadata to processes, cloud servers, and K8S namespaces
- Supported automatic synchronization of K8S cluster information under AWS and Alibaba Cloud accounts
- Supported syncing information of Baidu Cloud's Cloud Intelligent Network (CSN) resources
- Custom Tag field k8s.label added support for K8S Service
- When K8S Pod acts as the backend of multiple K8S Services, only the service with the smallest dictionary order is associated with all tags
- Usability
- Supported batch entry of load balancer and listener information
- Supported batch entry of gateway type host and its network card
- Supported setting the
extra docking routing interface
for all hosted K8S clusters under the same public cloud account
- Compatibility
- Adapted to sync resource information of K8S 1.18, 1.20
- Huawei public cloud added two configurations of domain name and IAM authorization address to adapt to HCSO scenarios
# 4. System
- APIs
- Querier now supports PromQL
- Supported querying DeepFlow native metrics through PromQL
- Supported querying Prometheus RemoteWrite metrics through PromQL
- When querying Prometheus native metrics, supported using DeepFlow AutoTagging auto-injected tags
- Ingester now supports OTLP Export
- Supported exporting SYS/NET Span data (l7_flow_log) to otel-collector, see document of output fields here, FR-014-Tencent (opens new window)。
- Querier SQL API
- Provided Trace Completion API
tracing-completion-by-external-app-spans
: The caller passed in a set of APP Spans (no need to be stored in DeepFlow), got their upstream and downstream SYS/NET Spans, and added infrastructure and uncodified service tracing capabilities to traditional APM, see specific document here - Completed support for AS keyword and returned the original field name before AS in the result
- When getting optional values of enum type Tag fields, returned corresponding description information
- Added SLIMIT parameter to limit the number of Series in the return result
- The Category of custom type Tag (k8s.label/cloud.tag/os.app) is uniformly map_item
- Added three Universal Tags: tap_port_host, tap_port_chost, tap_port_pod_node, representing the host, cloud server, and container node of the collected network card, respectively
- Provided Trace Completion API
- Querier now supports PromQL
- Grafana
- Prometheus Dashboard can use DeepFlow as the data source without modification
- In Query Editor, the Math operator supports referencing the
$__interval
variable - Optimized Enum type Variable to avoid all candidate values being expanded in SQL when choosing All
- Added Grafana backend plugin module, supported standard Grafana alert strategy configuration
- Supported reference of another Variable variable when defining Variable
- Real-time display of the corresponding SQL statement when editing the content in Query Editor
- eBPF
- Avoid eBPF associating all requests to the same Trace when the application process only acts as a client
- Simplified protocol recognition logic in eBPF code
- Compatibility
- Collector supported running in Linux Kernel environments below 3.0
- Supported specifying the Hostname of the collector environment, suitable for environments with hostname conflicts
- Provided two deepflow-agent binary packages: dynamic linking and static linking, the former depends on glibc dynamic linking library, while the latter has obvious lock competition in malloc/free under multithreading
- eBPF program supported automatic kernel offset adjustment by using BTF files
- Controller and data node management
- Supported ClickHouse node number greater than deepflow-server replica number
- Supported specifying domain name form of controller or ingester address for deepflow-agent
- Supported configuring data storage duration at hour granularity
- Data time duration configuration API changed to asynchronous to avoid timeout
- Added self-monitoring of reporting service
- Collector management
- Supported specifying Group ID when creating collector group (agent-group)
- Supported deepflow-agent running on K8s Node as a regular process (not a Pod)
- When the ctrl_ip or ctrl_mac of the agent's operating environment changes, supported automatic update of the corresponding information of the agent to avoid disconnection
- Remote upgrade of deepflow-agent on cloud server can be completed entirely by deepflow-ctl, without manually mounting hostPath for deepflow-server
- Supported configuration of the interval to list k8s resources of Agent, reducing pressure on apiserver
- When only network monitoring authorization is available, added support for viewing HTTP and DNS capabilities under the application menu
- Automatically deleted Agents that have lost connection for more than vtap_auto_delete_interval, default deletion after one hour of disconnection
- Debug
- Collection interval of all DeepFlow self-monitoring metrics reduced to 10s
- Included eBPF data source information (syscall, go-tls, go-http2, openssl, io-event) in tap_port field to enhance Debug capabilities
- deepflow-server supported monitoring itself through continue profile
- A series of performance optimizations for various software components