v6.2 EE Release Notes

This document was translated by GPT-4

# 1. Applications, Networks, Metrics, Events

  • AutoTracing
    • Support for zero-code Golang application distributed tracing
    • Added file IO event performance data and associated it with log calls
  • Universal Service Map
    • Support for zero-code automatic display of the universal application topology at the process granularity
    • Support for using TOA (TCP Option Address) mechanism to calculate the real access relationship before and after NAT
    • Aggregated OpenTelemetry Span data into service and path metrics
    • Used the operating system's socket information to correct the direction of path data
    • Added adirection score metric, the higher the score, the more accurate the direction of the client and server is, and when the score is 255, the direction is definitely correct.
    • Support for displaying application topology based on Custom Tag (such as k8s.label.xxx) as the grouping condition
  • Private protocol parsing, business field parsing
    • Added WebAssembly Plugin mechanism, supported adding private protocol parsing, and enhanced the field parsing capabilities of standard protocols, see specific document here
  • New continuous profiling feature
    • Supported the collection and display of Java, Golang Native Profile
  • Network NAT tracing, flow logs, flow downloads
    • Flow logs associated with PCAP policies are not rate-limited
    • Supported viewing the flow logs of specific PCAP policies
    • The status of non-TCP flow logs is uniformly set to normal
    • Lowest time span for PCAP download reduced to 1s
    • Supported direct traffic download based on PCAP policy
  • Cloud network traffic distribution
    • Supported traffic distribution using TCP tunnel
    • Displayed traffic size matched by distribution strategy separately by distribution point
    • Supported preconfiguring uncreated workloads and container services in distribution strategies
  • Usability improvements
    • Data correlation
      • Supported switching grouping conditions in the metric tabs of right-slide frames for easy drilling
    • Page operation
      • Expanded the types of fields supported by template variables
      • Added quick filter function on table page side
      • Metric quantity advanced configuration dropdown supports search filtering
      • Improved usability of table column selection.
      • Supported direct download of PCAP files from TCP time sequence diagram page
      • Supported configuring metric templates to quickly select common metric sets in subviews
      • Enhanced control of Tip display in subviews
      • Tags in right-slide pages, flow log call log detail pages support shortcut copying and quick filtering
      • Optimized the filter logic in the traffic relation cards of the right-slide frames to avoid ambiguity
      • Supported setting historical search conditions as page default loading content
      • Full-screen display of all subviews changed to pop-up display
      • Supported direct modification of path options in search box
      • Optimized the operation and display of distributed tracing flame graph, and supported filtering APP/SYS/NET Span
    • New pages
      • Distributed tracing added topology diagram display
      • Added service list page, consolidated display of OTel, eBPF, Packet signal source service granularity metric data
      • Added endpoint monitoring page, supported related expansion from service list page
    • Streamlined pages
      • Merged path, flow log/call log pages
    • Page display
      • Optimized the display of the resource corresponding to the network card in the displayed network path, and displayed the uncollector when the corresponding resource could not be found
      • Optimized the display of the knowledge graph information in the right-slide pages
      • NAT tracing page used waterfall topology presentation and added physical topology display
      • Optimized the display of network path in right-slide pages, supported default display physical network path
      • Empty value tags are not displayed on flow log, call log and other pages
      • Supported adding description information to historical search conditions
      • Optimized the display of search box field text, provided more help information
      • Displayed data table, enumeration value field values, and description information
      • When predicting that the target page has no data, the jump button is disabled in advance
      • Adjusted and optimized page layout, space utilization is more reasonable, and the amount of information displayed is larger
      • Replaced the hard-to-understand resource_gl0/gl1/gl2 with auto_instance and auto_service

# 2. Views, Alarms, Reports

  • Optimized the display of trigger value in alarm events, displayed as current value instead of trigger value
  • Reviewed all system alarm texts
  • Environment threshold alarm strategy for all packet loss of collectors and data nodes

# 3. Resources

  • AutoTagging
    • Supported the synchronization of process information of container nodes and cloud servers, and injected it as Universal Tag into all data, supporting filtering and grouping
    • Supported adding custom metadata to processes, cloud servers, and K8S namespaces
    • Supported automatic synchronization of K8S cluster information under AWS and Alibaba Cloud accounts
    • Supported syncing information of Baidu Cloud's Cloud Intelligent Network (CSN) resources
    • Custom Tag field k8s.label added support for K8S Service
    • When K8S Pod acts as the backend of multiple K8S Services, only the service with the smallest dictionary order is associated with all tags
  • Usability
    • Supported batch entry of load balancer and listener information
    • Supported batch entry of gateway type host and its network card
    • Supported setting the extra docking routing interface for all hosted K8S clusters under the same public cloud account
  • Compatibility
    • Adapted to sync resource information of K8S 1.18, 1.20
    • Huawei public cloud added two configurations of domain name and IAM authorization address to adapt to HCSO scenarios

# 4. System

  • APIs
    • Querier now supports PromQL
      • Supported querying DeepFlow native metrics through PromQL
      • Supported querying Prometheus RemoteWrite metrics through PromQL
      • When querying Prometheus native metrics, supported using DeepFlow AutoTagging auto-injected tags
    • Ingester now supports OTLP Export
    • Querier SQL API
      • Provided Trace Completion API tracing-completion-by-external-app-spans: The caller passed in a set of APP Spans (no need to be stored in DeepFlow), got their upstream and downstream SYS/NET Spans, and added infrastructure and uncodified service tracing capabilities to traditional APM, see specific document here
      • Completed support for AS keyword and returned the original field name before AS in the result
      • When getting optional values of enum type Tag fields, returned corresponding description information
      • Added SLIMIT parameter to limit the number of Series in the return result
      • The Category of custom type Tag (k8s.label/cloud.tag/os.app) is uniformly map_item
      • Added three Universal Tags: tap_port_host, tap_port_chost, tap_port_pod_node, representing the host, cloud server, and container node of the collected network card, respectively
  • Grafana
    • Prometheus Dashboard can use DeepFlow as the data source without modification
    • In Query Editor, the Math operator supports referencing the $__interval variable
    • Optimized Enum type Variable to avoid all candidate values being expanded in SQL when choosing All
    • Added Grafana backend plugin module, supported standard Grafana alert strategy configuration
    • Supported reference of another Variable variable when defining Variable
    • Real-time display of the corresponding SQL statement when editing the content in Query Editor
  • eBPF
    • Avoid eBPF associating all requests to the same Trace when the application process only acts as a client
    • Simplified protocol recognition logic in eBPF code
  • Compatibility
    • Collector supported running in Linux Kernel environments below 3.0
    • Supported specifying the Hostname of the collector environment, suitable for environments with hostname conflicts
    • Provided two deepflow-agent binary packages: dynamic linking and static linking, the former depends on glibc dynamic linking library, while the latter has obvious lock competition in malloc/free under multithreading
    • eBPF program supported automatic kernel offset adjustment by using BTF files
  • Controller and data node management
    • Supported ClickHouse node number greater than deepflow-server replica number
    • Supported specifying domain name form of controller or ingester address for deepflow-agent
    • Supported configuring data storage duration at hour granularity
    • Data time duration configuration API changed to asynchronous to avoid timeout
    • Added self-monitoring of reporting service
  • Collector management
    • Supported specifying Group ID when creating collector group (agent-group)
    • Supported deepflow-agent running on K8s Node as a regular process (not a Pod)
    • When the ctrl_ip or ctrl_mac of the agent's operating environment changes, supported automatic update of the corresponding information of the agent to avoid disconnection
    • Remote upgrade of deepflow-agent on cloud server can be completed entirely by deepflow-ctl, without manually mounting hostPath for deepflow-server
    • Supported configuration of the interval to list k8s resources of Agent, reducing pressure on apiserver
    • When only network monitoring authorization is available, added support for viewing HTTP and DNS capabilities under the application menu
    • Automatically deleted Agents that have lost connection for more than vtap_auto_delete_interval, default deletion after one hour of disconnection
  • Debug
    • Collection interval of all DeepFlow self-monitoring metrics reduced to 10s
    • Included eBPF data source information (syscall, go-tls, go-http2, openssl, io-event) in tap_port field to enhance Debug capabilities
    • deepflow-server supported monitoring itself through continue profile
  • A series of performance optimizations for various software components