Skip to content

Latest commit

 

History

History

Telemetry

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

Telemetry in cloud

Content

Metric

  • 24_FSE_MonitorAssistant: Simplifying Cloud Service Monitoringvia Large Language Models [paper]
  • 22_Cloud_MicroLens: A Performance Analysis Framework for Microservices Using Hidden Metrics With BPF [paper]
  • 21_ICSE_Kmon: An In-kernel Transparent Monitoring System for Microservice Systems with eBPF [paper] [code]
  • 21_Experiences in Managing the Performance and Reliability of a Large-Scale Genomics Cloud Platform [paper]
  • 20_NSDI_Google_Meaningful Availability [paper]
  • 19_VLDB_DDSketch: A Fast and Fully-Mergeable Quantile Sketch with Relative-Error Guarantees [paper] [code]

Log

  • 23_ISSRE_Loghub: A Large Collection of System Log Datasets for AI-driven Log Analytics [paper] [code]
  • 23_ICSE_LogReducer: Identify and Reduce Log Hotspots inKernel on the Fly [paper] [code]
  • 22_FSE_An Empirical Study of Log Analysis at Microsoft [paper]
  • 22_ICPC_QuLog: Data-Driven Approach for Log Instruction Quality Assessment [paper] [code]
  • 21_Fast_On the Feasibility of Parser-based Log Compression in Large-Scale Cloud Systems [paper] [code] [ppt]
  • 21_SoCC_Cloud-Scale Runtime Verification of Serverless Applications [paper] [code]
  • 21_SRDS_What Distributed Systems Say: A Study of Seven Spark Application Logs [paper] [data]
  • 21_OSDI_CLP: Efficient and Scalable Search on Compressed Text Logs [paper] [code]
  • 21_TSE_A Qualitative Study of the Benefits and Costs of Logging From Developers’ Perspectives [paper]
  • 21_CSUR_A Survey on Automated Log Analysis for Reliability Engineering [paper]
  • 19_ASE_Logzip: Extracting Hidden Structures via Iterative Clustering for Log Compression [paper] [code]
  • 18_ICPE_Log4Perf: Suggesting Logging Locations for Web-based Systems’ Performance Monitoring [paper]
  • 18_OSDI_Capturing and Enhancing In Situ System Observability for Failure Detection [paper] [code]
  • 17_SOSP_Log20: Fully Automated Optimal Placement of Log Printing Statements under Specified Overhead Threshold [paper]
  • 15_ATC_Log2: A Cost-Aware Logging Mechanism for Performance Diagnosis [paper]
  • 15_ICSE_Learning to Log: Helping Developers Make Informed Logging Decisions [paper]

Trace

  • 24_SIGCOM_TraceWeaver: Distributed Request Tracing for Microservices Without Application Modification [paper]
  • 24_FSE_TraStrainer: Adaptive Sampling for Distributed Traces with System Runtime State [paper] [code]
  • 23_Sigcom_Network-Centric Distributed Tracing with DeepFlow: Troubleshooting Your Microservices in Zero Code [paper] [ppt] [code]
  • 23_Sigcom_Fathom: Understanding Datacenter Application Network Performance [paper]
  • 23_Eurosys_Foxhound: Server-Grade Observability for Network-Augmented Applications [paper]
  • 23_NSDI_The Benefit of Hindsight: Tracing Edge-Cases in Distributed Systems [paper]
  • 22_NSDI_Hubble: Performance Debugging with In-Production, Just-In-Time Method Tracing on Android [paper]
  • 22_OSDI_Upgradvisor: Early Adopting Dependency Updates Using Hybrid Program Analysis and Hardware Tracing [paper] [ppt] [code]
  • 22_AFETM: Adaptive Function Execution Trace Monitoring for Fault Diagnosis [paper]
  • 21_ICWS_Sieve: Attention-based Sampling of End-to-End Trace Data in Distributed Microservice System [paper]
  • 21_SoCC_3MileBeach: A Tracer with Teeth [paper]
  • 19_Cloud_JCallGraph: Tracing Microservices in Very Large Scale Container Cloud Platforms [paper]
  • 16_SoCC_Principled workflow-centric tracing of distributed systems [paper]
  • 12_TCS_Fay: Extensible Distributed Tracing from Kernels to Clusters [paper]
  • 10_Google_Dapper, a Large-Scale Distributed Systems Tracing Infrastructure [paper]
  • 07_NSDI_X-Trace: A Pervasive Network Tracing Framework [paper]
  • 04_OSDI_Using Magpie for request extraction and workload modelling [paper]
  • 02_DSN_Pinpoint: Problem Determination in Large, Dynamic Internet Services [paper]

Entropy

  • 12_ICSE_On the Naturalness of Software [paper]

Network

  • 23_Eurosys_Foxhound: Server-Grade Observability for Network-Augmented Applications [paper]
  • 22_NSDI_Closed-loop Network Performance Monitoring and Diagnosis with SpiderMon [paper]
  • 21_NSDI_Debugging Transient Faults in Data Centers using Synchronized Network-Wide Packet Histories [paper]
  • 20_SIGCOMM_PINT: Probabilistic In-Band Network Telemetry [paper] [code]