Content
- 24_FSE_MonitorAssistant: Simplifying Cloud Service Monitoringvia Large Language Models [paper]
- 22_Cloud_MicroLens: A Performance Analysis Framework for Microservices Using Hidden Metrics With BPF [paper]
- 21_ICSE_Kmon: An In-kernel Transparent Monitoring System for Microservice Systems with eBPF [paper] [code]
- 21_Experiences in Managing the Performance and Reliability of a Large-Scale Genomics Cloud Platform [paper]
- 20_NSDI_Google_Meaningful Availability [paper]
- 19_VLDB_DDSketch: A Fast and Fully-Mergeable Quantile Sketch with Relative-Error Guarantees [paper] [code]
- 23_ISSRE_Loghub: A Large Collection of System Log Datasets for AI-driven Log Analytics [paper] [code]
- 23_ICSE_LogReducer: Identify and Reduce Log Hotspots inKernel on the Fly [paper] [code]
- 22_FSE_An Empirical Study of Log Analysis at Microsoft [paper]
- 22_ICPC_QuLog: Data-Driven Approach for Log Instruction Quality Assessment [paper] [code]
- 21_Fast_On the Feasibility of Parser-based Log Compression in Large-Scale Cloud Systems [paper] [code] [ppt]
- 21_SoCC_Cloud-Scale Runtime Verification of Serverless Applications [paper] [code]
- 21_SRDS_What Distributed Systems Say: A Study of Seven Spark Application Logs [paper] [data]
- 21_OSDI_CLP: Efficient and Scalable Search on Compressed Text Logs [paper] [code]
- 21_TSE_A Qualitative Study of the Benefits and Costs of Logging From Developers’ Perspectives [paper]
- 21_CSUR_A Survey on Automated Log Analysis for Reliability Engineering [paper]
- 19_ASE_Logzip: Extracting Hidden Structures via Iterative Clustering for Log Compression [paper] [code]
- 18_ICPE_Log4Perf: Suggesting Logging Locations for Web-based Systems’ Performance Monitoring [paper]
- 18_OSDI_Capturing and Enhancing In Situ System Observability for Failure Detection [paper] [code]
- 17_SOSP_Log20: Fully Automated Optimal Placement of Log Printing Statements under Specified Overhead Threshold [paper]
- 15_ATC_Log2: A Cost-Aware Logging Mechanism for Performance Diagnosis [paper]
- 15_ICSE_Learning to Log: Helping Developers Make Informed Logging Decisions [paper]
- 24_SIGCOM_TraceWeaver: Distributed Request Tracing for Microservices Without Application Modification [paper]
- 24_FSE_TraStrainer: Adaptive Sampling for Distributed Traces with System Runtime State [paper] [code]
- 23_Sigcom_Network-Centric Distributed Tracing with DeepFlow: Troubleshooting Your Microservices in Zero Code [paper] [ppt] [code]
- 23_Sigcom_Fathom: Understanding Datacenter Application Network Performance [paper]
- 23_Eurosys_Foxhound: Server-Grade Observability for Network-Augmented Applications [paper]
- 23_NSDI_The Benefit of Hindsight: Tracing Edge-Cases in Distributed Systems [paper]
- 22_NSDI_Hubble: Performance Debugging with In-Production, Just-In-Time Method Tracing on Android [paper]
- 22_OSDI_Upgradvisor: Early Adopting Dependency Updates Using Hybrid Program Analysis and Hardware Tracing [paper] [ppt] [code]
- 22_AFETM: Adaptive Function Execution Trace Monitoring for Fault Diagnosis [paper]
- 21_ICWS_Sieve: Attention-based Sampling of End-to-End Trace Data in Distributed Microservice System [paper]
- 21_SoCC_3MileBeach: A Tracer with Teeth [paper]
- 19_Cloud_JCallGraph: Tracing Microservices in Very Large Scale Container Cloud Platforms [paper]
- 16_SoCC_Principled workflow-centric tracing of distributed systems [paper]
- 12_TCS_Fay: Extensible Distributed Tracing from Kernels to Clusters [paper]
- 10_Google_Dapper, a Large-Scale Distributed Systems Tracing Infrastructure [paper]
- 07_NSDI_X-Trace: A Pervasive Network Tracing Framework [paper]
- 04_OSDI_Using Magpie for request extraction and workload modelling [paper]
- 02_DSN_Pinpoint: Problem Determination in Large, Dynamic Internet Services [paper]
- 12_ICSE_On the Naturalness of Software [paper]
- 23_Eurosys_Foxhound: Server-Grade Observability for Network-Augmented Applications [paper]
- 22_NSDI_Closed-loop Network Performance Monitoring and Diagnosis with SpiderMon [paper]
- 21_NSDI_Debugging Transient Faults in Data Centers using Synchronized Network-Wide Packet Histories [paper]
- 20_SIGCOMM_PINT: Probabilistic In-Band Network Telemetry [paper] [code]