- APM (Application Performance Monitoring) - Tools and processes for tracking the performance of software applications.
- P95 (95th Percentile) - Value below which 95% of the observations fall.
- P99 (99th Percentile) - Value below which 99% of the observations fall.
- Flamegraph - Visual representation of hierarchical data, typically used to visualize stack traces.
- Gantt Charts - Bar chart that represents a project schedule, showing the start and finish dates of elements.
- Throughput - Measure of how many units of information a system can process in a given amount of time.
- Latency - Time delay between the cause and the effect of some physical change in the system being observed.
- Response Time - Time taken for a system to respond to a request.
- Service Level Agreement (SLA) - Commitment between a service provider and a client.
- Service Level Objective (SLO) - Specific measurable characteristics of the SLA such as availability, throughput, frequency, response time, or quality.
- Service Level Indicator (SLI) - Metric used to measure the SLO.
- Error Rate - Percentage of all requests that result in an error.
- Availability - Proportion of time a system is in a functioning condition.
- Uptime - Amount of time a system is operational.
- Downtime - Amount of time a system is non-operational.
- Mean Time Between Failures (MTBF) - Average time between system failures.
- Mean Time To Repair (MTTR) - Average time required to repair a system.
- Root Cause Analysis (RCA) - Method of problem solving to identify the root causes of faults or problems.
- Transaction Tracing - Process of tracking the path of a single transaction through a system.
- Distributed Tracing - Method to track requests as they propagate through distributed systems.
- Heatmap - Data visualization technique that shows the magnitude of a phenomenon as color in two dimensions.
- Histogram - Graphical representation showing the frequency distribution of data points.
- Alerting - Notifying operators when a metric exceeds a predefined threshold.
- Anomaly Detection - Identifying data points that deviate significantly from the majority of the data.
- Baseline - Reference point used for comparison.
- Synthetic Monitoring - Using scripted transactions to simulate user interactions with a service.
- Real User Monitoring (RUM) - Passive monitoring that records all user interaction with a website or application.
- Log Aggregation - Collecting and storing log data from multiple sources in one place.
- Log Parsing - Extracting meaningful information from log files.
- Log Rotation - Renaming and compressing log files and creating new ones.
- Metrics - Quantitative measurements used to assess the performance and health of a system.
- Dashboards - Visual displays of key performance indicators and other metrics.
- Key Performance Indicator (KPI) - A measurable value that demonstrates how effectively a company is achieving key business objectives.
- Instrumentation - Adding code to an application to collect data about its behavior.
- Sampling - Technique of measuring a portion of events to infer the behavior of the entire system.
- Span - Unit of work in a trace, representing a single operation.
- Trace - Representation of a series of operations.
- Metrics Store - Database or other storage system for metrics data.
- Time Series Database - Database optimized for time-stamped or time series data.
- Alert Thresholds - Predefined limits which, when exceeded, trigger an alert.
- Event Correlation - Identifying and linking related events to determine the underlying issue.
- Telemetry - Automated communications process by which measurements are collected.
- Health Check - Process to determine the status of a system.
- Capacity Planning - Process of determining the necessary resources to meet future demands.
- Load Testing - Testing to determine how a system behaves under a specific load.
- Stress Testing - Testing to determine the limits of a system under extreme conditions.
- End-to-End Monitoring - Comprehensive monitoring of the entire system or process.
- Service Map - Visual representation of how services interact within a system.
- Dependency Mapping - Identifying and documenting the dependencies between various components of a system.
- Throttling - Controlling the amount of resources used by an application or service.
- Rate Limiting - Restricting the number of requests a user can make to a service within a specified time period.
- Circuit Breaker - Pattern to detect failures and sum up the logic of preventing a failure from constantly recurring.
- Retry Logic - Mechanism to handle transient errors by retrying the failed operation.
- Backoff Strategy - Gradually increasing the wait time between retries to prevent overwhelming a service.
- Chaos Engineering - Practice of experimenting on a system to build confidence in its ability to withstand turbulent conditions.
- Observability - Measure of how well you can understand a system's internal states from its external outputs.
- Span Context - Metadata that helps to link spans together in a trace.
- Sampling Rate - Frequency at which samples are collected.
- Event Logging - Recording events that occur in a system.
- Tagging - Adding metadata to metrics, logs, or traces for easier identification and filtering.
- Service Discovery - Automatically detecting services in a network.
- Health Endpoint - Specific URL that returns the health status of an application.
- Red/Black Deployment - Deployment strategy similar to blue/green where the new version runs alongside the old.
- Instrumentation Library - Collection of tools for adding instrumentation to an application.
- Dependency Injection - Technique for achieving Inversion of Control (IoC) between classes and their dependencies.
- Rolling Deployment - Gradual release of a new version of software.
- Shadow Testing - Running a new version alongside the old version to compare performance.
- Error Budget - Acceptable amount of downtime or errors allowed over a period.
- Service Topology - Diagram showing the relationships between services.
- User Journey - Path taken by a user through an application.
- Telemetry Pipeline - Sequence of processes for collecting, transmitting, and analyzing telemetry data.
- Monitoring Agent - Software that collects monitoring data.
- Resource Utilization - Measurement of how efficiently system resources are being used.
- Data Retention - Duration for which monitoring data is kept.
- Granularity - Level of detail in collected data.
- Workload - Amount of work that a system is performing.
- Benchmarking - Comparing system performance against a standard.
- Health Metrics - Metrics that indicate the health of a system.
- Transaction Time - Time taken to complete a transaction.
- Service Latency - Time taken for a service to respond.
- Error Code - Code that indicates the type of error encountered.
- Payload Size - Size of data sent in a request or response.
- Concurrent Users - Number of users simultaneously accessing a system.
- Data Sampling - Process of selecting a subset of data for analysis.
- Heap Dump - Snapshot of the memory of a process.
- Thread Dump - Snapshot of the active threads of a process.
- CPU Profiling - Analyzing the CPU usage of a process.
- Memory Profiling - Analyzing the memory usage of a process.
- Garbage Collection - Process of reclaiming unused memory.
- Leak Detection - Identifying memory leaks.
- Instrumentation API - Interface for adding instrumentation to an application.
- Service Mesh - Dedicated infrastructure layer for managing service-to-service communication.
- Circuit Breaker Pattern - Pattern to detect failures and sum up logic to prevent cascading failures.
- Rate Limiting - Controlling the rate of requests sent to or from a service.
- Fault Injection - Introducing errors into a system to test its robustness.
- Synthetic User - Simulated user interactions used for testing.
- Transaction Volume - Amount of transactions processed.
- Application Log - Log generated by an application.
- System Log - Log generated by the operating system.
- Request Trace - Detailed tracking of a request's journey through a system.
- Span ID - Unique identifier for a span in a trace.
- Parent Span ID - Identifier linking a span to its parent span.
- Log Level - Severity of a log message (e.g., DEBUG, INFO, WARN, ERROR).
- Throttling Policy - Rules for controlling the rate of requests.
- Jitter - Variation in latency.
- Heartbeat - Regular signal to indicate that a system is running.
- Event Stream - Continuous flow of events.
- Buffering - Temporarily storing data in memory while it's being transferred.
- Data Sharding - Partitioning data across multiple servers.
- Data Replication - Copying data to ensure consistency and reliability.
- APDEX (Application Performance Index) - Standardized measure of user satisfaction with application performance.
- Synthetic Transaction - Predefined, automated interactions with an application to test its performance.
- Telemetry Data - Data collected for monitoring and analysis.
- Service Dependency - Relationship between services where one service relies on another.
- Error Budget Policy - Rules for managing error budgets.
- Usage Analytics - Analyzing how users interact with an application.
- Heat Map - Visual representation of data where values are depicted by color.
- Telemetry Client - Component that collects telemetry data.
- Trace Context - Information that allows linking spans across services.
- Performance Tuning - Adjusting a system to improve its performance.
Terminology
Application Performance Management Terminologies
Read Application Performance Management Terminologies