简介
metrics让你以非并行的视角看到你的代码做了什么,给你强大的工具箱衡量生产环境中关键组件的行为。
使用通用库,例如Jetty, Logback, Log4j, Apache HttpClient, Ehcache, JDBI, Jersey
,报告给Graphite
,metrics给你全栈的可视化。
Metrics Core
- mertic 注册
- 5 metric types: Gauges, Counters, Histograms, Meters, and Timers.
- 报告度量值,via JMX,console,log,csv
Metric Registries
度量注册的主类是MetricRegistry
,所有metric实例的注册服务。通常,一个应用一个MetricRegistry,(spark每个source
一个registry,如DAGSchedulerSource)
1 | public class MetricRegistry implements MetricSet { |
省略了监听器相关代码,因为比较简单,且不是重点。
5 metric types
Gauge
即时读取特定的值(an instantaneous reading of a particular value)1
2
3
4
5
6
7
8public interface Gauge<T> extends Metric {
/**
* Returns the metric's current value.
*
* @return the metric's current value
*/
T getValue();
}
example:1
2
3
4
5
6final Queue<String> queue = new ConcurrentLinkedQueue<String>();
final Gauge<Integer> queueDepth = new Gauge<Integer>() {
public Integer getValue() {
return queue.size();
}
};
Gauge有几个实现类:RatioGauge
,JmxAttributeGauge
,DerivativeGauge
以CachedGauge为例,可用于耗时的操作1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58public abstract class CachedGauge<T> implements Gauge<T> {
private final Clock clock;
private final AtomicLong reloadAt;
private final long timeoutNS;
private volatile T value;// 缓存
/**
* Creates a new cached gauge with the given timeout period.
*
* @param timeout the timeout
* @param timeoutUnit the unit of {@code timeout}
*/
protected CachedGauge(long timeout, TimeUnit timeoutUnit) {
this(Clock.defaultClock(), timeout, timeoutUnit);
}
/**
* Creates a new cached gauge with the given clock and timeout period.
*
* @param clock the clock used to calculate the timeout
* @param timeout the timeout
* @param timeoutUnit the unit of {@code timeout}
*/
protected CachedGauge(Clock clock, long timeout, TimeUnit timeoutUnit) {
this.clock = clock;
this.reloadAt = new AtomicLong(0);
this.timeoutNS = timeoutUnit.toNanos(timeout);
}
/**
* Loads the value and returns it.
* 更新缓存
* @return the new value
*/
protected abstract T loadValue();
public T getValue() {
if (shouldLoad()) {
this.value = loadValue();
}
return value;
}
private boolean shouldLoad() {
for (; ; ) {
final long time = clock.getTick();
final long current = reloadAt.get();
if (current > time) {
return false;
}
if (reloadAt.compareAndSet(current, time + timeoutNS)) {
return true;
}
}
}
}
Counter
1 | /** |
Histogram
直方图,度量数据流的分布1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60/**
* A metric which calculates the distribution of a value.
*
* @see <a href="http://www.johndcook.com/standard_deviation.html">Accurately computing running
* variance</a>
*/
public class Histogram implements Metric, Sampling, Counting {
private final Reservoir reservoir;
private final LongAdderAdapter count;
/**
* Creates a new {@link Histogram} with the given reservoir.
*
* @param reservoir the reservoir to create a histogram from
*/
public Histogram(Reservoir reservoir) {
this.reservoir = reservoir;
this.count = LongAdderProxy.create();
}
/**
* Adds a recorded value.
*
* @param value the length of the value
*/
public void update(int value) {
update((long) value);
}
/**
* Adds a recorded value.
*
* @param value the length of the value
*/
public void update(long value) {
count.increment();
reservoir.update(value);
}
/**
* Returns the number of values recorded.
*
* @return the number of values recorded
*/
public long getCount() {
return count.sum();
}
public Snapshot getSnapshot() {
return reservoir.getSnapshot();
}
}
public abstract class Snapshot {
public double getMedian()
public double get75thPercentile()
...
}
同时具备Counter#getCount()功能 和 Reservoir#getSnapshot()方法
直方图可以测量min,max,mean,standard deviation,以及分位数,如中位数、p95等等
通常,取分位数需要获取全量数据集,排序,取值。这适合用于小数据集、批处理系统,不适用于高吞吐、低延迟服务。
解决方法,是在数据经过时取样。通过维护一个小的、可管理的、在统计上代表整个数据流的储层,我们可以快速而容易地计算出分位数,这是实际分位数的有效近似值。这种技术被称为储层取样(reservoir sampling)
。
目前共有以下几种储层
Meter
测量平均吞吐量,1min、5min、15min的指数权重吞吐量(rate * duration)1
Just like the Unix load averages visible in uptime or top.
Timers
1 | /** |
聚合时长、时长统计(histogram)、以及吞吐量(meter)
reporters
导出metrics统计出的结果,metrics-core
提供了四种方式:JMX, console, SLF4J, and CSV.
ScheduledReporter
1 | /** |
所谓的start、stop,就是调度定时器、关闭定时器
Other Reporters
MetricsServlet 健康检查、thread dump、JVM-level and OS-level信息
GraphiteReporter图形界面
JVM Instrumentation
- Run count and elapsed times for all supported garbage collectors
- Memory usage for all memory pools, including off-heap memory
- Breakdown of thread states, including deadlocks
- File descriptor usage
- Buffer pool sizes and utilization
Monitoring your JVM with Dropwizard Metrics