Go 可视化性能分析工具

pprof 是什么？

PProf是一个CPU分析器( cpu profiler)，它是gperftools工具的一个组件，由Google工程师为分析多线程的程序所开发。
Go标准库中的pprof package通过HTTP的方式为pprof工具提供数据。
(译者注：不止这个包，runtime/pprof还可以为控制台程序或者测试程序产生pprof数据)

既然pprof数据通过HTTP提供，所以它需要在你的应用中运行一个web服务器。可以通过 import pprof的副作用(这里副作用 side-effect 是指引入这个包让其初始化，不是贬义词)，这个包就可以在缺省的web服务器中注册它的handler，并补需要其它额外的操作。

在Mac上安装工具:

1 2	$ brew install gperftools $ brew install graphviz

pprof 文件分析
pprof 文件是二进制的，不是给人读的，需要翻译一下，而 golang 原生就给我们提供了分析工具，直接执行下面命令即可，会生成一张很直观的 svg 图片，直接用 chrome 就可以打开，当然也可以生成别的格式（pdf，png 都可以），可以用 go tool pprof -h 命令查看支持的输出类型

1.创建性能单元测试脚本

string_test.go

package awesomeProject

import (
   "strings"
   "testing"
)

func BenchmarkStringJoin1(b *testing.B) {
   b.ReportAllocs()
   input := []string{"Hello", "World"}
   for i := 0; i < b.N; i++ {
      result := strings.Join(input, " ")
      if result != "Hello World" {
         b.Error("Unexpected result: " + result)
      }
   }
}

2.执行性能测试benchmark，生成二进制out文件

go test -run string_test.go -bench=. -benchtime="3s" -cpuprofile profile_cpu.out

3.把二进制文件转换成图片或pdf

$ go tool pprof -pdf ./profile_cpu.out >cpu.pdf

Go语言性能测试

对于一些服务来说，性能是极其重要的一环，事关系统的吞吐、访问的延迟，进而影响用户的体验。

写性能测试在Go语言中是很便捷的，go自带的标准工具链就有完善的支持，下面我们来从Go的内部和系统调用方面来详细剖析一下Benchmark这块儿。

Benchmark

Go做Benchmar只要在目录下创建一个_test.go后缀的文件，然后添加下面函数：

func BenchmarkStringJoin1(b *testing.B) {
    b.ReportAllocs()
    input := []string{"Hello", "World"}
    for i := 0; i < b.N; i++ {
        result := strings.Join(input, " ")
        if result != "Hello World" {
            b.Error("Unexpected result: " + result)
        }
    }
}

调用以下命令:

# go test -run=xxx -bench=. -benchtime="3s" -cpuprofile profile_cpu.out

该命令会跳过单元测试，执行所有benchmark,同时生成一个cpu性能描述文件.

这里有两个注意点:

-benchtime 可以控制benchmark的运行时间
b.ReportAllocs() ，在report中包含内存分配信息，例如结果是:

BenchmarkStringJoin1-4 300000 4351 ns/op 32 B/op 2 allocs/op

-4表示4个CPU线程执行；300000表示总共执行了30万次；4531ns/op，表示每次执行耗时4531纳秒；32B/op表示每次执行分配了32字节内存；2 allocs/op表示每次执行分配了2次对象。

根据上面的信息，我们就能对热点路径进行内存对象分配的优化,例如针对上面的程序我们可以进行小小的优化:

func BenchmarkStringJoin2(b *testing.B) {
    b.ReportAllocs()
    input := []string{"Hello", "World"}
    join := func(strs []string, delim string) string {
        if len(strs) == 2 {
            return strs[0] + delim + strs[1];
        }
        return "";
    };
    for i := 0; i < b.N; i++ {
        result := join(input, " ")
        if result != "Hello World" {
            b.Error("Unexpected result: " + result)
        }
    }
}

新的Benchmark结果是：

BenchmarkStringJoin2-4 500000 2440 ns/op 16 B/op 1 allocs/op

可以看出来，在减少了内存分配后，性能提升了60%以上！

Cpu Profile

上一节的benchmark结果，我们只能看到函数的整体性能，但是如果该函数较为复杂呢？然后我们又想知道函数内部的耗时，这时就该Cpu Profile登场了。

Cpu profile是Go语言工具链中最闪耀的部分之一，掌握了它以及memory、block profile,那基本上就没有你发现不了的性能瓶颈了。

之前的benchmark同时还生成了一个profile_cpu.out文件，这里我们执行下面的命令:

# go tool pprof app.test profile_cpu.out
Entering interactive mode (type "help" for commands)
(pprof) top10
8220ms of 10360ms total (79.34%)
Dropped 63 nodes (cum <= 51.80ms)
Showing top 10 nodes out of 54 (cum >= 160ms)
      flat  flat%   sum%        cum   cum%
    2410ms 23.26% 23.26%     4960ms 47.88%  runtime.concatstrings
    2180ms 21.04% 44.31%     2680ms 25.87%  runtime.mallocgc
    1200ms 11.58% 55.89%     1200ms 11.58%  runtime.memmove
     530ms  5.12% 61.00%      530ms  5.12%  runtime.memeqbody
     530ms  5.12% 66.12%     2540ms 24.52%  runtime.rawstringtmp
     470ms  4.54% 70.66%     2420ms 23.36%  strings.Join
     390ms  3.76% 74.42%     2330ms 22.49%  app.BenchmarkStringJoin3B
     180ms  1.74% 76.16%     1970ms 19.02%  runtime.rawstring
     170ms  1.64% 77.80%     5130ms 49.52%  runtime.concatstring3
     160ms  1.54% 79.34%      160ms  1.54%  runtime.eqstring

上面仅仅展示部分函数的信息，并没有调用链路的性能分析，因此如果需要完整信息，我们要生成svg或者pdf图。

# go tool pprof -svg profile_cpu.out > profile_cpu.svg
# go tool pprof -pdf profile_cpu.out > profile_cpu.pdf

可以看到图里包含了多个benchmark的合集(之前的两段benmark函数都在同一个文件中)，但是我们只关心性能最差的那个benchmark,因此需要过滤：

go test -run=xxx -bench=BenchmarkStringJoin2B$ -cpuprofile profile_2b.out
go test -run=xxx -bench=BenchmarkStringJoin2$ -cpuprofile profile_2.out
go tool pprof -svg profile_2b.out > profile_2b.svg
go tool pprof -svg profile_2.out > profile_2.svg

Go自带了测试框架和工具，在testing包中，以便完成单元测试（T类型）和性能测试（B类型）。

一般测试代码放在*_test.go文件中，与被测代码放于同一个包中。

单元测试

测试函数名称格式是：Test[^a-z]，即以Test开头，跟上非小写字母开头的字符串。每个测试函数都接受一个*testing.T类型参数，用于输出信息或中断测试。

测试方法有：

Fail: 标记失败，但继续执行当前测试函数
FailNow: 失败，立即终止当前测试函数执行
Log: 输出错误信息
Error: Fail + Log
Fatal: FailNow + Log
Skip: 跳过当前函数，通常用于未完成的测试用例

测试代码：

// filename: add_test.go

package test
import (
    "testing"
)

func Add(a, b int) int {
    return a + b
}

func TestAdd1(t *testing.T) {
    if Add(2, 3) != 5 {
         t.Error("result is wrong!")
    } else {
         t.Log("result is right")
    }
}

func TestAdd2(t *testing.T) {
    if Add(2, 3) != 6 {
         t.Fatal("result is wrong!")
    } else {
         t.Log("result is right")
    }
}

运行以下命令，自动搜集所有的测试文件（*_test.go），提取全部测试函数。

$ go test
输出：
--- FAIL: TestAdd2 (0.00s)
    add_test.go:20: result is wrong!
FAIL
exit status 1
FAIL    _/Users/golang_learning/testTB  0.006s

输出结果包括：出错的测试函数名称，执行时长和错误信息。

go test还有以下参数：

-v：显示所有测试函数运行细节

$ go test -v
输出：
=== RUN   TestAdd1
--- PASS: TestAdd1 (0.00s)
    add_test.go:14: result is right
=== RUN   TestAdd2
--- FAIL: TestAdd2 (0.00s)
    add_test.go:20: result is wrong!
FAIL
exit status 1
FAIL    _/Users/golang_learning/testTB  0.007s

-run regex：指定要执行的测试函数

$ go test -run TestAdd2
输出：
--- FAIL: TestAdd2 (0.00s)
    add_test.go:20: result is wrong!
FAIL
exit status 1
FAIL    _/Users/golang_learning/testTB  0.006s

性能测试

性能测试函数以Benchmark 开头，参数类型是 *testing.B，可与Test函数放在同个文件中。默认情况下，go test不执行Benchmark测试，必须用“-bench <pattern>”指定性能测试函数。

测试代码：

// filename: add_test.go
package test
import (
    "testing"
)

func Add(a, b int) int {
    return a + b
}

func Benchmark(b *testing.B) {
    for i := 0; i < b.N; i++ { // b.N，测试循环次数
        Add(4, 5)
    }
}

执行命令测试：

$ go test -bench=.
输出：
goos: darwin
goarch: amd64
Benchmark-4     2000000000           0.35 ns/op
PASS
ok      _/Users/golang_learning/testTB  0.753s

B类型也有以下参数：

benchmem：输出内存分配统计
benchtime：指定测试时间
cpu：指定GOMAXPROCS
timeout：超市限制

$ go test -v -bench=. -cpu=8 -benchtime="3s" -timeout="5s" -benchmem
输出：
goos: darwin
goarch: amd64
Benchmark-8     5000000000           0.34 ns/op        0 B/op          0 allocs/op
PASS
ok      _/Users/golang_learning/testTB  1.766s

Benchmark-8：-cpu参数指定，-8表示8个CPU线程执行
5000000000：表示总共执行了5000000000次
0.34 ns/op：表示每次执行耗时0.34纳秒
0 B/op:表示每次执行分配的内存（字节）
0 allocs/op：表示每次执行分配了多少次对象

pprof

go tools继承了pprof，以便进行性能测试并找出瓶颈。

测试数据文件生成有两种方法：命令行和代码

1、命令行生成测试数据文件

$ go test -bench=. -cpuprofile cpu.out
输出：
goos: darwin
goarch: amd64
Benchmark-4     2000000000           0.35 ns/op
PASS
ok      _/Users/golang_learning/testTB  0.911s
$ ls 
输出：（上一条命令生成cpu.out和testTB.test）
add_test.go     cpu.out     testTB.test

用命令行分析

$ go tool pprof -text mem.out 
输出：
Main binary filename not available.
Type: inuse_space
Time: May 22, 2018 at 3:36pm (CST)
Showing nodes accounting for 1.16MB, 100% of 1.16MB total
      flat  flat%   sum%        cum   cum%
    1.16MB   100%   100%     1.16MB   100%  runtime/pprof.StartCPUProfile
         0     0%   100%     1.16MB   100%  main.main
         0     0%   100%     1.16MB   100%  runtime.main

pprof交互模式分析

$ go tool pprof testTB.test cpu.out 
输出：
File: testTB.test
Type: cpu
Time: May 22, 2018 at 3:08pm (CST)
Duration: 906.10ms, Total samples = 670ms (73.94%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
输出：
Showing nodes accounting for 670ms, 100% of 670ms total
      flat  flat%   sum%        cum   cum%
     670ms   100%   100%      670ms   100%  _/Users/golang_learning/testTB.Benchmark
         0     0%   100%      670ms   100%  testing.(*B).launch
         0     0%   100%      670ms   100%  testing.(*B).runN
(pprof) quit

以上性能指标含义：

函数本地采样数量（不包含其调用的其他函数）
函数本地采样数量所占百分比
前几个函数（包括当前函数）本地采样总和所占百分比
函数（包括其调用的函数）采样总数量
函数采样总数量所占百分比

pprof交互模式命令：

top、top10: 显示前几条信息；
web：以svg文件展示。

pdf或者svg分析

$ go tool pprof -svg cpu.out > cpu.svg
$ go tool pprof -pdf cpu.out > cpu.pdf
$ ls（以上命令生成cpu.pdf和cpu.svg文件）
add_test.go   cpu.out    cpu.pdf        cpu.svg     testTB.test

2、代码生成测试数据文件

用runtime/pprof包生成cpu和mem文件：

// filename: pprof.go
package main
import (
    "os"
    "runtime/pprof"
)

func main() {
    w, _ := os.Create("cpu.out")
    defer w.Close()
    pprof.StartCPUProfile(w)
    defer pprof.StopCPUProfile()

    w2, _ := os.Create("mem.out")
    defer w2.Close()
    defer pprof.WriteHeapProfile(w2)

    Sum(3, 5)

}

func Sum(a, b int) int {
    return a + b
}

编译并执行以上文件：

$ go build pprof.go 
$ ./pprof 
$ ls（./pprof命令生成cpu.out和mem.out文件）
cpu.out     mem.out     pprof       pprof.go

生成pdf文件：

$ go tool pprof -pdf cpu.out > cpu.pdf
$ go tool pprof -pdf mem.out > mem.pdf

net/http/pprof

若是测量http server的性能，则需要使用net/http/pprof包。只需要在源文件中加入代码：

import (
    _ "net/http/pprof"
)

CPU分析

启动http server，然后在浏览器中访问链接：http://localhost:【port】/debug/pprof/
就能看到prof信息了：

image

若要生成CPU状态分析图，可使用以下命令：

$ go tool pprof http://localhost:port/debug/pprof/profile

就会进入30秒的profile收集时间，在这段事件内向服务器连续发送多次请求，尽量让cpu占用性能产生数据。

$ go tool pprof http://localhost:8080/debug/pprof/profile
输出：
Fetching profile over HTTP from http://localhost:8080/debug/pprof/profile
// 这里会阻塞30s收集profile，在这段时间应多发请求
Saved profile in /Users/pprof/pprof.samples.cpu.002.pb.gz
Type: cpu
Time: May 22, 2018 at 7:16pm (CST)
Duration: 30s, Total samples = 330ms ( 1.10%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top10
输出：（前10条信息）
Showing nodes accounting for 330ms, 100% of 330ms total
Showing top 10 nodes out of 39
      flat  flat%   sum%        cum   cum%
     100ms 30.30% 30.30%      100ms 30.30%  runtime.mach_semaphore_signal
      60ms 18.18% 48.48%       60ms 18.18%  runtime.mach_semaphore_timedwait
      60ms 18.18% 66.67%       60ms 18.18%  runtime.mach_semaphore_wait
      ... ... 
(pprof) web
// 在浏览器中以svg文件的方式展示cpu占用图

内存分析

启动服务器，访问链接http://localhost:port/debug/pprof/heap，就会自动下载heap文件，调用以下命令，进入pprof交互模式：

$ go tool pprof heap 
Main binary filename not available.
Type: inuse_space
Time: May 24, 2018 at 11:30pm (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top5
Showing nodes accounting for 62.02MB, 98.20% of 63.16MB total
Showing top 5 nodes out of 26
      flat  flat%   sum%        cum   cum%
   21.67MB 34.31% 34.31%    21.67MB 34.31%  bytes.makeSlice
   20.50MB 32.46% 66.77%    20.50MB 32.46%  reflect.unsafe_New
      15MB 23.75% 90.52%       15MB 23.75%  github.com/golang/protobuf/proto.(*word32Slice).Append
    2.85MB  4.51% 95.03%     2.85MB  4.51%  github.com/golang/protobuf/proto.(*structPointerSlice).Append
       2MB  3.17% 98.20%        2MB  3.17%  github.com/golang/protobuf/proto.word32_Set
(pprof) web

常用命令

go test -run main2_test.go -bench=. benchtime="3s" -test.cpuprofile cpu.out

-memprofile string
write memory profile to this file
-test.bench regexp
run only benchmarks matching regexp
-test.benchmem
print memory allocations for benchmarks
-test.benchtime d
run each benchmark for duration d (default 1s)
-test.blockprofile file
write a goroutine blocking profile to file
-test.blockprofilerate rate
set blocking profile rate (see runtime.SetBlockProfileRate) (default 1)
-test.count n
run tests and benchmarks n times (default 1)
-test.coverprofile file
write a coverage profile to file
-test.cpu list
comma-separated list of cpu counts to run each test with
-test.cpuprofile file
write a cpu profile to file
-test.failfast
do not start new tests after the first test failure
-test.list regexp
list tests, examples, and benchmarks matching regexp then exit
-test.memprofile file
write an allocation profile to file
-test.memprofilerate rate
set memory allocation profiling rate (see runtime.MemProfileRate)
-test.mutexprofile string
write a mutex contention profile to the named file after execution
-test.mutexprofilefraction int
if >= 0, calls runtime.SetMutexProfileFraction() (default 1)
-test.outputdir dir
write profiles to dir
-test.parallel n
run at most n tests in parallel (default 12)
-test.run regexp
run only tests and examples matching regexp
-test.short
run smaller test suite to save time
-test.testlogfile file
write test action log to file (for use only by cmd/go)
-test.timeout d
panic test binary after duration d (default 0, timeout disabled)
-test.trace file
write an execution trace to file
-test.v
verbose: print additional output

版权声明：本文来源CSDN，感谢博主原创文章，遵循 CC 4.0 by-sa 版权协议，转载请附上原文出处链接和本声明。
原文链接：https://blog.csdn.net/u012014531/article/details/103528604
站方申明：本站部分内容来自社区用户分享，若涉及侵权，请联系站方删除。

发表于 2021-12-11 18:17:06
阅读 ( 3385 )
分类：Go

Go 可视化性能分析工具

pprof 是什么？

Go语言性能测试

Benchmark

Cpu Profile

单元测试

性能测试

pprof

1、命令行生成测试数据文件

用命令行分析

pprof交互模式分析

pdf或者svg分析

2、代码生成测试数据文件

net/http/pprof

CPU分析

内存分析

你可能感兴趣的文章

精选的优质文章

0 条评论

官方社群

GO教程

推荐文章

猜你喜欢

随便看看