|
17 | 17 |
|
18 | 18 | PyProf - PyTorch Profiling tool |
19 | 19 | =============================== |
20 | | - |
21 | | - **NOTE: You are currently on the r20.09 branch which tracks |
22 | | - stabilization towards the release. This branch is not usable |
23 | | - during stabilization.** |
24 | 20 |
|
25 | 21 | .. overview-begin-marker-do-not-remove |
26 | 22 |
|
| 23 | +PyProf is a tool that profiles and analyzes the GPU performance of PyTorch |
| 24 | +models. PyProf aggregates kernel performance from `Nsight Systems |
| 25 | +<https://developer.nvidia.com/nsight-systems>`_ or `NvProf |
| 26 | +<https://developer.nvidia.com/nvidia-visual-profiler>`_. |
| 27 | + |
| 28 | +What's New in 3.4.0 |
| 29 | +------------------- |
| 30 | + |
| 31 | +* README and User Guide documentation has been updated with more installation |
| 32 | + options and pointers |
| 33 | + |
| 34 | +Known Issues |
| 35 | +------------ |
| 36 | + |
| 37 | +* Forward-Backward kernel correlation heuristics do not work correctly with |
| 38 | + PyTorch 1.6. Recommended work arounds include: |
| 39 | + |
| 40 | + * Use with PyTorch 1.5 |
| 41 | + * Use DLProf in the `20.09 NGC Pytorch container <https://ngc.nvidia.com/catalog/containers/nvidia:pytorch>`_ |
| 42 | + |
| 43 | +Features |
| 44 | +-------- |
| 45 | + |
| 46 | +* Identifies the layer that launched a kernel: e.g. the association of |
| 47 | + `ComputeOffsetsKernel` with a concrete PyTorch layer or API is not obvious. |
| 48 | + |
| 49 | +* Identifies the tensor dimensions and precision: without knowing the tensor |
| 50 | + dimensions and precision, it's impossible to reason about whether the actual |
| 51 | + (silicon) kernel time is close to maximum performance of such a kernel on |
| 52 | + the GPU. Knowing the tensor dimensions and precision, we can figure out the |
| 53 | + FLOPs and bandwidth required by a layer, and then determine how close to |
| 54 | + maximum performance the kernel is for that operation. |
| 55 | + |
| 56 | +* Forward-backward correlation: PyProf determines what the forward pass step |
| 57 | + is that resulted in the particular weight and data gradients (wgrad, dgrad), |
| 58 | + which makes it possible to determine the tensor dimensions required by these |
| 59 | + backprop steps to assess their performance. |
| 60 | + |
| 61 | +* Determines Tensor Core usage: PyProf can highlight the kernels that use |
| 62 | + `Tensor Cores <https://developer.nvidia.com/tensor-cores>`_. |
| 63 | + |
| 64 | +* Correlate the line in the user's code that launched a particular kernel (program trace). |
| 65 | + |
27 | 66 | .. overview-end-marker-do-not-remove |
28 | 67 |
|
| 68 | +The current release of PyProf is 3.4.0 and is available in the 20.09 release of |
| 69 | +the PyTorch container on `NVIDIA GPU Cloud (NGC) <https://ngc.nvidia.com>`_. The |
| 70 | +branch for this release is `r20.09 |
| 71 | +<https://github.com/NVIDIA/PyProf/tree/r20.09>`_. |
| 72 | + |
| 73 | +Quick Installation Instructions |
| 74 | +------------------------------- |
| 75 | + |
29 | 76 | .. quick-install-start-marker-do-not-remove |
30 | 77 |
|
| 78 | +* Clone the git repository :: |
| 79 | + |
| 80 | + $ git clone https://github.com/NVIDIA/PyProf.git |
| 81 | + |
| 82 | +* Navigate to the top level PyProf directory |
| 83 | + |
| 84 | +* Install PyProf :: |
| 85 | + |
| 86 | + $ pip install . |
| 87 | + |
| 88 | +* Verify installation is complete with pip list :: |
| 89 | + |
| 90 | + $ pip list | grep pyprof |
| 91 | + |
| 92 | +* Should display :: |
| 93 | + |
| 94 | + pyprof 3.4.0 |
| 95 | + |
31 | 96 | .. quick-install-end-marker-do-not-remove |
32 | 97 |
|
| 98 | +Quick Start Instructions |
| 99 | +------------------------ |
| 100 | + |
33 | 101 | .. quick-start-start-marker-do-not-remove |
34 | 102 |
|
| 103 | +* Add the following lines to the PyTorch network you want to profile: :: |
| 104 | + |
| 105 | + import torch.cuda.profiler as profiler |
| 106 | + import pyprof |
| 107 | + pyprof.init() |
| 108 | + |
| 109 | +* Profile with NVProf or Nsight Systems to generate a SQL file. :: |
| 110 | + |
| 111 | + $ nsys profile -f true -o net --export sqlite python net.py |
| 112 | + |
| 113 | +* Run the parse.py script to generate the dictionary. :: |
| 114 | + |
| 115 | + $ python -m pyprof.parse net.sqlite > net.dict |
| 116 | + |
| 117 | +* Run the prof.py script to generate the reports. :: |
| 118 | + |
| 119 | + $ python -m pyprof.prof --csv net.dict |
| 120 | + |
35 | 121 | .. quick-start-end-marker-do-not-remove |
36 | 122 |
|
| 123 | +Documentation |
| 124 | +------------- |
| 125 | + |
| 126 | +The User Guide can be found in the |
| 127 | +`documentation for current release |
| 128 | +<https://docs.nvidia.com/deeplearning/frameworks/pyprof-user-guide/index.html>`_, and |
| 129 | +provides instructions on how to install and profile with PyProf. |
| 130 | + |
| 131 | +A complete `Quick Start Guide <https://docs.nvidia.com/deeplearning/frameworks/pyprof-user-guide/quickstart.html>`_ |
| 132 | +provides step-by-step instructions to get you quickly started using PyProf. |
| 133 | + |
| 134 | +An `FAQ <https://docs.nvidia.com/deeplearning/frameworks/pyprof-user-guide/faqs.html>`_ provides |
| 135 | +answers for frequently asked questions. |
| 136 | + |
| 137 | +The `Release Notes |
| 138 | +<https://docs.nvidia.com/deeplearning/frameworks/pyprof-release-notes/index.html>`_ |
| 139 | +indicate the required versions of the NVIDIA Driver and CUDA, and also describe |
| 140 | +which GPUs are supported by PyProf |
| 141 | + |
| 142 | +Presentation and Papers |
| 143 | +^^^^^^^^^^^^^^^^^^^^^^^ |
| 144 | + |
| 145 | +* `Automating End-toEnd PyTorch Profiling <https://developer.nvidia.com/gtc/2020/video/s21143>`_. |
| 146 | + * `Presentation slides <https://developer.download.nvidia.com/video/gputechconf/gtc/2020/presentations/s21143-automating-end-to-end-pytorch-profiling.pdf>`_. |
| 147 | + |
| 148 | +Contributing |
| 149 | +------------ |
| 150 | + |
| 151 | +Contributions to PyProf are more than welcome. To |
| 152 | +contribute make a pull request and follow the guidelines outlined in |
| 153 | +the `Contributing <CONTRIBUTING.md>`_ document. |
| 154 | + |
| 155 | +Reporting problems, asking questions |
| 156 | +------------------------------------ |
| 157 | + |
| 158 | +We appreciate any feedback, questions or bug reporting regarding this |
| 159 | +project. When help with code is needed, follow the process outlined in |
| 160 | +the Stack Overflow (https://stackoverflow.com/help/mcve) |
| 161 | +document. Ensure posted examples are: |
| 162 | + |
| 163 | +* minimal – use as little code as possible that still produces the |
| 164 | + same problem |
| 165 | + |
| 166 | +* complete – provide all parts needed to reproduce the problem. Check |
| 167 | + if you can strip external dependency and still show the problem. The |
| 168 | + less time we spend on reproducing problems the more time we have to |
| 169 | + fix it |
| 170 | + |
| 171 | +* verifiable – test the code you're about to provide to make sure it |
| 172 | + reproduces the problem. Remove all other problems that are not |
| 173 | + related to your request/question. |
| 174 | + |
37 | 175 | .. |License| image:: https://img.shields.io/badge/License-Apache2-green.svg |
38 | 176 | :target: http://www.apache.org/licenses/LICENSE-2.0 |
0 commit comments