Skip to content

Commit a775248

Browse files
committed
Update README and NGC versions post-20.09 release
1 parent ab4dc5a commit a775248

File tree

1 file changed

+142
-4
lines changed

1 file changed

+142
-4
lines changed

README.rst

Lines changed: 142 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,22 +17,160 @@
1717

1818
PyProf - PyTorch Profiling tool
1919
===============================
20-
21-
**NOTE: You are currently on the r20.09 branch which tracks
22-
stabilization towards the release. This branch is not usable
23-
during stabilization.**
2420

2521
.. overview-begin-marker-do-not-remove
2622
23+
PyProf is a tool that profiles and analyzes the GPU performance of PyTorch
24+
models. PyProf aggregates kernel performance from `Nsight Systems
25+
<https://developer.nvidia.com/nsight-systems>`_ or `NvProf
26+
<https://developer.nvidia.com/nvidia-visual-profiler>`_.
27+
28+
What's New in 3.4.0
29+
-------------------
30+
31+
* README and User Guide documentation has been updated with more installation
32+
options and pointers
33+
34+
Known Issues
35+
------------
36+
37+
* Forward-Backward kernel correlation heuristics do not work correctly with
38+
PyTorch 1.6. Recommended work arounds include:
39+
40+
* Use with PyTorch 1.5
41+
* Use DLProf in the `20.09 NGC Pytorch container <https://ngc.nvidia.com/catalog/containers/nvidia:pytorch>`_
42+
43+
Features
44+
--------
45+
46+
* Identifies the layer that launched a kernel: e.g. the association of
47+
`ComputeOffsetsKernel` with a concrete PyTorch layer or API is not obvious.
48+
49+
* Identifies the tensor dimensions and precision: without knowing the tensor
50+
dimensions and precision, it's impossible to reason about whether the actual
51+
(silicon) kernel time is close to maximum performance of such a kernel on
52+
the GPU. Knowing the tensor dimensions and precision, we can figure out the
53+
FLOPs and bandwidth required by a layer, and then determine how close to
54+
maximum performance the kernel is for that operation.
55+
56+
* Forward-backward correlation: PyProf determines what the forward pass step
57+
is that resulted in the particular weight and data gradients (wgrad, dgrad),
58+
which makes it possible to determine the tensor dimensions required by these
59+
backprop steps to assess their performance.
60+
61+
* Determines Tensor Core usage: PyProf can highlight the kernels that use
62+
`Tensor Cores <https://developer.nvidia.com/tensor-cores>`_.
63+
64+
* Correlate the line in the user's code that launched a particular kernel (program trace).
65+
2766
.. overview-end-marker-do-not-remove
2867
68+
The current release of PyProf is 3.4.0 and is available in the 20.09 release of
69+
the PyTorch container on `NVIDIA GPU Cloud (NGC) <https://ngc.nvidia.com>`_. The
70+
branch for this release is `r20.09
71+
<https://github.com/NVIDIA/PyProf/tree/r20.09>`_.
72+
73+
Quick Installation Instructions
74+
-------------------------------
75+
2976
.. quick-install-start-marker-do-not-remove
3077
78+
* Clone the git repository ::
79+
80+
$ git clone https://github.com/NVIDIA/PyProf.git
81+
82+
* Navigate to the top level PyProf directory
83+
84+
* Install PyProf ::
85+
86+
$ pip install .
87+
88+
* Verify installation is complete with pip list ::
89+
90+
$ pip list | grep pyprof
91+
92+
* Should display ::
93+
94+
pyprof 3.4.0
95+
3196
.. quick-install-end-marker-do-not-remove
3297
98+
Quick Start Instructions
99+
------------------------
100+
33101
.. quick-start-start-marker-do-not-remove
34102
103+
* Add the following lines to the PyTorch network you want to profile: ::
104+
105+
import torch.cuda.profiler as profiler
106+
import pyprof
107+
pyprof.init()
108+
109+
* Profile with NVProf or Nsight Systems to generate a SQL file. ::
110+
111+
$ nsys profile -f true -o net --export sqlite python net.py
112+
113+
* Run the parse.py script to generate the dictionary. ::
114+
115+
$ python -m pyprof.parse net.sqlite > net.dict
116+
117+
* Run the prof.py script to generate the reports. ::
118+
119+
$ python -m pyprof.prof --csv net.dict
120+
35121
.. quick-start-end-marker-do-not-remove
36122
123+
Documentation
124+
-------------
125+
126+
The User Guide can be found in the
127+
`documentation for current release
128+
<https://docs.nvidia.com/deeplearning/frameworks/pyprof-user-guide/index.html>`_, and
129+
provides instructions on how to install and profile with PyProf.
130+
131+
A complete `Quick Start Guide <https://docs.nvidia.com/deeplearning/frameworks/pyprof-user-guide/quickstart.html>`_
132+
provides step-by-step instructions to get you quickly started using PyProf.
133+
134+
An `FAQ <https://docs.nvidia.com/deeplearning/frameworks/pyprof-user-guide/faqs.html>`_ provides
135+
answers for frequently asked questions.
136+
137+
The `Release Notes
138+
<https://docs.nvidia.com/deeplearning/frameworks/pyprof-release-notes/index.html>`_
139+
indicate the required versions of the NVIDIA Driver and CUDA, and also describe
140+
which GPUs are supported by PyProf
141+
142+
Presentation and Papers
143+
^^^^^^^^^^^^^^^^^^^^^^^
144+
145+
* `Automating End-toEnd PyTorch Profiling <https://developer.nvidia.com/gtc/2020/video/s21143>`_.
146+
* `Presentation slides <https://developer.download.nvidia.com/video/gputechconf/gtc/2020/presentations/s21143-automating-end-to-end-pytorch-profiling.pdf>`_.
147+
148+
Contributing
149+
------------
150+
151+
Contributions to PyProf are more than welcome. To
152+
contribute make a pull request and follow the guidelines outlined in
153+
the `Contributing <CONTRIBUTING.md>`_ document.
154+
155+
Reporting problems, asking questions
156+
------------------------------------
157+
158+
We appreciate any feedback, questions or bug reporting regarding this
159+
project. When help with code is needed, follow the process outlined in
160+
the Stack Overflow (https://stackoverflow.com/help/mcve)
161+
document. Ensure posted examples are:
162+
163+
* minimal – use as little code as possible that still produces the
164+
same problem
165+
166+
* complete – provide all parts needed to reproduce the problem. Check
167+
if you can strip external dependency and still show the problem. The
168+
less time we spend on reproducing problems the more time we have to
169+
fix it
170+
171+
* verifiable – test the code you're about to provide to make sure it
172+
reproduces the problem. Remove all other problems that are not
173+
related to your request/question.
174+
37175
.. |License| image:: https://img.shields.io/badge/License-Apache2-green.svg
38176
:target: http://www.apache.org/licenses/LICENSE-2.0

0 commit comments

Comments
 (0)