11# -*- coding: utf-8 -*-
22"""
3- Forward-mode Automatic Differentiation (Beta)
3+ μμ ν λͺ¨λ μλ λ―ΈλΆ (Beta)
44=============================================
55
6- This tutorial demonstrates how to use forward-mode AD to compute
7- directional derivatives (or equivalently, Jacobian-vector products).
6+ **λ²μ**: `κΉκ²½λ―Ό <https://github.com/BcKmini>`_
87
9- The tutorial below uses some APIs only available in versions >= 1.11
10- (or nightly builds).
8+ μ΄ νν 리μΌμ μμ ν λͺ¨λ μλ λ―ΈλΆ(Forward-mode Automatic Differentiation)μ μ¬μ©νμ¬ λ°©ν₯μ± λν¨μ(directional derivative) λλ μΌμ½λΉμ-λ²‘ν° κ³±(Jacobian-vector product)μ κ³μ°νλ λ°©λ²μ 보μ¬μ€λλ€.
119
12- Also note that forward-mode AD is currently in beta. The API is
13- subject to change and operator coverage is still incomplete.
10+ μλ νν 리μΌμ 1.11 μ΄μ λ²μ (λλ λμ΄ν리 λΉλ)μμλ§ μ¬μ©ν μ μλ μΌλΆ APIλ₯Ό μ¬μ©ν©λλ€.
1411
15- Basic Usage
12+ λν, μμ ν λͺ¨λ μλ λ―ΈλΆμ νμ¬ λ² ν λ²μ μ
λλ€. λ°λΌμ APIκ° λ³κ²½λ μ μμΌλ©°, μμ§ μΌλΆ μ°μ°μλ μ§μλμ§ μμ μ μμ΅λλ€.
13+
14+ κΈ°λ³Έ μ¬μ©λ²
1615--------------------------------------------------------------------
17- Unlike reverse-mode AD, forward-mode AD computes gradients eagerly
18- alongside the forward pass. We can use forward-mode AD to compute a
19- directional derivative by performing the forward pass as before,
20- except we first associate our input with another tensor representing
21- the direction of the directional derivative (or equivalently, the ``v``
22- in a Jacobian-vector product). When an input, which we call "primal", is
23- associated with a "direction" tensor, which we call "tangent", the
24- resultant new tensor object is called a "dual tensor" for its connection
25- to dual numbers[0].
26-
27- As the forward pass is performed, if any input tensors are dual tensors,
28- extra computation is performed to propagate this "sensitivity" of the
29- function.
16+ μμ ν λͺ¨λ μλ λ―ΈλΆ(Reverse-mode Automatic Differentiation)κ³Ό λ¬λ¦¬, μμ ν λͺ¨λ μλ λ―ΈλΆμ μμ ν(forward pass)λ₯Ό μ§ννλ©° κΈ°μΈκΈ°(gradient)λ₯Ό μ¦μ(κ³μ°μ λ―Έλ£¨μ§ μκ³ ) κ³μ°ν©λλ€. μμ ν λͺ¨λ μλ λ―ΈλΆμΌλ‘ λ°©ν₯μ± λν¨μλ₯Ό κ³μ°νλ €λ©΄, λ¨Όμ μ
λ ₯μ λ°©ν₯μ± λν¨μμ λ°©ν₯μ λνλ΄λ λ€λ₯Έ tensor(μΌμ½λΉμ-λ²‘ν° κ³±μ `v`μ ν΄λΉ)μ μ°κ²°ν λ€ μ΄μ κ³Ό κ°μ΄ μμ νλ₯Ό μννλ©΄ λ©λλ€. 'primal'μ΄λΌκ³ λΆλ₯΄λ μ
λ ₯μ΄ 'tangent'λΌκ³ λΆλ₯΄λ 'λ°©ν₯' tensorμ μ°κ²°λ λ, κ²°κ³Όλ‘ λμ€λ μλ‘μ΄ tensor κ°μ²΄λ μ΄μ€μ(dual numbers) [0] μμ κ΄λ ¨μ± λλ¬Έμ 'μ΄μ€ tensor(dual tensor)'λΌκ³ λΆλ¦½λλ€.
17+
18+ μμ νκ° μνλ λ, μ
λ ₯ tensor μ€ νλλΌλ μ΄μ€ tensorμ΄λ©΄ ν¨μμ 'λ―Όκ°λ(sensitivity)'λ₯Ό μ ννκΈ° μν΄ μΆκ°μ μΈ μ°μ°μ΄ μνλ©λλ€.
3019
3120"""
3221
3928def fn (x , y ):
4029 return x ** 2 + y ** 2
4130
42- # All forward AD computation must be performed in the context of
43- # a ``dual_level`` context. All dual tensors created in such a context
44- # will have their tangents destroyed upon exit. This is to ensure that
45- # if the output or intermediate results of this computation are reused
46- # in a future forward AD computation, their tangents (which are associated
47- # with this computation) won't be confused with tangents from the later
48- # computation.
31+ # λͺ¨λ μμ ν μλ λ―ΈλΆ μ°μ°μ ``dual_level`` 컨ν
μ€νΈ μμμ μνν΄μΌ ν©λλ€.
32+ # μ΄ μ»¨ν
μ€νΈμμ μμ±λ λͺ¨λ μ΄μ€ tensorμ νμ νΈλ 컨ν
μ€νΈλ₯Ό λ²μ΄λ λ μλ©Έλ©λλ€.
33+ # μ΄λ ν΄λΉ μ°μ°μ μΆλ ₯μ΄λ μ€κ° κ²°κ³Όκ° ν₯ν λ€λ₯Έ μμ ν μλ λ―ΈλΆ μ°μ°μ μ¬μ¬μ©λ λ,
34+ # νμ¬ μ°μ°μ μν νμ νΈκ° λμ€ μ°μ°μ νμ νΈμ νΌλλλ κ²μ λ°©μ§νκΈ° μν¨μ
λλ€.
4935with fwAD .dual_level ():
50- # To create a dual tensor we associate a tensor, which we call the
51- # primal with another tensor of the same size, which we call the tangent.
52- # If the layout of the tangent is different from that of the primal ,
53- # The values of the tangent are copied into a new tensor with the same
54- # metadata as the primal. Otherwise, the tangent itself is used as-is .
36+ # μ΄μ€ tensorλ₯Ό λ§λ€λ €λ©΄ 'primal' tensorλ₯Ό κ°μ ν¬κΈ°μ λ€λ₯Έ tensor,
37+ # μ¦ 'νμ νΈ( tangent)'μ μ°κ²°ν©λλ€ .
38+ # λ§μ½ νμ νΈμ λ μ΄μμμ΄ primalμ λ μ΄μμκ³Ό λ€λ₯΄λ©΄ ,
39+ # νμ νΈμ κ°μ primalκ³Ό λμΌν λ©νλ°μ΄ν°λ₯Ό κ°λ μ tensorμ 볡μ¬λ©λλ€.
40+ # κ·Έλ μ§ μμΌλ©΄ νμ νΈ μμ²΄κ° κ·Έλλ‘ μ¬μ©λ©λλ€ .
5541 #
56- # It is also important to note that the dual tensor created by
57- # ``make_dual`` is a view of the primal .
42+ # ``make_dual`` λ‘ μμ±λ μ΄μ€ tensorλ primal tensorμ **λ·°(λ°μ΄ν°λ₯Ό 곡μ νλ μ°Έμ‘°)** λΌλ μ λ
43+ # μ€μν©λλ€ .
5844 dual_input = fwAD .make_dual (primal , tangent )
5945 assert fwAD .unpack_dual (dual_input ).tangent is tangent
6046
61- # To demonstrate the case where the copy of the tangent happens ,
62- # we pass in a tangent with a layout different from that of the primal
47+ # νμ νΈκ° 볡μ¬λλ κ²½μ°λ₯Ό 보μ¬μ£ΌκΈ° μν΄ ,
48+ # primalκ³Ό λ€λ₯Έ λ μ΄μμμ κ°μ§ νμ νΈλ₯Ό μ λ¬ν©λλ€.
6349 dual_input_alt = fwAD .make_dual (primal , tangent .T )
6450 assert fwAD .unpack_dual (dual_input_alt ).tangent is not tangent
6551
66- # Tensors that do not have an associated tangent are automatically
67- # considered to have a zero-filled tangent of the same shape .
52+ # νμ νΈκ° μ°κ²°λμ§ μμ tensorλ μλμΌλ‘
53+ # κ°μ shapeμ κ°μ§λ©° 0μΌλ‘ μ±μμ§ νμ νΈλ₯Ό κ°μ§ κ²μΌλ‘ κ°μ£Όλ©λλ€ .
6854 plain_tensor = torch .randn (10 , 10 )
6955 dual_output = fn (dual_input , plain_tensor )
7056
71- # Unpacking the dual returns a ``namedtuple`` with `` primal`` and ``tangent``
72- # as attributes
57+ # μ΄μ€ tensorλ₯Ό νλ©΄(unpack) `` primal`` κ³Ό ``tangent`` λ₯Ό
58+ # μμ±μΌλ‘ κ°λ ``namedtuple`` μ΄ λ°νλ©λλ€.
7359 jvp = fwAD .unpack_dual (dual_output ).tangent
7460
7561assert fwAD .unpack_dual (dual_output ).tangent is None
7662
7763######################################################################
78- # Usage with Modules
64+ # λͺ¨λκ³Ό ν¨κ» μ¬μ©νκΈ°
7965# --------------------------------------------------------------------
80- # To use ``nn.Module`` with forward AD, replace the parameters of your
81- # model with dual tensors before performing the forward pass. At the
82- # time of writing, it is not possible to create dual tensor
83- # `nn.Parameter`s. As a workaround, one must register the dual tensor
84- # as a non-parameter attribute of the module.
66+ # ``nn.Module`` μ μμ ν μλ λ―ΈλΆκ³Ό ν¨κ» μ¬μ©νλ €λ©΄, μμ νλ₯Ό μννκΈ° μ μ
67+ # λͺ¨λΈμ λ§€κ°λ³μ(parameter)λ₯Ό μ΄μ€ tensorλ‘ κ΅μ²΄ν΄μΌ ν©λλ€. νμ¬ μ΄μ€ tensorλ‘ λ
68+ # `nn.Parameter` λ μμ±ν μ μμ΅λλ€. μ΄μ λν ν΄κ²° λ°©λ²μΌλ‘,
69+ # μ΄μ€ tensorλ₯Ό λͺ¨λμ λ§€κ°λ³μκ° μλ μΌλ° μμ±μΌλ‘ λ±λ‘ν΄μΌ ν©λλ€.
8570
8671import torch .nn as nn
8772
@@ -100,52 +85,52 @@ def fn(x, y):
10085 jvp = fwAD .unpack_dual (out ).tangent
10186
10287######################################################################
103- # Using the functional Module API (beta )
88+ # ν¨μν λͺ¨λ API μ¬μ©νκΈ° (Beta )
10489# --------------------------------------------------------------------
105- # Another way to use ``nn.Module`` with forward AD is to utilize
106- # the functional Module API (also known as the stateless Module API).
90+ # ``nn.Module`` μ μμ ν μλ λ―ΈλΆκ³Ό ν¨κ» μ¬μ©νλ λ λ€λ₯Έ λ°©λ²μ
91+ # ν¨μν λͺ¨λ APIλ₯Ό νμ©νλ κ²μ
λλ€. (μνκ° μλ λͺ¨λ APIλΌκ³ λ ν¨)
10792
10893from torch .func import functional_call
10994
110- # We need a fresh module because the functional call requires the
111- # the model to have parameters registered .
95+ # functional_callμ λͺ¨λΈμ λ§€κ°λ³μκ° λ±λ‘λμ΄ μμ΄μΌ νλ―λ‘
96+ # μλ‘μ΄ λͺ¨λμ΄ νμν©λλ€ .
11297model = nn .Linear (5 , 5 )
11398
11499dual_params = {}
115100with fwAD .dual_level ():
116101 for name , p in params .items ():
117- # Using the same ``tangents`` from the above section
102+ # μ μΉμ
κ³Ό λμΌν ``tangents`` λ₯Ό μ¬μ©ν©λλ€.
118103 dual_params [name ] = fwAD .make_dual (p , tangents [name ])
119104 out = functional_call (model , dual_params , input )
120105 jvp2 = fwAD .unpack_dual (out ).tangent
121106
122- # Check our results
107+ # κ²°κ³Ό νμΈ
123108assert torch .allclose (jvp , jvp2 )
124109
125110######################################################################
126- # Custom autograd Function
111+ # μ¬μ©μ μ μ autograd ν¨μ
127112# --------------------------------------------------------------------
128- # Custom Functions also support forward-mode AD. To create custom Function
129- # supporting forward-mode AD, register the ``jvp()`` static method. It is
130- # possible, but not mandatory for custom Functions to support both forward
131- # and backward AD. See the
132- # `documentation <https://pytorch.org/docs/master/notes/extending.html#forward-mode-ad>`_
133- # for more information .
113+ # μ¬μ©μ μ μ ν¨μ λν μμ ν λͺ¨λ μλ λ―ΈλΆμ μ§μν©λλ€. μμ ν λͺ¨λ μλ λ―ΈλΆμ
114+ # μ§μνλ μ¬μ©μ μ μ ν¨μλ₯Ό λ§λ€λ €λ©΄ ``jvp()`` μ μ λ©μλλ₯Ό
115+ # λ±λ‘ν΄μΌ ν©λλ€. μ¬μ©μ μ μ ν¨μκ° μμ νμ μμ ν μλ λ―ΈλΆμ λͺ¨λ μ§μνλ κ²λ
116+ # κ°λ₯νμ§λ§ νμλ μλλλ€. λ μμΈν μ 보λ
117+ # `λ¬Έμ <https://pytorch.org/docs/master/notes/extending.html#forward-mode-ad>`_
118+ # λ₯Ό μ°Έκ³ νμΈμ .
134119
135120class Fn (torch .autograd .Function ):
136121 @staticmethod
137122 def forward (ctx , foo ):
138123 result = torch .exp (foo )
139- # Tensors stored in ``ctx`` can be used in the subsequent forward grad
140- # computation .
124+ # ``ctx`` μ μ μ₯λ tensorλ μ΄νμ μμ ν κΈ°μΈκΈ°
125+ # κ³μ°μ μ¬μ©ν μ μμ΅λλ€ .
141126 ctx .result = result
142127 return result
143128
144129 @staticmethod
145130 def jvp (ctx , gI ):
146131 gO = gI * ctx .result
147- # If the tensor stored in`` ctx`` will not also be used in the backward pass ,
148- # one can manually free it using ``del``
132+ # `` ctx`` μ μ μ₯λ tensorκ° μμ νμ μ¬μ©λμ§ μμ κ²½μ° ,
133+ # ``del`` μ μ¬μ©νμ¬ μλμΌλ‘ λ©λͺ¨λ¦¬μμ ν΄μ ν μ μμ΅λλ€.
149134 del ctx .result
150135 return gO
151136
@@ -159,33 +144,30 @@ def jvp(ctx, gI):
159144 dual_output = fn (dual_input )
160145 jvp = fwAD .unpack_dual (dual_output ).tangent
161146
162- # It is important to use ``autograd.gradcheck`` to verify that your
163- # custom autograd Function computes the gradients correctly. By default,
164- # ``gradcheck`` only checks the backward-mode (reverse-mode) AD gradients. Specify
165- # ``check_forward_ad=True`` to also check forward grads. If you did not
166- # implement the backward formula for your function, you can also tell ``gradcheck``
167- # to skip the tests that require backward-mode AD by specifying
168- # ``check_backward_ad=False``, ``check_undefined_grad=False``, and
169- # ``check_batched_grad=False``.
147+ # μ¬μ©μ μ μ autograd ν¨μκ° κΈ°μΈκΈ°λ₯Ό μ¬λ°λ₯΄κ² κ³μ°νλμ§ νμΈνλ €λ©΄
148+ # ``autograd.gradcheck`` λ₯Ό μ¬μ©νλ κ²μ΄ μ€μν©λλ€. κΈ°λ³Έμ μΌλ‘
149+ # ``gradcheck`` λ μμ ν λͺ¨λ(reverse-mode) μλ λ―ΈλΆ κΈ°μΈκΈ°λ§ νμΈν©λλ€.
150+ # ``check_forward_ad=True`` λ₯Ό μ§μ νμ¬ μμ ν κΈ°μΈκΈ°λ νμΈνλλ‘ ν μ μμ΅λλ€.
151+ # λ§μ½ ν¨μμ λν μμ νλ₯Ό ꡬννμ§ μμλ€λ©΄, ``check_backward_ad=False``,
152+ # ``check_undefined_grad=False``, ``check_batched_grad=False`` λ₯Ό μ§μ νμ¬
153+ # ``gradcheck`` κ° μμ ν λͺ¨λ μλ λ―ΈλΆμ΄ νμν ν
μ€νΈλ₯Ό 건λλ°λλ‘ ν μ μμ΅λλ€.
170154torch .autograd .gradcheck (Fn .apply , (primal ,), check_forward_ad = True ,
171155 check_backward_ad = False , check_undefined_grad = False ,
172156 check_batched_grad = False )
173157
174158######################################################################
175- # Functional API (beta )
159+ # ν¨μν API (Beta )
176160# --------------------------------------------------------------------
177- # We also offer a higher-level functional API in functorch
178- # for computing Jacobian-vector products that you may find simpler to use
179- # depending on your use case.
161+ # Functorchλ μΌμ½λΉμ-λ²‘ν° κ³±μ κ³μ°νκΈ° μν κ³ μμ€ ν¨μν APIλ
162+ # μ 곡νλ©°, μ¬μ© μ¬λ‘μ λ°λΌ λ κ°λ¨νκ² μ¬μ©ν μ μμ΅λλ€.
180163#
181- # The benefit of the functional API is that there isn't a need to understand
182- # or use the lower-level dual tensor API and that you can compose it with
183- # other `functorch transforms (like vmap) <https://pytorch.org/functorch/stable/notebooks/jacobians_hessians.html>`_;
184- # the downside is that it offers you less control.
164+ # ν¨μν APIμ μ₯μ μ μ μμ€μ μ΄μ€ tensor APIλ₯Ό μ΄ν΄νκ±°λ μ¬μ©ν
165+ # νμκ° μμΌλ©°, λ€λ₯Έ `functorch λ³ν(vmap λ±)κ³Ό κ²°ν© <https://pytorch.org/functorch/stable/notebooks/jacobians_hessians.html>`_
166+ # ν μ μλ€λ κ²μ
λλ€. λ¨μ μ μΈλ°ν μ μ΄κ° μ΄λ ΅λ€λ μ μ
λλ€.
185167#
186- # Note that the remainder of this tutorial will require functorch
187- # (https://github.com/pytorch/functorch) to run. Please find installation
188- # instructions at the specified link .
168+ # μ΄ νν 리μΌμ λλ¨Έμ§ λΆλΆμ μ€ννλ €λ©΄ functorch
169+ # (https://github.com/pytorch/functorch) κ° νμν©λλ€.
170+ # μ€μΉ λ°©λ²μ ν΄λΉ λ§ν¬μμ νμΈν΄μ£ΌμΈμ .
189171
190172import functorch as ft
191173
@@ -197,14 +179,15 @@ def jvp(ctx, gI):
197179def fn (x , y ):
198180 return x ** 2 + y ** 2
199181
200- # Here is a basic example to compute the JVP of the above function.
201- # The ``jvp(func, primals, tangents)`` returns ``func(*primals)`` as well as the
202- # computed Jacobian-vector product (JVP). Each primal must be associated with a tangent of the same shape.
182+ # μ ν¨μμ JVPλ₯Ό κ³μ°νλ κΈ°λ³Έ μμ μ
λλ€.
183+ # ``jvp(func, primals, tangents)`` λ ``func(*primals)`` μ κ²°κ³Όμ κ³μ°λ
184+ # μΌμ½λΉμ-λ²‘ν° κ³±(JVP)μ ν¨κ» λ°νν©λλ€. κ° primalμ κ°μ shapeμ νμ νΈμ
185+ # μ°κ²°λμ΄μΌ ν©λλ€.
203186primal_out , tangent_out = ft .jvp (fn , (primal0 , primal1 ), (tangent0 , tangent1 ))
204187
205- # ``functorch.jvp`` requires every primal to be associated with a tangent .
206- # If we only want to associate certain inputs to `fn` with tangents ,
207- # then we'll need to create a new function that captures inputs without tangents:
188+ # ``functorch.jvp`` λ λͺ¨λ primalμ΄ νμ νΈμ μ°κ²°λ κ²μ μꡬν©λλ€ .
189+ # λ§μ½ ``fn`` μ νΉμ μ
λ ₯μλ§ νμ νΈλ₯Ό μ°κ²°νκ³ μΆλ€λ©΄ ,
190+ # νμ νΈκ° μλ μ
λ ₯μ λ°λ μλ‘μ΄ ν¨μλ₯Ό λ§λ€μ΄μΌ ν©λλ€.
208191primal = torch .randn (10 , 10 )
209192tangent = torch .randn (10 , 10 )
210193y = torch .randn (10 , 10 )
@@ -214,33 +197,30 @@ def fn(x, y):
214197primal_out , tangent_out = ft .jvp (new_fn , (primal ,), (tangent ,))
215198
216199######################################################################
217- # Using the functional API with Modules
200+ # ν¨μν APIλ₯Ό λͺ¨λκ³Ό ν¨κ» μ¬μ©νκΈ°
218201# --------------------------------------------------------------------
219- # To use ``nn.Module`` with ``functorch.jvp`` to compute Jacobian-vector products
220- # with respect to the model parameters, we need to reformulate the
221- # ``nn.Module`` as a function that accepts both the model parameters and inputs
222- # to the module.
202+ # ``nn.Module`` κ³Ό ``functorch.jvp`` λ₯Ό ν¨κ» μ¬μ©νμ¬ λͺ¨λΈ λ§€κ°λ³μμ λν
203+ # μΌμ½λΉμ-λ²‘ν° κ³±μ κ³μ°νλ €λ©΄, ``nn.Module`` μ λͺ¨λΈ λ§€κ°λ³μμ
204+ # λͺ¨λμ μ
λ ₯μ λͺ¨λ μΈμλ‘ λ°λ ν¨μλ‘ μ¬κ΅¬μ±ν΄μΌ ν©λλ€.
223205
224206model = nn .Linear (5 , 5 )
225207input = torch .randn (16 , 5 )
226208tangents = tuple ([torch .rand_like (p ) for p in model .parameters ()])
227209
228- # Given a ``torch.nn.Module``, ``ft.make_functional_with_buffers`` extracts the state
229- # (``params`` and buffers) and returns a functional version of the model that
230- # can be invoked like a function.
231- # That is, the returned ``func`` can be invoked like
232- # ``func(params, buffers, input)``.
233- # ``ft.make_functional_with_buffers`` is analogous to the ``nn.Modules`` stateless API
234- # that you saw previously and we're working on consolidating the two.
210+ # ``ft.make_functional_with_buffers`` λ μ£Όμ΄μ§ ``torch.nn.Module`` μμ
211+ # μν(``params`` μ λ²νΌ)λ₯Ό μΆμΆνκ³ , ν¨μμ²λΌ νΈμΆν μ μλ
212+ # ν¨μν λ²μ μ λͺ¨λΈμ λ°νν©λλ€.
213+ # μ¦, λ°νλ ``func`` λ ``func(params, buffers, input)`` μ²λΌ νΈμΆν μ μμ΅λλ€.
214+ # ``ft.make_functional_with_buffers`` λ μ΄μ μ 보μλ ``nn.Module`` μ μν μλ APIμ
215+ # μ μ¬νλ©°, μ΄ λμ ν΅ν©νλ μμ
μ΄ μ§ν μ€μ
λλ€.
235216func , params , buffers = ft .make_functional_with_buffers (model )
236217
237- # Because ``jvp`` requires every input to be associated with a tangent, we need to
238- # create a new function that, when given the parameters, produces the output
218+ # ``jvp`` λ λͺ¨λ μ
λ ₯μ΄ νμ νΈμ μ°κ²°λ κ²μ μꡬνλ―λ‘,
219+ # λ§€κ°λ³μλ₯Ό λ°μμ λ μΆλ ₯μ μμ±νλ μλ‘μ΄ ν¨μλ₯Ό λ§λ€μ΄μΌ ν©λλ€.
239220def func_params_only (params ):
240221 return func (params , buffers , input )
241222
242223model_output , jvp_out = ft .jvp (func_params_only , (params ,), (tangents ,))
243224
244-
245225######################################################################
246- # [0] https://en.wikipedia.org/wiki/Dual_number
226+ # [0] https://en.wikipedia.org/wiki/Dual_number
0 commit comments