Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
A
accelerated_deeplearning_training
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Iterations
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Locked files
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package Registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Code review analytics
Issue analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Accelerated_training
accelerated_deeplearning_training
Merge requests
!12
Torch anderson
Code
Review changes
Check out branch
Download
Patches
Plain diff
Merged
Torch anderson
torch_Anderson
into
master
Overview
0
Commits
3
Pipelines
0
Changes
1
Merged
Reshniak, Viktor
requested to merge
torch_Anderson
into
master
4 years ago
Overview
0
Commits
3
Pipelines
0
Changes
1
Expand
The list of changes:
reimplemented utils/anderson_acceleration.py
removed modules/AccelerationModule.py:
in my opinion, it is unnecessary
moved all logic to modules/optimizers.py
updated modules/optimizers.py:
removed abstract Optimizers class
the training loop is now implemented only in FixedPointIteration class
DeterministicAcceleration now inherits from FixedPointIteration and just reimplements its accelerate method
acceleration is now performed every time parameters are updated and not just on every epoch (not sure about this)
added with
torch.no_grad()
logic to avoid memory leaking
history of updates is now
collections.deque
instead of
list
Edited
4 years ago
by
Reshniak, Viktor
0
0
Merge request reports
Viewing commit
d69be7d8
Prev
Next
Show latest version
1 file
+
26
−
64
Inline
Compare changes
Side-by-side
Inline
Show whitespace changes
Show one file at a time
d69be7d8
update anderson_acceleration.py
· d69be7d8
Reshniak, Viktor
authored
4 years ago
utils/anderson_acceleration.py
+
26
−
64
Options
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Fri Dec 4 11:52:48 2020
@author: 7ml
"""
import
torch
import
math
import
numpy
as
np
def
anderson
(
X
,
reg
=
0
):
def
anderson
(
X
,
relaxation
=
1.0
):
# Anderson Acceleration
# Take a matrix X of iterates, where X[:,i] is the difference between the {i+1}th and the ith iterations of the
# fixed-point operation
# x_i = g(x_{i-1})
#
# r_i = x_{i+1} - x_i
# X[:,i] = r_i
#
# reg is the regularization parameter used for solving the system
# (F'F + reg I)z = r_{i+1}
# where F is the matrix of differences in the residual, i.e. R[:,i] = r_{i+1}-r_{i}
# Recovers parameters, ensure X is a matrix
(
d
,
k
)
=
np
.
shape
(
X
)
k
=
k
-
1
X
=
np
.
asmatrix
(
X
)
# check if necessary
# Compute the matrix of residuals
DX
=
np
.
diff
(
X
)
DR
=
np
.
diff
(
DX
)
# Take a matrix X of iterates such that X[:,i] = g(X[:,i-1])
# Return acceleration for X[:,-1]
projected_residual
=
DX
[:,
k
-
1
]
DX
=
DX
[:,
:
-
1
]
assert
X
.
ndim
==
2
,
"
X must be a matrix
"
# Solve (R'R + lambda I)z = 1
(
extr
,
c
)
=
anderson_precomputed
(
DX
,
DR
,
projected_residual
,
reg
)
# Compute residuals
DX
=
X
[:,
1
:]
-
X
[:,:
-
1
]
# DX[:,i] = X[:,i+1] - X[:,i]
DR
=
DX
[:,
1
:]
-
DX
[:,:
-
1
]
# DR[:,i] = DX[:,i+1] - DX[:,i] = X[:,i+2] - 2*X[:,i+1] + X[:,i]
# Compute the extrapolation / weigthed mean "sum_i c_i x_i", and return
return
extr
,
c
# # use QR factorization
# q, r = torch.qr(DR)
# gamma, _ = torch.triangular_solve( (q.t()@DX[:,-1]).unsqueeze(1), r )
# gamma = gamma.squeeze(1)
# solve unconstrained least-squares problem
gamma
,
_
=
torch
.
lstsq
(
DX
[:,
-
1
].
unsqueeze
(
1
),
DR
)
gamma
=
gamma
.
squeeze
(
1
)[:
DR
.
size
(
1
)]
def
anderson_precomputed
(
DX
,
DR
,
residual
,
reg
=
0
):
# Regularized Nonlinear Acceleration, with RR precomputed
# Same than rna, but RR is computed only once
# compute acceleration
extr
=
X
[:,
-
2
]
+
DX
[:,
-
1
]
-
(
DX
[:,:
-
1
]
+
DR
)
@gamma
# Recovers parameters
(
d
,
k
)
=
DX
.
shape
RR
=
np
.
matmul
(
np
.
transpose
(
DR
),
DR
)
if
math
.
sqrt
(
np
.
linalg
.
cond
(
RR
,
'
fro
'
))
<
1e5
:
if
relaxation
!=
1
:
assert
relaxation
>
0
,
"
relaxation must be positive
"
# compute solution of the contraint optimization problem s.t. gamma = X[:,1:]@alpha
alpha
=
torch
.
zeros
(
gamma
.
numel
()
+
1
)
alpha
[
0
]
=
gamma
[
0
]
alpha
[
1
:
-
1
]
=
gamma
[
1
:]
-
gamma
[:
-
1
]
alpha
[
-
1
]
=
1
-
gamma
[
-
1
]
extr
=
relaxation
*
extr
+
(
1
-
relaxation
)
*
X
[:,:
-
1
]
@alpha
# In case of singular matrix, we solve using least squares instead
q
,
r
=
np
.
linalg
.
qr
(
DR
)
new_residual
=
np
.
matmul
(
np
.
transpose
(
q
),
residual
)
z
=
np
.
linalg
.
lstsq
(
r
,
new_residual
,
reg
)
z
=
z
[
0
]
# Recover weights c, where sum(c) = 1
if
np
.
abs
(
np
.
sum
(
z
))
<
1e-10
:
z
=
np
.
ones
((
k
,
1
))
alpha
=
np
.
asmatrix
(
z
/
np
.
sum
(
z
))
else
:
alpha
=
np
.
zeros
((
DX
.
shape
[
1
],
1
))
# Compute the extrapolation / weigthed mean "sum_i c_i x_i", and return
extr
=
np
.
matmul
(
DX
,
alpha
)
return
np
.
array
(
extr
),
alpha
\ No newline at end of file
return
extr
\ No newline at end of file
Loading