Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hier neighbor allreduce #58

Merged
merged 46 commits into from
Nov 5, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
aeb3b6a
Add hierachical_neighbor_allreduce python interface
BichengYing Oct 23, 2020
84e6975
Pass down is_hierachical down to perform
BichengYing Oct 23, 2020
cc4fb1d
Build local nccl comm
BichengYing Oct 23, 2020
2df368a
add ihierarchical neighbor allreduce impl in NCCL
BichengYing Oct 23, 2020
1e61f3c
export hierachical_neighbor_allreduce
BichengYing Oct 23, 2020
aafdf11
Update mpi_ops.py
BichengYing Oct 23, 2020
d7f1d01
remove extra avg_computation in hierachical nar
BichengYing Oct 23, 2020
b4dd476
fix the typo of hierarchical
BichengYing Oct 23, 2020
85d9e97
build DistributedHierarchicalNeighborAllreduceOptimizer
BichengYing Oct 23, 2020
f80cabf
Add GetExp2DynamicSendRecvMachineRanks
BichengYing Oct 24, 2020
bb9fa00
Add hierarchical_neighbor_allreduce
BichengYing Oct 24, 2020
aa53d06
Update pytorch_cifar10_resnet.py
BichengYing Oct 24, 2020
869f609
Copy input for hier nar due to in-place allreduce
BichengYing Oct 24, 2020
2e04cbf
Update pytorch_cifar10_resnet.py
BichengYing Oct 24, 2020
b88dde3
Copy fused input back to input for fused hier n_ar
BichengYing Oct 24, 2020
96214e5
Use local rank 0 to recv and local rank 1 to send in h n_ar
BichengYing Oct 24, 2020
95fd66c
Add h n_ar to other examples and enable dynamic => disable args
BichengYing Oct 25, 2020
2c65cef
fix the directly typo
BichengYing Oct 25, 2020
5c73e70
Fix the mesh topo usage
BichengYing Oct 25, 2020
f6df8e1
Update the comments for hier n_ar
BichengYing Oct 26, 2020
f5d5f65
Add is_hierarchical to request message
BichengYing Oct 26, 2020
4489a9d
fix the is_hierarchical usage in request
BichengYing Oct 26, 2020
4344a10
Update GetExp2DynamicSendRecvMachineRanks in 2 machines case
BichengYing Oct 26, 2020
510210a
Update hier n_ar comments
BichengYing Oct 26, 2020
726b06e
Use local rank 0 to send/recv for h n_ar
BichengYing Oct 26, 2020
8650da5
Add allreduce inplace API
BichengYing Oct 28, 2020
cc05047
Update pytorch_benchmark.py
BichengYing Oct 28, 2020
1a67e28
Add hierarchical local allreduce
BichengYing Oct 29, 2020
5b95719
Add use_empty_function_in_communication in dist optimizer
BichengYing Oct 31, 2020
e869fc0
Add Hierarchical ops explanation in docs
BichengYing Oct 31, 2020
87b8a16
Update the document and remove som API
BichengYing Oct 31, 2020
2fbab44
Simplify the examples folder
BichengYing Oct 31, 2020
a6228c2
Add hier n_ar for mpi case
BichengYing Oct 31, 2020
c77cb00
Update mpi h_nar
BichengYing Oct 31, 2020
d85766c
Remove the modula device count since we support only at most one devi…
BichengYing Nov 1, 2020
e266ca6
Add h_nar for fused case
BichengYing Nov 1, 2020
38db5b5
Add the missing MemcpyOutFusionBufferForInputs for h_nar
BichengYing Nov 1, 2020
cf481e1
Remove mesh/star topo in examples
BichengYing Nov 1, 2020
854e99b
Minor style update
BichengYing Nov 1, 2020
cb619f7
Merge branch 'master' into hier_neighbor_allreduce
BichengYing Nov 1, 2020
aa4094a
PowerGraph => ExponentialGraph
BichengYing Nov 2, 2020
af9bf12
Merge branch 'master' into hier_neighbor_allreduce
BichengYing Nov 2, 2020
42c3f95
Update install and ops documents
BichengYing Nov 2, 2020
618a2be
Merge Imagenet and Cifar 10 example together and remove pair_gossip
BichengYing Nov 3, 2020
d771ad0
Update performance and API documents
BichengYing Nov 4, 2020
fc4701e
Add neighbor averaging doc
BichengYing Nov 4, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions bluefog/common/basics.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ def init(self, topology_fn: Optional[Callable[[int], networkx.DiGraph]] = None,
Args:
topology_fn: A callable function that takes size as input and return
networkx.DiGraph object to decide the topology. If not provided
a default power graph (base 2) structure is called.
a default exponential graph (base 2) structure is called.
is_weighted: If set to true, the neighbor ops like (win_update, neighbor_allreduce) will
execute the weighted average instead, where the weight is the value used in
topology matrix (including self).
Expand All @@ -60,7 +60,7 @@ def init(self, topology_fn: Optional[Callable[[int], networkx.DiGraph]] = None,
if topology_fn:
topo = topology_fn(self.size())
else:
topo = topology_util.PowerGraph(self.size())
topo = topology_util.ExponentialGraph(self.size())
self.set_topology(topo, is_weighted)
atexit.register(self.shutdown)

Expand Down Expand Up @@ -191,7 +191,7 @@ def set_topology(self, topology: Optional[networkx.DiGraph] = None,

Args:
Topo: A networkx.DiGraph object to decide the topology. If not provided
a default power graph (base 2) structure is used.
a default exponential graph (base 2) structure is used.
is_weighted: If set to true, the win_update and neighbor_allreduce will execute the
weighted average instead, where the weights are the value used in topology matrix
(including self weight). Note win_get/win_put/win_accumulate do not use this weight
Expand All @@ -207,10 +207,10 @@ def set_topology(self, topology: Optional[networkx.DiGraph] = None,
>>> bf.set_topology(topology_util.RingGraph(bf.size()))
"""
if topology is None:
topology = topology_util.PowerGraph(size=self.size())
topology = topology_util.ExponentialGraph(size=self.size())
if self.local_rank() == 0:
logger.info(
"Topology is not specified. Default Power Two topology is used.")
"Topology is not specified. Default Exponential Two topology is used.")

if not isinstance(topology, networkx.DiGraph):
raise TypeError("topology must be a networkx.DiGraph obejct.")
Expand Down
3 changes: 3 additions & 0 deletions bluefog/common/common.h
Original file line number Diff line number Diff line change
Expand Up @@ -278,6 +278,9 @@ struct TensorTableEntry {
// Boolean value for enabling topology check.
bool enable_topo_check = false;

// Boolean value for hierarchical operation or not.
bool is_hierarchical = false;

// The ops requires the mutex.
bool require_mutex = false;

Expand Down
6 changes: 6 additions & 0 deletions bluefog/common/message.cc
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,10 @@ int32_t Request::device() const { return device_; }

void Request::set_device(int32_t value) { device_ = value; }

bool Request::is_hierarchical() const { return is_hierarchical_; }

void Request::set_is_hierarchical(bool value) { is_hierarchical_ = value; }

const std::vector<int64_t>& Request::tensor_shape() const {
return tensor_shape_;
}
Expand All @@ -118,6 +122,7 @@ void Request_ParseFromWire(Request& request,
request.set_tensor_name(obj->tensor_name()->str());
request.set_root_rank(obj->root_rank());
request.set_device(obj->device());
request.set_is_hierarchical(obj->is_hierarchical());
request.set_tensor_shape(std::vector<int64_t>(obj->tensor_shape()->begin(),
obj->tensor_shape()->end()));
}
Expand All @@ -137,6 +142,7 @@ void Request_SerializeToWire(const Request& request,
request_builder.add_tensor_name(tensor_name_wire);
request_builder.add_root_rank(request.root_rank());
request_builder.add_device(request.device());
request_builder.add_is_hierarchical(request.is_hierarchical());
request_builder.add_tensor_shape(tensor_shape_wire);
obj = request_builder.Finish();
}
Expand Down
4 changes: 4 additions & 0 deletions bluefog/common/message.h
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,9 @@ class Request {
int32_t device() const;
void set_device(int32_t value);

bool is_hierarchical() const;
void set_is_hierarchical(bool value);

const std::vector<int64_t>& tensor_shape() const;
void set_tensor_shape(const std::vector<int64_t>& value);
void add_tensor_shape(int64_t value);
Expand All @@ -84,6 +87,7 @@ class Request {
DataType tensor_type_ = DataType::BLUEFOG_UINT8;
int32_t root_rank_ = 0;
int32_t device_ = 0;
bool is_hierarchical_ = false;
std::string tensor_name_;
std::vector<int64_t> tensor_shape_;
};
Expand Down
Loading