@@ -257,8 +257,18 @@ be replicated transparently across sites.
257
257
The multi-site solution described here depends upon the combined use of different
258
258
components:
259
259
260
- - ** multi-site plugin** : Enables the replication of Gerrit _ indexes_ , _ caches_ ,
261
- and _ stream events_ across sites.
260
+ - ** multi-site libModule** : exports interfaces as DynamicItems to plug in specific
261
+ implementation of ` Brokers ` and ` Global Ref-DB ` plugins.
262
+
263
+ - ** broker plugin** : an implementation of the broker interface, which enables the
264
+ replication of Gerrit _ indexes_ , _ caches_ , and _ stream events_ across sites.
265
+ When no specific implementation is provided, then the [ Broker Noop implementation] ( #broker-noop-implementation )
266
+ then libModule interfaces are mapped to internal no-ops implementations.
267
+
268
+ - ** Global Ref-DB plugin** : an implementation of the Global Ref-DB interface,
269
+ which enables the detection of out-of-sync refs across gerrit sites.
270
+ When no specific implementation is provided, then the [ Global Ref-DB Noop implementation] ( #global-ref-db-noop-implementation )
271
+ then libModule interfaces are mapped to internal no-ops implementations.
262
272
263
273
- ** replication plugin** : enables the replication of the _ Git repositories_ across
264
274
sites.
@@ -277,10 +287,78 @@ The interactions between these components are illustrated in the following diagr
277
287
278
288
## Implementation Details
279
289
280
- ### Message brokers
281
- The multi-site plugin adopts an event-sourcing pattern and is based on an
282
- external message broker. The current implementation uses Apache Kafka.
283
- It is, however, potentially extensible to others, like RabbitMQ or NATS.
290
+ ### Multi-site libModule
291
+ As mentioned earlier there are different components behind the overarching architecture
292
+ of this solution of a distributed multi-site gerrit installation, each one fulfilling
293
+ a specific goal. However, whilst the goal of each component is well-defined, the
294
+ mechanics on how each single component achieves that goal is not: the choice of which
295
+ specific message broker or which Ref-DB to use can depend on different factors,
296
+ such as scalability, maintainability, business standards and costs, to name a few.
297
+
298
+ For this reason the multi-site component is designed to be explicitly agnostic to
299
+ specific choices of brokers and Global Ref-DB implementations, and it does
300
+ not care how they, specifically, fulfill their task.
301
+
302
+ Instead, this component takes on only two responsibilities:
303
+
304
+ * Wrapping the GitRepositoryManager so that every interaction with git can be
305
+ verified by the Global Ref-DB plugin.
306
+
307
+ * Exposing DynamicItem bindings onto which concrete _ Broker_ and a _ Global Ref-DB_
308
+ plugins can register their specific implementations.
309
+ When no such plugins are installed, then the initial binding points to no-ops.
310
+
311
+ * Detect out-of-sync refs across multiple gerrit sites:
312
+ Each change attempting to mutate a ref will be checked against the Ref-DB to
313
+ guarantee that each node has an up-to-date view of the repository state.
314
+
315
+ ### Message brokers plugin
316
+ Each gerrit node in the cluster needs to be informed and inform all other nodes
317
+ about fundamental events, such as indexing of new changes, cache evictions and
318
+ stream events. This component will provide a specific pub/sub broker implementation
319
+ that is able to do so.
320
+
321
+ When provided, the message broker plugin will override the dynamicItem binding exposed
322
+ by the multi-site module with a specific implementation, such as Kafka, RabbitMQ, NATS, etc.
323
+
324
+ #### Broker Noop implementation
325
+ The default ` Noop ` implementation provided by the ` Multi-site ` libModule does nothing
326
+ upon publishing and producing events. This is useful for setting up a test environment
327
+ and allows multi-site library to be installed independently from any additional
328
+ plugins or the existence of a specific broker installation.
329
+ The Noop implementation can also be useful when there is no need for coordination
330
+ with remote nodes, since it avoids maintaining an external broker altogether:
331
+ for example, using the multi-site plugin purely for the purpose of replicating the Git
332
+ repository to a disaster-recovery site and nothing else.
333
+
334
+ ### Global Ref-DB plugin
335
+ Whilst the replication plugin allows the propagation of the Git repositories across
336
+ sites and the broker plugin provides a mechanism to propagate events, the Global
337
+ Ref-DB ensures correct alignment of refs of the multi-site nodes.
338
+
339
+ It is the responsibility of this plugin to store atomically key/pairs of refs in
340
+ order to allow the libModule to detect out-of-sync refs across multi sites.
341
+ (aka split brain). This is achieved by storing the most recent ` sha ` for each
342
+ specific mutable ` refs ` , by the usage of some sort of atomic _ Compare and Set_ operation.
343
+
344
+ We mentioned earlier the [ CAP theorem] ( https://en.wikipedia.org/wiki/CAP_theorem ) ,
345
+ which in a nutshell states that a distributed system can only provide two of these
346
+ three properties: _ Consistency_ , _ Availability_ and _ Partition tolerance_ : the Global
347
+ Ref-DB helps achieving _ Consistency_ and _ Partition tolerance_ (thus sacrificing
348
+ Availability).
349
+
350
+ See [ Prevent split brain thanks to Global Ref-DB] ( #prevent-split-brain-thanks-to-global-ref-db )
351
+ For a thorough example on this.
352
+
353
+ When provided, the Global Ref-DB plugin will override the dynamicItem binding
354
+ exposed by the multi-site module with a specific implementation, such as Zoekeeper,
355
+ etcd, MySQL, Mongo, etc.
356
+
357
+ #### Global Ref-DB Noop implementation
358
+ The default ` Noop ` implementation provided by the ` Multi-site ` libModule accepts
359
+ any refs without checking for consistency. This is useful for setting up a test environment
360
+ and allows multi-site library to be installed independently from any additional
361
+ plugins or the existence of a specific Ref-DB installation.
284
362
285
363
### Eventual consistency on Git, indexes, caches, and stream events
286
364
@@ -408,22 +486,10 @@ detail below.
408
486
409
487
** NOTE** : The two options are not exclusive.
410
488
411
- #### Introduce a ` DfsRefDatabase `
412
-
413
- An implementation of the out-of-sync detection logic could be based on a central
414
- coordinator holding the _ last known status_ of a _ mutable ref_ (immutable refs won't
415
- have to be stored here). This would be, essentially, a DFS base ` RefDatabase ` or ` DfsRefDatabase ` .
416
-
417
- This component would:
418
-
419
- - Contain a subset of the local ` RefDatabase ` data:
420
- - Store only _ mutable _ ` refs `
421
- - Keep only the most recent ` sha ` for each specific ` ref `
422
- - Require that atomic _ Compare and Set_ operations can be performed on a
423
- key -> value storage. For example, it could be implemented using ` Zookeeper ` . (One implementation
424
- was done by Dave Borowitz some years ago.)
489
+ #### Prevent split brain thanks to Global Ref-DB
425
490
426
- This interaction is illustrated in the diagram below:
491
+ The above scenario can be prevented by using an implementation of the Global Ref-DB
492
+ interface, which will operate as follows:
427
493
428
494
![ Split Brain Prevented] ( images/git-replication-split-brain-detected.png )
429
495
@@ -469,23 +535,10 @@ sent the request to the Ref-DB but before persisting this request into its `git`
469
535
able to differentiate the type of traffic and, thus, is forced always to use the
470
536
RW site, even though the operation is RO.
471
537
472
- - ** Support for different brokers** : Currently, the multi-site plugin supports Kafka.
473
- More brokers need to be supported in a fashion similar to the
474
- [ ITS-* plugins framework] ( https://gerrit-review.googlesource.com/admin/repos/q/filter:plugins%252Fits ) .
475
- Explicit references to Kafka must be removed from the multi-site plugin. Other plugins may contribute
476
- implementations to the broker extension point.
477
-
478
- - ** Split the publishing and subscribing** : Create two separate
479
- plugins. Combine the generation of the events into the current kafka-
480
- events plugin. The multi-site plugin will focus on
481
- consumption of, and sorting of, the replication issues.
482
-
483
538
## Step-2: Move to multi-site Stage #8 .
484
539
485
540
- Auto-reconfigure HAProxy rules based on the projects sharding policy
486
541
487
542
- Serve RW/RW traffic based on the project name/ref-name.
488
543
489
544
- Balance traffic with "locally-aware" policies based on historical data
490
-
491
- - Preventing split-brain in case of temporary sites isolation
0 commit comments