V3 Multi-Datacenter Replication
cluster_mgr settingThe cluster_mgr setting must be set in order for version 3 replication to run.
The configuration for Multi-Datacenter (MDC) Replication is kept in
both the riak_core and riak_repl sections of the app.config
configuration file.
If you are using Riak Enterprise version 2.0, configuration is managed
using the advanced.config files on
each node. The semantics of the advanced.config file are similar to
the formerly used app.config file. For more information and for a list
of configurable parameters, see our documentation on Advanced Configuration.
Here is a sample of the syntax:
{riak_core, [
    %% Every *node* runs one cluster_mgr
    {cluster_mgr, {"0.0.0.0", 9080 }},
    % ...
]},
{riak_repl, [
    %% Pick the correct data_root for your platform
    %% Debian/Centos/RHEL:
    {data_root, "/var/lib/riak/data/riak_repl"},
    %% Solaris:
    %% {data_root, "/opt/riak/data/riak_repl"},
    %% FreeBSD/SmartOS:
    %% {data_root, "/var/db/riak/riak_repl"},
    {max_fssource_cluster, 5},
    {max_fssource_node, 2},
    {max_fssink_node, 2},
    {fullsync_on_connect, false},
    % ...
]}
Settings
Riak MDC configuration is set using the standard Erlang config file
syntax {Setting, Value}. For example, if you wished to set
fullsync_on_connect to false, you would insert this line into the
riak_repl section (appending a comma if you have more settings to
follow):
{fullsync_on_connect, false}
Once your configuration is set, you can verify its correctness by
running the riak command-line tool:
riak chkconfig
riak_repl Settings
| Setting | Options | Default | Description | 
|---|---|---|---|
| cluster_mgr | {ip_address, port} | REQUIRED | The cluster manager will listen for connections from remote clusters on this ip_addressandport. Every node runs one cluster manager, but only the cluster manager running on thecluster_leaderwill service requests. This can change as nodes enter and leave the cluster. The value is a combination of an IP address (not hostname) followed by a port number. | 
| max_fssource_cluster | nodes(integer) | 5 | The hard limit on the number of workers which will participate in the source cluster during a fullsync replication. This means that if one has configured fullsync for two different clusters, both with a max_fssource_clusterof 5, 10 fullsync workers can be in progress. Only affects nodes on the source cluster on which this parameter is defined via the configuration file or command line. | 
| max_fssource_node | nodes(integer) | 1 | Limits the number of fullsync workers that will be running on each individual node in a source cluster. This is a hard limit for all fullsyncs enabled; additional fullsync configurations will not increase the number of fullsync workers allowed to run on any node. Only affects nodes on the source cluster on which this parameter is defined via the configuration file or command line. | 
| max_fssink_node | nodes(integer) | 1 | Limits the number of fullsync workers allowed to run on each individual node in a sink cluster. This is a hard limit for all fullsync sources interacting with the sink cluster. Thus, multiple simultaneous source connections to the sink cluster will have to share the sink nodes number of maximum connections. Only affects nodes on the sink cluster on which this parameter is defined via the configuration file or command line. | 
| fullsync_on_connect | true,false | true | Whether to initiate a fullsync on initial connection from the secondary cluster | 
| data_root | path(string) | data/riak_repl | Path (relative or absolute) to the working directory for the replication process | 
| fullsync_interval | minutes(integer) OR[{sink_cluster, minutes(integer)}, ...] | 360 | A single integer value representing the duration to wait in minutes between fullsyncs, or a list of {"clustername", time_in_minutes}pairs for each sink participating in fullsync replication. | 
| rtq_overload_threshold | length(integer) | 2000 | The maximum length to which the realtime replication queue can grow before new objects are dropped. Dropped objects will need to be replicated with a fullsync. | 
| rtq_overload_recover | length(integer) | 1000 | The length to which the realtime replication queue, in an overload mode, must shrink before new objects are replicated again. | 
| rtq_max_bytes | bytes(integer) | 104857600 | The maximum size to which the realtime replication queue can grow before new objects are dropped. Defaults to 100MB. Dropped objects will need to be replicated with a fullsync. | 
| proxy_get | enabled,disabled | disabled | Enable Riak CS proxy_getand block filter. | 
| rt_heartbeat_interval | seconds(integer) | 15 | A full explanation can be found below. | 
| rt_heartbeat_timeout | seconds(integer) | 15 | A full explanation can be found below. | 
riak_core Settings
| Setting | Options | Default | Description | 
|---|---|---|---|
| keyfile | path(string) | undefined | Fully qualified path to an ssl .pemkey file | 
| cacertdir | path(string) | undefined | The cacertdiris a fully-qualified directory containing all the CA certificates needed to verify the CA chain back to the root | 
| certfile | path(string) | undefined | Fully qualified path to a .pemcert file | 
| ssl_depth | depth(integer) | 1 | Set the depth to check for SSL CA certs. See [^1]). | 
| ssl_enabled | true,false | false | Enable SSL communications | 
| peer_common_name_acl | cert(string) | "*" | Verify an SSL peer’s certificate common name. You can provide an ACL as a list of common name patterns, and you can wildcard the leftmost part of any of the patterns, so *.basho.comwould matchsite3.basho.combut notfoo.bar.basho.comorbasho.com. See [^2]. | 
Heartbeat Settings
There are two realtime-replication-related settings in the riak_repl
section of advanced.config related to the periodic "heartbeat" that is sent
from the source to the sink cluster to verify the sink cluster's
liveness. The rt_heartbeat_interval setting determines how often the
heartbeat is sent (in seconds). If a heartbeat is sent and a response is
not received, Riak will wait rt_heartbeat_timeout seconds before
attempting to re-connect to the sink; if any data is received from the
sink, even if it is not heartbeat data, the timer will be reset. Setting
rt_heartbeat_interval to undefined will disable the heartbeat.
One of the consequences of lowering the timeout threshold arises when connections are working properly but are slow to respond (perhaps due to heavy load). In this case, shortening the timeout means that Riak may attempt to re-connect more often that it needs to. On the other hand, lengthening the timeout will make Riak less sensitive to cases in which the connection really has been compromised.
[^1]: SSL depth is the maximum number of non-self-issued
intermediate certificates that may follow the peer certificate in a valid
certificate chain. If depth is 0, the PEER must be signed by the trusted
ROOT-CA directly; if 1 the path can be PEER, CA, ROOT-CA; if depth is 2
then PEER, CA, CA, ROOT-CA and so on.
[^2]: If the ACL is specified and not the special value *,
peers presenting certificates not matching any of the patterns will not be
allowed to connect.
If no ACLs are configured, no checks on the common name are done, except
as described for Identical Local and Peer Common Names.
Default Bucket Properties
Riak KV version 2.2.0 changed the values of the default bucket properties hash. This will cause an issue replicating between Riak KV clusters with versions 2.2.0 or greater and Riak KV clusters with versions less than 2.2.0.
To replicate between Riak KV versions 2.2.0 or greater and Riak KV clusters less than version 2.2.0, add the necessary override in the advanced.config file:
{riak_repl, [
    {override_capability, [
        {default_bucket_props_hash, [{use, [consistent, datatype, n_val, allow_mult, last_write_wins]}] }
    ]}
]}
If all of the Replication clusters are running Riak KV 2.2.0 or greater, this override is no longer necessary and should be removed.