Skip to main content
Version: 2.0.7

Errors & Messages

This is not a comprehensive listing of every error that Riak may encounter -- screws fall out all of the time, the world is an imperfect place. This is an attempt at capturing the most common recent errors that users do encounter, as well as give some description to non critical error atoms which you may find in the logs.

Discovering the source of an error can take some detective work, since one error can cause a cascade of errors.

The tables in this document do not specify which logs these error messages may appear in. Depending upon your log configuration some may appear more often (i.e., if you set the log to debug), while others may output to your console (eg. if you tee'd your output or started as riak console).

You can optionally customize your log message format via the lager_default_formatter field under lager in app.config. If you do, your messages will look different from those shown in this document.

Finally, this document is organized to be able to lookup portions of a log message, since printing every variation would be a bit unwieldy. For example, this message:

12:34:27.999 [error] gen_server riak_core_capability terminated with reason:
no function clause matching orddict:fetch('riak@192.168.2.81', []) line 72

Starts with a date (12:34:27.999), followed by the log severity ([error]), with a message formatted by lager (found in the Lager table below as gen_server Mod terminated with reason: Reason)

Lager Formats

Riak's main logging mechanism is the project Lager, so it's good to note some of the more common message formats. In almost every case the reasons for the error are described as variables, such as Reason of Mod (meaning the Erlang module which is generally the source of the error).

Riak does not format all error messages that it receives into human-readable sentences. However, It does output errors as objects.

The above example error message corresponds with the first message in this table, where the Erlang Mod value is riak_core_capability and the reason was an Erlang error: no function clause matching orddict:fetch('riak@192.168.2.81', []) line 72.

ErrorMessage
gen_server <Mod> terminated with reason: <Reason>
gen_fsm <Mod> in state <State> terminated with reason: <Reason>
gen_event <ID> installed in <Mod> terminated with reason: <Reason>
badargbad argument in call to <Mod1> in <Mod2>
badarithbad arithmetic expression in <Mod>
badarityfun called with wrong arity of <Ar1> instead of <Ar2> in <Mod>
badmatchno match of right hand value <Val> in <Mod>
bad_returnbad return value <Value> from <Mod>
bad_return_valuebad return value: <Val> in <Mod>
badrecordbad record <Record> in <Mod>
case_clauseno case clause matching <Val> in <Mod>
emfilemaximum number of file descriptors exhausted, check ulimit -n
function_clauseno function clause matching <Mod>
function not exportedcall to undefined function <Func> from <Mod>
if_clauseno true branch found while evaluating if expression in <Mod>
noprocno such process or port in call to <Mod>
{system_limit, {erlang, open_port}}maximum number of ports exceeded
{system_limit, {erlang, spawn}}maximum number of processes exceeded
{system_limit, {erlang, spawn_opt}}maximum number of processes exceeded
{system_limit, {erlang, list_to_atom}}tried to create an atom larger than 255, or maximum atom count exceeded
{system_limit, {ets, new}}maximum number of Erlang Term Storage (ETS) tables exceeded
try_clauseno try clause matching <Val> in <Mod>
undefcall to undefined function <Mod>

Error Atoms

Since Erlang programming support is a "happy path/fail fast" style, one of the more common error log strings you might encounter contain {error,{badmatch,{.... This is Erlang's way of telling you that an unexpected value was assigned, so these errors can prefix the more descriptive parts. In this case, {error,{badmatch,{... prefixes the more interesting insufficient_vnodes_available error, which can be found in the riak_kv table later on in this document.

2012-01-13 02:30:37.015 [error] <0.116.0> webmachine error: path="/riak/contexts"\
{error,{error,{badmatch,{error,insufficient_vnodes_available}},\
[{riak_kv_wm_keylist,produce_bucket_body,2},{webmachine_resource,resource_call,3},\
{webmachine_resour,resource_call,1},{webmachine_decision_core,decision,1},\
{webmachine_decision_core,handle_request,2},\
{webmachine_mochiweb,loop,1},{mochiweb_http,headers,5}]}}

Erlang Errors

Although relatively rare once a Riak cluster is running in production, users new to Riak or Erlang occasionally encounter errors on initial installation. These spring from a setup Erlang does not expect, generally due to network, permission, or configuration problems.

ErrorDescriptionResolution
{error,duplicate_name}You are trying to start a new Erlang node, but another node with the same name is already runningYou might be attempting to start multiple nodes on the same machine with the same vm.args -name value; or if Riak is already running, check for beam.smp; or epmd thinks Riak is running, check/kill epmd
{error,econnrefused}Remote Erlang node connection refusedEnsure your cluster is up and nodes are able to communicate with each other. See Step 1.
{error,ehostunreach}Remote node cannot be connected toEnsure that nodes are able to communicate with each other. See Step2.
{error,eacces}Cannot write a given fileEnsure the Riak beam process has permission to write to all *_dir values in app.config, for example, ring_state_dir, platform_data_dir, and others
{error,enoent}Missing an expected file or directoryEnsure all *_dir values in app.config exist, for example, ring_state_dir, platform_data_dir, and others
{error,erofs}A file/directory is attempted to be written to a read-only filesystemOnly set Riak directories to read/write filesystems
system_memory_high_watermarkOften a sign than an ETS table has grown too largeCheck that you are using a backend appropriate for your needs (LevelDB for very large key counts) and that your vnode count is reasonable (measured in dozens per node rather than hundreds)
temp_allocErlang attempting to allocate memoryOften associated with Cannot allocate X bytes of memory, which means that you're either creating too large of an object or that you simply don't have enough RAM. Base minimum suggested RAM per node is 4GB.

Riak Errors and Messages

Many KV errors have prescriptive messages. For such cases we leave it to Riak to explain the correct course of action. For example, the map/reduce parse_input phase will respond like this when it encounters an invalid input:

Note on inputs

Inputs must be a binary bucket, a tuple of bucket and key-filters, a list of target tuples, a search index, or modfun tuple: INPUT.

For the remaining common error codes, they are often marked by Erlang atoms (and quite often wrapped within an {error,{badmatch,{... tuple, as described in the Error section above). This table lays out those terse error codes and related log messages, if they exist.

Riak Core

Riak Core is the underlying implementation for KV. These are errors originating from that framework, and can appear whether you use KV, Search, or any Core implementation.

ErrorMessageDescriptionResolution
behaviorAttempting to execute an unknown behaviorEnsure that your configuration file choices (e.g. backends) support the behaviors you're attempting to use, such as configuring LevelDB to use secondary indexes
already_leavingNode is already in the process of leaving the clusterAn error marking a node to leave when it is already leavingNo need to duplicate the leave command
already_replacementThis node is already in the replacements request listYou cannot replace the same node twice
{different_owners, N1, N2}Two nodes list different partition owners, meaning the ring is not readyWhen the ring is ready, the status should be ok
different_ring_sizesThe joining ring is a different size from the existing cluster ringDon't join a node already joined to a cluster
insufficient_vnodes_availableWhen creating a query coverage plan, not enough vnodes are availableCheck the riak-admin ring-status and ensure all of your nodes are healthy and connected
invalid_replacementA new node is currently joining from a previous operation, so a replacement request is invalid until it is no longer joiningWait until the node is finished joining
invalid_ring_state_dirRing state directory <RingDir> does not exist, and could not be created: <Reason>The ring directory does not exist and no new dir can be created in expected locationEnsure that the Erlang proc can write to ring_state_diror has permission to create that dir
is_claimantA node cannot be the claimant of its own remove requestRemove/replace nodes from another node
is_upNode is expected to be down but is upWhen a node is downed, it should be down
legacyAttempting to stage a plan against a legacy ringStaging is a feature only of Riak versions 1.2.0+
max_concurrencyHandoff receiver for partition <Partition> exited abnormally after processing <Count> objects: <Reason>Disallow more handoff processes than the riak_core handoff_concurrency setting (defaults to 2)If this routinely kills vnodes, this issue has been linked to LevelDB compactions which can build up and block writing, which will also be accompanied by LevelDB logs saying Waiting... or Compacting
{nodes_down, Down}All nodes must be up to check
not_memberThis node is not a member of the ringCannot leave/remove/down when this is not a ring member
not_reachableCannot join unreachable nodeCheck your network connections, ensure Erlang cookie setting vm.args -setcookie
{not_registered, App}Attempting to use an unregistered processEnsure that your app.config choices contain the app you're attempting to use {riak_kv_stat, true}
not_single_nodeThere are no other members to joinJoin with at least one other node
nothing_plannedCannot commit a plan without changesEnsure at least one ring change is planned before running commit
only_memberThis is the only member of the ringCannot leave/remove/down when this is the only member of the ring
ring_not_readyRing not ready to perform commandAttempting to plan a ring change before the ring is ready to do so
self_joinCannot join node with itselfJoin another node to form a valid cluster
timeout<Type> transfer of <Module> from <SrcNode> <SrcPartition> to <TargetNode> <TargetPartition> failed because of TCP recv timeoutEnsure that ports chosen in your configuration files do not overlap with ports being used by your system, or with each other
unable_to_get_join_ringCannot access cluster ring to joinPossible corrupted ring
{unknown_capability, Capability}Attempting to use a capability unsupported by this implementationEnsure that your configuration choices support the capability you're attempting to use, such as Pipe MapReduce (setting a mapred_2i_pipe value in app.config)
vnode_exiting<Mod> failed to store handoff obj: <Err>A vnode fails to hand off data because the handoff state is deleted
vnode_shutdownThe vnode worker pool is shutting downVarious reasons can cause a shutdown, check other log messages
Bucket validation failed <Detail>Only set value bucket properties
set_recv_data called for non-existing receiverCannot connect to receiver during handoffEnsure receiver node is still up and running, and that the standard
An <Dir> handoff of partition <M> was terminated because the vnode diedHandoff stopped because of vnode was DOWN and sender must be killedAn expected message if a vnode dies during handoff. Check the logs for other causes.
status_update for non-existing handoff <Target>Cannot get the status of a handoff Target module that doesn't existAn expected message. Check the logs for other causes.
SSL handoff config error: property <FailProp>: <BadMat>.The receiver may reject the senders attempt to start a handoffEnsure your SSL settings and certificates are proper
Failure processing SSL handoff config <Props>:<X>:<Y>Ensure your SSL settings and certificates are proper
<Type> transfer of <Module> from <SrcNode> <SrcPartition> to <TargetNode> <TargetPartition> failed because of <Reason>Nodes cannot hand off dataEnsure that your cluster is up and nodes are able to communicate with each other. See Step 1.
Failed to start application: <App>Expected application cannot loadThis relates to an Erlang application, and not necessarily the Riak application in general. The app may fail to load for many reasons, such as a missing native library. Read other log messages for clues
Failed to read ring file: <Reason>Gives a reason why the ring file cannot be read on startupThe reason given explains the problem, such as eacces meaning the Erlang process does not have permission to read
Failed to load ring file: <Reason>Gives a reason why the ring file cannot be loaded on startupThe reason given explains the problem, such as enoent meaning the expected file cannot be found
ring_trans: invalid return value: <Other>Transferring ring data between nodes received an invalid valueOften associated with ring corruption, or an unexpected exit from the transferring node
Error while running bucket fixup module <Fixup> from application <App> on bucket <BucketName>: <Reason>Various sources for a fixup error, read associated errors
Crash while running bucket fixup module <Fixup> from application <App> on bucket <BucketName> : <What>:<Why>Various source for a fixup error, read associated errors
<Index> <Mod> worker pool crashed <Reason>Various reasons can be the source of a worker pool crash, read associated errors
Received xfer_complete for non-existing repair: <ModPartition>Unexpected repair messageNot much to do here, but a node did not expect to receive a xfer_complete status

Riak KV

Riak KV is the key/value implementation, generally just considered to be Riak proper. This is the source of most of the code, and consequently, most of the error messages.

ErrorMessageDescriptionResolution
all_nodes_downNo nodes are availableCheck riak-admin member-status and ensure that all expected nodes in the cluster are of valid Status
{bad_qterm, QueryTerm}Bad query when performing MapReduceFix your MapReduce query
{coord_handoff_failed, Reason}Unable to forward put for <Key> to <CoordNode> - <Reason>Vnodes unable to communicateCheck that coordinating vnode is not down. Ensure your cluster is up and nodes are able to communicate with each other. See Step 1.
{could_not_reach_node, Node}Erlang process was not reachableCheck network settings; ensure remote nodes are running and reachable; ensure all nodes have the same Erlang cookie setting vm.args -setcookie. See Step 1.
{deleted, Vclock}The value was already deleted, includes the current vector clockRiak will eventually clean up this tombstone
{dw_val_violation, DW}Same as w_val_violation but concerning durable writesSet a valid DW value
{field_parsing_failed, {Field, Value}}Could not parse field <Field>, value <Value>.Could not parse an index fieldMost commonly an _int field which cannot be parsed. For example a query like this is invalid: /buckets/X/index/Y_int/BADVAL, since BADVAL should instead be an integer
{hook_crashed, {Mod, Fun, Class, Exception}}Problem invoking pre-commit hookPrecommit process exited due to some failureFix the precommit function code, follow the message's exception and stacktrace to help debug
{indexes_not_supported, Mod}The chosen backend does not support indexes (only LevelDB currently supports secondary indexes)Set your configuration to use the LevelDB backend
{insufficient_vnodes, NumVnodes, need, R}R was set greater than the total vnodesSet a proper R value; or too many nodes are down; or too many nodes are unavailable due to crash or network partition. Ensure all nodes are available by running riak-admin ring-status.
{invalid_hook_def, HookDef}Invalid post-commit hook definition <Def>No Erlang module and function or JavaScript function nameDefine the hook with the correct settings
{invalid_inputdef, InputDef}Bad inputs definitions when running MapReduceFix inputs settings; set mapred_system from legacy to pipe
invalid_messageUnknown event sent to moduleEnsure you're running similar versions of Riak across (and specifically poolboy) across all nodes
{invalid_range, Args}Index range query hasStart > EndFix your query
{invalid_return, {Mod, Fun, Result}}Problem invoking pre-commit hook <Mod>:<Fun>, invalid return <Result>The given precommit function gave an invalid return for the given ResultEnsure your pre-commit functions return a valid result
invalid_storage_backendstorage_backend <Backend> is non-loadable.Invalid backend choice when starting up RiakSet a valid backend in your configuration files
key_too_largeThe key was larger than 65536 bytesUse a smaller key
local_put_failedA local vnode PUT operation failedThis has been linked to a LevelDB issue related to restricted memory usage and inability to flush a write to disk. If this happens repetitively, stop/start the riak node, forcing a memory realloc
{n_val_violation, N}(W > N) or (DW > N) or (PW > N) or (R > N) or (PR > N)No W or R values may be greater than N
{nodes_not_synchronized, Members}Rings of all members are not synchronizedBackups will fail if nodes are not synchronized
{not_supported, mapred_index, FlowPid}Index lookups for MapReduce are only supported with PipeSet mapred_system from legacy to pipe
notfoundNo value foundValue was deleted, or was not yet stored or replicated
{pr_val_unsatisfied, PR, Primaries}Same as r_val_unsatisfied but only counts Primary node repliesToo many primary nodes are down or the PR value was set too high
{pr_val_violation, R}Same as r_val_violation but concerning Primary readsSet a valid PR value
precommit_failPre-commit hook <Mod>:<Fun> failed with reason <Reason>The given precommit function failed for the given ReasonFix the precommit function code
{pw_val_unsatisfied, PR, Primaries}Same as w_val_unsatisfied but only counts Primary node repliesToo many primary nodes are down or the PW value was set too high
{pw_val_violation, PW}Same as w_val_violation but concerning primary writesSet a valid PW value
{r_val_unsatisfied, R, Replies}Not enough nodes replied to satisfy the R value, contains the given R value and the actual number of RepliesToo many nodes are down or the R value was set too high
{r_val_violation, R}The given R value was non-numeric and not a valid setting (on, all, quorum)Set a valid R value
receiver_downRemote process failed to acknowledge requestCan occur when listkeys is called
{rw_val_violation, RW}The given RW property was non-numeric and not a valid setting (one, all, quorum)Set a valid RW value
{siblings_not_allowed, Object}Siblings not allowed: <Object>The hook to index cannot abide siblingsSet the buckets allow_mult property to false
timeoutThe given action took too long to replyEnsure your cluster is up and nodes are able to communicate with each other. See Step 1. Or check you have a reasonable ulimit size. Note that listkeys commands can easily timeout and shouldn't be used in production.
{too_few_arguments, Args}Index query requires at least one argumentFix your query format
{too_many_arguments, Args}Index query is malformed with more than 1 (exact) or 2 (range) valuesFix your query format
too_many_failsToo many write failures to satisfy W or DWTry writing again. Or ensure your nodes/network is healthy. Or set a lower W or DW value
too_many_resultsToo many results are attempted to be returnedThis is a protective error. Either change your query to return fewer results, or change your max_search_results setting in app.config (it defaults to 100,000)
{unknown_field_type, Field}Unknown field type for field: <Field>.Unknown index field extension (begins with underscore)The only value field types are _int and _bin
{w_val_unsatisfied, RepliesW, RepliesDW, W, DW}Not enough nodes replied to satisfy the W value, contains the given W value and the actual number of Replies* for either W or DWToo many nodes are down or the W or DW value was set too high
{w_val_violation, W}The given W property was non-numeric and not a valid setting (on, all, quorum)Set a valid W value
Invalid equality query <SKey>Equality query is required and must be binary for an index callPass in an equality value when performing a 2i equality query
Invalid range query: <Min> -> <Max>Both range query values are required and must be binary an index callPass in both range values when performing a 2i equality query
Failed to start <Mod> <Reason>:<Reason>Riak KV failed to start for given ReasonSeveral possible reasons for failure, read the attached reason for insight into resolution

Backend Errors

These errors tend to stem from server-based problems. Backends are sensitive to low or corrupt disk or memory resources, native code, and configuration differences between nodes. Conversely, a network issue is unlikely to affect a backend.

ErrorMessageDescriptionResolution
data_root_not_setSame as data_root_unsetSet the data_root directory in config
data_root_unsetFailed to create bitcask dir: data_root is not setThe data_root config setting is requiredSet data_root as the base directory where to store bitcask data, under the bitcask section
{invalid_config_setting, multi_backend, list_expected}Multi backend configuration requires a listWrap multi_backend config value in a list
{invalid_config_setting, multi_backend, list_is_empty}Multi backend configuration requires a valueConfigure at least one backend under multi_backend in app.config
{invalid_config_setting, multi_backend_default, backend_not_found}Must choose a valid backend type to configure
multi_backend_config_unsetNo configuration for Multi backendConfigure at least one backend under multi_backend in app.config
not_loadedNative driver not loadingEnsure your native drivers exist (.dll or .so files {riak_kv_multi_backend, undefined_backend, BackendName}Backend defined for a bucket is invalidDefine a valid backed before using this bucket under lib/project/priv, where project is most likely eleveldb).
reset_disabledAttempted to reset a Memory backend in productionDon't use this in production

JavaScript

These are some errors related to JavaScript pre-commit functions, MapReduce functions, or simply the management of the pool of JavaScript VMs. If you do not use JavaScript, these should not be encountered. If they are, check your configuration for high *js_vm* values or as an epiphenomenon to a real issue, such as low resources.

ErrorMessageDescriptionResolution
no_vmsJS call failed: All VMs are busy.All JavaScript VMs are in useWait and run again; increase JavaScript VMs in app.config (map_js_vm_count, reduce_js_vm_count, or hook_js_vm_count)
bad_utf8_character_codeError JSON encoding arguments: <Args>A UTF-8 character give was a bad formatOnly use correct UTF-8 characters for JavaScript code and arguments
bad_jsonBad JSON formattingOnly use correctly formatted JSON for JavaScript command arguments
Invalid bucket properties: <Details>Listing bucket properties will fail if invalidFix bucket properties
{load_error, "Failed to load spidermonkey_drv.so"}The JavaScript driver is corrupted or missingIn OS X you may have compiled with llvm-gcc rather than gcc.

MapReduce

These are possible errors logged by Riak's MapReduce implementation, both legacy as well as Pipe. If you never use or call MapReduce, you should not run across these.

ErrorMessageDescriptionResolution
bad_mapper_props_no_keysAt least one property should be found by default. Unused in Riak 1.3+Set mapper properties, or don't use it
bad_mapred_inputsA bad value sent to MapReduce. Unused in Riak 1.3+When using the Erlang client interface, ensure all MapReduce and search queries are correctly binary
bad_fetchAn expected local query was not retrievable. Unused in Riak 1.3+Placing javascript MapReduce query code as a riak value must first be stored before execution
{bad_filter, <Filter>}An invalid keyfilter was usedEnsure your MapReduce keyfilter is correct
{dead_mapper, <Stacktrace>, <MapperData>}Getting a reply from a mapper for a job that has already exited. Unused in Riak 1.3+Check for a stuck Erlang process, or if using legacy MR ensure map_cache_size is set (Both issues may require a node restart)
{inputs, Reason}An error occurred parsing the "inputs" field.MapReduce request has invalid input fieldFix MapReduce fields
{invalid_json, Message}The POST body was not valid JSON. The error from the parser was: <Message>Posting a MapReduce command requires correct JSONFormat MapReduce requests correctly
javascript_reduce_timeoutJavaScript reduce function taking too longFor large numbers of objects, your JavaScript functions may become bottlenecks. Decrease the quantity of values being passed to and returned from the reduce functions, or rewrite as Erlang functions
missing_fieldThe post body was missing the "inputs" or "query" field.Either an inputs or query field is requiredPost MapReduce request with at least one
{error,notfound}Used in place of a RiakObject in the mapping phaseYour custom Erlang map function should deal with this type of value
not_jsonThe POST body was not a JSON object.Posting a MapReduce command requires correct JSONFormat MapReduce requests correctly
{no_candidate_nodes, exhausted_prefist, <Stacktrace>, <MapperData>}Some map phase workers diedPossibly a long running job hitting MapReduce timeout, upgrade to Pipe
{<query>, Reason}An error occurred parsing the "query" field.MapReduce request has invalid query fieldFix MapReduce query
{unhandled_entry, Other}Unhandled entry: <Other>The reduce_identity function is unusedIf you don't need reduce_identity, just don't set reduce phase at all
{unknown_content_type, ContentType}Bad content type for MapReduce queryOnly application/json and application/x-erlang-binary are accepted
Phase <Fitting>: <Reason>A general error when something happens using the Pipe MapReduce implementation with a bad argument or configurationCan happen with a bad map or reduce implementation, most recent known gotcha is when a JavaScript function improperly deals with tombstoned objects
riak_kv_w_reduce requires a function as argument, not a <Type>Reduce requires a function object, not any other typeThis shouldn't happen

Specific messages

Although you can put together many error causes with the tables above, here are some common yet esoteric messages with known causes and solutions.

MessageResolution
gen_server riak_core_capability terminated with reason: no function clause matching orddict:fetch('Node', [])The Node has been changed, either through change of IP or vm.args -name without notifying the ring. Either use the riak-admin cluster replace command, or remove the corrupted ring files rm -rf /var/lib/riak/ring/* and rejoin to the cluster
gen_server <PID> terminated with reason: no function clause matching riak_core_pb:encode(Args) line 40Ensure you do not have different settings on different nodes (for example, a ttl mem setting on one node's mem backend, and another without)
monitor busy_dist_port Pid [...{almost_current_function,...]This message means distributed Erlang buffers are filling up. Try setting zdbbl higher in vm.args, such as +zdbbl 16384. Or check that your network is not slow. Or ensure you are not slinging large values. If a high bandwidth network is congested, try setting RTO_min down to 0 msec (or 1msec).
<PID>@riak_core_sysmon___handler:handle_event:89 Monitor got {suppressed,port_events,1}Logged as info, you can add +swt very_low to your vm.args
(in LevelDB LOG files) Compaction errorTurn off the node and run repair on the LevelDB partition. See Step 2.
enif_send: env==NULL on non-SMP VM/usr/lib/riak/lib/os_mon-2.2.9/priv/bin/memsup: Erlang has closed.Riak's Erlang VM is built with SMP support and if Riak is started on a non-SMP system, an error like this one is logged. This is commonly seen in virtualized environments configured for only one CPU core.
exit with reason bad return value: {error,eaddrinuse} in context start_errorAn error like this example can occur when another process is already bound to the same address as the process being started is attempting to bind to. Use operating system tools like netstat, ps, and lsof to determine the root cause for resolving this kind of errors; check for existence of stale beam.smp processes.
exited with reason: eaddrnotavail in gen_server:init_it/6 line 320An error like this example can result when Riak cannot bind to the addresses specified in the configuration. In this case, you should verify HTTP and Protocol Buffers addresses in app.config and ensure that the ports being used are not in the privileged (1-1024) range as the riak user will not have access to such ports.
gen_server riak_core_capability terminated with reason: no function clause matching orddict:fetch('riak@192.168.2.2', []) line 72Error output like this example can indicate that a previously running Riak node with an original -name value in vm.args has been modified by simply changing the value in vm.args and not properly through riak-admin cluster replace.
** Configuration error: [FRAMEWORK-MIB]: missing context.conf file => generating a default fileThis error is commonly encountered when starting Riak Enterprise without prior SNMP configuration.
RPC to 'node@example.com' failed: {'EXIT', {badarg, [{ets,lookup, [schema_table,<<"search-example">>], []} {riak_search_config,get_schema,1, [{file,"src/riak_search_config.erl"}, {line,69}]} ...This error can be caused when attempting to use Riak Search without first enabling it in each node's app.config. See the configuration files documentation for more information on enabling Riak Search.

More

  1. Ensure node inter-communication
  • Check riak-admin member-status and ensure the cluster is valid.
  • Check riak-admin ring-status and ensure the ring and vnodes are communicating as expected.
  • Ensure your machine does not have a firewall or other issue that prevents traffic to the remote node.
  • Your remote vm.args -setcookie must be the same value for every node in the cluster.
  • The vm.args -name value must not change after joining the node (unless you use riak-admin cluster replace).
  1. Run LevelDB compaction

  2. find . -name "LOG" -exec grep -l 'Compaction error' {} \; (Finding one compaction error is interesting, more than one might be a strong indication of a hardware or OS bug)

  3. Stop Riak on the node: riak stop

  4. Start an Erlang session (do not start riak, we just want Erlang)

  5. From the Erlang console perform the following command to open the LevelDB database

    [application:set_env(eleveldb, Var, Val) || {Var, Val} <-
    [{max_open_files, 2000},
    {block_size, 1048576},
    {cache_size, 20*1024*1024*1024},
    {sync, false},
    {data_root, "/var/db/riak/leveldb"}]].
  6. For each of the corrupted LevelDB databases (found by find . -name "LOG" -exec | grep -l 'Compaction error' {} \;) run this command substituting in the proper vnode number.

    eleveldb:repair("/var/db/riak/leveldb/442446784738847563128068650529343492278651453440", []).
  7. When all have finished successfully you may restart the node: riak start

  8. Check for proper operation by looking at log files in /var/log/riak and in the LOG files in the effected LevelDB vnodes.