Skip to main content
Version: 2.0.8

HyperLogLogs

The examples in this section will show you how to use hyperloglogs on their own.

Set Up a Bucket Type

If you've already created and activated a bucket type with the datatype parameter set to hyperloglog, skip to the next section.

Start by creating a bucket type with the datatype parameter set to hyperloglog:

riak_admin bucket-type create hlls '{"props":{"datatype":"hll"}}'
note

The hlls bucket type name provided above is an example and is not required to be hlls. You are free to name bucket types whatever you like, with the exception of default.

After creating a bucket with a Riak data type, confirm the bucket property configuration associated with that type is correct:

riak-admin bucket-type status hlls

This returns a list of bucket properties and their values in the form of property: value.

If our hlls bucket type has been set properly we should see the following pair in our console output:

datatype: hll

Once we have confirmed the bucket type is properly configured, we can activate the bucket type to be used in Riak KV:

riak-admin bucket-type activate hlls

We can check if activation has been successful by using the same bucket-type status command shown above:

riak-admin bucket-type status hlls

After creating and activating our new hlls bucket type, we can setup our client to start using the bucket type as detailed in the next section.

Client Setup

First, we need to direct our client to the bucket type/bucket/key location that contains our counter.

For this example we'll use the hlls bucket type created and activated above and a bucket called hlls:

%% Buckets are simply named binaries in the Erlang client. See the
%% examples below for more information

Create a HyperLogLog data type

To create a hyperloglog data structure, you need to specify a bucket/key pair to hold that hyperloglog. Here is the general syntax for doing so:

HLL = riakc_hll:new().

%% Hyperloglogs in the Erlang client are opaque data structures that
%% collect operations as you mutate them. We will associate the data
%% structure with a bucket type, bucket, and key later on.

Upon creation, our hyperloglog data structure is empty:

HLL.

%% which will return:
%% {hll,0,[]}

Add elements to a HyperLogLog data type

HLL1 = riakc_hll:add_element(<<"Jokes">>, HLL),
RepeatHLL1 = riakc_hll:add_element(<<"Jokes">>, HLL),
HLL2 = riakc_hll:add_elements([<<"Are">>, <<"Better">>, <<"Explained">>], HLL1),

HLL2.

%% which will return:
%% {hll,0,[<<"Are">>,<<"Better">>,<<"Explained">>, <<"Jokes">>]}

However, when using a non-HTTP client, the approximate cardinality/value of our data structure will be 0, locally, until its pushed to the server and then fetched from the server.

riakc_hll:value(HLL2) == 0.

%% which will return:
%% true

Port = 8087,
{ok, Pid} = riakc_pb_socket:start_link("127.0.0.1", Port),
Key = <<"Holy Diver">>,
BucketType = <<"hlls">>,
Bucket = {BucketType, <<"rainbow in the dark">>},

ok = riakc_pb_socket:update_type(Pid, Bucket, Key, riakc_hll:to_op(HLL2)).
ok = riakc_pb_socket:update_type(Pid, Bucket, Key, riakc_hll:to_op(RepeatHLL1)).

Retrieve a HyperLogLog data type

Now, we can check the approximate count-of (a.k.a. the cardinality of the elements added to) our hyperloglog data structure:

{ok, HLL3} = riakc_pb_socket:fetch_type(Pid, Bucket, Key),
riakc_hll:value(HLL3) == 4.

%% which would return:
%% true

%% We added <<"Jokes">> twice, but, remember, the algorithm only counts the
%% unique elements we've added to the data structure.