mfaulk/hbase-object-mapper

A Java-annotation based object mapper that helps convert your bean-like objects to HBase row data (and vice-versa). For use in Map/Reduce jobs involving HBase tables (and their unit tests). Helps define Data Access Objects for HBase entities.

Download

Step 1. Add the JitPack repository to your build file

Add it in your root settings.gradle at the end of repositories:

	dependencyResolutionManagement {
		repositoriesMode.set(RepositoriesMode.FAIL_ON_PROJECT_REPOS)
		repositories {
			mavenCentral()
			maven { url 'https://jitpack.io' }
		}
	}

Add it in your settings.gradle.kts at the end of repositories:

	dependencyResolutionManagement {
		repositoriesMode.set(RepositoriesMode.FAIL_ON_PROJECT_REPOS)
		repositories {
			mavenCentral()
			maven { url = uri("https://jitpack.io") }
		}
	}

Add to pom.xml

	<repositories>
		<repository>
		    <id>jitpack.io</id>
		    <url>https://jitpack.io</url>
		</repository>
	</repositories>

Add it in your build.sbt at the end of resolvers:

 
    resolvers += "jitpack" at "https://jitpack.io"

Add it in your project.clj at the end of repositories:

 
    :repositories [["jitpack" "https://jitpack.io"]]

Step 2. Add the dependency

	dependencies {
		implementation 'com.github.mfaulk:hbase-object-mapper:v1.2-mf-0.1'
	}

	dependencies {
		implementation("com.github.mfaulk:hbase-object-mapper:v1.2-mf-0.1")
	}

	<dependency>
	    <groupId>com.github.mfaulk</groupId>
	    <artifactId>hbase-object-mapper</artifactId>
	    <version>v1.2-mf-0.1</version>
	</dependency>


                            
    libraryDependencies += "com.github.mfaulk" % "hbase-object-mapper" % "v1.2-mf-0.1"


                            
    :dependencies [[com.github.mfaulk/hbase-object-mapper "v1.2-mf-0.1"]]

Readme

HBase Object Mapper

Introduction

This compact utility library is an annotation based object mapper for HBase (written in Java) that helps you:

convert your bean-like objects to HBase rows and vice-versa (for use in Map/Reduce jobs on HBase tables and their unit-tests)
define data access objects for entities that map to HBase rows (for random single/range/bulk access of rows of an HBase table)

Usage

Let's say you've an HBase table citizens with row-key format country_code#UID. Let's say your table is created with two column families main and optional which may have columns like uid, name, salary etc.

This library enables to you represent your HBase row as a class like below:

@HBTable("citizens")
public class Citizen implements HBRecord {
    @HBRowKey
    private String countryCode;
    @HBRowKey
    private Integer uid;
    @HBColumn(family = "main", column = "name")
    private String name;
    @HBColumn(family = "optional", column = "age")
    private Short age;
    @HBColumn(family = "optional", column = "salary")
    private Integer sal;
    @HBColumn(family = "optional", column = "flags")
    private Map<String, Integer> extraFlags;
    @HBColumn(family = "optional", column = "dependents")
    private Dependents dependents;
    @HBColumnMultiVersion(family = "optional", column = "phone_number")
    private NavigableMap<Long, Integer> phoneNumber; // Multi-versioned column. This annotation enables you to fetch multiple versions of column values

    public String composeRowKey() {
        return String.format("%s#%d", countryCode, uid);
    }

    public void parseRowKey(String rowKey) {
        String[] pieces = rowKey.split("#");
        this.countryCode = pieces[0];
        this.uid = Integer.parseInt(pieces[1]);
    }
}

(see Citizen.java for a detailed example with more data types)

Now, for above definition of your Citizen class,

you can use methods in HBObjectMapper class to convert Citizen objects to HBase's Put and Result objects and vice-versa
you can inherit from class AbstractHBDAO that contains methods like get (for random single/bulk/range access of rows), persist (for writing rows) and delete (for deleting rows)

Map/Reduce use-cases

Use in `map()`

HBase's Result object can be converted to your bean-like object using below method:

<T extends HBRecord> T readValue(ImmutableBytesWritable rowKey, Result result, Class<T> clazz)

For example:

Citizen e = hbObjectMapper.readValue(key, value, Citizen.class);

See file CitizenMapper.java for full sample code.

Use in `reduce()`

Your bean-like object can be converted to HBase's Put (for row contents) and ImmutableBytesWritable (for row key) using below methods:

ImmutableBytesWritable getRowKey(HBRecord obj)

Put writeValueAsPut(HBRecord obj)

For example, below code in reducer writes your object as one HBase row with appropriate column families and columns:

Citizen citizen = new Citizen(/*details*/);
context.write(hbObjectMapper.getRowKey(citizen), hbObjectMapper.writeValueAsPut(citizen));

See file CitizenReducer.java for full sample code.

Unit-test for `map()`

Your bean-like object can be converted to HBase's Put (for row contents) and ImmutableBytesWritable (for row key) using below methods:

ImmutableBytesWritable getRowKey(HBRecord obj)

Result writeValueAsResult(HBRecord obj)

Below is an example of unit-test of a mapper using MRUnit:

Citizen citizen = new Citizen(/*params*/);
mapDriver
    .withInput(
            hbObjectMapper.getRowKey(citizen),
            hbObjectMapper.writeValueAsResult(citizen)
    )
    .withOutput(Util.strToIbw("key"), new IntWritable(citizen.getAge()))
    .runTest();

See file TestCitizenMapper.java for full sample code.

Unit-test for `reduce()`

HBase's Put object can be converted to your bean-like object using below method:

<T extends HBRecord> T readValue(ImmutableBytesWritable rowKeyBytes, Put put, Class<T> clazz)

Below is an example of unit-test of a reducer using MRUnit:

Pair<ImmutableBytesWritable, Writable> reducerResult = reducerDriver.withInput(Util.strToIbw("key"), Arrays.asList(new IntWritable(1), new IntWritable(5))).run().get(0);
Citizen citizen = hbObjectMapper.readValue(reducerResult.getFirst(), (Put) reducerResult.getSecond(), Citizen.class);

See file TestCitizenReducer.java for full sample code that unit-tests a reducer using MRUnit

HBase ORM

Since we're dealing with HBase (and not an OLTP system), fitting an ORM paradigm may not make sense. Nevertheless, you can use this library as an HBase-ORM too!

This library provides an abstract class to define your own data access object. For example you can create a data access object for Citizen class in the above example as follows:

import org.apache.hadoop.conf.Configuration;

public class CitizenDAO extends AbstractHBDAO<Citizen> {
    
    public CitizenDAO(Configuration conf) throws IOException {
        super(conf);
    }
}

(see CitizenDAO.java)

Once defined, you can access, manipulate and persist a row of citizens HBase table as below:

Configuration configuration = getConf(); // this is org.apache.hadoop.conf.Configuration

// Create a data access object:
CitizenDAO citizenDao = new CitizenDAO(configuration);

// Fetch an row from "citizens" HBase table with row key "IND#1":
Citizen pe = citizenDao.get("IND#1");

List<Citizen> lpe = citizenDao.get("IND#1", "IND#5"); //range get

Citizen[] ape = citizenDao.get(new String[] {"IND#1", "IND#2"}); //bulk get

pe.setPincode(560034); // change a field

citizenDao.persist(pe); // Save it back to HBase

citizenDao.delete(pe); // Delete a row by it's object reference

citizenDao.delete("IND#2"); // Delete a row by it's row key

(see TestsAbstractHBDAO.java for a more detailed example)

Maven

Add below entry within the dependencies section of your pom.xml:

<dependency>
	<groupId>com.flipkart</groupId>
	<artifactId>hbase-object-mapper</artifactId>
	<version>1.2</version>
</dependency>

(See artifact details for com.flipkart:hbase-object-mapper:1.2 on Maven Central)

How to build?

To build this project, follow below steps:

Do a git clone of this repository
Checkout latest stable version git checkout v1.2
Execute mvn clean install from shell

Currently, this library depends on Hadoop and HBase from Cloudera version 4. If you're using a different version (or even different distribution like HortonWorks), change the versions in pom.xml to desired ones and do a mvn clean install.

Please note: Test cases are very comprehensive - they even spin an in-memory HBase test cluster to run data access related test cases (near-realworld scenario). So, build times can sometimes be longer.

Releases

The change log can be found in the releases section.

Feature requests and bug reporting

If you intend to request a feature or report a bug, you may use Github Issues for hbase-object-mapper.

License

Licensed under the Apache License, version 2.0 (the "License"). You may not use this product or it's source code except in compliance with the License.