Monday, 11 July 2011

Thrift alternative for Web Services

Thrift - software library for exchanging data. It is a library for building RPC clients and servers with seamless communication across programming languages. That's all done by compiler generation code.

History:
It was created by Facebook team, to exchange data between modules written in different languages (C++,Java,Python,PHP, ...)

They found a few alternatives to use: SOAP (excessive XML parsing), CORBA (heavyweight, over-designed), COM by Microsoft (not open source), Pillar (missing versioning, abstraction), Protocol Buffers (closed source, owned by Google).

Due to those cons they started project Thrift.
In April 2007 they made it opensource and in May 2008 it was moved to Apache Incubator.
Now it is used by Facebook, Twitter, Amazon, Recaptcha.

Thrift is written in C++. They have not found any suitable library to use in C++ for thread management, so they developed also that part in the library, using Java Thread Library concepts as template.

Protocols:
Thrift allows to use following protocols: binary, dense encoding, json, useful for debugging. Protocol abstraction can be easily changed transparently from other parts of system.

Transport:
On the transport level we can used blocking/non-blocking sockets, files, shared memory.

Servers:
We have blocking, non-blocking, single threaded and multi-threaded servers.

Supported languages:
C++, C#, Erlang, Haskell, Java, Objective C/Cocoa, OCaml, Perl, PHP, Python, Ruby, Squeak.

Types:
Thrift does not add own types, but uses key types used in most languages and maps them to the ones used in the target language.

We have base types: bool, byte, i16 (signed 16-bit integer), i32, i64, double (64-bit floating point value), string ( UTF-8 encoding), binary: a sequence of unencoded bytes.

We have also containers of types: ordered list with duplicates (target ArrayList in Java), unordered set of unique elements (target HashSet in Java), map of unique keys to values. With compiler directives we can change container types for target languages.

We can declare structures, that will be mapped for classes in target languages. Every structure has a list of fields. Every field has unique name and numeric identifier. If can also have default value. If we do not specify identifier, than it will be added by Thrift. Identifiers are used for versioning. In the generated structure we have fields that inform whether the value was set by the other side of communication or not. Developer can decide how to treat this situation as an error or resolve inconsistencies.

f.e.
struct Book
{
1: string name,
2: i32 pages = 100,
3: Author op,
4: optional string comment,
}

We can define enums and exceptions, that will be compiled to abstract exception class in the target languages.

We declare services that are equal to defining interfaces. Compiler generates stubs implementing interface. Then in servers and clients we implements communication.

We can declare methods as void or async void. The first provides that the execution of method on the server was done, and the second only that the request to execute was done.

The scheme of usage Thrift is as follows:
create thrift script, generate code by compiler, write servers and clients using library objects.

Thrift do not support: cyclic structs, struct inheritance, overloading, heterogeneous containers, null return.

It gives: lower overhead due to the usage of binary format, simplify usage (no framework to code to, no XML), application-level wire format and the serialization-level wire format cleanly separated, soft versioning of the protocol, no build dependencies.

It can be recomenended for creating high-performance services, called from multiple languages when speed is a concern, but clients and servers are co-located.

http://thrift.apache.org/

No comments: