Posts Tagged ‘linkedin’


The paper can be found here: http://research.microsoft.com/apps/pubs/default.aspx?id=64551

I’ve read a number of Jim Gray’s papers before & always found them to be incredibly insightful & useful. Typically I’ve also been almost shocked that they could have been written so long ago, and yet be so relevant now, for example ‘The Transaction Concept: Virtues and Limitations‘ published back in 1981 and which next time I re-read it I’ll try & get something up here about. So coming to this paper, published just recently in 2004, I was very much looking forward to seeing what I might get from it.

The abstract sets out the scene as Jim Gray saw it in 2004, some of which remains very reasonable as a statement of current affairs, for example, the movement of intelligence to the periphery of the network, the integration of queues in to relational database platforms, the expectations we have that RDMSs are highly available and low cost to maintain, the rise of column oriented storage (vNext of SQL Server for example has an engine which provides a column oriented store, not too mention the various BigTable like stores out there now). In fact the only prediction/statement of his which I can draw serious contention with, having the obvious advantage of hindsight, is that ‘XML and xQuery will be the main data structure and access pattern’. I can’t express just how glad I am that this isn’t the case, but that’s for another post (or more typically a boozy twrant).

For me though, the abstract is where Gray stops being right. The revolution which Gray describes is I think perhaps only a revolution for the manufacturer of the RDBMS, but is very little one for its user, and is not perhaps a revolution caused by being in touch with their market, but from losing contact with it. And yet taken a little differently, his analysis seems, perhaps unsurprisingly, almost spot-on.

These are not the revolutions you’re looking for

Firstly let me explain why I feel that Grays revolution is not in fact the revolution at all. Gray’s revolution is:

  1. the ability to execute ‘OO’ code in the RDBMS (so Java in Oracle, & C# in SQL Server);
  2. relational databases presenting services that are accessible from the web;
  3. the inclusion of queuing systems inside the RDBMS platform;
  4. the arrival of cubes as a way of managing & modeling aggregations;
  5. the arrival of data mining;
  6. the ‘rebirth’ of column stores;
  7. that RDBMs now deal with ‘messy’ data better, such as text, temporal, and spatial data;
  8. RDBMs working with semi-structured data, in particular Gray points to the integration of the RDBMS and the filesystem (so, I imagine that he’s thinking here of something like SQL Server’s Filestream attribute & the capability that this enables). He also mentions XML.
  9. The requirement for Stream Processing – as he puts it in the abstact, that now the data finds millions of queries, rather than the queries acting over millions of rows of data.
  10. A movement towards pub-sub styles of replication;
  11. A need for query plans which take into account changing load on the system, skews in the distribution of the data, and changing statistics about the data;
  12. The substantial changes in terms of the size of available storage, both on disk & in memory, and the consequential shift in the latency of reads from each of these. The relative increase in the cost of random access reads over sequential reads.
  13. The possibility to move the RDBMS platform down to the disk, so that instead of a disk being organised around files, it becomes organised as a relational database.
  14. ‘Self-managing and always up’ is how he puts it, and we are all familiar with that idea now surely.

In all of these areas I think, in terms of the change in capabilities, Gray is clearly correct. However if I look at that list I think I could split it into those areas where, although capability has been added the existence of the capability has had little relative uptake or impact; those areas where other technologies than the relational database are dominating, or look set to; and, those areas that whilst the capability has been added, and has had uptake, it has not had a revolutionary impact.

The ability to execute imperative (or OO if you like) code inside of the database has had little uptake, and in general has been rejected by both the DBA community and the programmer communities. Similarly, the inclusion of messsage queues inside the RDBMS has had little uptake or impact, with most organisations which choose to adopt message queuing approaches preferring technologies which exist outside of their RDBMS (such as WebSphereMQ, Tibco, ActiveMQ, RabbitMQ, etc…) I’m not going to go into why I think this has happened here, and I don’t have any figures to back this up – but I’ve certainly not witnessed any uptake of these things, nor widespread discussion of & interest in them. I’m not suggesting that these capabilities are never used, not that they are never appropriate nor useful, just that they haven’t had anything like a revolutionary impact – unless you’re a company/engineer concerned with making RDBMSs. This said, I’ve certainly heard of platforms like Redis being used as high performance queues, and I’ve used CouchDB myself as a distributed, durable queue. So I think Jim Gray is correct that there is a union between database platforms and queuing technologies that is happening to an extent (though I’m not sure it constitutes a revolution), but despite these capabilities being bolted onto RDBMSs, they’re not where its happening.

Similarly, though relational database platforms have now enabled the exposing of data as ‘services’ through the web for a number of years, I have not seen this become a popular thing. Risk averse organisations are typically unwilling to risk exposing there data platforms in these ways (perhaps because they have been trained to see the DB as the king in a chess game of security), and the capabilities which these platforms expose (& I’m thinking specifically of SQL Server Astoria here). I could suggest that a lot of webservices which I’ve seen do very little except to directly expose CRUD like operations onto databases, but these remain an external wrapper around the database. If the relational database platform has been unsuccessful in promoting this approach then perhaps the NoSQL community has been more successful. Products like CouchDB take exposing the database as a web service to their core, and other products such as Neo4J, Riak, and HBase have followed in providing various interpretations of RESTful webservices. Perhaps this could be because the kind of organisation which might adopt NoSql stores assesses & manages risks in different ways to many which will not adopt these. So in one sense here Jim Gray is absolutely correct, it’s just that this hasn’t really happened for the RDBMS.

Perhaps paralleling this, Complex Event Processing definitely seems to have gained momentum over the last few years with platforms & products growing in capability & number which enable the concurrent, near-Real-Time processing of vast streams of events. Again however the RDBMS platform has failed to achieve much traction in this space, regardless of the capabilities which have been shoe-horned on to these products. Even the languages being used in many of the CEP products are clearly inspired by SQL, but this is where the closeness between the platforms perhaps end. Jim Gray is clearly correct about there being a revolution, perhaps still on the way, which will place stream processing far more centrally in how IT can enable, but it has failed, at least to date, to be reflected in the usage, or demands, of the RDBMS.

Cubes and Data mining are an area where clearly the RDBMS market has done very well, and can be viewed as an area where, in terms of usage & demand a revolution might be seen to have occurred in the market, rather than purely inside the vendors. It is also an area where the RDBMS vendors face a lot of competition as a plethora of other approaches and platforms have exploded on to the market competing in this space, such as the Map-Reduce implementations found in products like Hadoop and Greenplum.

When it comes to the changes in the hardware-scape onto which RDBMS vendors must prepare their products for deployment the revolution that Gray describes has most definitely occurred & is still occurring in every sense, and with the growth also of flash memory & SSDs perhaps more so than he anticipated. The challenges which he describes the engineers of RDBMS platforms as facing seem as relevant as when he wrote this piece, if not more so.

In conclusion then, I think that if the revolution is viewed as a revolution in the skills & projects of the teams working on Oracle or SQL Server then maybe Jim Gray is right in practically every aspect. If it is viewed as a list of capabilities that various different DBMSs may become popular for providing, then again, he is broadly correct. If though it is to be viewed as a revolution in what the market will demand and use in an all singing, all dancing, RDBMS then I think not only is Jim Gray wrong, but that this thinking which has clearly imbued places like Oracle and SQL Server over the last few years (decades?) is perhaps why these RDBMSs have become the hulking great behemoths that they now are, so overladen with features and capabilities, that both they & the organisations that place them at their core may find it difficult to maintain a level of organisational agility which, at least through technology, might allow them to achieve a competitive advantage. In particular the paper sets me thinking (& perhaps I’m echoing them already) about the papers by Michael Stonebraker on ‘The End of an Architectural Era’.

Read Full Post »


I’ve spent a lot of the last month or so learning Erlang in my free time. It’s been a very rewarding experience.

I wanted to look at Erlang because for a number of years now I’ve been coming across very impressive systems (amongst others RabbitMQ, CouchDB & Riak) written using this language and its OTP framework. A number of the core ideas in Erlang also aligned very well with a number of my recent interests in IT, including distributed systems design, building reliable systems, functional approaches to programming, messaging, and building systems that can support working with ‘big data’. Additionally I’ve noticed over the past few years the growing influence of the ideas that Erlang has at its core, in particular the Actor approach to concurrency (which now has a firm place in the Scala/Java world thanks to the Akka framework).

So enough background to why I have found myself ever more enamoured with Erlang, what have I learned which may help you to get coding in Erlang?

Well, for starters code is organised into modules, the modules have a name (which you give them), and the file which they’re saved in will share this name, but have the extension ‘.erl’.

So your first move might be to create a file ‘hello_erlang.erl’. In that add the code:

-module(hello_erlang).

Note two things about this code straight away:

  1. The text ‘hello_erlang’ is not in quotes, this is because it’s not a string. Erlang has the concept of an atom. More on these later, but be aware that this is an atom;
  2. The full stop at the end of the line. Erlang writes a lot like English does, so as you finish a sentence with a full-stop in English, so you finish a statement in Erlang with a full-stop.

Straight away you have something which, once saved, can be compiled. So let’s do that. Save it, and then start the erlang console.

Couple of things at this point.

  1. I’m going to assume that you have erlang installed. If you don’t then I recommend the instructions I found on the Basho site, they’ve got me going on both Ubuntu & MacOS very nicely. You can find them here: http://wiki.basho.com/Installing-Erlang.html (brew FTW on the MacOS). If using Windows I’d recommend running up a VM to use and installing something like Ubuntu or CentOS on it, but JIC you don’t want to do that you can find it here http://www.erlang.org/download.html.
  2. I’m assuming that, like me, you’re doing your editing in emacs & happy at a command prompt. If neither of these things are true then I’d strongly recommend that you make them true. Getting the basics of emacs only takes about an hour, but Erlang has some integration with emacs out of the box, has other projects which offer further integration, and it’ll be worth the initial WTFness of it I think.

Ok, so you’ve got erlang installed now (and hopefully emacs too), now navigate (in the command prompt/bash) to the folder where you saved hello_erlang.erl and start erlang (type erl & hit enter). What this will give you is the erlang repl console (Read Evaluate Print Loop), in many ways like the ones you may have come across if using like Ruby or F#. From here you can straight away try some things & see what happens, like:

  1. Fred = “Fred”.
    • note the full stop to complete the statement
  2. Fred == “Fred”.
    • should evaluate to true
  3. Fred = “Bert”.
    1. should complain that ‘no match of right hand side value “Bert”‘. This is because once a variable has been assigned a value, it cannot be assigned another value. Like you’ll find in F#, variables are immutable.
  4. fred == Fred.
    • will give false, you just compared an atom to a variable that holds the value of a string
  5. Bob = bob.
    • assigns the atom bob to the variable Bob
  6. Bob == bob.
    • will evaluate to true as the variable Bob holds the atom bob, which is clearly the same as the atom bob
  7. true == false.
    • will evaluate to false because the atom true is not equal to the atom false.

More of the basic types in Erlang in a future post, but worth noting here that:

  • Variables always start with a capital letter;
  • An atom always starts with a lower case letter;
  • Variables can point to both atoms & strings (& other things, but we haven’t seen them yet).

I tend to think of atoms as kind of like powerful enums, but this comparison isn’t really fair. They’re so much more than that. Every function name, module name, true, false, etc… they’re all atoms. You don’t have to declare them up front, at the point where they are used, they exist, and they continue to exist from then forwards. (there are some gotchas around this, it’s not quite that simple, but we’ll worry about that in the future)

Anyhow, back to that module. Still in the erlang console write

c(hello_erlang).

This will compile your module and you should get this response:

{ok,hello_erlang}

Which gives us a good moment to introduce another important type in Erlang, the tuple. .NET had a tuple type introduced (IIRC) in version 4 of the framework which allows you to write things like

var myTuple = new Tuple<string,int>();

or even,

var myTuple = Tuple.New("Fred",42).

Well, erlang lets you simply write:

{"Fred",42}

for exactly the same effect. Tuples are an incredibly important thing in erlang & we’ll see a lot more of them, but for now I think it’s enough just to be aware that the response you just got by compiling your module was a tuple consisting of two atoms, and it tells you that the module was compiled ‘ok’. Which is nice.

So anyway, lets add some code to that module, and like all good devs (cough, cough) let’s start with a test.

Erlang ships with a JUnit like (so, NUnit like for those of us of a .NET persuasion) testing framework called EUnit (strange that isn’t it). To use EUnit we’re going to add some code that’s the equivalent of a using/import statement and then we’ll be able to write our test.

With the hello_erlang.erl file back open again in the editor of your choice (which I trust is emacs – though did I mention that there is Eclipse integration available for Erlang too? No? Well, move along, nothing to see here,…) add this line of code below the -module(hello_erlang). line.

-include_lib("eunit/include/eunit.hrl").

With that in place we can now write our first erlang function, here it is:

greeting_should_be_hello_erlang_test() ->
?assertEqual("Hello Erlang!", greeting()).

Few things to note here:

  • That the function name is an atom, and that the empty brackets signify that it expects no arguments to be supplied when it’s called – basically it looks a lot like a C# or Java method signature, but with out the superflous crap like access modifiers, or having to declare a return type.
  • That appending _test to the test name is mandatory. It’s how the test library will be able to recognise this function as one to execute as a test. There are no attributes/annotations available in Erlang, so functions can’t have metadata attached to them, this means that conventions such as this are necessary. Java people, you may remember this sort of thing from working with JUnit in pre 1.5 versions of Java. If I’m honest this is a bit disappointing to me, but really and truly I don’t thing it matters much.
  • The -> bit is basically identical to => in C#. It denotes that a function definition is about to be provided. Java people, apologies, but if you’re going to use a dead language then this may look unfamiliar, get on the scala train & then come back :p
  • the ? tells us that assertEqual is a macro. More on them later, a lot later.
  • We expect that the function greeting() will be equal the string “Hello Erlang”. Note, I didn’t say that the result of the function should equal the string “Hello Erlang”, especially as the greeting function takes no arguments, we should expect referential transparency!
  • No curly brace crap, but meaningful indentation.
  • We’ve embedded the test in the same file as the code we’re writing. We don’t have to do this, but I find it works quite well when doing TDD. Compiler directives can be used to ensure that the tests don’t make their way into the released versions of the compiled code.
So, now we need to create our greeting function (yes I know I’m skipping the red in red, green, refactor). This is really trivial:
greeting() ->
"Hello Erlang!".
With that added to your file (personally I’d put it between the module declaration & the include_lib statement), go back to the erlang console, recompile your code and the run your tests by writing:
hello_erlang:test().
This should result in a message that tell’s you your tests passed (all one of them). That colon is very similar to how we use a full stop to separate the name of a class from the name of one of its methods when coding in C# or Java. So where we would have written hello_erlang.test in one of those languages, here we use a colon. Simple.
Try this though to execute the greeting function:
hello_erlang:greeting().
You should get this message:
** exception error: undefined function hello_erlang:greeting/0
It fails because we haven’t exported the function, or in terms of Java/C# made it public. By default all functions are not exported (available outside the module that they’re declared in). The test method becomes available because of our use of the ?assertEqual macro, and so we are not required to export that.
To export the function add this line of code after the module declaration statement:
-export([greeting/0]).
Couple of things to note here:
  • The square brackets signify a list [] (definitely more to be said about lists, but not in this post).
  • The /0 following on from greeting signify that this is a function with 0 arguments expected. You saw it earlier in the error message that erlang gave when we tried to execute the function before we had exported it.
And for now, that’s it. I hope from this, if you’ve never seen erlang before you’ve been able to get going. But you haven’t really seen anything of either the beauty & power of the language, the OTP framework, or the other libraries & tools available for it (such as the incomparable WebMachine). In future posts I’ll try & cover these too.

Read Full Post »