Thursday, January 19, 2012

Windows Azure and Cloud Computing Posts for 1/19/2012+

A compendium of Windows Azure, Service Bus, EAI & EDI Access Control, Connect, SQL Azure Database, and other cloud-computing articles. image222

image433

Note: This post is updated daily or more frequently, depending on the availability of new articles in the following sections:


Azure Blob, Drive, Table, Queue and Hadoop Services

Mariano Vazquez described Running MongoDB on Azure and connect from a node web app in a 1/19/2012 post to the Node On Azure blog:

imageThis post explains how to use MongoDB Replica Sets from a node.js app, all hosted on Windows Azure. For this, we'll use the new Windows Azure tools for MongoDB and Node.js, which contains some useful PowerShell CmdLets that will save valuable time.

imageWe will also explain how the integration between mongo-node-azure works.

How it works

This is how MongoDB works in Azure and node.js:

  • MongoDb will run the native binaries on a worker role and will store the data in Windows Azure storage using Windows Azure Drive (basically a hard disk mounted on Azure Page blobs)
  • The good thing about using Azure Storage is that the data is georeplicated. It will also make backup easier because of the snapshot feature of blob storage (which is not a copy but a diff).
  • It will use the local hard disk in the VM (local resources in the Azure jargon) to store the log files and a local cache.
  • You can scale out to multiple Mongo Replica Sets by increasing the instance count of the Mongo Db role
  • So how the application that will connect to the Mongo replica set will know the IP address of each replica set. The way it works is, there is a startup task that runs a small executable every time a new instance is started in your application. That executable will gather the IP address of each instance running the replica set using RoleEnviroment.Roles["ReplicaSetRole"].Instances[i].InstanceEndpoints.IPEndpoint and write it down to a json file in the root folder of the role. Then there will be a module in node.js that will be listen for changes in that file. This module will provide a method to obtain the replica set addresses to use with the mongo driver. On the other hand, if there was an increase or decrease in the instance count the executable will use the RoleEnvironment.Changed event and will rewrite the json file with the new info. They had to do all that because it is not possible to access the RoleEnvironment API from node yet.
  • And last but not least, it all works in the emulator

If you want to read more about this go to Getting Started Guide - Node.js with Storage on MongoDB and the documentation from 10gen.

How to configure Mongo with Windows Azure

The first step is to create the MongoDB role (it is a worker role) that will run the Replica Sets. Open Windows PowerShell for MongoDB Node.js and navigate to the folder where you have your azure-node application. Type the following command:

Add-AzureMongoWorkerRole ReplicaSetRole 3

This will create a worker role named ReplicaSetRole with 3 instances. You can use the amount you want but in production is recommended to use at least 3 for failover, 1 instance is the equivalent to a stand-alone server.

Next, we will link both the node app (in this case sample-web) and the mongo roles, using the following command:

Join-AzureNodeRoleToMongoRole sample-web ReplicaSetRole

This is what the CmdLet will do:

  • Add two configuration settings named RoleName & EndpointName.
  • Add a startup task that launches the AzureEndpointsAgent.exe that do all the work we described in the first section.
  • Install the azureEndpoints module (that will read the json file and provide the replcia set info).

Now that we have both roles linked, let's add some code to connect to the replica set.

// Create mongodb azure endpoint
// TODO: Replace 'ReplicaSetRole' with your MongoDB role name (ReplicaSetRole is the default)
var mongoEndpoints = new AzureEndpoint('ReplicaSetRole', 'MongodPort');

// Watch the endpoint for topologyChange events
mongoEndpoints.on('topologyChange', function() {
if (self.db) {
self.db.close();
self.db = null;
}

var mongoDbServerConfig = mongoEndpoints.getMongoDBServerConfig();
self.db = new mongoDb('test', mongoDbServerConfig, {native_parser:false});

self.db.open(function(){}});
});

mongoEndpoints.on('error', function(error) {
throw error;
});

The mongoEndpoints will listen the running MongoDB Replica Set nodes and will be updated automatically if one of the nodes come on or off line (either because the instance count of the replica set role was increased/decreased or because the VM is being patched)

And that's it! You can publish this app to Windows Azure and wait for the instances to start.

You can download the node example app from here (run npm install after you extract the code to download the necessary modules).

NOTE: The MongoDB nodes take some time to initialize. If you test the application in the local emulator, using the -launch option it's probably that you get a no primary server found error. If this happens, wait a few seconds and try again.

Note: Matias Woloski (@woloski) described the Azure-hosted Node on Azure blog in a Welcome post on 1/2/2012:

imagenodeblog.cloudapp.net is a website that features articles related to node.js + Windows + Windows Azure. The blog itself runs on top of wheat on a free (for 3 months) extra small Windows Azure web role. It runs iisnode which is an open source project from Microsoft that integrates node with Windows de-facto web server: IIS. This first article talks a bit about what is our goal with this website and how you can contribute to it...

A bit of history

I started working with node last year together with Angel Lopez at Southworks while helping Microsoft to implement the backend of an online HTML5 game tankster.net. We started reading about node and inmediately got hooked, for good or bad this is a game-changer technology. For that project in particular we used the socket.io library. With just 10 lines of code we were able to relay commands between different browsers with WebSockets or long polling. That's powerful :)

Since then I got interested in node.

Contributing to the community

The node community is vibrant. There are more than 6000 packages in npm, dozens of blogs, questions in StackOverflow and more than 7000 repos in github. That much contribution drives more contribution, it's a virtuous loop. And we want to be part of that loop.

The goal of this site

The intent of this website is to share anything related to node in the context of Windows and in the context of the Microsoft cloud platform, Windows Azure. Node in Windows is in its infancy so it's a good opportunity to share what we are learning throughout the process! This is by no means a Microsoft official blog, it's just a bunch of guys that happens to work together on this subject and decided to share their knowledge.


Scott M. Fulton, III (SMFulton3) pictured below right) posted MapR CEO: Hadoop Will Be Less About NoSQL, More About Parity to the ReadWriteCloud blog on 1/19/2012:

imageLast month, veteran IDC analyst Dan Vesset predicted that while Hadoop will become a standard component of the modern data center, by 2015 the market around Hadoop will have matured at such a rate that the major players we recognize today probably would no longer exist. MapR - a commercial Hadoop provider whose name was inspired by the MapReduce programming model for Hadoop - was one of the companies on Vesset's target list for acquisition, and perhaps a ceremonial asterisk for history once Wikipedia emerges from blackout.

imageSo you might expect the predictions of MapR CEO John Schroeder (pictured below) for the year 2012 would not include obscurity for his own company. But Schroeder makes at least an arguable case: The difference, he says, between the database market in 2012 versus the one from 1992 has to do with the customer's preference to refrain from vendor lock-in, and that customer's newfound ability to ensure against it.

The portability play

image"Multiple vendors competing in the marketplace brings out the best," Schoeder tells ReadWriteWeb. "If you look at the early '90s, with Oracle, Sybase, and Informix slugging it out for building a world-class relational database engine, it was all based on ANSI-standard SQL. I'd argue that Hadoop interfaces are even more standard and portable than the interfaces were across those relational databases, because those vendors had [their own proprietary] extensions. There's more to the platform than just the programming language of SQL."

By way of a strategic partnership with EMC, MapR has quickly evolved into a first-order player in this new market. This partnership, Schroeder implies, could help serve as MapR's insurance policy against oblivion.

image_thumb3_thumb But more importantly, he believes, Hadoop's APIs are strictly standardized, so that more components of the platform are portable than for an RDBMS. "Customers could move between distributions fairly easily with fairly low switching costs," he tells us. And future innovations in the emerging big data market, he believes, can and will only happen so long as the other players in MapR's category - most prominently Cloudera and Hortonworks - work in cooperation with MapR to maintain that platform portability, and ensure their mutual plurality.

"I think having multiple vendors in the space advances the technology," the CEO remarks. This way, if some developers write an application using HBase as the interface, others use Hive, and others use Pig, while still more choose to stick with the basic MapReduce API, the application itself is still portable between the various distributions.

The beta test phase is over

Schroeder perceives Hadoop implementations in enterprises as moving past the experimental, embryonic phase, and finally entering the mission-critical stage. But isn't the fact that mission-critical applications started using data sets that were too huge for SQL relational engines, the trigger that sparked Hadoop in the first place?

"In cases where you've got very large, unstructured data sets that are not feasible for being processed using traditional data warehouses, companies will move forward with these implementations," MapR's Schroeder admits to believing. "They have applications that they wouldn't have been able to implement before, so they could be critical to their business. But the state of the Hadoop distributions a couple of years ago really wasn't a reliable compute and data store. Just eighteen months ago, if you put data in Hadoop, it was subject to data loss; and if you were running production applications, you would encounter cluster crashes. The distributions hadn't matured enough to be reliable compute and data stores. That limited the applications to being more experimental, and less business critical."

That's changing, he continues, as the commercial Hadoop providers implement the same class of features customers expect from their SQL engines, such as business continuity and data protection.

Is SaaS a threat or a blessing?

As cloud service providers find new and more clever ways to provide database services through the cloud (Amazon's Elastic MapReduce and DynamoDB, the latter just announced today, being two examples), some believe that small and medium businesses will sign on to cloud service providers for remote big data storage and management, rather than implement their own deployments on-premise. Could this possibly threaten the status of the new, commercial on-premise brands like MapR?

No, not so long as MapR gets a chance to be the engine inside these brands. One example John Schroeder provided was a defense contractor that resells its own implementation of MapR as a turnkey app for companies doing business with, or at, the Pentagon. Maybe those customers don't recognize Hadoop as the engine, but who cares? Perhaps IDC's Vesset was partly right in that the brands could fade into obscurity, but the companies behind those brands' shared technology at least have one formula for continued survival.

To enhance, not replace

Early on, the future success of the so-called "NoSQL" movement was predicted on the basis of how soon unstructured data models could take over the enterprise. Now MapR CEO John Schroeder believes that success for Hadoop and big data systems depends on how soon software developers like his own take full advantage of the new class of applications beyond the maximum reach of SQL scalability.

"From working in this market for over two-and-a-half years, there isn't much evangelism required. There's a pretty strong market pull right now, and the integrators see that market pull, so they have to integrate that in. That said, I don't see customers initially unplugging their data warehouses and replacing them with Hadoop. They augment."

One example Schroeder provided was a credit card company working to implement fraud detection functionality. A traditional SQL data warehouse is more than likely already in place, and it may work well enough but without enough granularity for an analysis system to accurately capture or isolate the sequence of events that may lead up to a fraud incident. So one smart strategy he suggested was for that same warehouse to begin storing a supplemental stream of raw transactional data, perhaps several years' worth, through Hadoop. That way, when a potential fraud incident is isolated using SQL, rapid analytics over billions of transactions may become available through Hadoop. From those analytics, a model for predicting future fraud events can be constructed that benefits both SQL and Hadoop engines.

"I think [enterprises] are introducing the Hadoop framework as a way to augment their data warehouses; and I think in the future, there'll be much greater growth in the unstructured world than in the structured world. Why would you flatten and summarize data if you could keep the raw, transactional, log data online? You're limiting the types of analytics you can do when you summarize, structure, and flatten the data."


Josh Holmes described CloudCover’s Using Windows Azure Storage from the Windows Phone presentation in a 1/18/2011:

If you haven’t found Cloud Cover, it’s a great series on Channel9 that covers a lot of great Azure topics. What I particularly like is that it’s shows a lot of very practical knowledge things and doesn’t assume a tremendous amount of knowledge.

I found this one particularly useful with the amount of mobile development that I’ve been doing recently.

Join Wade and Steve each week as they cover the Windows Azure Platform. You can follow and interact with the show at @CloudCoverShow.

In this episode, Wade walks through the NuGet packages for Windows Azure storage and Windows Phone, highlighting how easy it is to interact with blobs, tables, and queues, both directly against storage and securely through proxy services.


<Return to section navigation list>

SQL Azure Database, Federations and Reporting

imageNo significant articles today.


<Return to section navigation list>

MarketPlace DataMarket, Social Analytics and OData

Ahmed Moustafa posted Announcing OData T4 for C#, Preview 1 to the WCF Data Services blog on 1/19/2012:

imageI’m very excited to announce the release of OData T4 for C# Preview 1, for the October 2011 CTP of the next version of WCF Data Services libraries, with support for code generation of service operations. The goal of this T4 preview and subsequent ones will be to get community feedback on the templates before having “Add Service Reference” natively generate T4 templates out of the box in a future release.

Note that support for service operations was added to the T4 because many customers already use it and it has been a regularly requested feature by the community. Having said that, some support for actions/functions (which are the way forward for custom operations) will be introduced in an upcoming refresh to the template.

We are looking to refresh the template by first week of February with feedback from the community as well as any bug fixes reported as well as once more by the time WCF Data Services V3 ships

So please give your feedback at our team forum.

How do I Use it?

- Install Nuget package manager

- Install October 2011 CTP

- Add a Service Reference to an OData service

o This downloads the metadata

- Right-click Add reference-> “Manage NuGet packages”

image001

- Search for OdataT4 template

image002

- Install OdataT4-CS package

- This will add a Refrence.tt file in your project

image003

Known Issues, Limitations, and Workarounds

- I get the following error after adding the Nuget package:Running transformation: System.IO.DirectoryNotFoundException: Could not find a part of the path '<projectDirecotryRoot>\Service References\ServiceReference1\service.edmx'.T4 has incorrect edmx file path“

The current template assumes an ‘$projectRoot\ServiceRefrence1\service.edmx”. If your edmx file is in a custom location change the MetadataFilepath property, in Refrence.tt, to point to the correct path.

- There is no T4 for VB

We are looking to release our first preview of the VB template at the time of next refresh

- There is no support for spatial

Spatial support will be included at end of January refresh


Pablo Castro (@pmc) posted Format efficiency take 2: really clean JSON to the OData.org blog on 1/19/2012:

imageIt took me longer than expected to write again about this, but I have another round of measurements and another proposal that goes with it. Since folks want to close on the next version of OData soon, it would be great to iterate quickly on this one so if we all agree we can include it in this version.

imageBack then we started with some discussion about pros/cons of various options and about what to optimize for (see this thread and this post if you want to see some of the original content). I proposed a JSON-encoded-in-JSON approach that had some fans but also some folks were worried that we might be optimizing for the wrong thing. Based on that I started to look at alternate approaches so I could put more options on the table, and I ended up with something that I think has a lot of potential.

I showed a somewhat half-baked version of this at //BUILD back in September, you can see it by skipping to 00:41:55 in this video.

Trying a different angle

This time I started asking “what if we could serve really clean JSON, just the kind of JSON you'd have in a custom service, but still keep all the richness in semantics of OData?” Control information (particularly URLs) adds lots of bloat to existing JSON format payloads. If you remote it, how do things change? Check out the following chart for a typical OData feed:

So if you remove every bit of control information (e.g. those “__metadata” properties) you end up with a JSON “light” format that’s very close to the “dense” format I was exploring before, but without all the weirdness of a custom encoding. In fact, you get very nice and clean JSON, pretty much as if you built a custom endpoint. Now, there is still a small but non-zero difference between “light” and “dense”…what if we combine this with compression? The difference is even smaller:

Now the question is: is it possible to actually describe a format that’s clean JSON with no extra stuff in it that still fully maintains OData semantics? I think we can get close enough.

Approach

To put this approach in context I need to establish a key assumption I’m making: there are two big buckets of OData clients, those that just don’t care about metadata (because they are too simple to, or because the use out-of-band knowledge and are hardcoded to a particular service) and those that use service metadata in order to maintain decoupling or provide richer functionality.

For the first set, the less stuff we put in our JSON payloads the better, and they’ve hardcoded knowledge about everything else anyway, so why include it? They can derive URLs from IDs, know when to expect a list versus a single object, etc. Whether hardwiring these assumptions into your application is a good idea depends on the context, I’m not judging here :)

The second set is the interesting one then. The approach for this set of clients can be summarized as follows:

  1. All OData clients need to know about two content types, OData metadata and OData data [1]
  2. All resources contain a pointer to metadata, so a link to any part of an OData service namespace is fully self-contained and requires no out-of-band knowledge
  3. All control information that’s uniform enough (most of it) is captured as patterns in metadata
  4. Control information that doesn’t follow the pattern can be included in any instance, overriding any metadata-described value

This turns this 2-row response (from: http://services.odata.org/OData/OData.svc/Products?$top=2&$inlinecount=allpages&$format=json):

image

Into this (note that we also propose we drop the “d” wrapper):

image

In the best case all control information goes away. In order to be able to reestablish it, we put one URL per response (in "__servicemetadata") that contains a link to where to find instructions if you want to interpret this document as an OData response with full fidelity. A client can follow the metadata link and using patterns described there reconstruct all URLs, ETags, types, etc. If a given object has something different, e.g. a link that doesn’t follow the pattern, or it’s an instance of a subtype, then you just add that piece of data (e.g. “__metadata”: { “type”: “some.subtype” }).

Capturing control information as patterns

I mentioned patterns several times already. Let me make this more concrete. As we discussed before in the OData mailing list, we’re adding support for annotations to metadata using vocabularies. In order to support this JSON-based “light” format we introduce a vocabulary that captures how to derive all bits of control information from the regular object data. We’ll have the details of every pattern documented in the official spec, but here are a few to show what they look like.

This one shows the base URL for the service, and is used for all relative URLs in other patterns:

<ValueAnnotation Term="odata.urls.baseurlexpression" Target="ODataDemo.DemoService">

<String>http://services.odata.org/OData/OData.svc/</String>

</ValueAnnotation>

These two show two URL construction rules, one to obtain the URL of a collection (a set) and one to obtain the URL of an individual element within that collection:

<ValueAnnotation Term="odata.urls.setexpression" Target="ODataDemo.DemoService.Products"

String="Products/"/>

<ValueAnnotation Term="odata.urls.keylookupexpression" Target="ODataDemo.DemoService.Products">

<Apply Function="KeyConcat">

<String>(</String>

<Path>ID</Path>

<String>)/</String>

</Apply>

</ValueAnnotation>

Finally, here’s one that’s not a URL but a plain value, in this case the ETag for each element (doesn’t apply to the “Product” type, but included here as an example):

<ValueAnnotation Term="odata.json.etagexpression" Target="ODataDemo.DemoService.Products">

<Apply Function="Concat">

<String>W/"</String>

<Apply Function="RawValue">

<Path>Version</Path>

</Apply>

<String>"</String>

</Apply>

</ValueAnnotation>

Note that an interesting side-effect of this approach is that removes any knowledge of the server URL namespace from clients. In the past OData clients had to choose between the higher coupling that came from knowing the URL conventions of the server and losing the query capabilities. Now that patterns are captured in metadata a client that knows about both data and metadata content types can derive all URLs from patterns. This removes the coupling and makes it possible for servers to have their own URL conventions as long as they can be represented with annotations (yes, it means you can have a server that uses “/” instead of “(“ and “)” if you want, for those that were always unhappy with parenthesis :) )

Summary

We discussed a JSON format that’s clean and lean and doesn’t need a special coding/decoding step and still preserves a lot of the compactness. We achieve this by moving control information that’s regular enough to metadata in the form of patterns, and by linking data and metadata so clients need no out-of-band knowledge. The approach also allows servers to have different URL conventions without causing OData clients to lose any functionality.

That do you think? As usual, the OData mailing list is the best place for debate.


Ryan Duclos (@rduclos) described the Windows Azure Marketplace in a 1/19/2012 post:

imageThe Windows Azure Marketplace why would I want to use that? There are several reasons why you may want to use this Marketplace. If the core value of my business was the data I capture an API or an application in which I want to setup a subscription based offering why wouldn’t I want a Marketplace. Look at it this way, you most likely have a smart phone, whether it be a Windows Phone/iPhone/Android/… you have a Marketplace to purchase applications you use every day. I think it is great to have the same options available to a broader audience.

The Windows Azure Marketplace has a couple options for exposing your core business to the world whether it be Data or Applications. I want to look at each option separately.

imageFirst let’s look at the Data side of the Marketplace. If I was a company whose core business was the data I capture/create, and I want to expose said data to anyone who wants to consume it. A marketplace for potentially consumers is a lot easier to trust and navigate to make sure you’re getting what you want. I still see lots of companies that still distribute their data via CD/DVD without anyway to offer real-time or daily updates. Having a way to expose your data globally without having to build the API to manage subscriptions and distribution of the data is a win/win to me. As a Content Provider you own the data/store the data/choose the price/specify the terms of use, the Marketplace just acts as your broker. All you have to do is meet the Marketplace Data Publishing SLA.

Now let’s look at the Application side of the Marketplace. If I was a company wanting to offer SaaS or Cloud based solutions, I would be very interest in another way to market and build a client base. Also your application offerings don’t solely have to be available via the Marketplace, it just adds another option for your clients to find and subscribe to them. All you have to do is meet the Marketplace Application Publishing SLA.

What’s not to like about another way to generate revenue off your core business. I invite you to take a look at the Windows Azure Marketplace.


Paul Miller (@paulmiller) posted Data Market Chat: Tyler Bell discusses Factual to his Cloud of Data blog on 1/19/2012:

imageHaving received some $27 million in investment from big names like Andreessen Horowitz, LA-based Factual is one of the better funded examples of a ‘data marketplace.’ But Tyler Bell, the company’s Director of Product, is not sure that Factual necessarily fits most people’s perception of what a data marketplace should be.

Factual logoFocussed — for now — upon aggregating location data, Factual provides access by API or download to a pool of over 55 million places in the US and other territories. A key differentiator for the company is their investment in cleaning and harmonising information drawn from multiple sources. API-based services such as Crosswalk and Resolve enable developers to cope with the very different ways in which third party services like Yelp, Foursquare and Gowalla reference a single restaurant or coffee shop.

Tyler suggests, though, that location data may just be the start;

“Factual doesn’t necessarily want to be a location-only company. Really what we’re doing is we’re cutting our teeth on location now, and places… It’s just a wonderful way to learn how to refine your business and of course how to refine your technology stack… But for the immediate future, you’ll see us focus primarily on places.”

Have a listen to learn more about Factual, and to hear some of Tyler’s perspectives on the utility of good, comprehensive data. And check back on Tuesday for the next podcast in the series; Chris Hathaway of AggData.

Following up on a blog post that I wrote at the start of 2012, this is the first in a series of podcasts with key stakeholders in the emerging category of Data Markets. Future conversations, all of which will be published here, have been scheduled with AggData, BuzzData, Datamarket.com, Infochimps, Kasabi, and Microsoft. I am still adding conversations to the series, and intend to talk with more companies and with analysts and investors with insight to share.

Related articles

<Return to section navigation list>

Windows Azure Access Control, Service Bus and Workflow

Dario Renzulli (@darrenzully) described Implementing Windows Azure ACS with everyauth in a 1/20/2012 post to the Node on Azure blog:

imageIn this article we will walk you through the implementation of the Windows Azure ACS module for everyauth.

Adding Windows Azure ACS module to everyauth

imageWe forked everyauth git repo. Then, created a new module, called azureacs, following the design guidelines suggested by Brian Noguchi. We did a quick and dirty implementation just to see if the whole flow would work. Once we had it working, we refactored it and created two independent modules: node-wsfederation and node-swt.

The token format: parsing and validating SimpleWebTokens with node-swt

imageSimpleWebTokens are really simple :). Windows Azure ACS can issue SimpleWebTokens as well as SAML 1.1 or 2.0 tokens. We decided to implement SWT because it is a very simple format and it's based on HMAC256 signatures which are ubiquous in every platform.

The key method, where we validate the token is this one:

  isValid: function(rawToken, audienceUri, swtSigningKey) {
    var chunks = rawToken.split(hmacSHA256);
    if(chunks.length < 2)
      return false;

    if(this.isExpired())
      return false;

     if(this.audience !== audienceUri)
        return false;

    var hash = crypto.createHmac('RSA-SHA256', new Buffer(swtSigningKey, 'base64').toString('binary')).update(new Buffer(chunks[0], 'utf8')).digest('base64');

    return (hash === decodeURIComponent(chunks[1]));
  }

The logic basically checks

  1. There is an HMAC hash
  2. The token has not expired
  3. The audience uri (the target application for this token) matches with the one in the configuration
  4. Finaly calculates the HMAC based on the signing key set on the configuration and compare it with the one in the token
The protocol: implementing the basic ws-federation protocol with node-wsfederation

Ws-Federation is a very simple protocol. It expects an HTTP GET against the identity provider endpoint and it will produce an HTTP POST against the application with an envelope that contains the token (swt, saml, custom, etc.).

These are the key methods:

  getRequestSecurityTokenUrl: function () {
    if (this.homerealm !== '') {
      return this.identityProviderUrl + "?wtrealm=" + this.realm + "&wa=wsignin1.0&whr=" + this.homerealm;   
    }
    else {
      return this.identityProviderUrl + "?wtrealm=" + this.realm + "&wa=wsignin1.0";
    } 
  },

  extractToken: function(res) {
    var promise = {};
    var parser = new xml2js.Parser();
    parser.on('end', function(result) {
      promise = result['t:RequestedSecurityToken'];
    });

    parser.parseString(res.req.body['wresult']);
    return promise;
  }

The getRequestSecurityTokenUrl will build the url that will be used for the redirect folowing the protocol (wtrealm to specify the application, wa to specify that this is a sign in and optionally whr to specify the identity provider, if there are more than one possible)

The extractToken will simply parse the response and extract from the XML the RequestedSecurityToken element. Inside that element we will find the token.

The glue: putting it all together in everyauth

everyauth uses an interesting model for defining the whole sequenece of steps so that you don't have to nest callbacks inside callbacks. Basically you define the flow like this, and then create each function that will be called.

  .get('entryPath', 
     'the link a user follows, whereupon you redirect them to ACS url- e.g., "/auth/facebook"')          
    .step('redirectToIdentityProviderSelector')
      .accepts('req res')
      .promises(null)

  .post('callbackPath',
       'the callback path that the ACS redirects to after an authorization result - e.g., "/auth/facscallback"')
    .step('getToken')
      .description('retrieves a verifier code from the url query')
      .accepts('req res')
      .promises('token')
      .canBreakTo('notValidTokenCallbackErrorSteps')
      .canBreakTo('authCallbackErrorSteps')
    .step('parseToken')
      .description('retrieves a verifier code from the url query')
      .accepts('req res token')
      .promises('claims')
      .canBreakTo('notValidTokenCallbackErrorSteps')
    .step('fetchUser')
      .accepts('claims')
      .promises('acsUser')
    .step('getSession')
      .accepts('req')
      .promises('session')      
    .step('findOrCreateUser')
      .accepts('session acsUser')
      .promises('user')
    .step('addToSession')
      .accepts('session acsUser token')
      .promises(null)
    .step('sendResponse')
      .accepts('res')
      .promises(null)

Here are the most important steps

  .redirectToIdentityProviderSelector( function (req, res) {
    var identityProviderSelectorUri = this.wsfederation.getRequestSecurityTokenUrl();

    res.writeHead(303, {'Location': identityProviderSelectorUri});
    res.end();
  })

  .getToken( function (req, res) {
    var token = this.wsfederation.extractToken(res);

    if (this.tokenFormat() === 'swt') {
      var str = token['wsse:BinarySecurityToken']['#'];
      var result = new Buffer(str, 'base64').toString('ascii'); 
    }
    else {
      return this.breakTo('protocolNotImplementedErrorSteps', this.tokenFormat());
    }

    if (this._authCallbackDidErr(req)) {
      return this.breakTo('authCallbackErrorSteps', req, res);
    }

    return result;
  })

  .parseToken( function (req, res, token) {
    if (this.tokenFormat() === 'swt') {
      var swt = new Swt(token);
      if (!swt.isValid(token, this.realm(), this.signingKey())) {
        return this.breakTo('notValidTokenCallbackErrorSteps', token);
      }
      return swt.claims;
    }

    return this.breakTo('protocolNotImplementedErrorSteps', this.tokenFormat());
  })
Conclusion

Integrating with everyauth was simple once we understood how it works. Anyway, we created two reusable modules node-swt and node-wsfederation that can be used to implement support for connect-auth or passport. By using the azureacs module you will be able to provide single sign on for multiple applications in different domains and platforms and also the ability to integrate with enterprise customers that use ADFS, SiteMinder or any other ws-federation identity provider.

I would like to thanks my co-workers @jpgd and @woloski from Southworks because they helped shaping this package.


Rick Garibay (@rickggaribay) explained Common Service Bus Queue Operations with the REST API in a 1/19/2012 post:

imageAzure Service Bus Brokered Messaging provides durable pull-based pub-sub, complimenting it’s older sibling Relay Messaging which uses a push messaging model. While both enable hybrid composition across traditional business, trust and network boundaries, they provide unique capabilities in and of themselves.

imageAs with Relay Messaging, Brokered Messaging provides first class support for WCF with the NetMessagingBinding, but expands the developer surface to general .NET and cross-platform/mobility scenarios by offering the .NET Client and REST API respectively.

Of the 3 APIs, the .NET Client API is the most robust and seems to be the most documented.

The simplicity of the WCF programming model (the illusion that messages are being pushed to your endpoint) is balanced with some restrictions that naturally fall out of the scope of one-way messaging including queue/topic/subscription/rule creation and support for peek lock.

In this regard, while not as robust as the .NET Client API, the REST API offers a more comprehensive feature set and when working on solutions that must be interoperable across client platforms or due to other restrictions, the REST API is a great choice.

Microsoft has documented the REST API in the Service Bus REST API Reference, but there are not a ton of imperative examples out there that show WebClient or HttpWebRequest, so the purpose of this post is to share some nitty gritty examples of how to get some of the most common operations done in C#.

Please note that my goal is not to be elegant or use the tersest or most fluid syntax possible in this samples, but rather to get some quick and dirty examples out there, well, quickly.

As such, the unit tests should be self explanatory, but if you have any questions, please don’t hesitate to ask. …

Rick continues with 202 lines of C# source code.

 

 


<Return to section navigation list>

Windows Azure VM Role, Virtual Network, Connect, RDP and CDN

Avkash Chauhan (@avkashchauhan) described Windows Azure application VM and (virtual) IP Address in a 1/19/2012 post:

imageTime to time, I get involved with our Windows Azure partners to discussion IP address configuration in Windows Azure Virtual Machine so I decided to write this article to point out most of the configuration:

  • imageWhen you have your application running in Windows Azure, your application gets a virtual IP address from a pool of available virtual IP address. This IP address is what you see when you ping to your service or this is the one single IP address used by all of your instances sitting behind the load balancer.
  • For example let’s assume you have two instance of your application.
    • When both the instance are starting, each one get its own internal IP address
      • Let’s assume 10.0.0.1 and 10.0.0.2
    • These internal IP address related with each instances are linked with your application Virtual IP address, i.e. 65.52.14.112
    • Finally both 10.0.0.1 and 10.0.02 IP address are bind to Load balancer over VIP 65.52.14.112
    • So when any outside request comes to your application, it first come to Windows Azure load balancer. Load balancer knows all the instances related with your application and depend on load balancing algorithm load balancer route the outside connection to appropriate instance.
  • In nutshell, when your VM starts, the VIP is used to bind with Load balancer. So if you have only 1 instance or multiple instances of the same service, Load Balancer knows how to route your call to appropriate instance. for outside world it does not matter which internal IP address was used with which specific instance.
  • Even when you have more than 1 instance, the VIP which is associated with your service will be one single Virtual IP Address, This is the same address, which will available to each of your input endpoint as configured to your service. For example:
    • If your Service shows VIP 65.52.14.112
    • The if you have Web Role enabled on port 80 then you will see input endpoint as - 65.52.14.112:80
    • For SSL enabled web role, the input endpoint will be listed as - 65.52.14.112:443
    • For RDP enabled virtual machine will have input endpoint as 65.52.14.112:3389
  • If you have RDP enabled in your application, and login into your virtual machine, IP address you will see will be the internal IP address. If you have more than 1 instance, each instance will have its own internal IP address. But please be sure that the IP address you could see inside your virtual machine is not accessible to outside world. The outside world connect to your service only through the Virtual IP address.
  • If you have any requirement to use your application IP address to either add in your firewall exception list or any other reason, you can just use the VIP. This also remain true for SQL Azure also.
  • As long as you don’t delete your service, the VIP will remain same. So if you have any requirement to keep your VIP intact, be sure to do not delete your deployment while updating and you can guarantee to keep the IP address intact.

Avkash has a new Twitter avatar. I like it better than the one it replaced.


<Return to section navigation list>

Live Windows Azure Apps, APIs, Tools and Test Harnesses

Market Wire reported Geminare Announces a Strategic Alliance With Microsoft to Bring Recovery as a Service to the Windows Azure Cloud in a 1/19/2012 press release (via the Azure Cloud on Ulitzer blog):

Geminare, a leader in the Recovery as a Service (RaaS) industry, today announced a strategic alliance with Microsoft Corp., a worldwide leader in software, services and solutions, whereby Geminare will deliver its award-winning RaaS Solution Suite through the Windows Azure cloud from Microsoft.

Geminare enables the transition of premises-based software to cloud-enabled solutions through its patented Cloud CORE Platform, a proven, mature, multi-tiered service delivery vehicle that is the foundation of Geminare's entire RaaS data protection suite. Beginning with a scheduled 2012 release of its Cloud Storage Assurance (CSA) email and file archiving service, Geminare will provide access to its RaaS data protection suite within the Windows Azure Marketplace, rendering it widely available through the extensive Microsoft Partner Network community.

"To serve our mutual customers, strategic leaders in emerging markets are creating innovative solutions on Windows Azure. One such leader is Geminare, which is widely acknowledged as a leader in the RaaS market. Its rich portfolio, coupled with the strengths and reach of Windows Azure, represents enormous potential for customers and is a welcome addition to the Windows Azure partner community," said Walid Abu-Hadba, Corporate Vice President, Developer and Platform Evangelism, at Microsoft Corp. "We are proud to have this alliance with Geminare."

According to Gartner, "By 2014, 30 percent of midsize companies will have adopted recovery-in-the-cloud, also known as recovery-as-a-service (RaaS), to support IT operations recovery, up from just over 1 percent today."

imageJohn Morency, Gartner's Research Vice President covering the management of disaster recovery and IT resiliency, stated, "RaaS has been hailed as a 'killer' app for DR in the Cloud, but the true 'killer' app stems from rapidly enabling the global partner community to provide RaaS solutions to their customers, easily and efficiently." Morency added, "The Geminare and Microsoft Strategic Alliance does just that. Geminare's RaaS portfolio delivered from the Windows Azure cloud eliminates the barriers to entry to the emerging RaaS market for resellers worldwide, setting the stage for the explosive growth we project."

Joshua Geist, Geminare's CEO, said, "Geminare is ecstatic to have been chosen by Microsoft for Recovery as a Service enablement on the Windows Azure cloud. There is no better cloud provider than Microsoft to optimize the power of RaaS through its comprehensive cloud strategy, massive partner network and best-in-class technology and market experience." Geist added, "This alliance delivers the foundation through which RaaS can scale to the market penetration levels projected by Gartner and beyond." Geist further added, "The public cloud marketplace has exploded virtually overnight, offering businesses the choice of low-cost compute and storage services never before seen in the industry and signalling a shift in the underlying requirements for the delivery and support infrastructure." Geist continued, "However, in order for businesses to truly capitalize on this evolution, there is a need for access to a wide range of applications and services which provide businesses with direct support and functionality."

Geminare's Recovery as a Service Suite will be delivered through the global Windows Azure marketplace as a highlighted solution and will allow OEM channel partners to offer Windows Azure-backed RaaS solutions directly to their customers and partners. Geminare's underlying Cloud CORE platform delivers the provisioning, billing, licensing, support and management capabilities for the RaaS suite from a single hosted platform, allowing partners to enter the RaaS market immediately and with ease.

About Geminare
Geminare enables ISVs to transition their products into Cloud-based offerings with a focus on the Recovery as a Service (RaaS) market. Geminare's award-winning patented Cloud CORE Platform, a proven, mature, multi-tiered service delivery vehicle that is the foundation of Geminare's entire RaaS data protection suite, has allowed leading and innovative companies such as CA Technologies, OpSource, Arrow, Iron Mountain, CenturyLink, Hosting.com, Bell, Allstream, Ingram Micro, LexisNexis, Long View Systems and many others, to enter the RaaS market with their own suite of data protection Cloud offerings. Geminare is headquartered in Mountain View, CA, with additional operations in Toronto, Canada.

www.geminare.com


Eric Nelson (@ericnel) reported Just published my first node.js Windows Azure application in a 1/19/2012 post:Quantcast

Turned out to be simplicity itself (thanks to this sweet tutorial). Great work team on the Windows Azure SDK for node.js and some lovely Powershell integration.

image

Which in the management portal shows up as:

image

imageCurrently running (Thursday 19th Jan 2012) at http://ericnelnode1.cloudapp.net/ with the exciting output of:

image


<Return to section navigation list>

Visual Studio LightSwitch and Entity Framework 4.1+

image_thumb1No significant articles today.


Return to section navigation list>

Windows Azure Infrastructure and DevOps

Wely Lau (@wely_live) started An Introduction to Windows Azure (Part 1) series for Red Gate Software’s ACloudyPlace blog on 1/19/2012:

imageWindows Azure is the Microsoft cloud computing platform which enables developers to quickly develop, deploy, and manage their applications hosted in a Microsoft data center. As a PAAS provider, Windows Azure not only takes care of the infrastructure, but will also help to manage higher level components including operating systems, runtimes, and middleware.

imageThis article will begin by looking at the Windows Azure data centers and will then walk through each of the available services provided by Windows Azure.

Windows Azure Data Centers

Map showing global location of datacenters

Slide 17 of WindowsAzureOverview.pptx (Windows Azure Platform Training Kit)

Microsoft has invested heavily in Windows Azure over the past few years. Six data centers across three continents have been developed to serve millions of customers. They have been built with an optimized power efficiency mechanism, self-cooling containers, and hardware homogeneity, which differentiates them from other data centers.

The data centers are located in the following cities:

  • US North Central – Chicago, IL
  • US South Central – San Antonio, TX
  • West Europe – Amsterdam
  • North Europe – Dublin
  • East Asia – Hong Kong
  • South-East Asia – Singapore

Windows Azure Datacenters- aerial and internal views

Windows Azure data centers are vast and intricately sophisticated. Images courtesy of Microsoft http://azurebootcamp.com

Windows Azure Services

Having seen the data centers, let’s move on to discuss the various services provided by Windows Azure.

Microsoft has previously categorized the Windows Azure Platform into three main components: Windows Azure, SQL Azure, and Windows Azure AppFabric. However, with the recent launch of the Metro-style Windows Azure portal, there are some slight changes to the branding, but the functionality has remained similar. The following diagram illustrates the complete suite of Windows Azure services available today.

The complete suite of Windows Azure services available today

The complete suite of Windows Azure services available today …

Wely continues with descriptions of Windows Azure’s Core Services, and concludes:

Coming up in my next article, I will carry on the discussion with the additional services that Windows Azure offers including ‘Building Block Services’, Data Services, Networking and more so make sure you keep an eye out for it because it’s coming soon!

Full disclosure: I’m a paid contributor to ACloudyPlace.


Ernest Mueller (@ernestmueller) asked Why Does Cloud Load Balancing Suck? in a 1/19/2012 post to The Agile Admin blog:

imageBack in the old world of real infrastructure, we used Netscalers or F5′s and we were happy. Now in the cloud, you have several options all of which seem to have problems.

1. Open source. But once you want SSL, and redundancy, and HTTP compression, you get people saying with a straight face “nginx (for HTTP compression) –> Varnish cache (for caching) –> HTTP level load balancer (HAProxy, or nginx, or the Varnish built-in) –> webservers.” (Quoted from Server Fault). Like four levels, often with the same software twice in it. And don’t forget some kind of heartbeat between the two front-ends. Oh look I’ve spent $150/mo on just machines to run my load balancing. And I really want to load balance/failover between all my tiers not just the front end. It’s a lot of software parts to go wrong.

2. Zeus. For some reason none of the other LB vendors have gotten off their happy asses and delivered a good software load balancer you can use in Amazon. I got tired of talking to our Netscaler reps about it after the first couple years. They’re more interested in selling their hardware to the cloud data centers than helping real people load balance their apps. Zeus is the only one – and it’s really quite expensive

3. Amazon ELBs. These just have a lot of problems under the hood. We’ve been engaged with Amazon ELB product management on them – large files serve out super slow; users get hits refused due to throttling/changes during ELB scaling – basically if you want 100% of your hits to come through you can’t use them. [Emphasis added.]

4. Geo-IP load balancing, through Dyn or whoever. They claim to have the failover problem fixed, but it still only works for the front end tier of what is a multitier architecture. I certainly don’t want to have to advertise every internal IP in external DNS to make load balancing work.

And really the frustrating part is there seems to have been no headway on any of this stuff in a decade. Same old open source options, same old techniques. Can someone come up with a way to load balance on the cloud that a) doesn’t lose any hits, b) is one thing not 4 things, and c) is useful for front and back end balancing? Seems like a necessary part of oh say every system ever, why is it still so hard?


<Return to section navigation list>

Windows Azure Platform Appliance (WAPA), Hyper-V and Private/Hybrid Clouds

Kristian Nese (@KristianNese) announced Sessions from NIC 2012 – [App Controller] now available on 1/19/2012:

imageAs I recently wrote earlier this week, the NIC conference was held for the first time in January 2012 here in Norway in Oslo Spektrum.

I had two sessions, one session where I explained Cloud Computing and especially thePrivate Cloud, and one session where I introduced System Center App Controllerwith the cloud, explaining the service concept in both VMM 2012 (Private Cloud)and Windows Azure (Public Cloud).

You can watch the “App Controller session” here: http://vimeo.com/nicconf/review/35056290/3bbb35aab9

I will post my Private Cloud session once it`s available.


Kevin Remde (@KevinRemde) continued his SCVMM in System Center 2012 and managing Citrix XenServer (So Many Questions. So Little Time. Part 5.) series on 1/19/2012:

imageThis question came from Randy at our TechNet Event in Saint Louis:

“Are the VMM management capabilities for a Citrix XenServer based VM the same as for a Hyper-V based VM? (example: can you still live migrate, change RAM, etc?)”

imageThe simple answer is: The capabilities are pretty much the same. Live Migration, for example, will simply drive VMs in a managed pool through XenMotion. There are a couple of considerations in some areas (such as XenServer Templates and networking), but being able to build a Cloud of Citrix XenServer resources right alongside Hyper-V or VMware based clouds is pretty amazing.

For a full overview on Managing Citrix XenServer using System Center Virtual Machine Manager, CLICK HERE.

For system requirements, CLICK HERE.

Note that managing Citrix XenServer in SCVMM requires Citrix’s “Microsoft System Center Integration Pack”, which can be ACQUIRED HERE.


<Return to section navigation list>

Cloud Security and Governance

No significant articles today.


<Return to section navigation list>

Cloud Computing Events

Bruno Terkaly (@brunoterkaly) announced on 1/19/2012 the availability of a Web Cast - Building a Massively Scalable Platform for Consumer Devices on Windows Azure:

imageWelcome to our first web cast

This morning on http://livestream.com/clouduniversity Bret Stateham and I delivered our first pilot webcast. We’re very proud about the way it went. Pretty much went without a hitch. Must be beginner’s luck.

imageThe webcast is based on a talk I did at the ATT developer summit in Las Vegas last week. If you’ve been following the previous posts, this represents the live presentation of that material, where I illustrate how to build a restful service and deploy it to the Microsoft cloud. I also illustrate how to consume this restful service from multiple clients, including mobile clients , such as Windows Phone.

Because REST is an open standards based on http, almost any device can consume and interact with this restful service, including jQuery, HTML 5, WPF, Java - you name it.

Building a Massively Scalable Platform for Consumer Devices on Windows Azure.


Jim O’Neil (@jimoneil, pictured below) announced on 1/19/2012 Satya Nadella at MassTLC Cloud Mixer on 1/26/2012:

imageMassTLC and the New England Research & Development (NERD) Center are hosting a free mixer featuring Satya Nadella, the President of Microsoft’s Server and Tools Business, at NERD next Thursday, January 26th, commencing with networking from 5 – 6:15 p.m., and continuing with Sayta’s presentation and Q&A from 6:15 to 7:15.

Cloud Optimizing Every Business

Mass TLCThe transition to cloud computing has been talked about as one of the most profound shifts occurring in technology in decades. With the huge growth and broad range of computing devices increasingly available, we see a shift in “design point” to a world of connected devices and continuous services. In this talk, Satya Nadella, President, Server and Tools Business at Microsoft, will share what he learned running a global online service, Bing, and how these lessons are informing the direction of Microsoft’s cloud strategy.

Nadella is one of very few people in the world who can speak first-hand about running an extremely large-scale cloud computing business. Having previously led R&D for Microsoft’s Online Services Division, which includes Bing and MSN, Nadella has practical and deep experience with cloud both from a technical and business perspective. Today, Nadella brings these experiences to bear in his current role, which includes accountability for the overall business and technical vision, strategy, operations, engineering and marketing for Microsoft's $17+ billion Server and Tools Business.

Join us to hear how Microsoft is focused on building a platform that spans public and private clouds to enable businesses to take advantage of this new design point.


Travis Wright described Upcoming Learning Opportunities for System Center 2012 in a 1/18/2012 post:

imageSo - now that the Release Candidate of System Center 2012 is out and the general availbility is fast approaching you may be starting to get more serious about getting up to speed on System Center 2012. Am I right?

Don't even worry! We are here to help you get up to speed fast with lots of different opportunities to learn from Microsoft presenters, MVPs, and other experts in System Center.

Here is a list of some of the upcoming events:

System Center Universe

January 19th in Austin, TX and webcast live around the world. That's tomorrow!!

We have a great lineup of speakers from Microsoft, MVPs, and other experts.

This event was sponsored by Microsoft and some of our partners and is the first of its kind.

Check out the Agenda and Speakers. While you are at it check out the Sponsors!

Register here: http://www.systemcenteruniverse.com/UserGroupViewings

There is also a version of it in Asia which you can attend in person or watch the live stream:

http://www.systemcenteruniverse.asia/

Microsoft Jump Start - Creating and Managing a Private Cloud with System Center 2012

This is a Microsoft produced two day training presented by our Technical Product Managers for free as a live virtual classroom.

February 21-22, 2012 9:00 AM - 5:00 PM PST

You can see the course outline, speakers, and register at the site:

http://mctreadiness.com/MicrosoftCareerConferenceRegistration.aspx?pid=298

Microsoft Management Summit 2012

Last, but certainly not least is the Microsoft Management Summit. This is the big daddy. An entire week of nothing but System Center and management! There are literally hundreds of sessions, self-paced labs, instructor-led labs, birds of a feather sessions, etc.

It will be held in Vegas at the Venetian again this year.

April 16-20

You can see the agenda, sponsors, and register at the MMS site:

http://mms-2012.com

Hurry, early bird registration that saves you $275 ends on January 27th!


<Return to section navigation list>

Other Cloud Computing Platforms and Services

Simon Munro (@simonmunro) explained What DynamoDB tells us about the future of cloud computing in a 1/19/2012 post:

imageBeyond the real world behaviour of DynamoDB and its technical comparison to Riak/Cassandra/others, there is something below the technical documentation that gives a clue to the future of cloud computing. AWS is the market leader and their actions indicate what customers are asking for, what is technically possible, and what is a good model for cloud computing. So, here are some of my thoughts on what DynamoDB means in the broader cloud computing market.

Scalability is a service

imageThe most interesting part of DynamoDB has to be the pricing which allows you to pay for the capacity you need (not what you consume). If you want things to run faster (higher throughput), you buy extra units of capacity. This means that the scalability is wrapped up in the service, rather than the infrastructure i.e. it is not as fast/slow for everyone, it is faster for those who pay more.

High performance is commodity

One of the fundamental principles of cloud computing is that the base compute units are commodity devices. In cloud computing there is no option for custom high performance infrastructure that solves particular problems, as we often see with on premise SQL databases. But it is inevitable that these simple commodity units will become higher performing over time and the SSD basis of DynamoDB illustrates this trend. A service with single digit millisecond response times reframes ‘commodity’.

IaaS is dead

I have pointed out before that AWS is not IaaS and every service that they add seems to push them further up the abstraction stack (towards PaaS). If you want low latency in your stack, pay for a (platform) service, not infrastructure. Which leads to mentioning the EBS infrastructure…

Virtualised storage is too generic and slow

It is interesting that the SSD requirement for the barrel-of-laughs that is EBS has been skipped and those SSDs have been allocated to a less infrastructure-oriented storage mechanism. A lot of the ‘AWS sucks’ rhetoric has been because people have a database of sorts on EBS backed storage, suffering the inevitable performance knock – DynamoDB starts pointing clearly towards the architecturally meaningful and technically viable alternative.

Pricing is complex

While consumption based cost generally works out better in the long run, it makes working out and optimising costs really difficult. Unfortunately this makes the cloud computing benefit difficult to understand and articulate as risk averse buyers stick with the ‘cost of hosted machine’ model that they are familiar with. There are now so many dimensions to optimising costs (including the problem presented in DynamoDB of having to change your capacity requirements based on demand) and stable, complete cost models don’t exist – so working out how much things are going to cost over the lifecycle of an application is really hard.

Competitors are stuck

AWS continues to beat the drum on cloud computing innovation and competitors are left languishing. At some point you almost have to stop counting the persistence options available on AWS (EBS, S3, RDS, SimpleDB, DynamoDB, etc.) where competitors have less to offer. Windows Azure Table storage, the Microsoft equivalent key-value store, has barely changed in two years despite desperate pleas for product advancement (secondary indexes, order by).

Vendor lock-in is compelling

As much as there may be a fear of being locked in to the AWS platform, in many cases using DynamoDB is a lot easier than the alternatives. Trying to get Riak setup on AWS (or on premise) to offer the same functionality, performance and ease of use may be so much hassle, and require such specialised skills, that you may be happy to be locked in.

NoSQL gains ground

DynamoDB seems to offer a credible shared state solution that allows for high write throughput, something that SQL is traditionally good at. The option to set a parameter for strongly or eventually consistent reads is a cheeky acknowledgement that CAP theorem bias is your runtime choice. I don’t see DynamoDB replacing RDS, but does add more credibility to, and acceptance of, NoSQL/SQL hybrid models within applications.


Ted Samson (@tsamson_IW) asserted “The hosted database service builds on the strengths of SimpleDB by boosting flexibility and reducing latency” in a deck for his Amazon DynamoDB brings speedier NoSQL to the cloud article of 1/19/2012 for InfoWorld’s Tech Watch blog:

imageAmazon announced today DynamoDB, a fully managed cloud-based NoSQL database service that builds on the company's SimpleDB service by delivering faster, more consistent database performance to keep pace with the demands of ever-scaling cloud apps.

Amazon DynamoDB brings speedier NoSQL to the cloud

The secret sauce here is Amazon's homegrown Dynamo nonrelational database architecture, which the company built to suit the demands of its complex, service-oriented e-commerce architecture. Designed to be a highly reliable, ultrascalable key/value database, Dynamo has inspired such offerings as Red Hat's Infinispan data grid technology and Apache Cassandra.

imageBut Dynamo, despite being more robust than SimpleDB, hasn't enjoyed broader adoption because "it did nothing to reduce the operational complexity of running large database systems," according to Amazon CTO Werner Vogels.

Indeed, SimpleDB's strength is its simplicity, as its moniker implies: It provides a straightforward table interface and a flexible data model while eliminating headaches associated with configuration, patching, replication, or scaling.

With DynamoDB, Amazon has attempted to bring together the best of both worlds: Dynamo's superior scalability, performance, and consistency delivered as an easy-as-pie service, effectively eliminating the complexity of forecasting and planning database deployments. Adding capacity takes a few clicks via the management console.

DynamoBD addresses four of SimpleDB's more significant shortcomings, according to Vogels:

  • With SimpleDB, users need to add dataset containers, called domains, in increments of 10GB.
  • SimpleDB indexes all attributes to each item stored in a domain, which means that every database write results in an update of not just the basic record, but all attribute indices. This can result in performance hiccups due to latency, especially as a dataset increases in size.
  • As is the tendency among NoSQL databases, SimpleDB takes an "eventually consistent" approach to data presentation, which can be up to a second in duration.
  • SimpleDB's pricing, based on "machine hours," has proven complex. …

Read more: 2, next page ›


Steven O’Grady (@sogrady) posted Amazon DynamoDB: First Look to his RedMonk Tecosystems blog on 1/19/2012:

This paper described Dynamo, a highly available and scalable data store, used for storing state of a number of core services of Amazon.com’s e-commerce platform. Dynamo has provided the desired levels of availability and performance and has been successful in handling server failures, data center failures and network partitions. Dynamo is incrementally scalable and allows service owners to scale up and down based on their current request load.

Dynamo allows service owners to customize their storage system to meet their desired performance, durability and consistency SLAs by allowing them to tune the parameters N, R, and W.

- “Dynamo: Amazon’s Highly Available Key-value Store [PDF],” Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels


imageIn October 2007, Amazon published a paper describing an internal data store called Dynamo. Incorporating ideas from both the database and key-value store worlds, the paper served as the inspiration for a number of open source projects, Cassandra and Riak being perhaps the most visible of the implementations. Until yesterday, these and other derivative projects were the only available Dynamo implementations available to the public, because Amazon did not expose the internally developed database as an external service. With Wednesday’s launch of Amazon DynamoDB, however, that is no longer true. Customers now are able to add Amazon to their potential list of NoSQL suppliers, although to be fair they’ve technically been in market with SimpleDB previously.

imageThe following are some points of consideration regarding the release, its impact on the market and likely customer questions.

AWS versus Hosted

imageThe most obvious advantage of DynamoDB versus its current market competition is the fact that it’s already in the cloud, managed and offering consolidated billing for AWS customers. Requiring minimal setup and configuration versus native tooling, a subset of the addressable market is likely to be of a similar mindset to this commenter on the DataStax blog:

“Cassandra’s tech is superior, as far as I can tell. But we’ll probably be using DynamoDB until there is an equivalent managed host service for Cassandra. Moving to Cassandra is simply too expensive right now.

All those are clearly better served by a service like DynamoDB than trying to run their own Cassandra clusters unless they happen to be very proficient in Cassandra administration and want to dedicate precious human resources to administration. That takes a lot of the benefits of “cloud” away from small and mid-sized companies where cost and management are the limiting factors.”

For many, outsourcing the installation, configuration and ongoing management of a data infrastructure is a major attraction, one that easily offsets a reduced featureset. Like Platform-as-a-Service (PaaS) offerings, DynamoDB offers time to market and theoretical cost advantages when required capital expense and resource loading are factored in.

Like the initial wave of PaaS platforms, however, DynamoDB is available only through a single provider. Unlike Amazon’s RDS, which is essentially compatible with MySQL, DynamoDB users will be unable to migrate off of the service seamlessly. The featureset can be replicated using externally available code – via those projects that were originally inspired by DynamoDB, for example – but you cannot at this time download, install and run DynamoDB locally.

It’s true that the practical implications of this lack of availability are uncertain. NetFlix’ Adrian Cockroft, for example, asserts that migration between NoSQL stores is less problematic than between equivalent relational alternatives, because of the lower complexity of the storage, saying “it doesn’t take a year to move between NoSQL, takes a week or so.” It remains true, however, that there are customers that postpone upgrades to newer versions of the same database because of the complexity involved. And that’s without considering the skills question. Given the uncertainty involved, then, it seems fair to conclude that the proprietary nature of DynamoDB and the potential switching costs will be – at least in some contexts – a barrier to entry.

The question for users is then similar to that facing would be adopters of first generation PaaS solutions: is the featureset sufficient to compel the jeopardizing of later substitutability? Amazon clearly believes that it is, its competitors less so. EMC’s Mark Chmarny, additionally, notes that Amazon may be advantaging adoption at the expense of migration in its pricing model.

Competition

DynamoDB clearly has the attention of competitive projects. Basho – the primary authors of Riak – welcomed DynamoDB in this post while pointing out the primary limitation, and DataStax wasted little time spinning up a favorable comparison table. One interesting aside: the Hacker News discussion of the launch mentioned Riak 23 times to Cassandra’s three.

Basho and Datastax are right to be concerned, because the combination of Amazon’s increasingly powerful branding and the managed nature of the product make it formidable competition indeed. The question facing both Amazon and competitors is to what extent substitutability matters within the database space. Proprietary databases have had a role in throttling the adoption of PaaS services like Force.com and Google App Engine in the past, but we have very few market examples of standalone, proprietary Database-as-a-Service (DaaS) offerings from which to forecast. Will DaaS or more properly NoSQL-as-a-Service be amenable to single vendor products or will they advantage, as they have in the PaaS space, standardized platforms that permit vendor choice?

The answer to that is unclear at present, but in the meantime expect Amazon to highlight the ease of adoption and vendors like Basho and DataStax to emphasize the potential difficulties in exiting, while aggressively exploring deeper cloud partnerships.

NoSQL Significance

It’s being argued in some quarters that DynamoDB is the final, necessary validation of the NoSQL market. I do not subscribe to this viewpoint. By our metrics, the relevance of distinctly non-relational datastores has been apparent for some years now. Hadoop’s recent commercial surge alone should have been sufficient to convince even the most skeptical relational orthodoxies that traditional databases will be complemented or in limited circumstances replaced by non-relational alternatives in a growing number of enterprises.

Throughput Reservation

Perhaps the most compelling new feature of Amazon’s new offering isn’t, technically speaking, a feature. Functionally, the product is (yet) another implementation of the ideas in the Dynamo paper; Alex Popescu has comprehensive notes on the feature list. Receiving the most attention aren’t technical capabilities like range queries but rather the concept of provisioned throughput, levels which can be dynamically adjusted up or down.

This type of atomic service level provisioning is both differentiating and compelling for certain customer types. Promising single digit latency at a selected throughput level with zero customer effort required is likely to be attractive for customers that require – or think they require – a particular service level. And by requiring customers to manually determine their required provisioning level, Amazon stands to benefit from customer overprovisioning; customers will feel pain if they’re under-provisioned and react, but conversely may fail to observe that they’re over. Much like mobile carriers, Amazon wins in both scenarios.

Timing

With DynamoDB having been extant in some form since at least 2007, one logical question is: why now? Amazon did not detail their intent with respect to timing when they prebriefed us last week, but their track record demonstrates a willingness to be first to market balanced with an understanding of timing.

In 2006, Amazon launched EC2 and S3, effectively creating the cloud market. This entrance, however, was built in part from the success of the Software-as-a-Service (SaaS) market that preceded it; Salesforce, remember, went public in 2004. With enterprises now acclimated to renting software via the network, the market could be considered primed for similar consumption models oriented around hardware and storage.

Three years later after the debut of EC2 and S3, and one year after MySQL had achieved ubiquity sufficient to realize a billion dollar valuation from Sun [coverage], Amazon launched the first cloud based MySQL-as-a-Service offering [coverage]. That same year, the first year that Hadoop was mainstream enough to justify its own HadoopWorld conference, Amazon launched Elastic MapReduce.

The pattern is clear: Amazon is unafraid to create a market, but attempts to temper the introductions with market readiness. Logic suggests that the same tactic is at work here.

NoSQL has, as a category, crossed the chasm from interesting science project to alternative data persistence mechanism. But while NoSQL tools like Cassandra and Riak are available in managed form via providers like Joyent and Heroku, DynamoDB is, in Popescu’s words: “the first managed NoSQL databases that auto-shards.”

It is also possible that SSD pricing contributed directly to the launch timing, with pricing for the drive type down to levels where the economics of a low cost shared service finally make sense.

SSDs

One underdiscussed aspect to the Dynamo launch is the underlying physical infrastructure, which consists solely of SSDs. This is likely one of the major contributing factors to the performance of the system, and in some cases will be another incentive to use Amazon’s platform as many traditional datacenters will not have equivalent SSD hardware available to them.

The Net

While discussion of the DynamoDB offering will necessarily focus on functional differentiation between it and competitive projects, it is likely that initial adoption and uptake will be primarily a function of attitudes regarding lock-in. For customers that want to run the same NoSQL store on premise and in the cloud, DynamoDB will be a poor fit. Those who are optimizing for convenience and cost predictability, however, may well prefer Amazon’s offering.

Amazon would clearly prefer the latter outcome, but both are likely acceptable. Amazon’s history is built on releasing products early and often, adjusting both offerings and pricing based on adoption and usage.

In any event, this is a notable launch and one that will continue to drive competition on and off the cloud in the months ahead.

Disclosure: Basho is a RedMonk client, while Amazon and DataStax are not.


Todd Hoff (@toddhoffious) asked Is it time to get rid of the Linux OS model in the cloud? in a 1/19/2012 post to his High Scalability blog:

imageYou program in a dynamic language, that runs on a JVM, that runs on a OS designed 40 years ago for a completely different purpose, that runs on virtualized hardware. Does this make sense? We've talked about this idea before in Machine VM + Cloud API - Rewriting The Cloud From Scratch, where the vision is to treat cloud virtual hardware as a compiler target, and converting high-level language source code directly into kernels that run on it.

As new technologies evolve the friction created by our old tool chains and architecture models becomes ever more obvious. Take, for example, what a team at USCD is releasing: a phase-change memory prototype - a solid state storage device that provides performance thousands of times faster than a conventional hard drive and up to seven times faster than current state-of-the-art solid-state drives (SSDs). However, PCM has access latencies several times slower than DRAM.

This technology has obvious mind blowing implications, but an interesting not so obvious implication is what it says about our current standard datacenter stack. Gary Athens has written an excellent article, Revamping storage performance, spelling it all out in more detail:

Computer scientists at UCSD argue that new technologies such as PCM will hardly be worth developing for storage systems unless the hidden bottlenecks and faulty optimizations inherent in storage systems are eliminated.

Moneta, bypasses a number of functions in the operating system (OS) that typically slow the flow of data to and from storage. These functions were developed years ago to organize data on disk and manage input and output (I/O). The overhead introduced by them was so overshadowed by the inherent latency in a rotating disk that they seemed not to matter much. But with new technologies such as PCM, which are expected to approach dynamic random-access memory (DRAM) in speed, the delays stand in the way of the technologies' reaching their full potential. Linux, for example, takes 20,000 instructions to perform a simple I/O request.

By redesigning the Linux I/O stack and by optimizing the hardware/software interface, researchers were able to reduce storage latency by 60% and increase bandwidth as much as 18 times.

The I/O scheduler in Linux performs various functions, such as assuring fair access to resources. Moneta bypasses the scheduler entirely, reducing overhead. Further gains come from removing all locks from the low-level driver, which block parallelism, by substituting more efficient mechanisms that do not.

Moneta performs I/O benchmarks 9.5 times faster than a RAID array of conventional disks, 2.8 times faster than a RAID array of flash-based solid-state drives (SSDs), and 2.2 times faster than fusion-io's high-end, flash-based SSD.

The next step in the evolution is reduce latency by removing the standard IO calls completely and:

Address non-volatile storage directly from my application, just like DRAM. That's the broader vision—a future in which the memory system and the storage system are integrated into one.

A great deal of the complexity in database management systems lies in the buffer management and query optimization to minimize I/O, and much of that might be eliminated.

But there's a still problem in the latency induced by the whole datacenter stack (paraphrased):

This change in storage performance is going to force us to look at all the different aspects of computer system design: low levels of the OS, through the application layers, and on up to the data center and network architectures. The idea is to attack all these layers at once.

In Revisiting Network I/O APIs: The netmap Framework, written by Luigi Rizzo, the theme of a mismatch between our tools and technology continues:

Today 10-gigabit interfaces are used more and more in datacenters and servers. On these links, packets flow as fast as one every 67.2 nanoseconds, yet modern operating systems can take 10-20 times longer just to move one packet between the wire and the application. We can do much better, not with more powerful hardware but by revising architectural decisions made long ago regarding the design of device drivers and network stacks.

In current mainstream operating systems (Windows, Linux, BSD and its derivatives), the architecture of the networking code and device drivers is heavily influenced by design decisions made almost 30 years ago. At the time, memory was a scarce resource; links operated at low (by today's standards) speeds; parallel processing was an advanced research topic; and the ability to work at line rate in all possible conditions was compromised by hardware limitations in the NIC (network interface controller) even before the software was involved.

There's a whole "get rid of the layers" meme here based on the idea that we are still using monolithic operating systems from a completely different age of assumptions. Operating systems aren't multi-user anymore, they aren't even generalized containers for running mixed workloads, they are specialized components in an overall distributed architecture running on VMs. And all that overhead is paid for by the hour to a cloud provider, by greater application latencies and by the means required to overcome them (caching, etc).

Scalability is often associated with specialization. We create something specialized in order to achieve the performance and scale that we can't get from standard tools. Perhaps it's time to see the cloud not as a hybrid of the past, but something that should be specialized, that it's something different by nature. We are already networking transform away from former canonical hardware driven models to embrace radical new ideas such as virtual networking.

You mission, should you choose to accept it, is to rethink everything. Do we need a device driver layer? Do we need processes? Do we need virtual memory? Do we need a different security model? Do we need a kernel? Do wee need libraries? Do we need installable packages? We are stuck in the past. We can hear the creakiness of the edifice we've built layer by creaky layer all around us. How will we build applications in the future and what kind of stack will help us get their faster?

Related Articles

Joe Brockmeier (@jzb) reported Red Hat Goes After VMware Hard with Red Hat Enterprise Virtualization 3.0 in a 1/18/2012 post to the ReadWriteCloud:

imageRed Hat Enterprise Virtualization (RHEV) 3.0 has been in the works for some time. Today Red Hat took the wraps off the release. Red Hat boasts more than 1,000 new features with RHEV 3.0, including a new user portal for self-provisioning, local storage and converting the management application to a Java application that runs on JBoss. With RHEV 3.0, Red Hat is going straight after VMware for customers.

rhat-logo.jpgRHEV 3.0 has been in beta since last August, and an open beta since September of last year to anyone with a Red Hat Network account.

If you look at many of the major features in RHEV 3.0, you'll see many come directly from improvements to the Linux kernel and KVM. RHEV 3.0 now has support for up to 160 logical CPUs and 2TB of RAM. The KVM networking stack has moved into the Linux kernel itself and out of userspace for better performance. RHEV 3.0 now supports memory overcommitment, which allows allocation of more RAM to VMs than is present to physical host.

Red Hat has also beefed up its scheduler, live migration, desktop management, storage management, reports and migration tools. But where Red Hat is really getting aggressive is pricing and messaging targeted at VMware's vSphere Enterprise and VMware View.

RHEV Pricing

Red Hat offers pricing guides for its RHEV for Servers and RHEV for Desktops that compare the pricing between RHEV and VMwares products. According to Red Hat's guides, its pricing scenario for 100 virtual guests, using six servers (each with two sockets and 400GB of RAM) will cost nearly $50,000 the first year for VMware vSphere Enterprise Edition. The same setup for RHEV 3.0 for Servers runs just less than $9,000.

six-server-pricing.png

The big difference in pricing, of course, is licensing. Red Hat doesn't charge for licensing – it charges for annual subscriptions and support. The licensing cost for VMware vSphere is nearly $40,000. The annual support/subscription costs for Red Hat and VMware are fairly close: $8,988 for Red Hat, and $9,877 for VMware. Red Hat's still cheaper than VMware on that, but not by much.

11-server-pricing.png

Another scenario with 11 servers for 250 guests is priced at $16,478 (Red Hat) versus $189,742 (VMware) for the first year. Red Hat continues to close the gap in features between RHEV and vSphere, but has a very wide gap in price. The question is, who's buying? Is RHEV good enough to start displacing VMware vSphere and VMware View?


<Return to section navigation list>

0 comments: