Wednesday, July 24, 2013

Machine Learning as a Service gaining in popularity by those that matter - the implementers

I had the opportunity to speak on a panel last night on the topic of Machine Learning as a Service.  My fellow panelists were from BigMLSnap Analytx, and Grok.
One of the questions I was asked before the event was “What needs to happen for broader acceptance of Machine Learning as a Service?”  My answer is below:
“I think this hinges on a broader understanding of how intelligent algorithms like machine learning can add value.  This means greater discussion of successful use cases on how the use of these algorithms has a meaningful impact, especially as compared to traditional approaches.   I’m looking forward to the day when there are so many compelling stories that Amazon and Netflix aren’t one of the first examples people use to talk about Machine Learning.
I have noticed that an increasing number of people we talk to have spent some time educating themselves on this concept, and it’s not all techies.  Many are being exposed to the topic as part of the broader discussion around Big Data. Their analysis is evolving from  “How would I use this in my organization?” to “How do I fit this within the budget of my organizational constraints?” People are wondering if it is something they are going to build out internally or if there is a way for them to have access to some of the technology without building out internal resources. Those are the people that we really enjoy talking to.”

Saturday, July 20, 2013

Two Technology Hurdles We’ve Overcome

I recently conducted an interview about  One of the questions asked was “What were some of the biggest technology challenges you had to overcome?”  My answer is below:

First off, what we’re doing today is really enabled by the latest advances in cloud computing.  Even 18 months ago it would have been nearly impossible to build this, and all of the people able to do it would have been locked up at one of the large technology companies

That said, there were two really difficult technical challenges to overcome: abstracting algorithms into reusable components and building a cross-cloud orchestration layer.

Being able to abstract and normalize algorithms written in different programming languages into a standard format was an interesting challenge.  We can currently take in algorithms written in R, Java, PHP, Python, C, as well as MapReduce and PIG functions for Hadoop and chain them all together without having to write custom glue code.  This approach makes our system infinitely customizable, and allows our customers to leverage the best open source code along with their own proprietary algorithms.   The net impact is companies are able to get more leverage out of their current data science team, or for those without a data scientist, they can start using these tools with limited ramp-up.

Our cross-cloud orchestration layer is also a useful piece of technology that’s pretty unique.  The platform currently sits across Rackspace, Amazon Web Services, Google Compute, and HP Cloud, and we have Microsoft Azure in our product roadmap.  This allows us to process our customer’s data in whichever cloud environment they are currently using.  We’re able to plug in new database and execution environments and dynamically spin up/down execution resources (i.e. Hadoop cluster, R server, Java server, etc) in each of those clouds.  With our alpha system alone that is over 55,000 unique data analysis combinations to manage PER CLOUD.  We do this all today programmatically, and it’s getting better all the time.

Friday, July 12, 2013 Now Support Async Calls

We have just released new functionality that allows someone to make an asynchronous call with any of our algorithms via our API.  Prior to this every algorithm’s API call is a synchronous call which means if the algorithms takes 5 minutes to run, the API call will hold the connection for 5 minutes.  This is not ideal in some situation.  This now asynchronous call functionality allows you to run the same algorithm but instead of waiting around for it to finish, the system will return a job id to you and you can use that to query for the status of the job.
Let me show you how this works.  It is not too different from before.
Making an asynchronous call.  The only thing we are changing in the call is the parameter “method” is now set to “async”
curl -X POST \
-d 'method=async' \
-d 'outputType=json' \
-d 'train=3339' \
-d 'test=3340' \
-d 'dependentVariable=closed' \
-H "authToken: <AUTH_TOKEN>" \
This call will return immediately.  Using the “job_id” you will make a query to checkout the status of this job.
curl -X GET \
-H "authToken: <AUTH_TOKEN>" \
    "additional_info": {
        "final": {
            "output": {
    "datasource": {
    "created": {
        "date""2013-07-09 22:20:29",
    "last_modified": {
        "date""2013-07-09 22:20:29",
The status will be in various states depending on the algorithm.  There is one final state “ERROR” or “COMPLETED”.  Once you see this, it is the final output.  The results from the algorithm is placed into a datasource for you.  The datasource id can be found in two places in the query job return.  Both will have the same ID.
Thats about it.  Making asynchronous calls easy!

Saturday, July 6, 2013

The increasing importance of Algorithms: Article by The Guardian's Leo Hickman

The Guardian’s Leo Hickman published a new article this week about the increasingly pervasive use of algorithms in today’s data driven world.
You can find the full article transcript here:
The article included an interesting quote from Chris Steiner, author of Automate This: How Algorithms Came to Rule Our World.
Steiner argues that we should not automatically see algorithms as a malign influence on our lives, but we should debate their ubiquity and their wide range of uses. “We’re already halfway towards a world where algorithms run nearly everything. As their power intensifies, wealth will concentrate towards them. They will ensure the 1%-99% divide gets larger. If you’re not part of the class attached to algorithms, then you will struggle.”
In today’s digital world, more businesses than ever before have access to data about their customers and operations.  Much more is to come as Silicon Valley has invested over $1B in Big Data since Q2 2011 (  The opportunities to turn this resource into revenue will continue to evolve.  Those business that are able to do this with algorithms will get the most leverage, and as demonstrated in Ecommerce with recommendations, will set a new standard that all others must follow or fail.