Mosami: providing sophisticated, complex video processing via a simple, transparent API

APIs are really about dramatically scaling the number of other people who can make use of your core idea.”

Francis ZaneFrancis Zane (fzane@mosami.com) is a co-founder of Mosami, who has in past lives worked on complexity theory, web content delivery, and venture incubation. To see what he works on now, see http://blog.mosami.com


Tell us more about the Mosami API
Video conferencing has been shifting from something only a few experts could build to something integrated into tons of applications, and even appearing as a feature in web browsers, through Flash and webRTC.

With that shift, we see an opportunity to not just transport the video, but to let developers shape real-time video in the way that TV and movie producers and desktop video editing programs have done with offline video. We want to give people the ability to create custom video experiences for different use cases –social gaming, tech support, social content, and more.

To do that, we provide sophisticated, complex video processing via a simple, transparent API. We created a set of media processing building blocks—algorithms to detect speech, faces and movement, to remove the background, to compose videos together, to overlay graphics, and more. And we’ve constructed a platform which developers can use to plug together all these advanced tools in novel ways to create their own video interactions.

What does the Mosami API offer?
Our API lets developers create imaginative and useful video experiences. We use REST because it easily integrates into many platforms and in many languages. The platform hides much of the complexity of real-time video processing platform, and gives just the right amount of flexibility so it can be used in interesting and novel ways. We also provide code samples for sending video between web browsers and the cloud that can be embedded in our customers’ web pages.

Why did we choose a platform-as-a-service model? Real-time video is still a hard problem that requires a specialized set of skills. Multimedia code is complex, hardware-sensitive, difficult to configure and tricky to manage. If the barrier to entry is being able to deal with that, we felt it would leave a lot of creative people and ideas on the table. Our API is designed to remove all the technical difficulties that have made multimedia programming so complex, and present to developers a simple, lightweight, yet powerful model that fits into their own development patterns and needs.

When did Mosami realized that it needed an API?
From very early on, we were convinced that the technology could enable many, many different kinds of new video applications. At the same time, there were only a handful of us working on it, and only so many applications we could develop ourselves. So the question for us became, how do we get lots more people involved? The answer was to make something that’s unbelievably easy to use, seamlessly scales from garage projects to enterprise deployments, and is loaded with cool features. So we’re chasing that goal. Running as service in the cloud with APIs, rather than as software libraries or applications to install, really let us simplify the process of building your own video application. That opens us up a world of people who would not have become involved before, including people who want to focus more on the experience than the technology

What recommendations and tips would you give to a company planning to launch an API?
The best way—really, the only way—to understand your API is to build apps yourself. At the beginning, you’ve probably thought more about your capabilities and how to use them than anyone else, and you can help your developers get started by sharing that insight through sample applications. Never underestimate the power of simplifying your API, and then simplifying it again. Stick to standards when they make sense, because it’s easier to find support for existing tools than to build your own knowledgebase. Make sure the API is explicit and transparent—by that I mean the developer should intuitively understand how and why the API is designed the way it is. Document everything, not just because your users will appreciate it, but because it forces you to smooth out any inconsistencies, and to see the API from the user’s perspective. It’s often easier to fix the API than describe why it’s quirky.

Why and how are you using 3scale API Management Platform today?
We’re leveraging the 3scale platform do to monitoring, limiting, and billing. We provide a rich API for a fairly complex service that includes management of real-time streams, and a large library of operation a developer can apply to those streams in unlimited combinations. So we need to track multiple categories of resources carefully, and the 3scale monitoring gives us and our developers a good top-level view of what’s happening.

What is your vision for your API?
We want to provide an essential building block for real-time video composition on the web. We’ve put a lot of effort into tuning our system so that large numbers of complex algorithms snap together as easy as Legos. Looking ahead, we have lots of new features in the pipeline, including 3D composition and rendering, object detection and tracking, richer support for gestures and motion and more, and we see many more areas to expand our functionality over time. Our vision is that the API makes new functionality simple for our customers to use, so they already understand how to integrate something new into their product as soon as we release it. Our API is the part of the product our customer touches, so it’s the best place to communicate that we really have something simple and powerful they can build great things with.


Mosami’s API gives you easy access to a whole suite of real-time video processing algorithms – face detection, background extraction, graphic overlays, programmable composition, and more – running on our servers in the cloud plus easy tools for your users to connect their webcams to these applications right from their browsers.