Unlocking mass adoption of high-end graphics through cloud computing

GPUs are the main component enabling high-end graphics applications, including AI applications, high-end gaming experiences, CAD and rendering software, XR and much more. Usually, this means users need to own high-end GPUs in order to access these types of applications from their devices.

But the thing with GPUs is that they have this tendency to be pricy, power-hungry bits of hardware which not all devices can accommodate. Requiring high-end GPUs for all of our user’s devices would restrict our total addressable market to a very narrow user segment.

Cloud computing to the rescue

With the advent of public clouds like AWS and others, software makers (i.e. us!) can now offer users cloud-based products where advanced hardware requirements are lifted from client devices and GPU workloads can be largely offloaded to the cloud. Pretty neat, huh?

Even though our experience building these kind of applications has been extremely positive, this architecture style does come with some pitfalls.

What follows is an inside look at some of the challenges we ran into as well as our overall outlook on this very promising technology.

(Spoiler alert: we love it ❤️).

A primer on cloud-based GPU architecture

In a nutshell, the way we offload computer graphics onto the cloud is by:

Deploying our software to a cluster of GPU-enabled servers in the cloud.
When a user starts a session, we run the app on one of our servers on demand.
The server receives user commands via the internet and updates itself accordingly.
Then we capture the application’s video buffer in real time.
Finally, we stream back the video buffer to the user’s device over a websocket.

Technical note: This works very differently from other streaming product architectures (for example, OTT platforms like YouTube, etc). Since latency is a critical factor, there’s no video buffering in place. That also means there’s no adaptive streaming since that would complicate the stack (although it would certainly be possible to implement this on top). There’s also no need to implement a DRM scheme to protect content since websockets support encryption at the SSL level.

Enabling the metaverse: A case study of what cloud GPUs can help accomplish

The metaverse has been all the rage lately. Still, a lot of popular metaverses are extremely lacking in graphics quality, and for good reason: They are trying to cast the widest possible net for users by making sure their platforms can run on severely underpowered devices, making them look like 80’s video games: Accessible, sure… but kinda underwhelming.

For our first metaverse project we took a different approach. Our client required exciting, console-level graphics, so we decided to develop it using Unreal Engine given its superb world-building features and its unparalleled graphics quality.

The main goal – besides knock-your-socks-off graphics – was to make the experience as frictionless as possible for users.

This meant:

No powerful GPU required.
No need to download a huge 10GB+ game binary.
No installation required.
Transparent software updates with no downtime.

By running the entire multiplayer experience in the cloud we can remove all of this friction for users since all they have to do is go online on consumer-grade devices and connect to a server.

Easy peasy, right? Well, not exactly…

The cloud is not all sunshine and rainbows (yet)

The following are some of the main challenges we faced during development. The good news is they were all solvable!

Fast or furious?

One of the main concerns we had going in was network latency. Would users be happy with running a game remotely on a server given the latency introduced by the internet?

To our surprise, our tests were incredibly encouraging. We experienced near zero latency despite having our servers in the US and our development team in Buenos Aires. Input feedback was pretty much instantaneous with no noticeable delay.

We truly are living in the future!

Not all networks are made equal

One of the main issues we faced is that under some network conditions, websocket traffic can be blocked due to NATs and subnet configurations. Also, a lot of mobile networks block WebRTC traffic. Needless to say, this was far from ideal.

Solutions to these issues come in the form of both STUN and TURN servers. These special servers can help negotiate traffic between the servers and client devices so that websocket connections can work as expected.

Unfortunately, this comes at the cost of cloud infrastructure complexity but it is a must if we want to guarantee service availability under the widest possible network conditions.

Cloud compute costs

Another rather obvious question has to do with the cost of running high-end applications on the cloud. Wouldn’t it be prohibitively expensive to do so?

As of this writing, the price of a dedicated cloud server boasting an NVIDIA T4 GPU, 4 CPU cores and 16GB of RAM is $1.58 per hour (not including any other infrastructure costs like data transfers, storage, etc). And what’s best, compute costs keep going down over time.

So depending on the product’s pricing strategy, this makes it pretty darn affordable on a per-user basis!

Will it scale?

Finally, the issue of scale. Can this type of architecture scale to thousands, or even millions of users?

Reaching this level of concurrency has many architecture implications beyond just cloud server availability. But leaving those aside, getting enough cloud servers would most likely require a multi-cloud approach where more than one cloud provider would have to be used. And even then, we would probably need to schedule user sessions so that they are not all happening at the same time.

As an example, AWS puts a limit on how many dedicated GPU-enabled servers an account can spin up. We don’t really know how many total GPUs AWS has (although we’re pretty sure it’s not an infinite number!).

Another possible strategy would be to create a downloadable version so that not all user session run in the cloud, alleviating some of this pressure, and obviously bringing down infrastructure costs.

Summary

In this blog post we wanted to share our experience on what the process of moving GPU computing to the cloud looks like in practice. We found out it’s not only possible but also cost-effective to do so.

We really believe this is just the start. Cloud-based solution like these can go well beyond this particular case study. Countless other GPU-centric applications could benefit from this architecture, pushing further into other domains like artificial intelligence, video editing, 3D authoring software, and so much more.

We are incredibly excited of all the possibilities this will open up in the near future.

And so should you!