How we use Amazon & Google experience to develop Horizon, a Ukrainian SaaS iGaming platform for 45 million users

How Horizon was created

Back in 2016, when we started working on a SaaS iGaming platform, it was a serious challenge. Our company had to compete with the dinosaurs of the niche, if not in terms of budgets, then in terms of technology.

VeliTech was a small startup then with an energised team. “I don’t know how we were so lucky to gather such motivated and highly skilled people focused on results,” says our Technical Lead, Serhii Volokh. We all immediately tried to choose technologies for the platform that would allow it to grow and scale quickly.

It turned out to be the right decision: The SaaS platform developed quite rapidly. A year later, our team received a Maltese gaming licence from MGI, the platform went into production, and we received our first customers. Already in 2017, 300,000 financial transactions and 1,000 registrations per day took place on the platform. Today, there are more than 45 million users, up to 50,000 registrations and thirty million transactions daily, and the platform has grown to 350 services and 30,000 games.

And we laid the technological foundation for this development at the very beginning of the project.

How we formed the stack

Before starting work on the platform, we considered how its architecture should be designed to be as flexible as possible, easily adapt to changes (both business and technical), and support scalability, at least in the most loaded parts.

Serhii Maiakov
CTO at VeliTech

We laid the foundation for development and scaling from the very beginning of the project. Three fundamental things: microservices, DDD, and a reactive approach. As for microservices, we immediately decided it should be this way: Event-driven architecture is the primary thing that allowed us to scale and actively develop the project rather than drown in monotonous refactoring. Nowadays, this is hardly surprising, but only a few specialists knew how to do it correctly back then.

I won’t say everything worked out perfectly right away, but we quickly tested ideas with this approach. For example, if we needed to create a new feature, we created a separate service or a functional decomposition of an existing one, and tested it immediately.

There is an alternative approach: with detailed calculations, documentation at every step, and reinsurers, but with flexible processes – Extreme Programming (XP), iterative processes, agile architecture – we have been able to and continue to launch and test new features and receive feedback quickly. This is important for a platform with hundreds of microservices.

Horizon maintained the spirit of a startup: we were always experimenting and choosing technologies that were both hype and best suited to our tasks. We are still doing this today. We have the most complex critical services in Scala, Node.js is widely implemented, and micro-frontends are built on React. Everything related to recommendation systems and reporting is in Python.

This experimental approach also meant that we tested and rejected specific ideas. It worked like this:

– We started with a Java-Scala hybrid, but now only a few services are based on it, and we are rewriting them. And it’s not because Java is bad in itself; it’s a purely practical decision: our main framework is Akka, and it’s based on Scala. Plus, Scala is very convenient for implementing powerful DDD units and pure event-driven services.

– We experimented with different databases, eventually rejecting Mongo and MariaDB and choosing ElasticSearch, Postgres, ClickHouse, and BigQuery.

We started out running everything on Docker Swarm, only looking at K8s, but five years ago, we switched to it completely. This added many useful features related to scaling, CD processes, and reliability to our servers and the platform as a whole.

Initially, the platform was cloud agnostic, and some services were kept locally. However, as the platform grew, we moved everything to AWS infrastructure, which positively impacted performance and significantly improved availability. Now, we use AWS to the maximum: spot instances, RDS, balancers, new Graviton processors, Serverless approaches, and more.

Due to this flexible approach, we have never had a technological bottleneck: We either immediately selected the right stack for the tasks and goals or changed it quite quickly. For example, the transition to Scala (about 60% of the entire platform now) gave us access to a powerful Scala community, so we always have interesting tasks on the one hand and enthusiastic people who are ready to think about them on the other.

In addition to a flexible approach to the technologies that power the platform, our team applies project and management practices from the experience of Google, Amazon, and Netflix.

How to develop a million-dollar project based on Google and Amazon practices

While the team used to focus only on the result, with the development of the platform and the company’s growth, the very concept of “result” has become blurred. So the focus shifted from “just the result” to “the result and the process of achieving it.”

To better organise the processes, we used the practices of FAANG companies, removing all the bureaucracy and leaving only practical things, for example:

Site reliability engineering (SRE) is an approach used by Google and Amazon. It is a set of engineering principles that allow companies to deliver reliable, highly available online services.

“SRE approaches are like a pyramid with the product at the top, but everything starts from the bottom, with Monitoring and Alerting, Root Cause Analysis, Technologies and Best Practices, Capacity Planning and Development. For all of these levels to work effectively, you need a broad vision of the entire system, an understanding of all levels of infrastructure, transitive dependencies, and the nature and specifics of the load.

We are also completely changing the focus of error analysis. For example, we have an SRE board where you can see the state of the entire platform, no matter how huge it is. And if our support team responded earlier to incidents that had already occurred, now we try to react beforehand. Each incident goes through a serious Correction-of-errors (analogous to Amazon’s post-mortem), which is an analysis of the causes and consequences of what happened,” explains Serhii.

Such an analysis does not just show the causes of a problem. It shows its impact, which every team member can see and understand. It also formulates action items – the steps needed to prevent the problem from happening again. All action items are converted into tasks in Jira.

The team also actively uses the 5W approach. These are the five Why’s, five questions that remove all superficial explanations from each problem layer by layer and get to the heart of the matter.

The company has drawn up a huge roadmap on how to reach the required SLA (Service Level Agreement) for the most critical services.

All of this gives each team member a common understanding of where the company is heading and how its services should develop, and provides a shared vision of what everyone is working on and why.

What challenges does the team face

The load on the iGaming platform can fluctuate seven times, and the number of requests to the platform reaches `10000 per second. For the team, this means non-trivial technical and business challenges that need to be adapted to.

Serhii Volokh,
Technical Lead, explains how the team solves complex problems:

When we were just starting to build the platform, the only way to differentiate ourselves from huge but sluggish competitors was to make a product in a year that would take our competitors 5 years to build. To do this, we chose non-standard approaches and a stack that were not standard then and hired top engineers. This also meant quite serious technological challenges for us.

The load on the platform at peak times reaches 6000-10000 requests per second, our critical services process up to 30 million financial transactions daily, and we break records in terms of traffic and finances every month. Of course, such dynamics imply constant challenges.

The company and the business are completely tech-driven, and tech forms the structure and determines the business’s ability to scale. At VeliTech, our engineers focus on both the tech and business parts of product development. I, as a tech and team leader, am involved in the product oversight of my domain and form a vision for its development. This is a completely different level of involvement and motivation.

For example, one of our clients wanted to build large-scale marketing campaigns on our platform so players could receive a reward after fulfilling certain conditions and get directly to another campaign. These had to be different entities: connections to 10 campaigns, 10 screens and configurations, sometimes even more.

Given the thousands of player segments and campaigns, it would have been extremely difficult to manage. I came up with a different concept: “Multi-level campaigns”: one screen and many configurable levels. This exponentially reduced the operational complexity of using the product. In fact, I, as an engineer, brought business profit, simplified the client’s needs as much as possible and implemented them in tech. This is our approach to complex tasks.

Another huge challenge for us was to build the platform on a microservice architecture. In 2016, when microservices were just starting to be talked about in Kyiv, we all actively followed foreign tech blogs of FAANG companies, Twitter and Reddit, so we were aware of all the trends.

We borrowed many microservice architecture ideas from Netflix. We started using Kafka and event-driven design, which allowed us to scale the business and build a platform on hundreds of microservices. This allowed us to create a highly customisable product in the UI and backend.

However, the most difficult challenge initially was making an essential mental reincarnation. Replacing outsourced approaches, when an engineer does only what they are told, with a conscious approach: when an end-to-end engineer is responsible for his or her work, and sometimes the entire product, he or she realises his or her value and importance to the product more than ever. It took us 9 to 12 months to make the cognitive change, but we succeeded, and at the same time we moved from MVP to production.

Thanks to interesting technological challenges and this kind of awareness, the company has an incredible retention rate, and we have many people in the team who have been working with us for 6 years or more. Our product is constantly growing, and we as engineers need to grow with it, breaking through the ceiling of our knowledge and capabilities every time.

Why people stay in the company

Vadym Doloka
Technical Lead

Vadim joined the company 8 years ago; he started as a Frontend developer and worked in the iGaming back office.

Firstly, almost from the very beginning, we were implementing hype technologies that everyone wanted to work with. But we didn’t just do what was “on the radar”, we chose technologies based on how they would be applied to our project. In this regard, we had freedom of action and freedom of choice from the very beginning, and nothing has changed so far.

Let me give you an example. We started with a monolith for our UI on React + Redux, then we started writing Backend For Frontend (BFF) for ourselves, which was a GraphQL server using Apollo Server, started to abandon Redux and used more Apollo Client with its states and caching, and then when our monolith became too big, we decided to split it into many small UI components or routs (we switched to micro front-end) that work on the basis of WebComponents (when we started, there was no Webpack Module Federation). We also started splitting our GraphQL into microservices using Apollo Federation.

Another important thing is that everyone can influence how the entire product works. For example, we have a version-based approach, and each environment consists of versions of different services. We have hundreds of services, and at some point it became difficult to go between environments, check and reconcile different versions. I don’t like routine tasks, so I made a UI divided into team domains, where anyone can see the version difference. At first, my team used this UI only, but later, the entire product team started using it. It’s incredibly cool to see the real impact of your work.

Accordingly, we hire people who also care about this. In interviews, I care less about how much a person knows about theory; I care about how they think, and we focus on how the candidate approaches problem solving.

We are a good fit for cheerful people who thirst for new knowledge and are interested in developing and influencing the entire product. We always welcome suggestions that could optimise our work. Most of the ideas for developing our back office come from our front office team. Everyone tries to offer more convenient and elegant solutions, which ultimately affect how pleasant it is for the user to use our product.

Because of this constant involvement and concern, people are growing rapidly: three of my team started as coders and became Frontend developers.

Another interesting thing about our team is that whether you are a JavaScript, Scala, or React developer, you have to understand infrastructure at least a little bit. Each of our front-end developers knows a little bit about K8s and can look at logs on a GraphQL server. You can learn a lot of things that DevOps just won’t let you in a regular company.

But here, we do it this way: Are you interested in understanding? Here’s a green light for you to do so, study, and attend conferences. In other words, every engineer can improve both as a system architect and designer. We hire people who love to learn and are not afraid to experiment. The company always supports this.

How Horizon was created

How we formed the stack

How to develop a million-dollar project based on Google and Amazon practices

What challenges does the team face

Why people stay in the company

Keep Exploring: Suggested Further Reading