Spotlight: Meet Juraj, our CTO
In our latest Spotlight interview, we sat down with Juraj to learn about his role, his passion for technology and the challenges of being the CTO at datasapiens.
You are the CTO at datasapiens; tell us what does a CTO do?
I do not know what a generic CTO does, but I aim to be useful 😀.
In our tech team, I fill the role of a data engineer and DevOps engineer. Also, I investigate and test new tech stacks and together with my teammates, we form the tech roadmap for our team.
Each team member has a domain of expertise in the software engineering realm and handles that domain when a problem or feature of interest must be addressed. We aim to teach ourselves to be as independent and self-managed as possible. In this way, we try to work as a ‘distributed mind’ where each person addresses a different aspect of a topic. Minimizing the presence of a team or company hierarchy is effective if we want to scale up in terms of delivered feature volume.
How do you keep up with the fast-changing world of technology?
Together with other colleagues, we watch several blogs and news channels for modern technologies, software stacks, architectural design ideas etc.
We closely watch the news from our cloud provider since sometimes, a new feature can fix an existing problem, reduce costs for our infrastructure, or help us replace another tool or tech stack while also bringing more features.
Since we have adopted the Hadoop ecosystem (HDFS, Hive, Spark, Alluxio, Trino, Pinot and others) as our core stack, we stay connected with tech blogs from most of the current well-established companies using this ecosystem (Google, Amazon, Netflix, Uber, LinkedIn, Lyft). They are, of course, on a different data scale, but the problems we face now have already been solved and adequately documented. And most of the projects from the Hadoop ecosystem come with an active community, which you can reach for help if needed.
Apart from the infrastructure and whole software stacks, we also keep in touch with various frameworks and libraries for the programming languages we use to develop our internal tools.
What do you like to do in your free time?
In my free time, I do most of the keeping up with the technology news 😊. But apart from that, I socialize with my friends and do some minimum viable physical activities like running and gym training.
Also, I read books on topics like economics, machine learning, global trends, and self-development.
What do you find most challenging in your role?
The biggest challenge currently for me and for all our team is prioritization. Three essential ‘item buckets’ require prioritization – the new technologies bucket, features bucket, and issues bucket.
The new technologies bucket is, unsurprisingly, the most exciting set of items. Here we always want to try out new and popular software tools. However, with this bucket, there is always an involved amount of risk, that a newly tested software tool will not bear its expected fruit, meaning after first testing, we will find out that it doesn’t add value to our current tech scope. But that is expected in any trial-and-error process. Here, we regularly give a tiny part of a standard sprint to do innovations/investigations.
The features bucket forms a vital part of our tech roadmap. The features are adequately defined, and their solution is thoroughly described. Here we often tend to deprioritize due to more pressing issues from the issues bucket.
The issues bucket is the most often addressed since we always get an inflow of several contemporary issues. It might be the least exciting bucket for prioritization, but it is the most inspirational for implementing new features and probing modern technologies.
So, the issue is what to prioritize: To solve a specific issue directly and quickly, or invest time and bear the risk of hitting a roadblock with new technology? To develop a high-value but customer-specific feature or solve a low-value but generic bug?
We always struggle to find some balance between these buckets. We don’t always succeed, but we have made progress on this. Still, there is a lot to improve 😊.
What are you passionate about in the world of technology?
I would highlight two things here: the variability of technologies and the open-source communities.
The variability of technologies often offers several workable solutions for a given problem. Then, the task is to choose which technology is the most suitable for the problem. This is the most exciting but sometimes challenging part of the choice process. Each technology comes in with several trade-offs – advantages and shortcomings.
Quite recently, we were deciding on an OLAP stack to which we would gradually migrate our data retrieval workloads. The candidates were Apache Druid, ClickHouse, Apache Pinot and Apache Kylin. After some testing time and going carefully through the documentation, roadmap, and the activity of the developer community of each candidate technology, we arrived at the conclusion that Apache Pinot is the most suitable candidate. But reaching such conclusions in these cases was lengthy since one must consider implications in medium and long-term time horizons.
I like how vibrant open-source communities can be and how critical their role is in software development. From our experience, an open-source software project is 1/3 the software code and 2/3 the community behind that code. Thus the community is far more critical than the current state of the given software project.
For example, some time ago, we investigated a promising technology for our use cases. It had quite an intriguing set of features; if implemented, it would accelerate our data retrieval processes tremendously. However, it had a very isolated community, poor documentation, and little integration with other stacks. However, we have found a direct competitor to this technology, one that has a very vibrant community, is eager to collaborate, has nicely written documentation, and aims to be as open and available as possible. Both these technologies have many innovative ideas in them, but it is clear that the second one is poised for long-term success and growth.
Another aspect of open-source communities is the importance of taking part in them, one can quickly start by highlighting bugs and issues and proposing features or improvements. In an active community, when you, for example, post a bug, someone from the project’s committers (people actively writing code for that project) will take it and investigate the issue in a matter of few days and will try to incorporate the fix as soon as possible into a nearest version or patch release.
With our participation in open-source communities, we have started small – highlighting a few low-level specific issues within a given software project like Alluxio or Trino.
For Alluxio, we pinpointed a simple use case, that can reduce cloud infrastructure costs under frequent conditions appearing in cloud data lakes.
In the case of Trino, we want to integrate Pinot with Trino fully. However, this is challenging due to the significant differences between the two systems. Several people within the community aim to create this integration, and we are in close contact with them.
Where do you stand on cybersecurity? Coding? Programming languages?
Good cybersecurity is necessary irrespective of the size or stage of the company that you work for. We are undergoing a small transformation within the company to become compliant with ISO27001, which is one of the most common cybersecurity certificates. It has many aspects, and obtaining it is not trivial since you need to, among other things, undergo penetration testing and have a review from an external security auditor. But obtaining one is a sign that the company has become more mature and well-established.
Regarding coding and programming languages – in our company, we use many well-known programming languages – Javascript, Golang, Python, Scala, and Java. For our internally developed stack, we use Javascript and Golang; for the Hadoop stack, we use Python, Scala and Java. And since Python has become the lingua franca of many ML frameworks, we also use Python in our internal ML initiative projects.
I often code almost daily, mainly in Python and Scala, since I often work with Apache Spark. I also write a lot of configuration files (since I am a DevOps guy 😊). In my earlier company, I coded in Golang as a backend developer.