The Economist cover announced: The world’s most valuable resource is no longer oil, but data. Analytical insights are a critical competitive advantage for anyone. The most valuable companies harness massive unique datasets or provide processing capabilities.
Data keeps growing faster than budgets. So, to accommodate that, data processing prices should decrease yearly. Thanks to Moore’s law, hardware enjoys this dynamic, but software licenses stay the same or are more expensive.
Data got its gravity. Once a company adopts a data warehouse, it will often use it for many years. The data warehouse vendors’ incentives are to be conservative with critical changes and not to decrease prices yearly.
Though data infrastructure is a vibrant ecosystem, the core principles of query engines are mostly the same. Though, every few years, there is a breakthrough:
- Google needed a magnitude lower cost to process the web data and provide search. The MapReduce paper in 2004 and Bigtable in 2006 inspired the tech world. Open-source implementations such as Apache Hadoop ecosystem and Apache Cassandra® took over the world.
- Dutch university CWI released cutting-edge data processing implementation—MonetDB in 2002 and VectorWise based on X100 in 2008. VectorWise was the fastest in the TPC-H benchmark. Sophisticated venture capital investor, Mike Speiser, recruited Marcin Zukowski as a co-founder of Snowflake.
- Though the Russian business ecosystem is challenging, many engineers and scientists are world-class, grinding for years on the most demanding problem—Yandex released of ClickHouse in 2016, which is several times faster in Star Schema Benchmark than alternatives (e.g. Druid). In just a year, it got adopted by industry leaders such as CloudFlare and has continued to raise the bar for many years in benchmarks.
Each of these revolutions gave birth to multibillion-dollar companies.
There is a lot of temptation to claim breakthrough performance based on gimmicks (e.g. edge case operation, different data schema or indexing). Though, material breakthrough requires years of hard work using rusty unsexy tech.
When Adam Szymanski told me in 2021 he started working on it several times faster data warehouses, our conversation centred:
- Will it scale up and down rapidly?
- Most database architectures assume static provisions. Scaling is infrequent.
- Today, load spikes are frequent, and the ability to scale is crucial. The stateless application on Cloud and Kubernetes can do that, and the databases must match that agility.
- Can you do massive joins effectively?
- Big JOINs are the most demanding operation. Many architects prefer to pay costs and feel the pain of duplicating data. We are going backwards to support a bigger scale, but we should solve the root cause and enjoy a clean schema.
- Do you get stamina for this ultra triathlon race?
- All great database warehouses require years of hard work before landing a first customer. The hardest thing about this category is to have patience and an elite group of top talented engineers.
- You go against principal startup wisdom. Launching a compelling data warehouse later is better than a minimal viable product.
- You need to be capital efficient. Building a product takes years, and enterprise adoption takes years, but eventually, you can become “overnight” success.
In our first conversation, I got great answers and said it sounded like a great idea and would love to invest. I rarely do angel investing in startups. Challenging game, and I only do it directly if I have personal information and a skillset edge. Oxla is unique, as I was the first investor.
It is about time to launch
Finally, three years of hard work by the team came to fruition. The benchmark data is juicy—five times JOINs than Clickhouse and two times faster GROUP BYs. The technical preview is open. You can try it for free.
Clickhouse will be hitting more headwinds with its quest to morph into a serverless database. Moreover, raising $300mln at a unicorn valuation gives you a lot of dilemmas about what to release open-source and what to keep behind your cloud.
For Oxla, the massively distributed world is a tailwind. The humble beginning gives plenty of options for what to do next.
In a stagnant economy, many leading data warehouse vendors may covertly raise prices, but the data infra budgets are shrinking—an excellent time to try something new.
The AI LLM is another tailwind. We will need more data processing as AI LLM can run 10+ times more queries for data retrieval before inferring a decent answer.