Modern Data Stack Landscape
Problems, evolving solutions and future outlook of modern data infrastructure industry
Before 2010, Oracle, IBM, and SAP were the top companies dominating as vendors in providing data infrastructure solutions (Warehouse, Storage, and Pipelines) for on-premise servers. As the cloud started to gain prominence, there were endless possibilities to understand customers, markets, and industries due to tremendous analytical and big data capabilities. Post-2012, enterprises started to realize the potential advantages of cloud in their income statement and balance sheet, as well as its true capabilities, data infrastructure seen as Capex investment, can be an Opex improving the EBITDA significantly. According to McKinsey, 80%+ senior leadership in companies have shifted their focus to adopting cloud solutions and transforming them into data-driven companies. Data infra landscape started to boom with startups flouting with their products solving problems related to cloud-native pipelines, analytics, big data operations, and so on...
Data flow in infrastructure can be viewed in 5 steps:
Each step has a different process and has its own complexity in adhering to the business requirements. For example, Uber has redefined their architecture and frameworks twice in 4 years as there were new requirements like latency for big data<1 hr, real-time data for city ops team, limitations in Kafka framework, etc...
This is a B2B industry and products are all global in nature. So, any startup in this space has competition at a global level and market size opportunity is also at the global level.
Before dwelling deeper into the problems and opportunities in the landscape, there are 5 key parameters that influence the enterprises in deciding the tools and frameworks for their data stack and the problem for the company revolves around these factors.
Limitations across each sector:
Market Size:
According to Gartner, the current spend on data center systems is around $228Bn. But a major chunk of data infrastructure spending is on storage systems/services. Currently, the storage sector is dominated by few companies like Amazon, Microsoft, Google, IBM, Oracle, and Snowflake. There is very limited room for growth for any startups in this storage sector. But the major areas of opportunities are Ingestions and Processing. According to IDC, spending on AI/ML-based services is currently around $50Bn which includes AI-based ops, SaaS solutions, and AI models. According to a market research firm(marketsandmarkets) current data prep market(a subset of the Ingestion sector) is around $4Bn.
Overall, the current market size for each sector/sub-sector can be calculated by estimating the spending budget of the ideal customers for the particular service. For example, AI ops for healthcare will target big hospitals and manufacturers)
Landscape:
Note: This is a sample representation of some startups in each subsector(yellow box). For a detailed understanding of startups in this space, please refer to Matt Duck landscapes of 2020, 2019, 2018, 2017
Customers:
Unlike B2C, where consumers’ acquisition and retention dynamics are entirely different and have numerous factors but in B2B a formal, defined, and structured enterprise sales approach is required. So startups with a strong sales team have a clear advantage.
This infrastructure market caters to a wide variety of industries, but tools/services adopted can change at an industry level, for example, financial services industries must adopt a hybrid cloud model due to regulatory concerns and their infrastructure has to be designed to suffice the hybrid needs.
Startups with industry-specific products have their specific customer segment, for example, AI ops in healthcare startups like Mendel will target hospitals, health tech, and Medtech companies
Products/Trends to watch
Indian Startup Landscape:
Sadly, there are very few Indian startups building or providing services in this market. Traditionally, India is an IT servicing land but product building ability is still a lagging factor. There are only 2 great companies from Indian soil – Appdynamics(acquired by Cisco) and Cohesity. But in recent years there has been a rise in AI/ML ops startups providing specific use cases or covering broad solutions. So, processing, visualization, and Ingestion are the top three sectors where Indian startups are building their solutions. Some of the Indian startups in this space are Scribble Data, Couture Ai, Predera, Auto Intelli, Dataiken, Deepbrainz, etc.
For further readings:
A16Z blogs and podcasts related to the data stack
Basic understanding of IT systems - Check: IT k funde channel
Mckinsey articles related to cloud adoptions, digital transformations, Data analytics, etc.
Architecture understandings - Engineering blogs of Uber, Netflix, Spotify, Facebook, and other big tech companies
Most important - Connecting with relevant people
Final thoughts:
Unlike B2C-related products, this is a difficult industry to unlock values due to decade-old systems and increasing complexities. But with rising adoptions of cloud, there is a huge demand for cloud-native tools and solutions as explained before which makes this an interesting space to look out for. More AI-based models evolve, more the requirement for AI-based ops tools for efficient processing.
Thank you for reading this and I hope I covered the important aspects here. It will be super helpful if you could provide me with feedback in comments or feel free to reach out to me on Linkedin or Twitter