The Great Data Closure: Why Databricks and Snowflake Are Hitting Their Ceiling

Introduction

a knowledge firm actually develop?

This week what would have been information a yr in the past was not information. Snowflake invested in AtScale, a supplier of semantic layer providers in a strategic funding within the waning firm’s historical past. An odd transfer, given the dedication to the open semantic interchange or “OSI” (yet one more acronym or .yaa) which seems to be metricflow masquerading as one thing else.

In the meantime, Databricks, the AI and Information firm, invested in AI-winner and all-round VC paramore Loveable — the quickly rising vibe-coding firm from Sweden.

Beginning a enterprise arm is a tried-and-tested route for enterprises. All people from Walmart and Hitachi to banks like JPMorgan and Goldman Sachs, and naturally the hyperscalers — MSFT, GOOG — themselves have enterprise arms (although unusually not AWS).

The advantages are clear. An funding right into a spherical may give the correct of first refusal. It provides each events affect round complementary roadmap options in addition to clear distribution benefits. “Synergy” is the phrase utilized in boardrooms, although it’s the much less insidious and pleasant youthful brother of central cost cutting so prevalent in PE quite than venture-backed companies.

It ought to subsequently come as no shock to see that Databricks are branching out outdoors of Information. In spite of everything (and Ali has been very open about this), the workforce understands the way in which to develop the corporate is thru new use circumstances, most notably AI. Whereas Dolly was a flop, the jury is out on the partnership with OpenAI. AI/BI, in addition to Databricks Purposes, are promising initiatives designed to convey extra pals into the tent — outdoors of the core SYSADMIN cluster directors.

Snowflake in the meantime could also be making an attempt an identical tack however with differing ranges of success. Except for Streamlit, it’s not clear what worth its acquisitions are really bringing. Openflow, Neolithic Nifi under-the-hood, is just not properly acquired. Relatively, it’s the inner developments such because the embedding of dbt core into the Snowflake platform that seem like gaining extra traction.

On this article, we’ll dive into the various factors at play and make some predictions for 2026. Let’s get caught in!

Progress by use circumstances

Databricks has an issue. A giant downside. And that’s fairness.

Because the fourth-largest privately held firm on the planet, on the tender age of 12 its staff require liquidity. And liquidity is pricey (see this excellent article).

To make good on its inner commitments, Databricks wanted maybe $5bn+ when it did this raise. The quantity it wants per yr is critical. It’s subsequently merely not an choice to stop elevating cash with out firing staff and chopping prices.

The expansion is staggering. In the latest series L (!) the company cites 55% yearly period-on-period progress resulting in a valuation of over $130bn. The corporate should proceed to boost cash to pay its opex and fairness, however there may be one other constraint which is valuation. At this level Databricks’ capacity to boost cash is virtually a bellwether for the trade, and so there’s a vested curiosity for everybody concerned (the record is big) to maintain issues up.

Supply: previous article

The dream is to proceed rising the corporate as it will maintain the valuation — valuations are tied to income progress. Which brings us again to make use of circumstances.

The clear use circumstances, as proven right here, are roughly:

Large information processing and spark
Inside this, Machine Studying workloads
AI workloads
Information warehousing
Ingestion or Lakeflow (Arcion we suspect was maybe a bit early)
Enterprise Intelligence
Purposes

It’s price noting these sectors are all forecasted to develop at round 15–30% all in, per the overwhelming majority of market experiences (an instance here). This displays the underlying demand for extra information, extra automation, and extra effectivity which I consider is in the end justified, particularly within the age of AI.

Sources like Technavio, Mordor Intelligence, and sometimes simply plain previous Industrial Due Diligence Experiences hardly ever disagree with one another, with virtually all the them placing the secure 15–30% vary in sectors folks suppose ought to be invested in. The purpose is; this isn’t a stagnant or a rocketing market, they usually all agree. In distinction to, for instance, AI

It could seem to point, subsequently, that the underside or “ground” for Databricks could be a couple of progress of 15–30%, and with it maybe a 40% haircut to valuation multiples (assuming linear correlation; sure, sure, assumptions, assumptions — some extra data here), barring after all any exogenous shocks to the system comparable to OpenAI going out of business or war.

That is hardly regarding as a bear-case, which makes me surprise — what’s the bull?

The bull lies within the two A’s: AI use circumstances and Purposes.

AI as a method out

If Databricks can efficiently associate with the mannequin suppliers and develop into the de-facto engine for internet hosting fashions and working the related workflows, it may very well be large.

Handkerchief maths — the income is $4.8bn RR rising at 55%. Say we’re rising at 30% in regular state, we’re lacking 25%. 25% of $4.8 is $1.2bn. The place can this come from? Supposedly current AI merchandise and current warehousing is already over $2bn (see here). What occurs subsequent yr when Databricks is at $6bn and we have to develop 50% and subsequently want $3bn? Is the enterprise going to double the AI half?

Confluent is a benchmark. It’s the largest Kafka/stream processing firm, with a income of about $1.1bn annualised. It grows about 25% y-o-y however traded at about 8x income and bought to IBM for $11bn, so about 11x income. Even with its loyal fanbase and robust adoptions for AI use circumstances (see for instance marketecture from Sean Falconer.), it might nonetheless battle to place one other $250m of annual progress on yearly.

Applications are another story. People who construct data-intensive functions should not those who usually construct internal-facing merchandise, a activity usually borne by in-house groups of software program engineers or consultants. These are groups that already know the way to do that, and know find out how to do it properly, with current know-how particularly designed for its objective, specifically core engineering primitives like React, Postgres (self-hosted) and Quick API.

A knowledge engineer may log in to Loveable, spin up Neon-Postgres, a declarative spark ETL pipeline, and front-end in Databricks. They may. However will they wish to add this to their ever-increasing backlog? I’m not positive.

The purpose is the core enterprise is just not rising quick sufficient to maintain the present valuation so further strains of enterprise are required. Databricks is sort of a golden goose at the craps table, who continues to keep away from rolling the unutterable quantity. They will now proceed making increasingly more bets, whereas all these across the desk proceed to learn.

Databricks is topped out as a data-only firm.

We’ve written before about methods they might have moved out of this. Spark-structured streaming was an apparent selection, however the ship has sailed, and it’s corporations like Aiven and Veverica that at the moment are in pole place for the Flink race.

📚 Learn: What not to miss in Real-time Data and AI in 2025 📚

To develop into a model-serving firm or an ‘AI Cloud’ appears additionally a tall order. Coreweave, Lambda, and naturally Nebius are all on track to really challenge the hyperscalers here.

An AI cloud is essentially pushed by a excessive availability of GPU-optimised compute. This doesn’t simply imply leasing EC2 cases from Jeff Bezos. It means sliding into Jensen Huang’s DMs and shopping for a ton of GPUs.

Nebius has about 20,000, with another 30,000 on the way in which — this Yahoo report thinks the numbers are higher. All of the AI Clouds lease house in information centres in addition to constructing their very own. Inference, in contrast to spark, is just not a commodity due to the immense software program, {hardware}, and logistical challenges required.

Allow us to not neglect that Nebius owns just over 25% of Clickhouse — each groups being very software engineering-led and Russian; the Yandex Alumni Club.

If there may be one factor now we have realized it’s that it’s simpler to go up the worth chain than down it. I wrote about this funnel maybe two years in the past now nevertheless it appears more true than ever.

*Who remembers this from my early running a blog days? Learn: Unstructured Data funnel*

Snowflake easily eats into dbt. Databricks has simply eaten into Snowflake’s warehouse income. Microsoft will eat into Databricks’. And in flip, with uncooked information centre energy, NVIDIA and Meta partnerships, and a military of one of the best builders within the enterprise, Nebius can eat into the hyperscalers.

Information warehousing underneath assault

With each passing day proprietary information warehousing platforms appear increasingly more unlikely to be the technical finish for AI and Information infrastructure.

Salesforce are increasing levies, databases are supporting cross-query capabilities, CDOs are running Duck DB in Snowflake itself.

Even Bill Inmon acknowledges warehousing corporations missed the warehousing!

Whereas handy, there’s a scale at which enterprises and even late stage start-ups are demanding higher openness, higher flexibility and cheaper compute.

At Orchestra we’ve seen this first-hand. The businesses taking a look at applied sciences comparable to Iceberg are overwhelmingly large. From the most important telecom suppliers to the Reserving.com’s of this world (who occur to make use of and love Snowflake; extra on this later), conventional information warehousing is unlikely to proceed dominating the share of funds it has completed for the final decade.

There are a couple of methods Snowflake has additionally tried to increase its core providing:

Assist for managed iceberg; open compute engine
Information cataloging (Select *)
Purposes (streamlit)
Spark and different types of compute like containers
AI brokers for Analysts AKA snowflake intelligence
Transformation (i.e. dbt)

Mockingly for a proprietary engine supplier, it might seem that Iceberg is a big progress avenue, in addition to AI. See extra from TT here.

Snowflake clients adore it.

Information Pangea

I feel the definitions of the pioneers, early adopters, late adopters, and laggards are altering.

Early Adopters now embody a heavy real-time element and AI-first method to the stack. That is prone to revert to Machine Studying as folks realise AI is just not a hammer for each nail.

These corporations wish to associate with a couple of massive distributors, and have a excessive urge for food for constructing in addition to shopping for software program. They are going to have a minimum of one vendor within the streaming/AI, question engine and analytics house. instance is booking.com, or maybe Fresha, who makes use of Snowflake, Starrocks, and Kafka (I beloved the article under).

📚 Learn: Exploring how modern streaming tools power the next generation of analytics with StarRocks. 📚

Early Adopters may have the standard analytics stack after which one different space. They lack the size to completely buy-in to an enterprise-wide information and AI technique, so concentrate on these use-cases they know work. Automation, Reporting.

The previous “early adopters” would have had the Andreesen Horowitz data stack. That, I’m afraid, is not cool, or in. That was the previous structure. The late adopters have the final stack.

The laggards? Who is aware of. They are going to most likely go along with whoever their CTO is aware of probably the most. Be it Informatica (see this incredible reddit post), Material, or even perhaps GCP!

The following step: chaos for smaller distributors

Quite a lot of corporations are altering tack. Secoda were acquired by Atlassian, Choose Star had been acquired by Snowflake. Arch.dev, the creators of Meltano, shut-down and passed the project to Matatika. From the massive corporations to the small, slowing income progress mixed with large strain from bloated VC rounds make constructing a “Fashionable-Information Stack”-style firm an untenable method.

📚 Learn: The Closing Voyage of the Fashionable Information Stack | Can the Context Layer for AI provide catalogs with the last chopper out of Saigon? 📚

What would occur when the Databricks and Snowflake progress numbers lastly begin to gradual, as we argue they need to right here?

What would occur if there was a big exogenous market shock or OpenAI ran out of cash sooner than anticipated?

What occurs as Salesforce improve taxes and therefore instruments like Fivetran and dbt improve in value much more?

An ideal storm for migrations and re-architecturing is brewing. Information infrastructure is extraordinarily sticky, which suggests in troublesome occasions, corporations increase costs. EC2 spot cases have not likely modified a lot in value over time, and so neither too has information infra compute — and but even AWS are raising prices of GPUs.

The marginal value of onboarding an extra instrument is turning into very excessive. We used to construct all the pieces ourselves because it was the one method. However having one instrument for each downside doesn’t work both.

We must always not neglect that Parkinson’s law applies to IT budgets too. Regardless of the funds is, the funds will get spent. Think about should you had a instrument that helped you automate extra issues with AI whereas lowering your wareouse invoice and lowering your BI Licenses (sometimes a big 25–50% P&L funds line) — what do you do?

You don’t pat your self on the again — you spend it. You spend it on extra stuff, doing extra stuff. You’ll most likely push your Databricks and Snowflake invoice again up. However you should have extra to point out for it.

Consolidation is driving funds again into centre of gravities. These are Snowflake, Databricks, GCP, AWS and Microsoft (and to a lesser extent, palantir). This spells chaos for many smaller distributors.

Conclusion — brace for less complicated structure

The Salesforce Tax is a pivotal second in our trade. Corporations like Salesforce, SAP, and ServiceNow all have an immense quantity of information and sufficient clout to maintain it there.

As Information Individuals, anybody who has completed a migration from Salesforce to Netsuite is aware of that migrating these instruments might be the most important, most costly, and most painful transfer anybody faces of their skilled careers.

Salesforce charging infrastructure service suppliers charges will increase costs, which in flip, mixed with the more and more precarious home of playing cards we see in AI and Information, all level in direction of large consolidation.

ServiceNow’s acquisition of Information.World, I feel, offers some readability into why we’ll see information groups make extra use of current tooling, simplifying structure within the course of. Information.World is a supplier of data graphs and ontologies. By mapping the ServiceNow information schema to an ontology, a gargantuan activity, ServiceNow may find yourself with half-decent AI and brokers working inside ServiceNow.

AgentForce and Data360 is Salesforce’s try, and supposedly already has $1.4bn in revenue, although we suspect it consists of a variety of legacy in there too.

These suppliers do not likely need information working round as AI use circumstances in Snowflake or Databricks. They need the Procurement Specialists, Finance Professionals, and Advertising Gurus staying in their platforms — they usually have the means to make them keep.

This isn’t monetary recommendation and this isn’t a loopy prediction. To foretell that Snowflake and Databricks will find yourself rising extra alongside the analyst consensus is hardly difficult.

However the concept the most important information corporations’ progress is on the verge of slowing is difficult. It challenges the rhetoric. It challenges the AI maximalist discourse.

We’re coming into the period of the Nice Information Closure. Whereas the AI maximalists dream of a borderless future, the truth is a heavy ceiling constructed by the incumbents’ gravity. On this new panorama, the winner isn’t the one with one of the best set of instruments, however the those that take advantage of what they’ve.

About Me

I’m the CEO of Orchestra. We assist Information Individuals construct, run and monitor their pipelines simply.

You will discover me on Linkedin here.

Source link

The Great Data Closure: Why Databricks and Snowflake Are Hitting Their Ceiling

Escaping the Valley of Choice in BI

Ensuring Data Integrity with Cryptographic Hashing and the Ethereum Blockchain

RAG Is Not Machine Learning, and the ML Toolkit Solves the Wrong Problem

How to Combine Claude Code and Codex for Maximum Coding Power

It’s the Lessons We Learned Along the Way. Or, Is It?

Proxy-Pointer RAG: Eliminating Wasteful Entity & Relations Extraction in Knowledge Graphs

Whoop Promo Codes May 2026: 20% Off | June 2026

Hawthorne bankruptcy dispute targets Illinois racing funds

Today’s NYT Connections: Sports Edition Hints, Answers for June 2 #617

Encore ROG 12RK-FB teardrop camper with pop-up wet bathroom tent

Featured Picks

Apply Sphinx’s Functionality to Create Documentation for Your Next Data Science Project

Yamaha retires its inline four for a new V4 MotoGP engine

Best Shower Head Filters of 2025

The Great Data Closure: Why Databricks and Snowflake Are Hitting Their Ceiling

Introduction

Progress by use circumstances

AI as a method out

Databricks is topped out as a data-only firm.

Information warehousing underneath assault

Information Pangea

The following step: chaos for smaller distributors

Conclusion — brace for less complicated structure

About Me

Related Posts