Is the Common Semantic Layer the Subsequent Large Information Battleground?


(Finest Backgrounds/Shutterstock)

We’ve entered a interval of punctuated equilibrium within the evolution of massive knowledge over the previous month, because of neighborhood congregating round open desk codecs and metadata catalogs for storing knowledge and enabling processing engines to entry that knowledge. Now consideration is shifting to a different ingredient of the stack that’s been dwelling quietly within the shadows: the semantic layer.

The semantic layer is an abstraction that sits between an organization’s knowledge and the enterprise metrics that it has chosen as its commonplace unit of measurement. It’s a vital layer to make sure correctness.

As an illustration, whereas varied departments in an organization could have totally different opinions of what one of the best ways to measure what “income” means, the semantic layer defines what’s the appropriate approach to measure income for that firm, thereby eliminating (or not less than enormously decreasing) the possibility of getting unhealthy analytic output.

Historically, the semantic layer has travelled with the enterprise intelligence or knowledge analytics software. Should you had been a Tableau store or a Qlik store or a Microsoft PowerBI store or a ThoughtSpot store or a Looker store, you used the semantic layer supplied by these distributors to outline your small business metrics.

With no semantic layer, belief in analytic question outcomes declines (NicoElNino/Shutterstock)

This strategy works properly for smaller firms, but it surely created issues for bigger enterprises that used two or extra BI and analytic instruments. Now the enterprise is confronted with the duty of hardwiring two or extra semantic layers collectively to make sure that they’re pulling knowledge from the proper tables and making use of the best transformations to make sure their studies and dashboards would proceed producing correct data.

In recent times, the idea of a common semantic layer has began to bubble up. As a substitute of defining the enterprise metrics in a semantic layer that’s tied on to the BI or analytics software, the common semantic layer lives exterior of the BI and analytics instruments, thereby offering a semantic service that any BI or analytics software might faucet into to make sure accuracy.

Because the cloud knowledge estates of firms have grown throughout the previous 5 years, smaller firms began coping with the elevated complexity that comes from utilizing a number of knowledge stacks. That has helped to drive some curiosity within the common semantic layer.

Pure Language AI

Extra not too long ago, one other issue has pushed a surge of curiosity within the semantic layer: generative AI. Massive language fashions (LLMs) like ChatGPT are main many firms to experiment with utilizing pure language as an interfaces for a spread of functions. LLMs have proven a capability to generate textual content in any variety of languages, together with English, Spanish, and SQL.

Whereas the English generated by LLMs usually is kind of good, the SQL is often fairly poor. In reality, a latest paper discovered that LLMs generate correct SQL on common solely about one-third of the time, stated Tristan Helpful, the CEO of dbt Labs, the corporate behind the favored dbt software, and the purveyor of a common semantic layer.

“Lots of people are experimenting on this house are AI engineers or software program engineers who don’t even have information of how BI works,” Helpful informed Datanami in an interview at Snowflake’s Information Cloud Summit final month. “And they also’re identical to, ‘I don’t know, let’s have the mannequin write SQL for me.’ It simply doesn’t occur to work that properly.”

The excellent news is that it’s not troublesome to introduce a semantic layer into the GenAI name stack. Utilizing a software like LangChain, one might merely instruct the LLM to make use of a common semantic layer to generate the SQL question that can fetch the info type the database, as a substitute of letting the LLM do it itself, Helpful stated. In any case, that is precisely what semantic layers had been created for, he identified. Utilizing this strategy will increase the accuracy of pure language queries utilizing LLMs to about 90%, Helpful stated.

“We’re having a whole lot of conversations concerning the semantic layer, and a whole lot of them are pushed by the pure language interface query,” he stated.

Not Simply Semantics

Dbt Labs isn’t the one vendor plying the common semantic layer waters. Two different distributors have staked a flag on this house, together with AtScale and Dice.


AtScale not too long ago introduced that its Semantic Layer Platform is now out there on the Snowflake Market. This assist ensures that Snowflake prospects can proceed to depend on the info they’re producing, irrespective of which AI or BI software they’re utilizing within the Snowflake cloud, stated

“The semantic fashions you outline in AtScale characterize the metrics, calculated measures, and dimensions your small business customers want to investigate to attain their enterprise targets,” AtScale Vice President of Progress Cort Johnson wrote in a latest weblog put up. “After your semantics are outlined in AtScale, they are often consumed by each BI software, AI/ML software, or LLM in your group.”

Databricks can also be stepping into the semantic recreation. At its latest Information + AI Summit, it introduced that it has added first-class assist for metrics in Unity Catalog, its knowledge catalog and governance software.

“The concept right here is that you could outline metrics inside Unity Catalog and handle them along with all the opposite property,” Databricks CTO Matei Zaharia stated throughout his keynote handle two weeks in the past. “We wish you to have the ability to use the metrics in any downstream software. We’re going to show them to a number of BI instruments, so you may decide the BI software of your selection. … And also you’ll be capable of simply use them by SQL, by desk features that you could compute on.”

Databricks additionally introduced that it was partnering with dbt, Dice, and AtScale as “exterior metrics supplier,” to make it straightforward to herald and handle metrics from these distributors’ instruments inside Unity Catalog, Zaharia stated.

Dice, in the meantime, final week launched a few new merchandise, together with a brand new Semantic Catalog, which is designed to provide customers “a complete, unified view of linked knowledge property,” wrote David Jayatillake, the VP of AI at Dice, in a latest weblog put up.

“Whether or not you’re searching for modeled knowledge in Dice Cloud, downstream BI content material, or upstream tables, now you can discover all of it inside a single, cohesive interface,” he continued. “This reduces the time spent leaping between totally different knowledge sources and platforms, providing a extra streamlined and environment friendly knowledge discovery course of for each engineers and customers.”

The opposite new product introduced by Dice, which not too long ago raised $25 million from Databricks and different enterprise companies, contains an AI Assistant. This new providing is designed to “empower non-technical customers to ask questions in pure language and obtain trusted solutions primarily based in your current funding into Dice’s common semantic layer,” Jayatillake wrote in a weblog.

Opening Extra Information

GenAI could be the largest issue driving curiosity in a common semantic layer as we speak, however the want for it predates GenAI.

In response to dbt Labs’ Helpful, who’s a 2022 Datanami Particular person to Watch, the rise of the common semantic layer is occurring for a similar motive that the database is being decomposed into constituent elements.

Dbt Labs initially bought into this common semantic layer house as a result of the corporate noticed it as “a cross-platform supply of reality,” stated Helpful, who’s a 2022 Datanami Particular person to Watch.

“It needs to be throughout your totally different knowledge instruments, it needs to be throughout your BI instruments,” he stated. “In the identical means that you simply govern your knowledge transformation on this impartial means, you need to be governing your small business metrics that means, too.”

The rise of open desk codecs like Apache Iceberg, Apache Hudi, and Delta Lake–together with open metadata catalogs like Snowflake Polaris and Databricks Unity Catalog–present that there’s an urge for food for dismantling the normal monolithic database and knowledge buildings into a group of impartial parts, linked by a federated structure.

In the meanwhile, all the common semantic layers are proprietary, which is not like what’s taking place on the desk format and metastore layers, the place open requirements reign, Helpful identified. Finally, the market will decide on a normal, but it surely’s nonetheless very early days, he stated.

“Semantic layers was once form of a distinct segment factor,” he stated, “and now it’s changing into a scorching matter.”

Associated Objects:

Dice Secures $25M to Advance Its Semantic Layer Platform

AtScale Pronounces Main Improve To Its Semantic Layer Platform

Semantic Layer Belongs in Middleware, and dbt Desires to Ship It