Overcoming Data Integration Challenges to Unleash Massive Potential: Knowledge Graphs 101


(Mathias Rosenthal/Shutterstock)

Doing integrations in the storage layer is a bit like asking the first person you ever date to marry you. It’s not necessarily a bad idea, but it is risky and it requires a big upfront commitment. Like a hasty proposal, when it comes to integrating data, moving and copying data in order to integrate may well work out, but it involves risk and requires a big upfront obligation. If anything changes—and something always changes—then you have to rerun all the jobs, make new copies, and move data all over again.

Because you’ve committed so early to a particular viewpoint and then consolidated it at the storage layer, you’ve also aggressively excluded other possibilities. Early binding and tight coupling are all fun and games until things don’t work out and then where are we?

While enterprise data set might seem like objective fact–like a hard, fixed, immutable thing that represents the world exactly as it is–in reality, it’s more instrumental than all that. Enterprise data represents something; it re-presents some part of the world and we manipulate the data largely to manipulate the world. Which means data is full of subjective human choices and human values. Data is really comprised of answers to very human questions: Which data should I collect? What data needs to be transformed? Which data needs to be summarized or aggregated and what data counts? What matters and what are we trying to accomplish? Every one of these choices becomes or influences a modeling decision or transformation or an invariant or business rule. The technical apparatus of data integration and analytics is shot through with human values.

The result is two facts: first, that integrating data at the storage layer excludes possibilities; and, second, that human data is a function of human choices. What follows is that when an analyst has a new idea about how to organize and understand data or a strategic initiative is mooted by a regulatory ruling or a competitor zigs instead of zags, organizations may have to throw it all away and start over from scratch.

Integrating data at the file system level is a risky proposition, the author says (Miha Creative/Shutterstock)

Suddenly data teams have to recreate a new version of the data and form a new data set. Which means they have to go through the process of remodeling, transforming, and summarizing the data all over again, including re-running the weeks-long ELT jobs and blowing up schedules, budgets, bandwidth and storage.

But what if they didn’t have to?

Leveraging Knowledge Graphs to Accelerate Insight

Whether called data sprawl or data silos, data resides in lots of places. In its natural state, data is disconnected, both from other data it should be connected to and from the business context which makes it meaningful. The natural disconnectedness of enterprise data presents a challenge for organizations that need to drive business transformation with data. Which is just to say that it’s a challenge for everyone. Data management practices that limit range of motion and increase inflexibility, including integrating data solely at the storage layer, hamper everything from app development, data science and analytics, process automation, and even reporting and compliance.

However, there is an alternative to integrating data eagerly at the storage layer; namely, connecting data lazily at the compute layer. Late binding and loose coupling in data architectures increases flexibility and range of motion. Enterprise are increasingly adopting new data management techniques including data fabrics and knowledge graphs (KG) to unify. Knowledge graphs offer a flexible, reusable data layer that enables organizations to answer complex queries across data sources and offers unprecedented connectedness with contextualized data, represented and organized in the form of smart graphs.

Built to capture the ever-changing nature of information, knowledge graphs accept new data, definitions, and requirements fluidly, easily, and in a way that promotes radical reuse of data across large orgs. This means as the enterprise evolves and greater volumes of data, sources and use cases emerge, they can be absorbed without manageability and

Knowledge graphs offer a powerful abstraction for data integration challenges, the author writes

accessibility loss, while fully representing the current expanse of what the enterprise knows.

Dissecting the Components of a Knowledge Graph

Enterprise Knowledge Graph is a technology that combines the capabilities of a graph database with a knowledge toolkit, including AI, ML, data quality, and reusable smart graph models, for the purpose of large scale data unification. Put simply, KGs know everything the business knows because it can re-present data sprawl and silos in connected data fabric.

Because knowledge graphs are built on a graph database technologies, they natively represent and store data as entities (aka nodes) with relationships called edges to other entities. Like traditional graph databases, knowledge graphs quickly navigate chains of these edges to find relationships between various pieces of data. Following numerous chains of edges at the same time, they can identify many-to-many interrelationships at multiple levels of granularity, from summary rollups to a record’s smallest details, so relevant data can be retrieved through a single query. Unlike plain graph databases, knowledge graph platforms query connected data using data virtualization and query federation techniques, moving data integration from the storage layer to the compute layer.

As data and queries become more complex, the benefits of knowledge graph’s smart data model increase, as it can connect data silos into facts that constitute contextualized knowledge. A knowledge graph also contains tools that allow enterprises to add a layer of richer semantics to support knowledge representation in the graph and strengthen machine understanding, which is something that plain graph databases does not.

For instance, where a plain graph database knows there is an interrelationship between a person node in silo A and an organization node in silo B, a knowledge graph also understands the nature of that interrelationship and it can query that relationship without first moving or copying data from siloes A and B into silo C (i.e., a plain graph database).

Combatting Big Data’s 3Vs: How KGs Unveil Hidden Insights

Stepping back and looking at the bigger picture of the modern data analytics stack, there are plenty of tools and techniques for addressing the volume and velocity challenges of big data. The cloud means, for example, never running out of storage again and it makes distributed systems easier to operate, even if they’re still really hard to build and maintain.

But the big data challenge of variety has mostly been ignored until recently. Perhaps knowledge graphs biggest contribution is to solve the variety problem by providing a consistent view across heterogeneous data. Notice, however, that the view is homogeneous and consistent while the underlying data remains heterogenous and even physically separate.

Knowledge graphs encompass the large, diverse, and constantly evolving data found in modern enterprises using a comprehensive abstraction (that is, semantic graphs) based on declarative data structures and query languages. They combine key technologies that work together to unify data on a massive scale including a reusable, powerful data model; virtual graph capabilities to manage structured, semi-structured, and unstructured data; and inference and reasoning services.

Given the importance and potential of data, enterprises can’t ignore the costs that arise from not being able to access or apply the knowledge accumulated across the enterprise. In today’s hybrid multicloud world of increasing complexity and specialization, data sprawl and data silos aren’t really avoidable, but they are manageable so long as data can be unified across them. By applying knowledge graphs to truly leverage what the enterprise knows, they can grow in parallel with the business and enable users to utilize this untapped data and insight to help them innovate and achieve a true competitive advantage.

About the Author: Kendall Clark  is founder and CEO of Stardog, an Enterprise Knowledge Graph (EKG) platform provider. You can follow the company on Twitter @StardogHQ.

Related Items:

Why Young Developers Don’t Get Knowledge Graphs

Cloud-Native Knowledge Graph Forges a Data Fabric

Why Knowledge Graphs Are Foundational to Artificial Intelligence