The Difference between DIKW Hierarchy and Data-Centricity

Knowledge, information, and data are fundamental concepts in knowledge management. Often, these concepts are presented in a hierarchy, known as the DIKW (Data, Information, Knowledge, Wisdom) hierarchy, where they are viewed as separate concepts building upon each other. The hierarchical structure is manifested in definitions, that refer to the other concept and expand it. For example, information is data with a context.

The definitions of the DIKW hierarchy, as used in the Knowledge Management community, have been summarized as follows (Zins, 2007):

  • Data: The things we sense, like sounds we hear, or sets of signs, that represent empirical stimuli.
  • Information: The meaning we give to these sounds, like recognizing a car engine running, or set of signs, that represent empirical knowledge.
  • Knowledge: A thought in an individual’s mind that they justifiably believe to be true. It can be based on empirical evidence or be of a non-empirical nature, such as logical, mathematical, religious, or philosophical knowledge. Knowledge is the content of a thought believed to be true, while ”knowing” is the state of mind where the individual believes it is true.

The Critique of DIKW Hierarchy

Although the DIKW hierarchy is a foundational concept in the knowledge management community, it has been critiqued both from within the community and from outside perspectives.

Arguments are offered that the hierarchy is unsound and methodologically undesirable. The paper identifies a central logical error that DIKW makes. The paper also identifies the dated and unsatisfactory philosophical positions of operationalism and inductivism as the philosophical backdrop to the hierarchy. (Frické, 2009)

In his article, Zins mapped 47 publications and 130 definitions to several different categories, that do not agree even on whether these three concepts are internal human properties or external human made artefacts, or nouns or verbs (Zins, 2007). There is a lot of similar illogical redundancies, circular definitions, and logical conflicts between the concepts.

For example, information might be described as signs representing empirical knowledge i.e. information is the same as explicit knowledge. At the same time, knowledge is a set of signs that represent meaning of thoughts i.e. typical definition of information. On top of that, a typical definition of data is that it is raw signs about empirical or artificial realities resembling the previous definition of information or knowledge. For a more thorough overview to the criticism, check Adrião et al (2023) who recognized and described six broad theme of critique in the DIKW hierarchy: 1) Linearization, Hierarchies and Value Judgments; 2) Teleology; 3) Outdated conception of science; 4) Uncritical view of technology, 5) Unclear transitions and transformations, 6) Not all knowledge is born from data.

Rather than collecting a vast amount of criticism against the DIKW hierarchy, this article focuses on explaining an alternative way of thinking rather than the more widely known but problematic hierarchy.

The Origin of Data-Centric Thinking

For decades, Data Management International (DAMA-I) community and aligned data-centric thinkers have criticized the DIKW hierarchy and preferred alternative ways of talking about data and issues related to it. According to the DAMA Data Management Body of Knowledge (DMBOK), one of the main issues with the DIKW hierarchy is that “it implies that data and information are separate things, when the two concepts are intertwined with and dependent on each other. Data is a form of information and information is a form of data.” A similar stance has a long history to the data and information quality research, that typically uses “information interchangeably with data” (Wang, 1998). In the data-centric thinking community, professionals have emphasized the importance of scientific principles like simplicity and clarity, preferring straightforward language over the complex insider jargon that can be found in some academic fields and industry hype.

Currently, this alternative data-centric stance has not been thoroughly analyzed and described elsewhere, although there is a lot of statements towards it in DAMA-aligned scientific and practitioner publications and commentaries. Because of the numerous anecdotes surrounding this issue, the DAMA community and other data-centric thinkers can be seen as part of a group that is critical of the DIKW hierarchy.

  DAMA Community DIKW Community

Statement on conceptual relations

Data and Information are the same thing and can be used interchangeable on the same object.

 

Data, Information, Knowledge, and Wisdom are distinct concepts forming a cumulative hierarchy.

 

Expression: Data=Information (=Explicit Knowledge) Expression: Data+x=Information, Information+y=Knowledge, Knowledge+z=Wisdom
Note: the author of this article extrapolates that explicit knowledge is also the same, hence it is included in parentheses. Note: x, y, and z are “something more in addition to the earlier concept”.
Reasoning behind the conceptual relationships Data, information and knowledge are perspectives to the same thing and all these can exist in the same object i.e. representation at the same time. These terms can be used interchangeably depending on the specific issues you wish to highlight in your communication. Both activities and situations generate information (i.e. ‘relevant meaning’ to someone) that either is captured thus becoming Data, or becomes oblivious (lost), or learned in the human mind as knowledge.
Justification of the reasoning The above conclusion results from a vast amount of DIKW hierarchy criticism. For example, Zins (2007) documented ”130 definitions of data, information, and knowledge formulated by 45 scholars”. The collection of definitions provides a lot of evidence for significant logical and empirical problems. The above was interpreted and summarized from Liew (2007).
Finnish Perspective In Finland, we have only a single word ‘tieto’ to describe all these concepts and a parallel verb ‘tietää’ to describe the ‘act of knowing’.

Examples:
Database = tietokanta

Information system = tietojärjestelmä

Knowledge = tieto

In this way, the Finnish language is fundamentally data-centric or “tietokeskeinen”. All issues related to ‘data/information/knowledge’ are described by specifying semantic variations and pragmatic considerations explicitly and by using narrative structures rather than creating artificial hidden distinctions or parallel synonyms for the same object.

 

A Practical Example of DIKW Problems

One of the core problems is that people use the DIKW hierarchy all the time, but nobody has been able to provide an example of information or explicit knowledge that is not just another set of data. This indicates that the DIKW hierarchy is not exactly a valid hierarchy of different concepts, and it does not fit well to our empirical reality. It’s like a redundant collection of synonyms for the same artefact or a collection of logically incoherent assumptions.

Here’s an example for further analysis – what is this set of symbols?

TestCount PositiveCount Days Population City Country
773 195 30 20M Lagos Nigeria
55211 1900 30 400k Wellington New Zealand

 

WHO’s Perspective – Data: The World Health Organization (WHO) states that this table of facts is their raw data used to compute their international Covid statistics.

Hospitals’ Perspective – Information: Hospitals in Lagos and Wellington consider this to be contextual information, as it is derived from their raw data and produced specifically for the context of Covid reporting.

Citizen Perspective – Knowledge: This is a typical example of ”know-why” knowledge embodied in the statistics that informs people why one might want to travel to Wellington rather than to Lagos. A knowledgeable person can interpret actionable insight from these statistics and create a justified opinion (knowledge in many DIKW hierarchy definitions) to make informed decisions about traveling choices. For instance, despite Nigeria showing low levels of Covid positives, one might not risk to travel there and instead consider New Zealand, which reports a higher number of positives. This decision is based on the facts that Nigeria conducts fewer tests, and the tests that are done often return positive results, indicating widespread Covid prevalence. Conversely, New Zealand conducts extensive testing, including asymptomatic cases, catching nearly all infections. Therefore, despite the higher number of positive diagnoses with significantly smaller population, New Zealand is a much safer area in terms of Covid risk.

In this way, the exact same collection of database objects or a set of artificial symbols can be considered data, information, and knowledge by different parties, without any changes, additions, or removals. It’s all the same data. No difference. At the same time. The difference is in the perspective of the each party.

A practical example of DIKW automation

Additionally, the example can be expanded with additional columns to make it usable as training data for a new artificial intelligence solution.

FACTS IN TRAINING DATA
Test
Count
Positive
Count
Days Population City Country Accuracy Risk Decision
773 195 30 20M Lagos Nigeria Low High Risky
55211 1900 30 400k Wellington New Zealand High Low Safe

 

LABELS IN TRAINING DATA
HIGH ACCURACY High TestCount / Population means that Covid statistics reflect more accurately the reality. Positive feature and low risk level is more trustworthy.
LOW ACCURACY Low TestCount / Population means that Covid statistics are anecdotal samples, and the reality might differ significantly. Negative feature and risk levels cannot be trusted.
LOW RISK High TestCount / PositiveCount means that there is less Covid risks. Positive feature and less risks.
HIGH RISK Low TestCount / PositiveCount means that there is more Covid risks. Negative feature and more risks.

 

DECISION-MAKING RULES AND GOALS IN TRAINING DATA
If High Accuracy and Low Risks Travelling is safe
If High Accuracy and High Risks Travelling is not recommended
If Low Accuracy Travelling is risky

 

By extending the original data set with these labels and decision-making rules, the so-called “knowledge” i.e. know-how, and know-why, could be codified more explicitly to the same data set. At the same time, one should recognize that a knowledgeable actor would have been able to determine these quality issues, risk categories, and travelling implications even from the original raw data – like I did earlier. These additional columns were added to make a point that a few more columns of data would make the explicit ”data set” match even the most advanced definitions of explicit “knowledge”. Yet, it is just a slightly more comprehensive data set with additional data relevant for travel planning.

In addition, a modern AI system could learn the logical patterns of the successful reasoning i.e. relationships between accuracy, risks, and decisions, and make better informed travelling recommendations than most of the average humans were able to do during the Covid times. In this way, an AI system could become a more knowledgeable decision-maker than a large part of the people ever was during the Covid pandemic. This leads to logical and practical problems in DIKW hierarchy definitions that claim knowledge to be a human property – but soon find out that the machines do the same things. This is the same century old problem of trying to distinguish knowledge and intelligence between humans and machines as described in our earlier article on ”Artificial Intelligence: Mathematical Imitator or Genuine Thinker?”.

In the DIKW hierarchy terminology, this kind of an AI capability could be said with three different words to be “using data to make pandemic information products and then making knowledgeable travelling decisions”.

According to the data-centric thinking, its equally fine to say that this is ‘data-driven travel decision-making with hospital data’. A more simple way to say the same thing with less hidden semantic assumptions about data, information and knowledge.

The core point of data-centric thinking is that there is no need to create complex and redundant vocabulary with distinct layered concepts like information or knowledge in the DIKW hierarchy. In fact, it could be said that creating such language constructs can be even harmful for precise and effective communication, when the artificial and nuanced definitions are not understood exactly similarly between all stakeholders, or they do not match with real-world experiences.

The Reasoning of Data-Centric Perspective

According to transition criticism, the same set of symbols can simultaneously be considered data and information and explicit knowledge, without any transformation or processing between them. There is no conversion process required to transform data into information or knowledge. Similarly, nothing needs to be removed from explicit knowledge to utilize it as data for machine learning. It’s the same set of signs or symbols like in the example of Covid statistics.

According to empirical criticism, no one has ever demonstrated the existence of additional symbols that would differentiate explicitly knowledge, information and data. In fact, nowadays it is well understood by everyone that practically any set of symbols can serve as training data for modern AI systems. All previous examples of information and knowledge have so far been nothing more than potential training data for modern AI systems. Likewise, it’s quite common to hear a data scientist say that it’s all data to us. They don’t care if you call your set of symbols data, metadata, information, knowledge or just encrypted gibberish – just hand it to them and it’s their data. They can derive new meanings out of it and use it for new purposes.

Considering the above, it is logical to replace redundant ”information” and ”explicit knowledge” terms with just a single simple term – data or tieto in Finnish. In fact, in the scientific communities, simplicity is a virtue. If you can replace the redundant terms with a single term, then that should be done. This data-centric view aligns with the views of artificial intelligence community, that treats everything as data if it just fits into a storage in a form or another.

In the data-centric community, it is perfectly fine or even preferable to say that “an organization has sales data sets that are processed into data products, which are then repurposed to make automated data-driven marketing decisions without any human intervention”. There is no need to use more complex terminological hierarchy and say that “we have sales data sets that are processed into information products, which are then repurposed to make automated knowledge operations without any human intervention”. Particularly, because the original sales data sets are also contextual data (i.e. so-called information) about sales activities and the latent data patterns embodying know-why and know-how (i.e. so-called knowledge) can be codified into knowledgeable AI agents performing fully automated marketing to improve purchase conversions without any human intervention.

In business reality, by making a distinction between these three words and choosing to use one of them for your set of symbols, like the sales data, you are always using a wrong term from someone else’s perspective! Any data set is always contextual information from someone else and it might have even unforeseeable uses (knowledge) due to its inherent characteristics. The differences between data, information, and knowledge should be understood as different perspectives to the exactly same object or artefact.

The Core Ideas of Data-Centric Thinking

  1. Anything can be Data: Empirical and Artificial.
    • Data encompasses all forms of representations, including unstructured, structured, binary, empirical, artificial, digital, analogue, and any other physical form of signs.
    • Any kind of physical representation can be used as data for various purposes.
  2. Data is always Contextual: Contextual and Meaningful.
    • Any data can contain numerous explicit and latent meanings, complex reasoning paths, and valuable decision insights.
    • The quality of data is dependent on how well it depicts these meanings, patterns, and contexts and distinguishes them from random meaningless entropy.
  3. Data can be Actionable: Pragmatic and Valuable.
    • Ideas and patterns embodied in data can be turned into data-driven actions performed by various actors, such as algorithms or humans.
    • The quality of data depends also on its fitness for a specific pragmatic purpose leading to desirable results and outcomes in a specific context. In this way, the quality and actionability are also contextual properties or variables rather than a single property or factual state of an object.

Conclusions

Instead of creating artificial synonyms for a single concept like data, it is better to use a single simple term and discuss its syntactical, semantic, and pragmatic aspects explicitly. They are more rigorous concepts and generally better understood, while the DIKW terminology often leads to problematic interpretations, even within the knowledge management community itself.

Preferring simple terms and explicit narratives i.e. data storytelling reduces misunderstandings and improves the effectiveness of real-life communication. Artificial terminologies with latent subcultural meanings should be avoided when aiming towards effective communication. Therefore, use simple and effective language to communicate, stating things explicitly and employing convincing narrative structures—do not assume or create latent meanings or conceptual dependencies that might be missed or understood differently by others.

In general, it is acceptable to use DIKW terms occasionally to make a point, but in the DAMA community, we have used them interchangeably for a long time, similar to the CDOIQ community.  Even in these cases, it’s better to treat data, information, and knowledge perspectives to the same object, rather than different type of objects or activities. Therefore, it is valid and even preferable to say that we use data to make data-driven decisions, and any so-called information or explicit knowledge is merely more data for our data processing and machine-learning operations. In the end, anything can be our data, all data is always contextual and it can be actionable even for unforeseen purposes.

References

Adrião, M. C., & Filho, E. R. (2021). The Pyramid of Information – criticism and opportunity. International Journal of Advanced Engineering Research and Science, 8 (5).

Frické, Martin (2009). ”The Knowledge Pyramid: A Critique of the DIKW Hierarchy”. Journal of Information Science. 35 (2).

Liew, A. (June 2007). Understanding Data, Information, Knowledge and Their Inter-Relationships”. Journal of Knowledge Management Practice. 8 (2).

Zins, C. (January 22, 2007). ”Conceptual Approaches for Defining Data, Information, and Knowledge”. Journal of the American Society for Information Science and Technology. 58 (4).


Sami Laineen kuva
Author: Sami Laine is Senior Data Advisor in Tietoevry Tech Services, and Senior Advisor at Aalto EE, where he is currently spearheading the mission to bring the world-renowned CDOIQ Symposium to the Nordics in 2025. During his over 20 years career, he has worked in data management practitioner, consultant, researcher, and teacher roles in several business sectors. Throughout his career, Sami has been an active advocate for promoting quality and ethical perspectives in data management and business decision-making, as recognized by his nomination for the DAIR Awards in 2022. He has been a long-time president and board member of the DAMA Finland ry and a program committee member of the MIT CDOIQ Symposium.