What Is Data? | FunDataMentals

The Problem

Generally, we talk about computers as if they operate on perfect 0s and 1s. In code, a condition is true or false. In textbook diagrams, digital signals jump cleanly from 0 to 1 or 1 to 0. The abstraction works so well that it is tempting to believe it describes reality.

Hmmm...but does it?

At the physical level, a digital signal is a voltage, and voltage does not teleport from low to high. It takes time to settle. Sometimes the signal has not finished settling when we sample it. When that happens, the value is neither cleanly 0 nor cleanly 1. In hardware-description-land, unresolved, unknown, or unsafe-to-classify states are often modeled as X. I am borrowing that symbol here, but not using it as a full Verilog/SystemVerilog semantics lesson. In this post, X simply means: under this observer's reference frame, the signal cannot honestly be resolved as 0 or 1 yet.

That little X is where things get interesting. Because once X exists, the naive binary model is no longer enough. We cannot just chant "bits, bits, bits" and pretend the physical world politely obeys our abstraction. To model X, we have to rebuild the idea of "data" from closer to absolute zero.

The Question Can we build a first-principles model of how signals become data without starting from a textbook definition? Let's try. If we hit a wall, good. That means the model is touching something real.

The Derivation

Let's proceed by way of a thought experiment. When we observe a system, we never access the system "directly". Instead, we interact with signals produced by, emitted by, or mediated through the system: voltages, photons, pressure waves, whatever. Observation therefore requires a signal.

Terminology Clarification to Un-Melt Brains Signal: A physical manifestation of a system. Examples: voltages, photons, pressure waves.
Observation: An act of measuring, sampling, or interacting with a signal.
Data: One or more recorded outcomes of observation.
Information: Data interpreted through an observer's reference frame. Information is context-dependent; this is where meaning begins to show up.

Let $S$ denote a signal. To extract any datum $D$ from $S$, we need at least two things: distinguishability and a reference frame.

Axiom 1: Distinguishability For information to exist, there must be contrast. If all states are indistinguishable, no information can be conveyed or extracted. Thus, observation requires at least two distinguishable states. Minimum states: introduce symbols such as 0 and 1. Example: in digital electronics, these correspond to Low vs High voltage levels.

Axiom 2: Reference Frame ($F$) Measurement is impossible without a reference frame. A reference frame defines how a signal is sampled, compared, and interpreted - the "ruler" against which $S$ is evaluated. In digital systems, the reference frame is often temporal and defined by a clock. If we assume an ideal digital clock, then observations are expected to land inside a clean symbol set: $$F = \{0, 1\}$$

The Trap: because our symbols are discrete, we often assume the underlying reality being measured is also discrete. We assume that at any moment $t$, the state is either 0 or 1. But that assumption lives in the logical model, not necessarily in the physical signal.

The Physical Crack in the Binary Model

When a voltage changes from 0 to 1, it does not jump instantly. It passes through intermediate values. In a clean logical model, every sampled value is assumed to resolve into one of two symbolic classes: 0 or 1. But physical observation is messier. During transitions, the observer may not be able to classify the signal safely under the current reference frame.

A simple example is a flip-flop sampling an asynchronous input. If the input changes too close to the clock edge, the circuit can enter a metastable region. It may eventually settle to 0 or 1, but at the moment of observation, the system may not be safely classifiable as either.

That is the key point. The physical signal may have a voltage. The world did not disappear. But relative to the observer's digital reference frame, the observation is unresolved. The model must admit:

$$X$$

Not because X is a magical third bit, but because the observer cannot honestly map the signal to 0 or 1 at that moment.

So What Is a Datum?

In the clean binary model, a datum $D$, with respect to reference frame $F$, can take two values:

$$D_F = \{0, 1\}$$

That abstraction works when the signal stabilizes. But once we admit the physical unresolved state, the minimal observer-facing model becomes:

$$D_F = \{0, 1, X\}$$

This X is not a trit. A trit has three logical values. Here, we have two clean logical values plus one "I can't safely call this either one yet" state. That is the whole point. X is the model admitting uncertainty instead of lying.

Set, Multiset, or Sequence?

Now we need to be careful. Once we admit that observations can produce 0, 1, or X, what kind of object is a stream of data?

A set is not enough. If I write:

$$\{0,1,X\}$$

I know which symbols are possible, but I lose repetition and order. I cannot distinguish 0011 from 1100, because both collapse to the same set of symbols. That is way too much information loss.

A multiset is better because it preserves counts. It can tell me that I observed two 0s, two 1s, and one X. But it still loses order. And order matters. A signal sampled over time is not just a pile of outcomes. It is a history.

So for actual observed data, the natural object is a sequence. A sequence preserves both repetition and order:

$$D = \langle D_0, D_1, \dots, D_{n-1} \rangle$$

This is the point where data starts to look like memory. A single datum is one observed outcome. A data stream is an ordered history of observed outcomes. That history is what lets us compare, detect patterns, estimate probabilities, and eventually talk about meaning.

If we accept that data — observational outcomes — are sequences formed from sampling symbols from $\{0, 1, X\}$, we can ask ourselves: how much uncertainty can we handle while still treating different observations of a signal as meaningfully equivalent?

For example, consider the following observation of some signal — a sequence of six time-ordered samples:

Figure 1. A sequence of six time-ordered samples, where X marks an unresolved state.

Strictly speaking, 10X1X0 is not the equivalence class. It is the observed representative, or label, for a whole set of possibilities that the observer’s reference frame did not distinguish.

10X1X0 represents: {100100, 100110, 101100, 101110}

The above set, where any of its members could be a representative, is an equivalence class...well, not quite. In math, an equivalence class is a clean bucket: each object belongs to one bucket under the rule you chose. But physical observation is messier. Sometimes the observer cannot resolve the signal. Sometimes the observer ignores differences that do not matter for the job. And sometimes what looks like the "same" observed pattern can belong to a different practical bucket once the surrounding situation changes. Different physical histories, or observed sequences, can look like the same data stream because the observer’s reference frame cannot, or does not need to, tell them apart.

Funny enough, as I was typing this post, I wrote uncertian instaed of uncertain. If we view the 26 letters of the English alphabet as our symbol set, then, given that I am using a QWERTY keyboard, I can construct a pseudo-equivalence class:

"uncertain" is just one representative of: {"uncertain", "uncertian", "ucertain", "umcertain"}

And if you are really paying attention, you may have noticed I just made another typo in that paragraph: instaed (did you spot it?). Perhaps you glossed over it, which is fine, because it means you effortlessly constructed an equivalence-like class for instead in real time.

So why am I stressing the difference between a mathematical equivalence class and an equivalence-like class in practice? Here is the problem:

"cat" is just one representative of: {"ccat", "caat", "catt", "vat", "c t"}

Notice vat. On my QWERTY keyboard, v sits near c, so given the surrounding context (sentence, paragraph, etc.), a reader might infer that vat was intended to be cat.

But now consider another pseudo-equivalence class:

"bat" is just one representative of: {"bbat", "baat", "batt", "vat"}

Now vat appears in both groupings. But clearly a cat is not a bat. That is exactly why I am calling these pseudo-equivalence classes. In the clean math version, the buckets do not overlap. In the messy observer version, they can. The observer needs context, memory, and judgment to decide which bucket is the better fit.

A Compression Thought Hiding in Plain Sight

There is a compression idea hiding here. Once an observer treats many concrete observations as "the same enough", the observer can replace the whole messy set with one representative.

That representative might be the cleanest version, the shortest version, the safest version, the most likely version, or simply the version the system knows how to act on.

In some systems, the useful move is to find a compact representative of the whole class: the simplest form that preserves what the observer cares about. That is the bridge to compression. Not every detail survives. Only the distinctions the system needs are kept.

Think about an image of a cat. A PNG tries to preserve the image exactly, so that the original pixel data can be recovered. A JPEG is willing to throw away details the human eye probably will not care about. The cat still looks like a cat, even if the exact pixel values changed.

That is the point. Lossless compression tries to preserve enough structure to recover the original. Lossy compression intentionally throws away distinctions the observer or application is willing to ignore. Either way, compression is not magic. It is a decision about which differences matter.

Robustness as Tolerance

Why do we care? Because one useful way to think about robustness is the size of a system's acceptable equivalence class.

Some systems tolerate huge variation. Others tolerate almost none.

Example 1: The JPEG Image (High Tolerance) Context: Rendering a cat photo.
Observation: <1, 0, X, 1, X, ...>
Equivalence Class: Large. The decoder may tolerate missing or corrupted pixel-level details, interpolate, compress, smooth, or still render something recognizable.
Result: The user still sees a cat. The system survives ambiguity.

Example 2: The Rocket Controller (Low Tolerance) Context: Firing a thruster.
Observation: <1, 0, X, 1>
Equivalence Class: Tiny. The command must resolve cleanly under the system's safety rules.
Result: If an X appears in a safety-critical command path, the system should reject, retry, or trigger a failsafe.

Same abstract issue. Completely different tolerance profile. This is why "data quality" is not universal. Data quality is always relative to the observer, the reference frame, and the system's tolerance for ambiguity.

The Observation Function

Now we can make the model slightly more general.

Let $U$ be the universe of possible signal values. Let $R$ be the set of values the observer knows how to reason about. Let $F \subseteq R$ be the observer's resolved reference frame — the values the observer can map into clean categories like 0 and 1.

Domain Side (the Input Sets) $U$: all possible signal values, known and unknown $R$: observer-known values $F \subseteq R$: resolved reference frame $U \setminus R$: values outside the observer's known set

The codomain of the observation function can contain several categories:

$D_{resolved}$: values the observer can resolve, such as 0 or 1
$X_{meta}$: known-but-unresolved values, like metastable or unsafe transition states
$X_{unknown}$: values outside the observer's known set
$X_{dc}$: "don't care" values, where the observer intentionally ignores the distinction

That gives us an observation map:

$$ O_F : U \to \{D_{resolved}, X_{meta}, X_{unknown}, X_{dc}\} $$

The key point: the map is observer-relative. Change the observer, change the reference frame, change the sampling rate, change the tolerated error, and you may change the datum produced by the same underlying signal.

Data is therefore not merely "what happened". Data is what an observer can record, resolve, ignore, or admit ignorance about.

Meaning as an Equivalence-Like Class

Now we can talk about meaning without pretending a single isolated observation magically contains everything.

A single voltage measurement, bit, or X has limited meaning by itself. Meaning emerges when the observer treats different concrete outcomes as "the same enough" under some context.

Examples:

Many physical voltages are interpreted as logic 1.
Different bit patterns may decode to the same message.
Saying "turn on the light" and flipping the switch may both map to the same action.

That means meaning is not simply "inside" the datum. Meaning is assigned through the observer's reference frame. The observer says: these different physical or symbolic outcomes count as the same thing for what I care about right now.

In the cleanest case, this behaves like an equivalence class. But real meaning can be messier than that. Meanings can overlap. Context can shift. A word like "bank" can point toward money in one frame and a river in another. So no, I am not saying meaning always partitions the world into perfect, non-overlapping boxes. That would be too clean. Reality, as usual, refuses to be that polite.

Second Definition Meaning, in this simplified model, is an observer-relative grouping over observations. Different concrete outcomes can share the same meaning if the observer treats them as interchangeable enough under the current context.

Why Memory Is Required for Meaning

Here is where the model starts getting spicy.

To determine what something means, the observer usually needs comparison. And comparison requires memory, even if only temporarily. To compare two numbers, a machine must hold both somewhere: registers, memory, a buffer, a latch, something. Same with meaning.

If a symbol $w_i$ has meaning because of its relationship to other symbols $w_j$, then the observer must store enough observations to estimate those relationships. Otherwise, there is no pattern. No distribution. No association. Just isolated events floating in the void.

So yes: memory is required for discernible meaning. Not necessarily human memory. Storage. State. History. Some mechanism that lets the observer compare "now" against "before" or "this" against "that".

Let $M_F(w_i)$ be the meaning of $w_i$ relative to observer frame $F$. Not a dictionary definition. A relationship map. What tends to show up with it? What does it point toward? What does the observer associate it with?

$$ M_F(w_i) = \{(w_j, p_F(w_j \mid w_i)) \mid j \neq i\} $$

Read $p_F(w_j \mid w_i)$ as: "under observer frame $F$, given that I observed $w_i$, how likely am I to also observe or associate it with $w_j$?" That vertical bar means "given". Nothing mystical — just conditional probability.

This is not me claiming I invented semantics, graph theory, probability, or information theory. The point is simpler: if we start from observation and ask what is needed for meaning to become discernible, we naturally end up needing stored associations, frequencies, and probabilities.

Recall Figure 1. If we take 100 or 1000 measurements of a signal, we can count the observed symbols and build a frequency distribution. That frequency distribution is the bridge from raw observation to probability and statistics. Again, nothing mystical. Just counting.

Meaning as a Graph

Once the observer has memory, meanings can be represented as a weighted graph. Nodes are symbols or observations. Edges represent association. Edge weights represent empirical frequency, conditional probability, or strength of relationship.

Meaning Graph Nodes: symbols $w_0, w_1, \dots, w_{n-1}$ Edges: associations between symbols Weights: $p_F(w_j \mid w_i)$ or another observer-defined association score

This is why meaning feels circular. The meaning of $w_i$ depends on its relationship to other symbols, but those symbols also get their meaning from the relationships around them. There is no lonely symbol floating in space carrying divine context on its back.

That is not a bug. That is the structure.

$$ A^F_{ij} = p_F(w_j \mid w_i) $$

Each matrix entry $A^F_{ij}$ stores the association strength from $w_i$ to $w_j$ under observer frame $F$. If $A^F_{ij}$ is large, $w_j$ commonly shows up with or after $w_i$. If it is small, the association is weak.

Once you have a weighted adjacency matrix $A$, you are no longer just waving your hands about "meaning". You can actually compute over the relationships: clusters, transitions, central nodes, PageRank-style propagation, whatever model fits the observer's frame.

This is how first-principles thinking unfolds: intuition -> model -> formalism. The math comes to you rather than you forcing it.

Probability as the Observer's Model

Let me be precise because this is where people love to argue with ghosts. I am not trying to settle whether the underlying system is deterministic or probabilistic here. That is a deeper physics argument. In this post, I am saying something narrower: meaning is probabilistic from the observer's perspective because meaning is inferred from aggregated observations, and aggregated observations naturally produce distributions.

Side Thought for a Later Philosophical Post There is a deeper thought experiment hiding here that I have been playing with, because apparently I cannot leave observer problems alone: the self-observing universe. If the universe contains its own observers, then any observer inside the universe is trying to model the whole from within the whole. That feels like a dog chasing its own tail. If we think of the universe as the set of everything, then an observer-element inside that set cannot step outside it to inspect the whole system from nowhere. At best, the observer works with a model: a compressed, partial, frame-dependent approximation of the system it is inside. Whether that approximation is discovered, generated, or selected depends on the deeper physics story one believes. Block universe? Dynamical universe? Something else? Fine. The point here is narrower: the approximation is still not outside the universe-set. It belongs to the system it tries to describe. But the "moment" we introduce observation, we introduce a boundary: observer versus observed. I put "moment" in quotes because I am not trying to smuggle in a full theory of time or causality here. My point is this: without a boundary, there is no distinction, no measurement, no data. With a boundary, the observer gets only a partial view. That is why probability keeps showing up. Not necessarily because I have settled the physics of reality, but because observation from inside a system is incomplete. And yes, this immediately raises the next annoying question: where does the frame itself live? Is the frame outside the observation, or is the frame also part of what must be observed? Exactly. That is why I am saving this for a later post. I am not solving that here. I am just pointing at the edge of the cliff. For as long as there are boundaries, the universe may remain unknowable in full.

Physics, as humans practice it, is model-building over observed signals. We infer patterns from observations, compress those patterns into laws, and then test whether those laws keep predicting what we observe. That does not mean our models are the system itself. It means our models are the best structure we currently have over the signals available to us.

Different observers with different sensors, frames, scales, or access to signals may build different models. That does not mean "truth is whatever you want". It means the observer never gets the system raw. The observer gets signals, measurements, limits, and a reference frame.

So What? Data is observer-relative.
Meaning is observer-relative.
Probability enters because observers aggregate observations over time.

This does not make the system fake. It means the observer never gets the system unfiltered.

Bridge Back to SPPARRS

This post is the lower-level version of the Relativity idea in SPPARRS.

In SPPARRS, I use Relativity to mean observer-context: relative to which clock, frame, system boundary, sampling rule, or point of view? Here, we see the same idea at the level of a single datum. A datum is not just "out there". It is produced when an observer applies a reference frame to a signal and decides what resolves, what remains unresolved, what is unknown, and what can be ignored.

A lot of textbook models quietly drop the observer because they assume the reference frame is fixed. Same ruler. Same clock. Same sampling rule. Same measurement procedure. In that cleaned-up world, everyone is assumed to be looking at the same thing the same way.

Useful? Sure. Reality? Not always. In real engineering, observability is incomplete. Sensors are limited. Clocks drift. Samples are missed. Systems fail. Two observers may not actually see the same thing. So the observer-context is not philosophical decoration. It is part of the system boundary.

So when I say data is observer-relative, I am not being mystical. I am saying something practical: change the observer, change the frame, change the sampling rule, or change the tolerance threshold, and you may change the data. That is SPPARRS-style Relativity showing up before we even get to full systems design.

Conclusion

So what is data?

Data is not merely a bit. A bit is already a cleaned-up abstraction. Before the bit, there is a signal. Before interpretation, there is observation. Before meaning, there is memory, comparison, and reference frame.

The naive model says:

$$D_F = \{0,1\}$$

The physical model forces us to admit:

$$D_F = \{0,1,X\}$$

And the broader observer model says data is produced by an observation function mapping possible signals into resolved, unresolved, unknown, or ignored outcomes.

Final Compression Signal -> Observation -> Data -> Memory -> Association -> Meaning

That is the journey. From voltages to symbols. From symbols to uncertainty. From uncertainty to memory. From memory to meaning.

That is data, at least from this observer's reference frame.