The Problem
Generally, we talk about computers as if they operate on perfect 0s and 1s. In code, a condition is true or false. In textbook diagrams, digital signals jump cleanly from 0 to 1 or 1 to 0. The abstraction works so well that it is tempting to believe it describes reality.
Hmmm...but does it?
At the physical level, a digital signal is a voltage, and voltage does not teleport from low to high. It takes time to settle. Sometimes the signal has not finished settling when we sample it. When that happens, the value is neither cleanly 0 nor cleanly 1. In hardware-description-land, unresolved, unknown, or unsafe-to-classify states are often modeled as X. I am borrowing that symbol here, but not using it as a full Verilog/SystemVerilog semantics lesson. In this post, X simply means: under this observer's reference frame, the signal cannot honestly be resolved as 0 or 1 yet.
That little X is where things get interesting. Because once X exists, the naive binary model is no longer enough. We cannot just chant "bits, bits, bits" and pretend the physical world politely obeys our abstraction. To model X, we have to rebuild the idea of "data" from closer to absolute zero.
The Derivation
Let's proceed by way of a thought experiment. When we observe a system, we never access the system "directly". Instead, we interact with signals produced by, emitted by, or mediated through the system: voltages, photons, pressure waves, whatever. Observation therefore requires a signal.
Observation: An act of measuring, sampling, or interacting with a signal.
Data: One or more recorded outcomes of observation.
Information: Data interpreted through an observer's reference frame. Information is context-dependent; this is where meaning begins to show up.
Let $S$ denote a signal. To extract any datum $D$ from $S$, we need at least two things: distinguishability and a reference frame.
0 and 1.
Example: in digital electronics, these correspond to Low vs High voltage levels.
The Trap: because our symbols are discrete, we often assume the underlying reality being measured is also discrete. We assume that at any moment $t$, the state is either 0 or 1. But that assumption lives in the logical model, not necessarily in the physical signal.
The Physical Crack in the Binary Model
When a voltage changes from 0 to 1, it does not jump instantly. It passes through intermediate values. In a clean logical model, every sampled value is assumed to resolve into one of two symbolic classes: 0 or 1. But physical observation is messier. During transitions, the observer may not be able to classify the signal safely under the current reference frame.
A simple example is a flip-flop sampling an asynchronous input. If the input changes too close to the clock edge, the circuit can enter a metastable region. It may eventually settle to 0 or 1, but at the moment of observation, the system may not be safely classifiable as either.
That is the key point. The physical signal may have a voltage. The world did not disappear. But relative to the observer's digital reference frame, the observation is unresolved. The model must admit:
Not because X is a magical third bit, but because the observer cannot honestly map the signal to 0 or 1 at that moment.
So What Is a Datum?
In the clean binary model, a datum $D$, with respect to reference frame $F$, can take two values:
That abstraction works when the signal stabilizes. But once we admit the physical unresolved state, the minimal observer-facing model becomes:
This X is not a trit. A trit has three logical values. Here, we have two clean logical values plus one "I can't safely call this either one yet" state. That is the whole point. X is the model admitting uncertainty instead of lying.
Set, Multiset, or Sequence?
Now we need to be careful. Once we admit that observations can produce 0, 1, or X, what kind of object is a stream of data?
A set is not enough. If I write:
I know which symbols are possible, but I lose repetition and order. I cannot distinguish 0011 from 1100, because both collapse to the same set of symbols. That is way too much information loss.
A multiset is better because it preserves counts. It can tell me that I observed two 0s, two 1s, and one X. But it still loses order. And order matters. A signal sampled over time is not just a pile of outcomes. It is a history.
So for actual observed data, the natural object is a sequence. A sequence preserves both repetition and order:
This is the point where data starts to look like memory. A single datum is one observed outcome. A data stream is an ordered history of observed outcomes. That history is what lets us compare, detect patterns, estimate probabilities, and eventually talk about meaning.
If we accept that data — observational outcomes — are sequences formed from sampling symbols from $\{0, 1, X\}$, we can ask ourselves: how much uncertainty can we handle while still treating different observations of a signal as meaningfully equivalent?
For example, consider the following observation of some signal — a sequence of six time-ordered samples:
X marks an unresolved state.
Strictly speaking, 10X1X0 is not the equivalence class. It is the observed representative, or label, for a whole set of possibilities that the observer’s reference frame did not distinguish.
The above set, where any of its members could be a representative, is an equivalence class...well, not quite. In math, an equivalence class is a clean bucket: each object belongs to one bucket under the rule you chose. But physical observation is messier. Sometimes the observer cannot resolve the signal. Sometimes the observer ignores differences that do not matter for the job. And sometimes what looks like the "same" observed pattern can belong to a different practical bucket once the surrounding situation changes. Different physical histories, or observed sequences, can look like the same data stream because the observer’s reference frame cannot, or does not need to, tell them apart.
Funny enough, as I was typing this post, I wrote uncertian instaed of uncertain. If we view the 26 letters of the English alphabet as our symbol set, then, given that I am using a QWERTY keyboard, I can construct a pseudo-equivalence class:
And if you are really paying attention, you may have noticed I just made another typo in that paragraph: instaed (did you spot it?). Perhaps you glossed over it, which is fine, because it means you effortlessly constructed an equivalence-like class for instead in real time.
So why am I stressing the difference between a mathematical equivalence class and an equivalence-like class in practice? Here is the problem:
Notice vat. On my QWERTY keyboard, v sits near c, so given the surrounding context (sentence, paragraph, etc.), a reader might infer that vat was intended to be cat.
But now consider another pseudo-equivalence class:
Now vat appears in both groupings. But clearly a cat is not a bat. That is exactly why I am calling these pseudo-equivalence classes. In the clean math version, the buckets do not overlap. In the messy observer version, they can. The observer needs context, memory, and judgment to decide which bucket is the better fit.
A Compression Thought Hiding in Plain Sight
There is a compression idea hiding here. Once an observer treats many concrete observations as "the same enough", the observer can replace the whole messy set with one representative.
That representative might be the cleanest version, the shortest version, the safest version, the most likely version, or simply the version the system knows how to act on.
In some systems, the useful move is to find a compact representative of the whole class: the simplest form that preserves what the observer cares about. That is the bridge to compression. Not every detail survives. Only the distinctions the system needs are kept.
Think about an image of a cat. A PNG tries to preserve the image exactly, so that the original pixel data can be recovered. A JPEG is willing to throw away details the human eye probably will not care about. The cat still looks like a cat, even if the exact pixel values changed.
That is the point. Lossless compression tries to preserve enough structure to recover the original. Lossy compression intentionally throws away distinctions the observer or application is willing to ignore. Either way, compression is not magic. It is a decision about which differences matter.
Robustness as Tolerance
Why do we care? Because one useful way to think about robustness is the size of a system's acceptable equivalence class.
Some systems tolerate huge variation. Others tolerate almost none.
Observation:
<1, 0, X, 1, X, ...>Equivalence Class: Large. The decoder may tolerate missing or corrupted pixel-level details, interpolate, compress, smooth, or still render something recognizable.
Result: The user still sees a cat. The system survives ambiguity.
Observation:
<1, 0, X, 1>Equivalence Class: Tiny. The command must resolve cleanly under the system's safety rules.
Result: If an
X appears in a safety-critical command path, the system should reject, retry, or trigger a failsafe.
Same abstract issue. Completely different tolerance profile. This is why "data quality" is not universal. Data quality is always relative to the observer, the reference frame, and the system's tolerance for ambiguity.
The Observation Function
Now we can make the model slightly more general.
Let $U$ be the universe of possible signal values. Let $R$ be the set of values the observer knows how to reason about. Let $F \subseteq R$ be the observer's resolved reference frame — the values the observer can map into clean categories like 0 and 1.
The codomain of the observation function can contain several categories:
- $D_{resolved}$: values the observer can resolve, such as
0or1 - $X_{meta}$: known-but-unresolved values, like metastable or unsafe transition states
- $X_{unknown}$: values outside the observer's known set
- $X_{dc}$: "don't care" values, where the observer intentionally ignores the distinction
That gives us an observation map:
The key point: the map is observer-relative. Change the observer, change the reference frame, change the sampling rate, change the tolerated error, and you may change the datum produced by the same underlying signal.
Data is therefore not merely "what happened". Data is what an observer can record, resolve, ignore, or admit ignorance about.
Meaning as an Equivalence-Like Class
Now we can talk about meaning without pretending a single isolated observation magically contains everything.
A single voltage measurement, bit, or X has limited meaning by itself. Meaning emerges when the observer treats different concrete outcomes as "the same enough" under some context.
Examples:
- Many physical voltages are interpreted as logic
1. - Different bit patterns may decode to the same message.
- Saying "turn on the light" and flipping the switch may both map to the same action.
That means meaning is not simply "inside" the datum. Meaning is assigned through the observer's reference frame. The observer says: these different physical or symbolic outcomes count as the same thing for what I care about right now.
In the cleanest case, this behaves like an equivalence class. But real meaning can be messier than that. Meanings can overlap. Context can shift. A word like "bank" can point toward money in one frame and a river in another. So no, I am not saying meaning always partitions the world into perfect, non-overlapping boxes. That would be too clean. Reality, as usual, refuses to be that polite.
Why Memory Is Required for Meaning
Here is where the model starts getting spicy.
To determine what something means, the observer usually needs comparison. And comparison requires memory, even if only temporarily. To compare two numbers, a machine must hold both somewhere: registers, memory, a buffer, a latch, something. Same with meaning.
If a symbol $w_i$ has meaning because of its relationship to other symbols $w_j$, then the observer must store enough observations to estimate those relationships. Otherwise, there is no pattern. No distribution. No association. Just isolated events floating in the void.
So yes: memory is required for discernible meaning. Not necessarily human memory. Storage. State. History. Some mechanism that lets the observer compare "now" against "before" or "this" against "that".
Let $M_F(w_i)$ be the meaning of $w_i$ relative to observer frame $F$. Not a dictionary definition. A relationship map. What tends to show up with it? What does it point toward? What does the observer associate it with?
Read $p_F(w_j \mid w_i)$ as: "under observer frame $F$, given that I observed $w_i$, how likely am I to also observe or associate it with $w_j$?" That vertical bar means "given". Nothing mystical — just conditional probability.
This is not me claiming I invented semantics, graph theory, probability, or information theory. The point is simpler: if we start from observation and ask what is needed for meaning to become discernible, we naturally end up needing stored associations, frequencies, and probabilities.
Recall Figure 1. If we take 100 or 1000 measurements of a signal, we can count the observed symbols and build a frequency distribution. That frequency distribution is the bridge from raw observation to probability and statistics. Again, nothing mystical. Just counting.
Meaning as a Graph
Once the observer has memory, meanings can be represented as a weighted graph. Nodes are symbols or observations. Edges represent association. Edge weights represent empirical frequency, conditional probability, or strength of relationship.
This is why meaning feels circular. The meaning of $w_i$ depends on its relationship to other symbols, but those symbols also get their meaning from the relationships around them. There is no lonely symbol floating in space carrying divine context on its back.
That is not a bug. That is the structure.
Each matrix entry $A^F_{ij}$ stores the association strength from $w_i$ to $w_j$ under observer frame $F$. If $A^F_{ij}$ is large, $w_j$ commonly shows up with or after $w_i$. If it is small, the association is weak.
Once you have a weighted adjacency matrix $A$, you are no longer just waving your hands about "meaning". You can actually compute over the relationships: clusters, transitions, central nodes, PageRank-style propagation, whatever model fits the observer's frame.
This is how first-principles thinking unfolds: intuition -> model -> formalism. The math comes to you rather than you forcing it.
Probability as the Observer's Model
Let me be precise because this is where people love to argue with ghosts. I am not trying to settle whether the underlying system is deterministic or probabilistic here. That is a deeper physics argument. In this post, I am saying something narrower: meaning is probabilistic from the observer's perspective because meaning is inferred from aggregated observations, and aggregated observations naturally produce distributions.
Physics, as humans practice it, is model-building over observed signals. We infer patterns from observations, compress those patterns into laws, and then test whether those laws keep predicting what we observe. That does not mean our models are the system itself. It means our models are the best structure we currently have over the signals available to us.
Different observers with different sensors, frames, scales, or access to signals may build different models. That does not mean "truth is whatever you want". It means the observer never gets the system raw. The observer gets signals, measurements, limits, and a reference frame.
Meaning is observer-relative.
Probability enters because observers aggregate observations over time.
This does not make the system fake. It means the observer never gets the system unfiltered.
Bridge Back to SPPARRS
This post is the lower-level version of the Relativity idea in SPPARRS.
In SPPARRS, I use Relativity to mean observer-context: relative to which clock, frame, system boundary, sampling rule, or point of view? Here, we see the same idea at the level of a single datum. A datum is not just "out there". It is produced when an observer applies a reference frame to a signal and decides what resolves, what remains unresolved, what is unknown, and what can be ignored.
A lot of textbook models quietly drop the observer because they assume the reference frame is fixed. Same ruler. Same clock. Same sampling rule. Same measurement procedure. In that cleaned-up world, everyone is assumed to be looking at the same thing the same way.
Useful? Sure. Reality? Not always. In real engineering, observability is incomplete. Sensors are limited. Clocks drift. Samples are missed. Systems fail. Two observers may not actually see the same thing. So the observer-context is not philosophical decoration. It is part of the system boundary.
So when I say data is observer-relative, I am not being mystical. I am saying something practical: change the observer, change the frame, change the sampling rule, or change the tolerance threshold, and you may change the data. That is SPPARRS-style Relativity showing up before we even get to full systems design.
Conclusion
So what is data?
Data is not merely a bit. A bit is already a cleaned-up abstraction. Before the bit, there is a signal. Before interpretation, there is observation. Before meaning, there is memory, comparison, and reference frame.
The naive model says:
The physical model forces us to admit:
And the broader observer model says data is produced by an observation function mapping possible signals into resolved, unresolved, unknown, or ignored outcomes.
That is the journey. From voltages to symbols. From symbols to uncertainty. From uncertainty to memory. From memory to meaning.
That is data, at least from this observer's reference frame.