Data Models

A data model is a mental model of the nature of some data. It answers such questions as the following:

What values can the data take?

Are they all numeric? Are they all integers? Is the set of possible values finite? What are the minimum and maximum possible values? Are ∞ and NaN possible values? Do some values have special meanings, such as indicating undefined or missing data?

What is the measurement level?

Data is often classified as follows according to measurement level:

Name
of
Level
Description Valid
Operations
Appropriate
Measure of
Central Tendency
Examples
nominal values denote categories
which have no order
= ≠ mode color, postal code (e.g. zip)
chemical species (e.g. CO2)
ordinal values are ordered
differences meaningless
= ≠ < ≤ > ≥ median Richter earthquake scale
dates in form YYYYMMDD
interval differences are valid
quotients meaningless
= ≠ < ≤ > ≥
+ −
arithmetic mean temperature in °C
ratio quotients are valid = ≠ < ≤ > ≥
+ − × ÷
geometric mean temperature in °K

How accurate are the values?

A measurement error is the difference between the true value and the measured value. Measured values can differ from true values due to:

It is desirable to include error estimates with data.

Are the data located in some space?

A time series consists of values located along the time dimension. Geographic data is located along spatial dimensions such as latitude, longitude and altitude and may also have a time dimension. Note that longitude is cyclic.

The dimensions of the space can have a measurement level of nominal. For example, an accounting spreadsheet might have columns corresponding to charge codes and rows corresponding to company divisions.

Data located in a continuous space can be either gridded or scattered. Both types are discussed in NAP Grids.

Author: Harvey Davies       © 2002, CSIRO Australia.       Legal Notice and Disclaimer
CVS Version Details: $Id: model.html,v 1.1 2002/08/07 08:09:24 dav480 Exp $