Data modeling¶
This chapter describes how to model cross-linguistic data using the core resources
available in the clld
framework. While it is possible to extend the core data model
in various ways, sticking to core resources for comparable concepts will ensure
re-usability of the data, because all of the data publication mechanisms implemented
in clld
will be available.
Dataset¶
Each clld
app is assumed to serve a cross-linguistic dataset. The
clld.db.models.common.Dataset
object holds metadata about the dataset, e.g.
the publisher and license and relations to editors.
Languages¶
Languages are the core objects which are described in datasets served by clld
apps.
clld.db.models.common.Language
- like most other objects - are at the most
basic level described by a name, an optional description and an optional geographical
coordinate.
To allow identification of languages across apps or even domains, languages can be
associated with any number of alternative
clld.db.models.common.Identifier
; typically glottocodes or iso 639-3
codes or alternative names.
Parameters¶
clld.db.models.common.Parameter
objects are used to model language parameters,
i.e. phenomena (aka features) which can be measured across languages. Single datapoints,
i.e. measurements of the parameter for a single language are modeled as instances of
clld.db.models.common.Value
. To support multiple measurements for the same
(language, parameter) pair, values are grouped in a
clld.db.models.common.ValueSet
, and it is the valueset that is related to
language and parameter.
Enumerated domain¶
clld
supports enumerated domains. Elements of the domain of a parameter can be modeled
as clld.db.models.common.DomainElement
instances and each value must then be
related to one domain element.
The clld
framework will then use the domain
property of a parameter to select
behaviour suitable for enumerated domains only, e.g. loading values associated with one
domain element as separate layer when displaying a parameter map.
Typed values¶
The clld
framework is agnostic with regard to the types of values, i.e. as far as
default functionality is concerned the only properties required of a value are a name
and an id
(and optionally a description
). To simply store typed data for values
multiple mechanisms are available.
Storing typed data in the
jsondata
dictionary: This accomodates all data types which can be serialized as JSON, i.e. numbers, booleans, arrays, dictionaries.If the data for a value comes as a list or dictionary of strings, it can also be stored as
clld.db.models.common.Value_data
instances.Finally there’s the option to store data related to a value as files, i.e. as instances of
clld.db.models.common.Value_files
.