Domain vs. Dataset: What's the Difference?

The terms “domain” and “dataset” are commonly used in CDISC’s nomenclature and found frequently in the Study Data Tabulation Model (SDTM). For example, the SDTM v1.8 includes 134 instances of "domain" and says "A collection of observations on a particular topic is considered a domain." The Model includes 78 instances of dataset and certain structures in the model are called "datasets" rather than "domains." Is there a difference between a domain and a dataset?

The CDISC Glossary defines these terms as follows:

  • Domain: A collection of logically related observations with a common, specific topic that are normally collected for all subjects in a clinical investigation. NOTE: The logic of the relationship may pertain to the scientific subject matter of the data or to its role in the trial. Example domains include laboratory test results (LB), adverse events (AE), concomitant medications (CM). [After SDTM Implementation Guide version 3.2,] See also general observation class.
  • Dataset: A collection of structured data in a single file. [CDISC, ODM, and SDS] Compare to analysis dataset, tabulation dataset.

In plainer terms, a domain is a grouping of observations that are related while a dataset is the data structure associated with that grouping of observations. Both domains and datasets use the same nomenclature, which is why they are often confused.

The distinction between domain and dataset is most clearly seen in cases where a general observation class domain is split into multiple datasets in a submission. Common examples are splitting the Laboratory Test Results (LB) domain due to size, splitting the Questionnaires (QS) domain by questionnaire, and splitting the Findings About Events or Interventions (FA) domain by parent domain.

However, since in most cases there is a one-to-one relationships between a conceptual domain and a dataset based on that conceptual domain, the words are used interchangeably in the standards and, therefore, by most users. The structures called “relationship datasets” were given that name because they are mechanisms for connecting information represented in different datasets rather than observations about study subjects. Note that none of the relationship datasets includes the variable DOMAIN. However, in a submission, these datasets need dataset names, and character strings used in those names are included in the CDISC Codelist called "SDTM Domain Abbreviations."

In conclusion, there is a clear distinction between the meaning of "domain" and "dataset" but given that the naming conventions are the same across both terms, in many cases they can be considered interchangeable.