Submissions Data Domain Models
Version 2.0

The CDISC Submissions Data Domain Models have been prepared by the CDISC Submissions Data Standards (SDS) team to guide the organization, content, and form of submission datasets for the 12 safety-related domains listed in the FDA guidance documents. In the future, additional models will be provided for common analysis formats and to describe other types of data such as pharmacokinetics and pharmacodynamics, as well as efficacy data for certain therapeutic areas. The focus of the SDS Group has been on case report tabulation (CRT) datasets; analysis metadata models are being developed separately by the CDISC Analysis Data Modeling (ADaM) team.

While the CDISC models are under consideration within the FDA as a reference standard, sponsors should always check with their review Division before making any electronic submissions of data. 

Please read the CDISC Submission Metadata Model before reviewing these data domain models.  The primary goal of the Metadata Model is to provide regulatory reviewers with a clear understanding of the datasets provided in a submission by communicating clear descriptions of the structure, purpose, attributes, and contents of each dataset and dataset variable. 

The CDISC Submission Metadata Model allows for differences in the data sponsors must collect for an individual trial and how they collect it. No attempt has been made to define all possible variables or data structures in the domain models. Rather, the SDS team has adopted an ‘80% rule’, identifying those variables commonly used by most sponsors. The proposed model places considerable emphasis on the importance of providing detailed examples to illustrate concepts, particularly when multiple approaches are possible.  Each SDS domain model provides a description of variables in a CRT domain, including the following:

  • The CDISC-suggested variable name (which may be used as an alias by sponsors who choose other variable names)
  • A sample variable label (which should be adjusted by sponsors to describe the data contents)
  • Suggested values for data type (character, numeric) and decodes or formats
  • The origin or source of the raw data (e.g., CRF or derived)
  • An optional column for the role of the variable in the dataset
  • A column for comments (provided by the sponsor)
  • Two columns of “Data Preparation Comments” provided by the CDISC SDS team -- A set of Notes relevant to each variable and a column that indicates if a variable is a core CDISC variable (as defined below).

CDISC defines a core variable as one that would normally be present in a typical submission by any sponsor. In some cases, CDISC indicates when a choice can be made in the selection of a core variable. In addition to the general CDISC usage notes, general notes applicable to the entire domain are presented with the Assumptions for each model.

Conformance with the SDS models is indicated by:

1.      Following the complete metadata structure for data domains and variables

2.      Using CDISC-recommended names for data domains

3.      Including all core variables identified for each model when collected

4.      Using CDISC-standard variable names – especially for core variables

5.      Using recommended data types for all variables

6.      Using CDISC-recommended formats and codes for dates, times and certain other variables.

Sponsors should always supply their own labels, origins, roles (especially for key variables), and comments, plus any additional decodes or format information required for the FDA reviewer to properly interpret the data.  Sponsors may choose to use or ignore non-core variables and to add additional variables as necessary, depending on what data was actually collected. Since most regulatory submissions involve data that has been collected over many years, CDISC recognizes that full conformance with the SDS model may not be immediately achievable, but will occur slowly over time.

Version 2.0 of the CDISC models, published in November 2001, incorporates several improvements to the Version 1.0 models published in October 2000, and the Version 1.1 revision published in June 2001.  As in prior versions, the Domain Definitions document describes the data domain metadata, which includes domain names, descriptions, locations, structures, purpose, keys, and comments.  Changes since Version 1.0 include: 

  • The addition of an introductory Assumptions page for each domain, documenting many of the assumptions, conventions, and decisions made by the CDISC SDS team
  • Consistency edits to variable names, labels, decode/formats, and notes, and the inclusion of an Asterisk (*) in the CDISC Notes column for any variable that has a specific assumption listed on the Assumptions page
  • Revisions to the core-variable designator and additions or deletions of individual variables (such as the addition of DMREFDT in Demographics).  These content changes affected fewer than 25% of the variables included in prior versions of the models, and were made in response to industry feedback.
  • Standardization of decodes and formats:  Beginning with Version 1.1, SDS adopted the ISO 8601 date and time formats, E2B codes for adverse-event data, and standard codes for Sex and Yes/No variables; comments regarding codes for other variables are included in the CDISC notes column.
  • Improvements in presentation format for the model.

Version 2.0 builds further upon Version 1.1 by addressing additional consistency issues (especially in Labs, ECG, and Vitals). A small number of core variables have been added – especially in Demographics, ECG, Labs, and Vitals – and variable naming conventions have been improved.  Version 2.0 is also packaged as a single, book-marked pdf file for easier distribution and printing.

One of the most significant differences in Version 2.0 is the addition of an alternative representation of ECG and Vitals as a vertical “tall, skinny” more-normalized format in order to support FDA activities to pilot new database and data-viewing technologies.  The vertical representation allows greater flexibility in terms of data storage. However, it does restrict the ability to provide metadata about individual measurements since variables such as SYSBP in Vital Signs) would be values for the VSTEST variable rather than separate variables themselves (as they would be in the horizontal rendition). Sponsors should check with their review Divisions before deciding on whether to use the new Version 2.0 vertical format for ECG and/or Vitals or the original horizontal format.  The vertical models led to some improvements in the respective horizontal models.  Version 2.0 added a non-core variable for LOINC codes for Lab, ECG, and Vitals measurements.  Information on LOINC can be obtained at http://www.regenstrief.org/loinc/loinc.htm.

Using the Models

After reading the CDISC Submission Metadata Model, please examine the Demographics domain model before proceeding to other domains.  Demographics provides many of the common core selection variables used in other datasets (shaded in light gray in each model), and was used by CDISC as the template for all other domains. 

Upon reviewing these models, please submit your comments via our Public Discussion Forum.

  • Submission Data Domain Model V 2.0, Nov 21, 2001 (pdf)
  • CDISC Submission Metadata Model V 2.0, Nov 26, 2001 (pdf)

Go to CDISC Public Discussion Forums


CDISC Inc., 15907 Two Rivers Cove, Austin, Texas 78717
© 2007 Clinical Data Interchange Standards Consortium, Inc. All rights reserved