Submissions
Data Domain Models
Version 2.0
The CDISC Submissions Data Domain Models have
been prepared by the CDISC Submissions Data
Standards (SDS) team to guide the organization,
content, and form of submission datasets for
the 12 safety-related domains listed in the
FDA guidance documents. In the future, additional
models will be provided for common analysis
formats and to describe other types of data
such as pharmacokinetics and pharmacodynamics,
as well as efficacy data for certain therapeutic
areas. The focus of the SDS Group has been on
case report tabulation (CRT) datasets; analysis
metadata models are being developed separately
by the CDISC Analysis Data Modeling (ADaM) team.
While the CDISC models are under consideration
within the FDA as a reference standard, sponsors
should always check with their review Division
before making any electronic submissions of
data.
Please read the CDISC
Submission Metadata Model before reviewing
these data domain models. The primary
goal of the Metadata Model is to provide regulatory
reviewers with a clear understanding of the
datasets provided in a submission by communicating
clear descriptions of the structure, purpose,
attributes, and contents of each dataset and
dataset variable.
The CDISC Submission Metadata Model allows
for differences in the data sponsors must collect
for an individual trial and how they collect
it. No attempt has been made to define all possible
variables or data structures in the domain models.
Rather, the SDS team has adopted an ‘80%
rule’, identifying those variables commonly
used by most sponsors. The proposed model places
considerable emphasis on the importance of providing
detailed examples to illustrate concepts, particularly
when multiple approaches are possible.
Each SDS domain model provides a description
of variables in a CRT domain, including the
following:
- The CDISC-suggested variable
name (which may be used as an alias by sponsors
who choose other variable names)
- A sample variable label
(which should be adjusted by sponsors to describe
the data contents)
- Suggested values for data
type (character, numeric) and decodes or formats
- The origin or source of
the raw data (e.g., CRF or derived)
- An optional column for
the role of the variable in the dataset
- A column for comments (provided
by the sponsor)
- Two columns of “Data
Preparation Comments” provided by the
CDISC SDS team -- A set of Notes relevant
to each variable and a column that indicates
if a variable is a core CDISC variable (as
defined below).
CDISC defines a core variable as one that would
normally be present in a typical submission
by any sponsor. In some cases, CDISC indicates
when a choice can be made in the selection of
a core variable. In addition to the general
CDISC usage notes, general notes applicable
to the entire domain are presented with the
Assumptions for each model.
Conformance with the SDS models is indicated
by:
1. Following
the complete metadata structure for data domains
and variables
2. Using CDISC-recommended
names for data domains
3. Including
all core variables identified for each model
when collected
4. Using CDISC-standard
variable names – especially for core variables
5. Using recommended
data types for all variables
6. Using CDISC-recommended
formats and codes for dates, times and certain
other variables.
Sponsors should always supply their own labels,
origins, roles (especially for key variables),
and comments, plus any additional decodes or
format information required for the FDA reviewer
to properly interpret the data. Sponsors
may choose to use or ignore non-core variables
and to add additional variables as necessary,
depending on what data was actually collected.
Since most regulatory submissions involve data
that has been collected over many years, CDISC
recognizes that full conformance with the SDS
model may not be immediately achievable, but
will occur slowly over time.
Version 2.0 of the CDISC
models, published in November 2001, incorporates
several improvements to the Version 1.0 models
published in October 2000, and the Version 1.1
revision published in June 2001. As in
prior versions, the Domain Definitions document
describes the data domain metadata, which includes
domain names, descriptions, locations, structures,
purpose, keys, and comments. Changes since
Version 1.0 include:
- The addition of an introductory
Assumptions page for each domain, documenting
many of the assumptions, conventions, and
decisions made by the CDISC SDS team
- Consistency edits to variable
names, labels, decode/formats, and notes,
and the inclusion of an Asterisk (*) in the
CDISC Notes column for any variable that has
a specific assumption listed on the Assumptions
page
- Revisions to the core-variable
designator and additions or deletions of individual
variables (such as the addition of DMREFDT
in Demographics). These content changes
affected fewer than 25% of the variables included
in prior versions of the models, and were
made in response to industry feedback.
- Standardization of decodes
and formats: Beginning with Version
1.1, SDS adopted the ISO 8601 date and time
formats, E2B codes for adverse-event data,
and standard codes for Sex and Yes/No variables;
comments regarding codes for other variables
are included in the CDISC notes column.
- Improvements in presentation
format for the model.
Version 2.0 builds further upon Version 1.1
by addressing additional consistency issues
(especially in Labs, ECG, and Vitals). A small
number of core variables have been added –
especially in Demographics, ECG, Labs, and Vitals
– and variable naming conventions have
been improved. Version 2.0 is also packaged
as a single, book-marked pdf file for easier
distribution and printing.
One of the most significant differences in
Version 2.0 is the addition of an alternative
representation of ECG and Vitals as a vertical
“tall, skinny” more-normalized format
in order to support FDA activities to pilot
new database and data-viewing technologies.
The vertical representation allows greater flexibility
in terms of data storage. However, it does restrict
the ability to provide metadata about individual
measurements since variables such as SYSBP in
Vital Signs) would be values for the VSTEST
variable rather than separate variables themselves
(as they would be in the horizontal rendition).
Sponsors should check with their review Divisions
before deciding on whether to use the new Version
2.0 vertical format for ECG and/or Vitals or
the original horizontal format. The vertical
models led to some improvements in the respective
horizontal models. Version 2.0 added a
non-core variable for LOINC codes for Lab, ECG,
and Vitals measurements. Information on
LOINC can be obtained at http://www.regenstrief.org/loinc/loinc.htm.
Using the Models
After reading the CDISC
Submission Metadata Model, please examine
the Demographics domain model before proceeding
to other domains. Demographics provides
many of the common core selection variables
used in other datasets (shaded in light gray
in each model), and was used by CDISC as the
template for all other domains.
Upon reviewing these models, please submit
your comments via our Public
Discussion Forum.
- Submission Data Domain
Model V 2.0, Nov 21, 2001 (pdf)
- CDISC Submission Metadata
Model V 2.0, Nov 26, 2001 (pdf)
Go
to CDISC Public Discussion Forums
|