UCUM and CDISC Codelists

 

Unified Code for Units of Measure (UCUM) was developed by Regenstrief Institute and the UCUM Organization as an unambiguous system of units and their combinations. UCUM is intended to include all units of measure currently used internationally in science, engineering and business and has been adopted internationally by  IEEEDICOMLOINC, and HL7, and is also in the ISO 11240:2012 standard.

Nomenclature Versus Codelists

To understand why CDISC uses code lists rather than a nomenclature like UCUM, you need to understand the difference between a nomenclature and a codelist.

  • A nomenclature is a system for expressing concepts in a content area (for UCUM, this content area is units of measure). The nomenclature defines the building blocks (UCUM "atoms") for describing concepts and the rules for putting these building blocks together.  For example, "g" and "L" are symbols that mean "gram" and "liter" respectively, and these can be used in the expression "g/L".  One advantage of a nomenclature is that a computer can read a string of characters and determine whether that character string is a valid expression in the nomenclature as well as determine the meaning of valid expressions.

  • A codelist is, as the name implies, a list of codes (terms) with definitions of what they mean.  For example, "g/L" is a term in the CDISC "UNIT" codelist.  In the "UNIT" codelist, as in all CDISC codelists, each term has its own definition, as well as an NCI thesaurus code (often referred to as a C-code), a CDISC submission value, and synonyms.

A string of characters that is a valid UCUM expression may not be present in the CDISC "UNIT" codelist for various reasons:

  • CDISC has not received a request for the concept represented by the expression.
  • Many unit expressions are mathematically synonymous, but the CDISC "UNIT" codelist includes only one of those synonymous expressions as a submission value.  For example, the CDISC "UNIT" codelist includes the submission value "g/L" but does not include synonyms such as "mg/kg" or "ug/uL", both of which are valid UCUM expressions.  This restriction to a single submission value is intended to make life easier for reviewers, so that they always see the same expression, and don't have to mentally translate between mathematically synonymous expressions.
  • To make UCUM expressions unambiguous and to deal with a variety of units outside the SI system, UCUM uses symbols which are unfamiliar to most lay users.  The CDISC codelist is intended for use by a broad audience and therefore uses expressions without these special and potentially confusing symbols.  For example:
    • Most non-SI units, such as the "imperial" units used in the US, are represented by character strings which includes suffixes and are enclosed in square brackets.  For example, the representation of "inch" is "[in_i]", where "_i" indicates the imperial system of measurement.
    • It is common to see, as part of the representation of a unit, text enclosed in curly brackets that UCUM considers annotations. For example, what is represented with a submission value of "ELISA unit/dose" in the CDISC "UNIT" codelist is represented as "[ELU]/{dose}" in UCUM.

Mapping between UCUM and CDISC codelist representations of units may be facilitated by the Unit-UCUM Code table.  This code table includes all CDISC unit of measure codelists along with UCUM representations for each unit.

The table below compares these unit code systems at a high-level. 

UCUM CDISC UNIT Codelists
Nomenclature for constructing machine-readable unit representations from a set of basic building blocks. A codelist of unit representations
Contains mathematically equivalent representations of a unit (e.g., g/l, mg/ml) Includes only one representation of each unit, (e.g. g/l, but not mg/ml)
Includes brackets for machine interpretation (e.g. "[in_1]" for "in", "{Capsule}" for "CAPSULE") Does not assume that systems are programmed to recognize unit synonyms.

Getting Started

 

Finding the CDISC Submission Value for a UCUM expression

This can be done manually, using the Unit-UCUM Codetable, by following these steps:

  1. Open the Unit-UCUM Codetable in Excel and enable editing.
  2. Choose the unit codelist you are interested in.  (There are seven codelists, the general Unit codelist, the Age Unit codelist, and five codelists for PK.)  Filter on the unit codelist you want.
  3. Open the "find" box (Ctrl-F)
  4. Type or copy the UCUM expression into the "Find what:" field
  5. Click "Find All".  There may be several results that include the text string you are searching for. Look for the result with an exact match.
  6. The UCUM match will be in one of the multiple "UCUM Expression" columns.  Find the value in the CDISC Submission Value column – this is the value you want.

For example, to find the CDISC Submission value for the UCUM expression "mm[Hg]" in the Unit codelist.

  • In the filter for the Codelist Name, select "Unit"
  • In the "Find and Replace" box, enter "mm[Hg]" and click "Find all". Several results which include "mm[Hg]" appear.

 

  • Select the occurrence which includes only "mm[Hg]".  This this will take you to the row with this content

 

  • The CDISC Submission Value is "mmHg"

This process could, of course, be automated.