OAK is an open-source community project to evolve software that uses agnostic transformation logic to enable mapping of CDASH to SDTM whose functionality will also generate raw synthetic data. 


CDISC Platinum Member Roche approached COSA with a proposal to turn over OAK, one of their key software platforms, to the open-source community.  The proposal included three elements:

  1. {oak} as a open-source
  2. PoC to automate SDTM based on CDASH standards
  3. {raw.synthetic.data} a open source solution

Project Proposal

OAK is part of the "next-generation" solution for Roche’s data and analytics platforms to move towards increased automation. It will be used by the CDISC Data Science team to produce SDTM. OAK contributes to the prospective FAIRification of clinical trial data by creating SDTM datasets integrated with the global data Standards to ensure interoperability of the data.

Please see complete proposal here.

Further Description

Proof of Concept to automate SDTM based on CDASH standards.

OAK is an R-based solution that Roche successfully developed to automate SDTM Domains. Closely linked to data standards, OAK is a metadata-driven solution, which can automate ~80% of the SDTM domains based on ~22 Reusable Algorithms. SDTM Mappings are defined as algorithms that transform the collected (eCRF, nonCRF) source data into the target tabulation data model. Mapping algorithms are the backbone of SDTM automation.

Algorithms can be re-used across domains and can be pre-specified for data standards. Users can reuse the algorithms for the extension to data standards or for new data types. Algorithms are programming language agnostic, that is, the concept does not rely on a specific programming language for implementation.

In collaboration with CDISC and Pfizer, Roche invite you to participate in open-source proof of concept to enable pharmaceutical companies to automate SDTM when following CDASH standards.

Proof of Concept Vision and Scope


  • Develop an open-source, metadata-driven SDTM solution that enables users (data scientists, statistical programmers, data managers) to automate SDTM datasets in R.
  • Enable SDTM automation when CDASH standards are adopted from CDISC Library.
  • Follow ODM standard and remain an EDC-agnostic solution.
  • Completely leverage OAK Algorithms, CDISC Library, and CDASH eCRFs.
  • Provide a framework for automation when CDASH standards are extended to meet study or company needs.


  • Pick domains like DM, CM, MH, VS, EC, EX, and DS for the PoC.
  • Add algorithms and associated metadata to CDISC Library for CDASH standards. (similar to what the Roche team did in Roche’s MDR)
  • Modify Roche version of the {oak} package to work with CDISC Library and ODM clinical data format to enable metadata-driven automation. This might be an extension of the {oak} package, something like {oak.cdash}, or could be a new package by itself.
  • Use {oak.cdash} package and automate SDTM. If successful, expand to all CDASH standards and develop the {oak.cdash} package to support all algorithms.



How to Participate

We invite you and your organization to participate in this exciting new project.

Please complete the volunteer form to engage in this project: Become a Volunteer

Note: COSA is currently using the CDISC Volunteer onboarding form.  If there is anything that is expressly standards related (like following COP-001, that obviously does not apply to this OSS project)

For more about COSA, please read the inclusion criteria under About COSA.


Additional Resources