Clinical Metadata Standard

GSCID/BRC Clinical Metadata Standard
v1.5
Finalized by the GSCID/BRC Clinical Metadata Working Group

How to interpret the document:

BOLD: Field name 
ITALICS: Attributes of the field

Clinical metadata may include elements that pose a risk of patient identification. These have been marked for each entry in an attribute labeled 'Risk'. Individuals handling these data should familiarize themselves with the relevant regulations regarding the sharing and use of protected health information. As specified in the NIAID contracts to the GSCs and BRCs, it is the institutions' responsibility to provide the appropriate privacy training for handling this information. Fields marked as HIPAA data elements are based on the information at the HIPAA web site which describes what protected health information includes. For dates: "All elements of dates (except year) for dates directly related to an individual, including birth date, admission date, discharge date, date of death; and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older." For geographic location: "All geographic subdivisions smaller than a State, including street address, city, county, precinct, zip code, and their equivalent geocodes" are considered potentially identifiable.

Several required fields are dates (e.g., sample collection date) which should be supplied in the international standard (ISO) format (yyyy-mm-dd). However, these dates when combined may potentially lead to patient identification. To avoid this possible breach of patient health information, the dates should be collected but censored from public access. In place of dates, time relative to the sample collection date should be provided. In the case of multiple samples at different dates from the same individual, then the initial sample collection date should be used as the reference point. For example, a patient with a birth date of 2010-01-01 who was hospitalized 2010-10-01 and had blood drawn for follow-up study 2011-01-01 would be shown as a 1 year old patient who was hospitalized 2 months prior to blood collection for the study.

However, it is also recognized that dates and geographical location can be critical for useful interpretation of the data in some circumstances. For example, the season or proximity to spraying of insecticides can have a significant impact on collected data. To minimize risk yet still provide useful data, obfuscation of dates and generalization of locations is recommended as an alternative. An example of obfuscation of dates is the adding/subtracting of some random number of days to the actual date. The goal with obfuscation and generalization is to achieve a result where the number of people who match all the information provided, with the obfuscated/generalized values, is sufficient to minimize identification. In general, a minimum number of 5-10 people should be targeted, with higher numbers needed for higher risk studies (e.g., involving HIV). When calculating these numbers, close correspondence to other collected fields should be examined; a correlation analysis is recommended. Note that to achieve this level of minimum identifiability, other fields can also be obfuscated or removed if they are not critical (e.g., age, gender, race/ethnicity). For additional information on this topic, see Emam, Rodgers, & Malin. "Anonymising and sharing individual patient data." BMJ 2015 Mar 20;350:h1139. doi: 10.1136/bmj.h1139.

Some fields such as those pertaining to Physical Exams, Diagnosis, Lab Tests, and Treatments may have multiple related entries. Preferred approaches for capturing multiple entries are described in the 'Multiple Sets' attribute.

We understand that LOINC codes provide information that cover multiple required fields (e.g., Lab Test Method; Body Site). However, not all data contributors will start with LOINC codes and we want to make sure that we capture needed information in a manner that could be easily mapped to LOINC codes to be used as standard values.
 

1. Category:
Field ID: CS1
Field Name: Specimen Source ID
Multiple Sets:
Risk:
Definition: This is a reminder that the clinical data collected for the fields below are in relation to CS1 from the Core Sample Data Dictionary.
NA allowed:
Data Type:
Accepted Values:
Examples:
Xref_id*:
Notes:
OBI/OBO terms:
OBO Foundry ID:

2. Category: Project
Field ID: CC1
Field Name: Subject/Sample Selection Criteria
Multiple Sets: This would be captured at either the project or study level and should be characteristics that are in common for all enrolled participants and selected samples evaluated
Risk: Some risk if criteria are extremely restrictive.
Definition: The criteria used to identify and select human subject participants and their samples for the clinical research study, including both the inclusion and exclusion enrollment criteria, which describes the common characteristics of the base human population participating in the research study, and the common characteristics of the samples used for the research study.
NA allowed: No
Data Type: text
Accepted Values: free text
Examples: (For single study collections) Whole blood from adults between the ages of 18 and 80 without evidence of previous exposure to the influenza A H5N1 virus; (For multiple study meta-collections) Nasal swab and sputum samples from children between the ages of 6 weeks to 2 years with a diagnosis of pneumonia associated with the detection of respiratory syncytial virus from multiple independent studies
Xref_id*:
Notes:
OBI/OBO terms: eligibility criterion
OBO Foundry ID: http://www.ontobee.org/ontology/OBI?iri=http://purl.obolibrary.org/obo/OBI_0500026

3. Category: General
Field ID: CC2
Field Name: Race/Ethnicity
Multiple Sets:
Risk:
Definition: Biological race, group or cultural background with which the subject most identifies with.
NA allowed: Yes
Data Type: text
Accepted Values: Anthro-Ethnicity-Groups list with codes.
Examples: Mixed, of predominantly African Ancestry 1301900
Xref_id*:
Notes: Derived from - SJ Mack, A. Sanchez-Mazas, D Meyer, RM Single, Y Tsai, HA Erlich. Anthropology/Human Genetic Diversity Joint Report. Chapter2: Methods Used in the Generation and Preparation of Data for Analysis in the 13th International Histocompatibility Workshop, in Immunobiology of the Human MHC: Proceedings of the 13th International Histocompatibility Workshop and Conference, J. Hansen, Editor. 2007, IHWG Press: Seattle. p. 564-579.
OBI/OBO terms: ethnic group
OBO Foundry ID: http://www.ebi.ac.uk/efo/EFO_0001799

4. Category: General
Field ID: CC3
Field Name: Study type
Multiple Sets:
Risk:
Definition: Term (s) that describes the type or design of the conducted study.
NA allowed: No
Data Type: text
Accepted Values: cross-sectional survey, cohort (prospective or retrospective), health facility, case-control, other
Examples: prospective cohort
Xref_id*:
Notes:
OBI/OBO terms: study design
OBO Foundry ID: http://www.ontobee.org/ontology/OBI?iri=http://purl.obolibrary.org/obo/OBI_0500000

5. Category: Physical Exam
Field ID: CC4
Field Name: Comorbidity
Multiple Sets: Multiple Comorbidities may be recorded during sample collection: comma delimited
Risk:
Definition: Additional disorders or diseases co-occurring with the primary disease; or the effect of such additional disorders or diseases. Primary disease: the disease caused by the sampled organism
NA allowed: Yes
Data Type: text
Accepted Values: Disease Terms ICD9/10; Disease Ontology
Examples: acquired immunodeficiency
Xref_id*:
Notes:
OBI/OBO terms: comorbidity
OBO Foundry ID: http://www.ontobee.org/ontology/OMIABIS?iri=http://purl.obolibrary.org/obo/OMIABIS_0001008

6. Category: Physical Exam
Field ID: CC5
Field Name: Concomitant Medication
Multiple Sets: Multiple Concomitant Medication may be recorded during sample collection: comma delimited
Risk:
Definition: Medications being administered to or taken by the patient that are unrelated to the suspected pathogenic organism.
NA allowed: Yes
Data Type: text
Accepted Values: PubChem
Examples: Ciprofloxacin
Xref_id*: http://www.ncbi.nlm.nih.gov/pccompound
Notes:
OBI/OBO terms: drug product (medication material)
OBO Foundry ID: http://www.ontobee.org/ontology/DRON?iri=http://purl.obolibrary.org/obo/DRON_00000005

7. Category: Physical Exam
Field ID: CC6
Field Name: Measured Attribute
Multiple Sets: Multiple Measured Attributes may be collected in the same Physical Exam process. Each Measured Attribute may have a Measured Method and Measured Value
Risk:
Definition: Quantitative measurements of the subject during the physical exam.
NA allowed: Yes
Data Type: text
Accepted Values: Vital signs terms in LOINC, qualities in PATO
Examples: temperature
Xref_id*:  https://raw.githubusercontent.com/pato-ontology/pato/master/pato.owl, https://loinc.org/
Notes: Additional values can be defined via project-specific data dictionary
OBI/OBO terms: quality
OBO Foundry ID: http://www.ontobee.org/ontology/PATO?iri=http://purl.obolibrary.org/obo/PATO_0000001

8. Category: Physical Exam
Field ID: CC7
Field Name: Measurement Method
Multiple Sets: One Measured Method is associated with one Measured Attribute
Risk:
Definition: Method used to capture the quantitative measurements during the physical exam.
NA allowed: Yes
Data Type: text
Accepted Values: Taking patient vital sign terms in LOINC, Vital Sign Ontology, OGMS
Examples: oral temperature method
Xref_id*: http://www.ontobee.org/ontology/OGMS; https://loinc.org/; vital sign ontology: https://code.google.com/archive/p/vital-sign-ontology/
Notes: some of these can be put in OBI
OBI/OBO terms: physical