AI-Technologies for Healthcare

Neel K.
Analytics Vidhya
Published in
4 min readDec 10, 2019

--

Artificial Intelligence is steeply covering all public sectors with its wide application and deep understanding algorithms. The current decade has shown tremendous research breakthroughs in Machine learning for the healthcare sector. Although these developments can be categorized into two areas namely Image-based (Computer Vision) AI & Text-based (Natural Language Processing) AI.

MIMIC (Medical Information Mart for Intensive Care ): This is a database of 61,523 patients who stays in intensive care units of Beth Israel Deaconess Medical Center between 2001 and 2012. It comprises of 53,432 adults and 8100 new-born children. This largely publicly available data set (6.2 GB) is comprised of lab measurement, caregiver notes, procedures, medications, mortality reports and chest x-rays.

Upgradation (MIMIC-II vs MIMIC-III):

Since the MIMIC was one of the few first available databases, A lot of research publication is concluded based on MIMIC-II. To relate it with current data we need to understand how it was upgraded.

A website view for MIMC PhysioNet

MIMIC-III is an extension of old MIMIC-II (formerly multiparameter intelligent monitoring Intensive Care). MIMIC-II was a collection of data between 2001–08, which was later incorporated with further data from 2008–12. This transition was done in several queries some items like D_MEDITEMS, D_IOITEMS, D_CHARTITEMS were merged to D_ITEMS. Admissions and Discharges were labelled with a time component. Moreover, CENSUSEVENTS replaced by TRANSFERS, DEMOGRAPHIC_DETAIL merged into ADMISSIONS DRGEVENTS renamed DRGCODES, ICD9 renamed DIAGNOSES_ICD and so on.

How to Obtain Access?

The MIMIC database is updated regularly based on recent data. MIMIC-III v 1.4 is the latest version and can be obtained from the Physio net. These Electronic Health Records (EHR) are de-identified to ensure the confidentiality of patients. The procedure to acquire the database and use it for research is as follows.

  1. Completing CITI “Data or Specimens Only Research” course, which ensures defining data regulation laws for research purposes.

2. Registering for an account at https://physionet.org.

3. submit your application for credentialed access. Remember to provide your CITI completion report.

4. After access is granted you can download 26 Comma-Separated CSV files based on your purpose of use. You can read the content of each file from here before downloading it.

5. You can access a demo repository of 100 patients, in case you fail to obtain CITI Completion report from here

International Classification of Diseases (ICD) :

ICD is a healthcare system, maintained by the World Health Organization (WHO). This system relates to various health conditions such as signs, symptoms, abnormal circumstances and injuries to a sign digit code. This coding system is hierarchical with major code for diseases and its child codes for variants. ICD-9 is used in MIMIC to label the condition. ICD-11 has been approved and will be effective from January 2022.ICD-10 is widely used in many countries because it is available multi-lingually.

The most frequently used codes in MIMIC are 427.31 Atrial fibrillation, 584.9 Acute Kidney failure, 428.9 Congestive Heart Failure, and 401.9 Unspecified Essential Hypertension.

An Excerpt from ICD-9 Codes.

Hierarchal Identifiers:

This Public Health Information is maintained generally in static and dynamic types.

Static:

Subject_ID: This is a patient’s major information noted in fields such as Date of Birth (DOB), Date of Death (DOD), DOD_HOSP & DOD_SSN, which is listed in Patients Table.

HADM_ID: This is data based on patient entry into hospital comprising of Admission time, Death time, Discharge time and Admission type.

ICUSTAY_ID: This is data when the patient went under the procedure. This information is reported as ICU InTime, ICU OutTime, First Care Unit and Last Care Unit.

Dynamic:

These are the records updated periodically such as blood pressure, Drugs, Procedures, Lab Events, Microbiology reports. This hospital-acquired data and ICU acquired data have been used in many ML algorithms as a feature. You can find a few listed publications here.

MIMIC-CXR-JPG:

Computer Vision has a long history of medical disease classification based on cellular images and molecular structures. Especially, X-rays have a vast role in bone, cardiac, reproductive and pulmonary issues. The purpose of this database is to automate Chest-radiography with labeled cardiopulmonary diseases. This is another important dataset of MIMIC, particularity deidentified under HIPPA rules. The dataset contains 377,110 JPG format images and structured labels derived from the 227,827 free-text radiology reports associated with these images.

Relevant Datasets:

There is a long area of research in biochemistry and genetics based on medical grounds. There are around 221 datasets available at health-data.

Special Thanks to: Giuseppe Rizzo for Advisory.

References:

https://mimic.physionet.org/about/publications/

https://github.com/MIT-LCP/mimic-code

https://it.wikipedia.org/wiki/Classificazione_ICD

--

--