Computational Healthcare Library

  • Platform used for developing Computational Healthcare

  • Load data from large HCUP and State healthcare datasets

  • Compute aggregate statistics in a privacy preserving manner

  • Build machine learning models for predicting outcomes

  • Benchmark & Interpret model performance

import chlib
NRD ='../config.json','HCUPNRD')
for p_key,patient in NRD.iter_patients(): # patients & visits
print p_key,patient
for v in patient.visits:
    print v # print visits

# Retrieve patients / visits by a specific diagnosis/procedure
print len(list(TX.iter_patients_by_code('D486')))

# Compute aggregate statistics
policy = chlib.entity.aggregate.Policy(min_count=20,min_hospital=5)
aggregate = chlib.entity.aggregate.Aggregate()

Architecture & Features

Protobuf & LevelDB

Protocol Buffers are used to represent patients, visits as well as aggregate statistics. LevelDB allows use of removable encrypted drives for at-rest security.


A local DJango server allows quick inspection of patient/visit level data and visualization of aggregate statistics using pre-defined charts and tables.

Supported Datasets

We currently support HCUP & State databases. However the protocols can be easily extended in future to support other formats such as OHDSI CDM.

Docker container

We provide a docker container image with a docker-compose file. The compose file contains Postgres & RabbitMQ used for the local Django server.

Installation & Supported databases

Docker installation

To run Computational Healthcare clone the repo and start containers using "docker-compose up".

#clone the repo
git clone
cd ComputationalHealthcare/docker
# launch containers
docker-compose up  -d

Loading & Exploring data

# make sure the containers are running by going to localhost:8111
open localhost:8111
# To load NRD data edit with the path to NRD 2013 data files
# To load Texas data edit with the path to Texas data files
# You can load either one or both datasets
# The image contains jupyter notebook server
# Launch ipython/jupyter notebooks
open localhost:8188
# To stop and remove containers
docker-compose down
# The data volumes are named, retained and automatically attached when started again
docker volume ls
# The volume chdata contains processed data, other are used by Postgres & RabbitMQ

Texas Public Use Inpatient

Nationwide Readmission data

State Inpatient, ED & SASD

Please note that this repository does not contains any data, nor do we provide any data. You should acquire the datasets from AHRQ and/or other state agencies.

Quick examples

Retrieving patients & visits

  • Iterate over all patients or quickly retrieve patients who underwent a specific procedure

  • LevelDB & Protocol buffers allow accessing from any programming language.

# Get dataset object
NRD ='../config.json','HCUPNRD')
# Path to the levelDB directory with serialized patients objects as values
print NRD.db
# Iterate over all patients patients & visits
for p_key,patient in NRD.iter_patients(): #
# Retrieve all patients by specific diagnosis or procedure code
patients  = list(NRD.iter_patients_by_code('D486'))

Text description of codes

  • Codes are prepended with unique character per code type. E.g.

  • ICD-9 procedure codes are prepended with 'P'.

  • ICD-9 diagnosis codes are prepended with 'D'.

  • You can also print string representation of Enums

coder =
print 'D486',coder['D486']
# output: D486 Pneumonia, organism unspecified
print 'P9971',coder['P9971'] #
# output: P9971 Therapeutic plasmapheresis
print coder[chlib.entity.enums.D_AMA]
# output: Against medical advice

Compute aggregate statistics

  • Built-in primitives for computing aggregate statistics on "bag of visits/patients"

  • Aggregation policy for specifying parameters such as minimum number of visits etc.

  • Statistics and policies are also represented using protocol buffers.

  • Computed aggregate statistics can be quickly examined using a local django server.

Aggregate statistics on set of visits

# Aggregate statistics for all inpatient visits in Texas
# dataset where patient underwent Therapeutic plasmapheresis
code = 'P9971'
TX ='../config.json','TX')
aggregate = chlib.entity.aggregate.Aggregate()
policy = chlib.entity.aggregate.Policy(min_count=20)
aggregate.init_compute('Test key',"Test dataset",policy)
for _,p in TX.iter_patients_by_code(code):
    for v in p.visits:
visualizer_url = agg.visualize(host='')
print visualizer_url # you can also manually copy paste

Visualizing aggregate statistics on set of visits

Aggregate statistics on set of patients

# Aggregate statistics for all patients in
# HCUP Nationwide Readmission Database where patient had complications due to transplanted kidney
from chlib.entity.aggregate import PatientAggregate,Policy
HCUPNRD ='config.json','HCUPNRD')
pagg = PatientAggregate()
for _,p in HCUPNRD.iter_patients_by_code('D99681'):
url = pagg.visualize(host="",port=8000,prefix='local/')
print url

Visualizing aggregate statistics on set of patients

Contact & Issues

To minimize chances of visit/patient level information leaking via error messages or traceback, we have not enabled issues on the Github repo. If you find any bugs, make sure that your bug report/question does not contains any visit or patient level information. To file bugs, comments or if you plan on citing Computational Healthcare library please contact Akshay Bhat on email below.

© 2017 Akshay U Bhat, Peter M. Fleischut & Ramin Zabih, Cornell University.
All Rights Reserved, At this time we are pursuing the patent process to protect this software.