Literature DB >> 34850911

An architecture for research computing in health to support clinical and translational investigators with electronic patient data.

Thomas R Campion^1,2,3,4, Evan T Sholle^1,3,4, Jyotishman Pathak^1,4, Stephen B Johnson⁵, John P Leonard⁶, Curtis L Cole^1,3,4,6.

Abstract

OBJECTIVE: Obtaining electronic patient data, especially from electronic health record (EHR) systems, for clinical and translational research is difficult. Multiple research informatics systems exist but navigating the numerous applications can be challenging for scientists. This article describes Architecture for Research Computing in Health (ARCH), our institution's approach for matching investigators with tools and services for obtaining electronic patient data.
MATERIALS AND METHODS: Supporting the spectrum of studies from populations to individuals, ARCH delivers a breadth of scientific functions-including but not limited to cohort discovery, electronic data capture, and multi-institutional data sharing-that manifest in specific systems-such as i2b2, REDCap, and PCORnet. Through a consultative process, ARCH staff align investigators with tools with respect to study design, data sources, and cost. Although most ARCH services are available free of charge, advanced engagements require fee for service.
RESULTS: Since 2016 at Weill Cornell Medicine, ARCH has supported over 1200 unique investigators through more than 4177 consultations. Notably, ARCH infrastructure enabled critical coronavirus disease 2019 response activities for research and patient care. DISCUSSION: ARCH has provided a technical, regulatory, financial, and educational framework to support the biomedical research enterprise with electronic patient data. Collaboration among informaticians, biostatisticians, and clinicians has been critical to rapid generation and analysis of EHR data.
CONCLUSION: A suite of tools and services, ARCH helps match investigators with informatics systems to reduce time to science. ARCH has facilitated research at Weill Cornell Medicine and may provide a model for informatics and research leaders to support scientists elsewhere.

Entities: Chemical

Keywords: CTSA; EHR; data collection; data warehouse; secondary use

Mesh：

Year: 2022 PMID： 34850911 PMCID： PMC8690260 DOI： 10.1093/jamia/ocab266

Source DB: PubMed Journal: J Am Med Inform Assoc ISSN： 1067-5027 Impact factor: 4.497

INTRODUCTION

Obtaining electronic patient data, especially from electronic health record (EHR) systems, for clinical and translational research is difficult., Challenges include repurposing transactional (eg, care, billing) data for analytical purposes, finding and using the right electronic tools, understanding strengths and limitations of underlying data, obtaining regulatory approval, and maintaining compliance., These multiple factors comprise a complex socio-technical problem, and optimal approaches are unknown. At Weill Cornell Medicine (WCM), the Research Informatics division of the Information Technologies & Services Department has operational responsibility for supporting the research enterprise with electronic patient data and tests hypotheses about how to best deliver service. Specifically, Research Informatics helps investigators obtain EHR data, collect novel measures, and integrate data from multiple sources. Through our experience supporting scientific workflows (eg, cohort discovery) with specific informatics tools (eg, i2b2), we have observed that science occurs not within informatics systems but rather in statistical software packages (eg, SAS, Stata, R) . With the goal of delivering to investigators data sets that are immediately amenable to statistical analysis, Research Informatics has established Architecture for Research Computing in Health (ARCH), a suite of tools and services for obtaining electronic patient data. Navigating numerous informatics software systems commonly available in academic medical centers—i2b2, REDCap, EHR reporting, PCORnet, and OpenSpecimen among others—can be challenging for investigators, and ARCH staff align scientists with the right tools with respect to study design, source systems, and cost so that researchers can accelerate data collection and reduce time to science. Although scholars have criticized academic medical centers as “all breakthrough and no follow-through” for failing to change patient care based on clinical and translational research findings, the EHR provides a platform for investigators to translate novel models from the laboratory into clinical care through interventions—such as alerts and order sets—and subsequently collect data from the EHR to measures effects. Biomedical informatics is a critical component of this virtuous data-driven feedback loop known as the learning health system, and our institution has successfully deployed ARCH in support. To the best of our knowledge, the literature does not describe a comprehensive suite of tools and services to support investigators with electronic patient data. In this article, we describe ARCH to inform efforts at other institutions.

MATERIALS AND METHODS

Setting

WCM is a multispecialty group practice based on the Upper East Side of Manhattan in New York City. Consisting of more than 1600 physicians across 50 practice locations throughout the metropolitan area, WCM sees 3 million annual patient visits. WCM physicians are faculty members of Weill Medical College of Cornell University and hold admitting privileges to NewYork-Presbyterian (NYP), a long-standing clinical affiliate. A quaternary care institution, NYP has multiple hospital campuses where WCM attending physicians admit patients and educate medical trainees. Across outpatient, inpatient, and emergency settings, WCM and NYP personnel document care using the Epic EHR system. In addition to WCM, doctors from Columbia University Vagelos College of Physicians and Surgeons have admitting privileges to other NYP facilities and share the same Epic EHR system across all of WCM and NYP. Of note, prior to 2020, NYP used the Allscripts Sunrise Clinical Manager EHR system in inpatient and emergency settings with multiple interfaces exchanging data with Epic, which WCM physicians have used in outpatient areas since 2000. Through the Tripartite Request Assessment Committee comprised of WCM, NYP, and Columbia representatives, investigators can obtain data from the shared Epic enterprise EHR, the legacy Allscripts system, and other clinical and billing systems across the 3 institutions. In addition to patient care, WCM serves education and research missions. Multiple WCM core facilities, institutes, and support units enable all phases of biomedical research. A National Institutes of Health (NIH)-funded Clinical and Translational Science Award (CTSA) hub, the WCM Clinical and Translational Science Center (CTSC) provides biomedical research and education infrastructure, including support for biostatistics and informatics among other activities. Additionally, the Joint Clinical Trials Office (JCTO) of WCM and NYP supports research conducted between the 2 partner institutions. Financial support for Research Informatics includes subsidy from the CTSC and JCTO along with grants (eg, PCORnet, All of Us Research Program). Except where noted, Research Informatics services are made available free of charge to WCM investigators. As detailed in Figure 1, the WCM Information Technologies and Services Department (ITS) provides foundational IT services (eg, infrastructure, project management) in support of the college’s tripartite mission along with 3 specialized divisions that provide services spanning the spectrum of research activities from conduct to administration. Notably, Scientific Computing provides high-performance computing for “omics” analyses and other “big data” challenges typically pursued by basic scientists and translational researchers, and Research Administrative Computing supports compliance and planning activities such as grants and contracts, Institutional Review Board, and clinical trials enrollment and compliance. Research Informatics brings together data and processes supported by these divisions as well as the patient care enterprise to enable the conduct of clinical and translational research.

Figure 1.

Weill Cornell Medicine Information Technologies and Services support for research computing.

Weill Cornell Medicine Information Technologies and Services support for research computing. Undergirding Research Informatics efforts to support investigators is an enterprise data warehouse for research called Secondary Use of Patients’ Electronic Records (SUPER). SUPER automates the acquisition and refresh of data from EHR systems maintained by clinical information technology groups including but not limited to Epic used across WCM, NYP, and Columbia; Allscripts previously used for NYP inpatient and emergency care overseen by WCM physicians; Athenahealth previously used at regional affiliate NYP/Queens; Standard Molecular genomic information system used for clinical genomic testing; and multiple specialty- and ancillary-focused systems for clinical and research purposes, including REDCap. After aggregating data from disparate sources, SUPER transforms data to multiple target data models, including common data models (CDMs) and custom research data marts, and executes a series of quality assurance scripts, including both locally developed testing queries and standardized data quality assessment tools. Prior to data transformation at the level of the EHR system, a customized terminology management interface ensures that incoming data are mapped to reference terminology. Along with unstructured data such as physician notes, structured data available in SUPER include but are not limited to diagnoses (ICD-9/10), procedures (CPT), laboratory results (LOINC), medications (RxNorm), and tumor registry codes (ICD-O-3) plus allergies, demographics, encounters, free-text notes, family history, social history, vital signs, and other domains. SUPER contains data for over 3 million patients who received care from WCM providers. Research Informatics staff consists of data engineers and business analysts. Data engineers create and maintain ETL pipelines, write SQL code for custom EHR data extraction, and develop custom applications to support the research enterprise. Business analysts engage investigators to understand scientific objectives, collect requirements, match scientists with appropriate tools, ensure regulatory compliance, and document policies and procedures. Additionally, Research Informatics has service agreements with other ITS divisions—including but not limited to server infrastructure, information security, and project management—to obtain expertise and support from specialized personnel. All WCM ITS staff, including the Research Informatics team, routinely use ServiceNow (Santa Clara, California), an information technology service management (ITSM) platform widely adopted within the field, to track customer engagement, provide service and support, and automate common IT workflows. Requesters seeking to use any tools or services from Research Informatics first begin by submitting a request in ServiceNow, allowing staff to document regulatory approval verification for specific research data requests but also gauge overall patterns in the utilization of services provided. Specifically, researchers submit what in the parlance of ITSM is termed a “request,” an instance of a form describing Institutional Review Board (IRB) protocol number, data of interest, sponsor, and other details. ARCH team members then review the request in ServiceNow and use existing system features, such as the option to leave “work notes,” to document the lifecycle of the request from intake to approval to execution.

System description

As illustrated in Figure 2, ARCH supports the spectrum of scientific activities from populations to individuals by enabling scientific workflows that manifest in specific systems. Drawing from the ARCH suite of tools and services, Research Informatics analysts work with investigators to understand how to support scientific projects with informatics tools with respect to study design, source systems, and cost.

Figure 2.

The Architecture for Research Computing in Health suite of tools and services supports multiple scientific workflows.

The Architecture for Research Computing in Health suite of tools and services supports multiple scientific workflows. EHR reporting enables researchers to request customized, detailed reports of EHR data from outpatient, inpatient, and emergency settings through an iterative process with a database analyst. Data are available from Epic and the legacy Allscripts EHR system as well as other applications. Multiple clinical IT units from WCM and NYP provide EHR reporting services. To facilitate patient cohort discovery preparatory to research, i2b2 provides investigators with a self-service tool to query EHR data for patients seen by WCM physicians. After determining a cohort of interest using i2b2 deidentified data, investigators with IRB approval can request identified medical record numbers. Notably, ARCH has demonstrated that investigators tend to use basic (eg, ICD-10 codes) rather than complex queries (eg, genomics), which suggests informatics teams may wish to focus on delivering basic rather than complex features in i2b2. To support big data analytics, the Observational Health Data Science and Informatics consortium’s Observational Medical Outcomes Partnership (OMOP) CDM, enables access to almost all data from WCM and NYP EHR systems mapped to reference terminologies, such as ICD, CPT, LOINC, and RxNorm. OMOP enables data scientists to investigate local research questions and scale to multi-center studies. Additionally, OMOP provides standardized representations of patient data rather than proprietary vendor-defined representations. ARCH also enables natural language processing (NLP) using the UIMA-based Leo framework created by the Salt Lake City Veterans Administration as well as various Python packages. In addition to supporting local studies, ARCH contributes EHR data to multi-institutional data sharing initiatives, including the NCATS Accrual to Clinical Trials (ACT) and National COVID Cohort Collaborative (N3C). Building on success with i2b2, ACT supports investigator-initiated clinical trials by helping scientists obtain patient counts preparatory to research from more than 45 CTSA hubs. To further pandemic response efforts, N3C aggregates EHR data to form a centralized national database in support of observational studies with extensive privacy and security controls, and ARCH contributes data on behalf of WCM. As the lead site of the INSIGHT Clinical Research Network, WCM aggregates EHR data for more than 8 million patients from all New York City academic medical centers, all of which are CTSA hubs, and enables participation in PCORnet, a network-of-networks for studies using EHR and other data sources. Together with Columbia University Irving Medical Center and Harlem Hospital Center, ARCH enables Weill Cornell participation in the NIH All of Us Research Program through novel informatics support for study coordinators that has also supported the PCORI-funded ADAPTABLE study. To support sponsor-initiated clinical trials, TriNetX enables biopharmaceutical sponsors to obtain deidentified counts of patients from CTSC EHR data and propose clinical trial opportunities. Along with supporting research involving big data from the EHR, ARCH supports creation of small data sets using electronic data capture systems, especially REDCap. Building on the success of REDCap, ARCH has adopted the commercial REDCap Cloud to support studies requiring FDA oversight under 21 CFR Part 11. Additionally, to integrate clinical and research workflows, ARCH implemented SUPER REDCap, a generalizable middleware for connecting REDCap with an institution’s enterprise data warehouse using REDCap’s dynamic data pull feature. By prepopulating case report forms with data from the EHR, SUPER REDCap reduces data entry and saves time for research coordinators. ARCH also helped WCM become one of the first institutions globally to adopt SUPER REDCap on Fast Healthcare Interoperability Resource (FHIR), which makes REDCap accessible within the Epic EHR system. To support specific information needs of different disease areas, ARCH provides custom research data repositories (RDRs). Containing identified data only for patients of interest to an investigator group, each RDR has 3 user interfaces to support scientific workflows—i2b2 for cohort discovery, SUPER REDCap for data collection, and Microsoft SQL Server Management Studio for data querying and analysis. RDRs contain rows-and-columns-level data sets customized to the needs of investigators and seek to support multiple studies. In contrast to the bulk of ARCH services that are available free of charge to investigators, RDRs require a $50 000 startup fee and $7500 annual fee. Although the charges do not fully recover costs, the fees ensure investigators “have skin in the game” and commit to partnering with Research Informatics for developing data marts. To support electronic consent (eConsent) for research studies, ARCH successfully launched REDCap-based eConsent in multiple clinics. Additionally, for eConsent for studies requiring 21 CFR Part 11 compliance, ARCH has piloted DocuSign. More recently, ARCH has implemented a “consent to be contacted for research” within the Epic MyChart patient portal that allows patients to opt in to researchers other than their treating physicians to contact them about studies for which they may be eligible. To date, more than 100 000 patients have opted in since May 2019. Additionally, ARCH launched biobank informatics at WCM with implementation of OpenSpecimen, which CTSA hubs and other academic medical centers use broadly. OpenSpecimen is integrated with the Epic EHR system and local data warehouse. ARCH also receives data from the Standard Molecular genomic information system—which contains variants of known and unknown significance performed as part of NYP/WCM clinical genomics testing—and makes data available through i2b2 and other tools. In addition to supporting the acquisition of data, ARCH enables secure analysis via the Data Core. Consisting of a remote Windows desktop environment with productivity software (eg, Microsoft Office, Stata, R) and access restricted to specific study personnel, the Data Core allows investigators to analyze sensitive data—such as data from EHR systems, insurance payers, and other institutions—in accordance with IRB protocols, data use agreements, and other contracts. Notably, during the coronavirus disease 2019(COVID-19) pandemic stay-at-home orders, the Data Core enabled secure remote access to sensitive WCM COVID patient data for investigators at home without WCM-managed workstations. Governance of ARCH consists of multiple mechanisms, including a steering committee comprised of senior WCM research and IT leaders who provide scientific and project prioritization guidance. On behalf of the WCM Privacy Office and WCM Institutional Review Board, ARCH serves as the honest broker of patient identity for research for the institution, with a particular focus on de-identification according to the HIPAA Safe Harbor method. For governance of clinical data for research originating from the EHR system shared across WCM, NYP, and Columbia, a data sharing agreement executed by the 3 institutions created the Alignment Committee on Oversight of Requests for Data (ACORD), which sets policies that the Tripartite Request Assessment Committee (TRAC) implements as processes for investigators to obtain data. ARCH functions as an agent of TRAC and ACORD for fulfilling data requests per institutional policy.

Evaluation

To assess and evaluate overall utilization of the ARCH suite of tools and services, we extracted data from ServiceNow and other institutional sources as necessary. First, we determined the yearly volume of total investigator consults and the total number of investigators supported through the ARCH suite of tools and services, identifying a consult as a single point of engagement (eg, an incident or request in ServiceNow) and utilizing built-in ServiceNow dashboard and reporting features to tabulate data. Then we evaluated the volume of support provided with respect to users, projects, and other associated metrics.

RESULTS

Since 2016, ARCH has supported 1294 unique investigators through 4177 consults. Year-to-year support of investigators has generally increased with major growth in custom RDRs occurring in 2019. A partial list of publications enabled by ARCH is available at https://its.weill.cornell.edu/guides/publications-using-arch-data. As described in Table 1, investigators have used scientific functions enabled by ARCH tools to support numerous measures of research activity. Driven by clinical use cases, ARCH NLP efforts have supported acquisition of left ventricular ejection fraction, depression severity, suicidal ideation, and race and ethnicity among other elements from progress notes and pathology reports. ARCH infrastructure has also grown support of multi-institutional data sharing initiatives overtime to deliver regular data set updates (eg, quarterly, monthly, weekly) to PCORnet, ACT, N3C, All of Us Research Program, and TriNetX.

Table 1.

Volume of support provided through the Architecture for Research Computing in Health (ARCH) program

ARCH function	Measure of support
EHR reporting	329 studies
Cohort discovery: i2b2	688 users 11 624 queries 207 IRB-approved patient reidentification requests
Big data analytics: natural language processing	9 information extraction pipelines
Multi-institutional data sharing: TriNetX	275 clinical trial opportunities
Custom research data repositories	17 investigator groups 233 data extractions
Electronic data capture: REDCap	5518 projects
Custom research data repositories	17 investigator groups 233 data extractions
Biobank and ancillary omics: OpenSpecimen	16 studies
Biobank and ancillary omics: Standard Molecular	3005 next-generation sequencing assays with structured results

Volume of support provided through the Architecture for Research Computing in Health (ARCH) program 329 studies 688 users 11 624 queries 207 IRB-approved patient reidentification requests 9 information extraction pipelines 275 clinical trial opportunities 17 investigator groups 233 data extractions 5518 projects 17 investigator groups 233 data extractions 16 studies 3005 next-generation sequencing assays with structured results Of the 17 custom RDRs live as of July 2021, academic output includes but is not limited to that from Cardiac Imaging,, Digestive Care, Mental Health,, Myeloproliferative Neoplasms,, Pulmonary and Critical Care,, and Stroke. Largely driven by investigators with grant funding, RDR projects have generated data marts to address specific clinical research questions (eg, predictors of outcomes in hospitalized cirrhotic patients) while also yielding generalizable resources for the institution, such as an i2b2 eye exam ontology from Ophthalmology and surgical pathology report NLP from Urology. Notably, to support COVID-19 response efforts, ARCH provisioned the COVID Institutional Data Repository (IDR) using the RDR model to enable data-driven decision-making for not only research but also clinical care. To date, the COVID IDR has supported more than 13 publications.,, A data mart created as part of the Pulmonary and Critical Care RDR for sepsis research supported WCM action early in the COVID-19 pandemic.

DISCUSSION

As sources of biomedical big data have proliferated, so too have informatics systems that support the spectrum of studies from populations to individuals, which we collectively refer to as ARCH. At our institution, the ARCH suite of tools and services has enabled investigators to navigate systems to obtain electronic patient data for research. By combining technical, regulatory, financial, and engagement activities, ARCH provides a framework that may inform efforts at other institutions to support scientists with electronic patient data. The ARCH program initially took shape with a limited scope. Seeking to prioritize immediate investigator needs, we provisioned i2b2 to support cohort discovery, REDCap to support collection of research data, and EHR reporting, alongside custom RDRs, to support the analysis of rows-and-columns data sets. As the program evolved, we have expanded its offerings to include additional services, such as biospecimen informatics, big data analytics, and multi-institutional data sharing. Conceptualizing the structure of this portfolio as we have organized it here, in terms of specific tools designed to support specific scientific workflows, as well as the underlying infrastructure, has been helpful in framing ARCH’s role both with local investigators and with administrators seeking to allocate funding to support custom efforts. Other institutions may find the ARCH framework (Figure 3) useful for demonstrating to investigators the “alphabet soup” of tools and services available—and the benefit of consulting informatics staff for guidance—as well as site-specific substitutions of tools to support scientific workflows, such as Leaf instead of i2b2 for cohort discovery and OpenClinica instead of REDCap for data capture. Additionally, the modular ARCH framework can help institutions inform investigator communities of new product offerings, such as a novel multi-institutional data sharing consortium (eg, NIH postacute sequalae of COVID) and radiology or pathology image-specific services.

Figure 3.

Annual support of scientific activity through Architecture for Research Computing in Health. Consults are specific to each year, and a unique investigator may receive support in 1 or more years.

Annual support of scientific activity through Architecture for Research Computing in Health. Consults are specific to each year, and a unique investigator may receive support in 1 or more years. In expanding the ARCH program since its inception, we have learned multiple lessons both from internal operational analyses and from formal, structured evaluations of the use of ARCH tools and services. Some of these include the following: Support for basic research workflows, such as cohort discovery and data collection, can often support the majority of investigator use cases. Tailoring efforts toward complex and theoretical use cases risks overprioritizing hypothetical and glamorous projects at the expense of the day-to-day work that constitutes the backbone of IT support for the research enterprise (eg, the provision of electronic case report forms, cohort discovery to facilitate manual chart review, and participation in multi-institutional consortia). Custom-tailored data extraction trades specificity for scalability. Through developing customized RDRs that extract EHR data in an ad hoc fashion to support specific scientific use cases rather than a one-size-fits-all data warehouse, we have been able to address particular use cases and support studies that might not have otherwise been feasible. However, this approach requires individual engagement with stakeholders, and thus a linear scaling of staff is necessary to support an expanding portfolio of custom extraction efforts. Standardized data models can support some but not all use cases. Reliance on tools such as the OMOP CDM affords flexibility and saves time—if an investigator seeks to extract a table with a row for each diagnosis a patient has been assigned, it is easier to pull this from an instance of OMOP’s CONDITION_OCCURRENCE table than from an EHR’s proprietary source data model, where diagnosis data may be stored in as many as 6 distinct tables. However, in many cases, specific studies require the extraction and analysis of data points that are not necessarily mappable to a standard data model, such as “I&O” flowsheets which document at the shift level patient fluid intake and excretion in intensive care units and cannot be easily modeled without exhaustive effort and a series of arbitrary data modeling decisions. It takes a village of multiple specialists to quickly, accurately, and effectively extract and transform patient data for statistical analysis. Clinicians and trained informatics staff working together can easily generate large data sets, but early and frequent engagement with trained biostatisticians is also required to make sure that the data are appropriately structured and transformed to suit the analyses at hand. Gaps in knowledge exist on both sides when clinicians and informaticians come together to extract patient data for research and must be accounted for. Informaticians may be ignorant of basic elements of clinical workflows, such as the fact that some departments may not order procedures in the EHR, but instead document them solely in free-text progress notes, leaving billers to review encounters and file charges after the fact. Conversely, clinicians may be unaware that some data elements that exist in the EHR are not structured and cannot be easily extracted or modeled, such as response/relapse/remission in cancer. Generalized data quality assessment platforms cannot always accurately assess the fitness-for-purpose of an individual data set for an individual use case. Some data sets that pass a series of automated checks may be missing a critical element for a particular project. Conversely, other data sets that may trigger alerts from automated tools may be sufficient for some analytic use cases. Investigator engagement and request triaging are critical elements of providing informatics support for the research enterprise. Especially at academic medical centers, a broad array of investigators with varying degrees of expertise and widely disparate areas of interest are constantly seeking to explore an ever-evolving array of hypotheses. Many of these investigators may reach out with a specific tool in mind, only to reveal upon examination that their use case necessitates a completely different approach (eg, REDCap instead of i2b2). Regardless of the outcome of an individual consult with a particular investigator, there is value in having a designated and centrally coordinated team responding to inquiries about the use of electronic patient data. Grant funding for informatics infrastructure is useful but does not typically cover full costs. Although extramural awards provide a bolus of funds to start projects, support tapers over time, and institutional subsidy is critical for both launch and maintenance of operations. Agencies have an opportunity to better fund research informatics infrastructure at academic medical centers. As the ARCH program has evolved, it has also encountered growing pains. In demonstrating the ability to deliver data that are of value to investigators, we have stimulated interest to the point that investigators now seek to obtain data on such a scale and with such frequency as to necessitate restructuring our underlying infrastructure, especially given existing funded commitments to regularly supply data to multi-institutional research networks, such as PCORnet and N3C. Future directions for expansion of the ARCH platform include migration to a cloud-based infrastructure, which will not obviate but may alleviate some of these issues. Additionally, providing support for direct EHR interventions through the FHIR framework may potentially allow ARCH to fully enable the virtuous cycle of the learning healthcare system. The analysis presented in this article has limitations. Tracking publications ensuing from the use of ARCH tools and services remains a challenge. Although boilerplate text acknowledging federal support through the CTSA funding mechanism helps with prospective identification of new studies or papers using data gathered through ARCH, there is no guarantee that investigators will include this copy or that journals will have a place for it, rendering it difficult to accurately assess the full scope of work supported through this program. Additionally, some of the metrics we have chosen to represent utility and uptake of informatics tools at our institutions are imperfect at best. Query volume, in a tool like i2b2, may be less related to investigator interest in and engagement with the tool and more related to mechanistic difficulties in constructing a query that identifies the patient population of interest. We recognize that the approach outlined here may not be applicable to every institution, and that in some cases, exigencies of funding or organizational structure may necessitate the adoption of a different approach. Regardless, it is our hope that the lessons we have learned in developing and implementing this program may be of use to other institutions seeking to support the research enterprise with electronic patient data.

CONCLUSION

Supporting clinical and translational scientists with electronic patient data is challenging. Although multiple systems exist to enable data collection and analysis, navigating options can be difficult for faculty, staff, and students. A suite of tools and services, ARCH helps match investigators with informatics approaches with respect to study design, data sources, and cost. ARCH has successfully enabled research at Weill Cornell Medicine and may help informatics and research administrators support scientists elsewhere.

FUNDING

This study received support from the National Institutes of Health National Center for Advancing Translational Sciences through grant number UL1TR002384 (Weill Cornell) as well as support from the Joint Clinical Trials Office of Weill Cornell Medicine and NewYork-Presbyterian.

AUTHOR CONTRIBUTIONS

TRC conceptualized ARCH and drafted the initial article. ETS contributed new content and major edits. SBJ, JP, JPL, and CLC participated in refining the ARCH concept and editing the article. CLC championed the ARCH effort.

54 in total

1. Scenarios for Using OpenClinica in Academic Clinical Trials.

Authors: Matthias Löbe; Frank Meineke; Alfred Winter
Journal: Stud Health Technol Inform Date: 2019

2. Transforming from centers of learning to learning health systems: the challenge for academic health centers.

Authors: Kevin Grumbach; Catherine R Lucey; S Claiborne Johnston
Journal: JAMA Date: 2014-03-19 Impact factor: 56.272

3. Extracting and classifying diagnosis dates from clinical notes: A case study.

Authors: Julia T Fu; Evan Sholle; Spencer Krichevsky; Joseph Scandura; Thomas R Campion
Journal: J Biomed Inform Date: 2020-09-16 Impact factor: 6.317

4. COVID-19 in Patients with CKD in New York City.

Authors: Oleh Akchurin; Kelly Meza; Sharmi Biswas; Michaela Greenbaum; Alexandra P Licona-Freudenstein; Parag Goyal; Justin J Choi; Mary E Choi
Journal: Kidney360 Date: 2021-01-28

5. From Sour Grapes to Low-Hanging Fruit: A Case Study Demonstrating a Practical Strategy for Natural Language Processing Portability.

Authors: Stephen B Johnson; Prakash Adekkanattu; Thomas R Campion; James Flory; Jyotishman Pathak; Olga V Patterson; Scott L DuVall; Vincent Major; Yindalon Aphinyanaphongs
Journal: AMIA Jt Summits Transl Sci Proc Date: 2018-05-18

6. Respiratory Mechanics and Gas Exchange in COVID-19-associated Respiratory Failure.

Authors: Edward J Schenck; Katherine Hoffman; Parag Goyal; Justin Choi; Lisa Torres; Kapil Rajwani; Christopher W Tam; Natalia Ivascu; Fernando J Martinez; David A Berlin
Journal: Ann Am Thorac Soc Date: 2020-09

7. Gastrointestinal and Hepatic Manifestations of 2019 Novel Coronavirus Disease in a Large Cohort of Infected Patients From New York: Clinical Implications.

Authors: Kaveh Hajifathalian; Tibor Krisko; Amit Mehta; Sonal Kumar; Robert Schwartz; Brett Fortune; Reem Z Sharaiha
Journal: Gastroenterology Date: 2020-05-08 Impact factor: 22.682

8. Shotgun transcriptome, spatial omics, and isothermal profiling of SARS-CoV-2 infection reveals unique host responses, viral diversification, and drug interactions.

Authors: Daniel Butler; Christopher Mozsary; Cem Meydan; Jonathan Foox; Joel Rosiene; Alon Shaiber; David Danko; Ebrahim Afshinnekoo; Matthew MacKay; Fritz J Sedlazeck; Nikolay A Ivanov; Maria Sierra; Diana Pohle; Michael Zietz; Undina Gisladottir; Vijendra Ramlall; Evan T Sholle; Edward J Schenck; Craig D Westover; Ciaran Hassan; Krista Ryon; Benjamin Young; Chandrima Bhattacharya; Dianna L Ng; Andrea C Granados; Yale A Santos; Venice Servellita; Scot Federman; Phyllis Ruggiero; Arkarachai Fungtammasan; Chen-Shan Chin; Nathaniel M Pearson; Bradley W Langhorst; Nathan A Tanner; Youngmi Kim; Jason W Reeves; Tyler D Hether; Sarah E Warren; Michael Bailey; Justyna Gawrys; Dmitry Meleshko; Dong Xu; Mara Couto-Rodriguez; Dorottya Nagy-Szakal; Joseph Barrows; Heather Wells; Niamh B O'Hara; Jeffrey A Rosenfeld; Ying Chen; Peter A D Steel; Amos J Shemesh; Jenny Xiang; Jean Thierry-Mieg; Danielle Thierry-Mieg; Angelika Iftner; Daniela Bezdan; Elizabeth Sanchez; Thomas R Campion; John Sipley; Lin Cong; Arryn Craney; Priya Velu; Ari M Melnick; Sagi Shapira; Iman Hajirasouliha; Alain Borczuk; Thomas Iftner; Mirella Salvatore; Massimo Loda; Lars F Westblade; Melissa Cushing; Shixiu Wu; Shawn Levy; Charles Chiu; Robert E Schwartz; Nicholas Tatonetti; Hanna Rennert; Marcin Imielinski; Christopher E Mason
Journal: Nat Commun Date: 2021-03-12 Impact factor: 14.919

9. Extraction of radiographic findings from unstructured thoracoabdominal computed tomography reports using convolutional neural network based natural language processing.

Authors: Mohit Pandey; Zhuoran Xu; Evan Sholle; Gabriel Maliakal; Gurpreet Singh; Zahra Fatima; Daria Larine; Benjamin C Lee; Jing Wang; Alexander R van Rosendael; Lohendran Baskaran; Leslee J Shaw; James K Min; Subhi J Al'Aref
Journal: PLoS One Date: 2020-07-30 Impact factor: 3.240

10. The National COVID Cohort Collaborative (N3C): Rationale, design, infrastructure, and deployment.

Authors: Melissa A Haendel; Christopher G Chute; Tellen D Bennett; David A Eichmann; Justin Guinney; Warren A Kibbe; Philip R O Payne; Emily R Pfaff; Peter N Robinson; Joel H Saltz; Heidi Spratt; Christine Suver; John Wilbanks; Adam B Wilcox; Andrew E Williams; Chunlei Wu; Clair Blacketer; Robert L Bradford; James J Cimino; Marshall Clark; Evan W Colmenares; Patricia A Francis; Davera Gabriel; Alexis Graves; Raju Hemadri; Stephanie S Hong; George Hripscak; Dazhi Jiao; Jeffrey G Klann; Kristin Kostka; Adam M Lee; Harold P Lehmann; Lora Lingrey; Robert T Miller; Michele Morris; Shawn N Murphy; Karthik Natarajan; Matvey B Palchuk; Usman Sheikh; Harold Solbrig; Shyam Visweswaran; Anita Walden; Kellie M Walters; Griffin M Weber; Xiaohan Tanner Zhang; Richard L Zhu; Benjamin Amor; Andrew T Girvin; Amin Manna; Nabeel Qureshi; Michael G Kurilla; Sam G Michael; Lili M Portilla; Joni L Rutter; Christopher P Austin; Ken R Gersing
Journal: J Am Med Inform Assoc Date: 2021-03-01 Impact factor: 7.942

1 in total

1. Research data warehouse best practices: catalyzing national data sharing through informatics innovation.

Authors: Shawn N Murphy; Shyam Visweswaran; Michael J Becich; Thomas R Campion; Boyd M Knosp; Genevieve B Melton-Meaux; Leslie A Lenert
Journal: J Am Med Inform Assoc Date: 2022-03-15 Impact factor: 7.942

1 in total