| Literature DB >> 26487987 |
Okan Azmak1, Hannah Bayer1, Andrew Caplin1, Miyoung Chun2, Paul Glimcher1, Steven Koonin1, Aristides Patrinos1.
Abstract
Until now, most large-scale studies of humans have either focused on very specific domains of inquiry or have relied on between-subjects approaches. While these previous studies have been invaluable for revealing important biological factors in cardiac health or social factors in retirement choices, no single repository contains anything like a complete record of the health, education, genetics, environmental, and lifestyle profiles of a large group of individuals at the within-subject level. This seems critical today because emerging evidence about the dynamic interplay between biology, behavior, and the environment point to a pressing need for just the kind of large-scale, long-term synoptic dataset that does not yet exist at the within-subject level. At the same time that the need for such a dataset is becoming clear, there is also growing evidence that just such a synoptic dataset may now be obtainable-at least at moderate scale-using contemporary big data approaches. To this end, we introduce the Kavli HUMAN Project (KHP), an effort to aggregate data from 2,500 New York City households in all five boroughs (roughly 10,000 individuals) whose biology and behavior will be measured using an unprecedented array of modalities over 20 years. It will also richly measure environmental conditions and events that KHP members experience using a geographic information system database of unparalleled scale, currently under construction in New York. In this manner, KHP will offer both synoptic and granular views of how human health and behavior coevolve over the life cycle and why they evolve differently for different people. In turn, we argue that this will allow for new discovery-based scientific approaches, rooted in big data analytics, to improving the health and quality of human life, particularly in urban contexts.Entities:
Keywords: big data analytics; semistructured data; unstructured data
Year: 2015 PMID: 26487987 PMCID: PMC4605457 DOI: 10.1089/big.2015.0012
Source DB: PubMed Journal: Big Data ISSN: 2167-6461 Impact factor: 2.128

The Biobehavioral Complex.
Domains of data collection from KHP study participants
| Demographics | Participant questionnaire | Demographic information about participant household and individual members of the household, such as age, gender, and ethnicity |
| Home environment | Participant questionnaire | Information about housing space, presence of toxins, air quality, ambient noise level, and water and energy use |
| Neighborhood baseline | NYC public data sets on census, education, law enforcement, public service, and GIS | Information about the neighborhood in which the participant lives, such as demographic composition, median income, school ratings, emergency service requests, and crime statistics |
| Biomedical | Physical exam (weight, height, BMI, resting heart rate, blood pressure) | Information about each participant's medical and dental history, physiology, biochemistry, complete whole genome genetics, complete microbiomes, and complete pharmacological use profiles |
| Diet and health | Participant food diaries (for limited duration, repeated regularly) | Information about each participant's diet, use of alcohol, tobacco, and other substances |
| Psychological | Structured interviews of participants by trained professionals | Information about participants' mental health, personality attributes, levels of cognitive function, executive function and memory, and risk preferences |
| Educational | Participants' educational records and extracurricular activity records | Information about participants' formal and informal educational history (e.g., number of books in the home) and progress of current education |
| Occupational | Participants' curriculum vitae (oral or written) | Information about participants' occupational history and progress of their occupation/career during the study time frame. |
| Activity | Smartphone app (for location, activity, and socializing data) | Information about the times and duration of different activities, such as sleep, commute/travel, work/school, exercise, entertainment, socializing, and screen time, as measured by wearable technologies, smartphone apps, and presence detection systems |
| Family interactions | Participant questionnaire | Information about the frequency and duration of interaction between parents and children in the home |
| Financial | Participant questionnaire | Information about participants' sources of income, major assets and liabilities, categories of expenses, savings, and retirement planning activities. Detailed purchase data to the level of all individual purchases, grocery purchases at the level of individual items, prescription drug co-pay data, alcohol purchases, tobacco purchases, etc. |
| Interactions with law enforcement | Participants' call history | Information about participants' interaction with law enforcement agencies as either victims or potential culprits |

Data sources to inputs for the KHP study.

Data ingestion for KHP study.