| Literature DB >> 27918836 |
Laurence Loewe1,2, Katherine S Scheuer1, Seth A Keel1,2, Vaibhav Vyas1, Ben Liblit3, Bret Hanlon4, Michael C Ferris1,3, John Yin1,5, Inês Dutra6, Anthony Pietsch1, Christine G Javid1, Cecilia L Moog1, Jocelyn Meyer1, Jerdon Dresel1, Brian McLoone1, Sonya Loberger1, Arezoo Movaghar1, Morgaine Gilchrist-Scott1, Yazeed Sabri1, Dave Sescleifer1, Ivan Pereda-Zorrilla1, Andrew Zietlow1, Rodrigo Smith1, Samantha Pietenpol1, Jacob Goldfinger1, Sarah L Atzen1, Erika Freiberg1, Noah P Waters1, Claire Nusbaum1, Erik Nolan1, Alyssa Hotz1, Richard M Kliman7, Ayalew Mentewab8, Nathan Fregien9, Martha Loewe1.
Abstract
Names in programming are vital for understanding the meaning of code and big data. We define code2brain (C2B) interfaces as maps in compilers and brains between meaning and naming syntax, which help to understand executable code. While working toward an Evolvix syntax for general-purpose programming that makes accurate modeling easy for biologists, we observed how names affect C2B quality. To protect learning and coding investments, C2B interfaces require long-term backward compatibility and semantic reproducibility (accurate reproduction of computational meaning from coder-brains to reader-brains by code alone). Semantic reproducibility is often assumed until confusing synonyms degrade modeling in biology to deciphering exercises. We highlight empirical naming priorities from diverse individuals and roles of names in different modes of computing to show how naming easily becomes impossibly difficult. We present the Evolvix BEST (Brief, Explicit, Summarizing, Technical) Names concept for reducing naming priority conflicts, test it on a real challenge by naming subfolders for the Project Organization Stabilizing Tool system, and provide naming questionnaires designed to facilitate C2B debugging by improving names used as keywords in a stabilizing programming language. Our experiences inspired us to develop Evolvix using a flipped programming language design approach with some unexpected features and BEST Names at its core.Entities:
Keywords: debugging code2brain interfaces; evolutionary systems biology simulations; flipped programming language design; fundamental modes of computing; names of identifiers in code; ontology computing; programming language paradigms and naming
Mesh:
Year: 2016 PMID: 27918836 PMCID: PMC5299481 DOI: 10.1111/nyas.13192
Source DB: PubMed Journal: Ann N Y Acad Sci ISSN: 0077-8923 Impact factor: 5.691
Figure 1(A) C2B interfaces for writers and readers of computer programs are critical for computational science and the correct use of big data. The power of computational modeling for understanding the natural world has long been known and is essential for analyzing big data. Still, many scientists have been slow to engage with computational models. We suggest that the C2B interfaces assumed by many programming languages might carry a significant part of the responsibility as they may induce too much confusion for many scientists who are not trained computationally. It is not easy to design good C2B interfaces and near impossible to do so in isolation because of the curse of knowledge, which is difficult to escape for designers of programming languages (who need advanced programming skills to implement any language). This curse makes people forget how difficult the first steps were. As a result, they struggle to simplify problems appropriately for beginners. The only way of avoiding the resulting breakdown of communication is to debug the C2B interfaces of diverse potential user groups by comparing a language designer's ways of expressing (syntax) an intended meaning (semantics) to be implemented in a language with the meaning inferred by diverse readers of the code. This process is costly, as it involves talking a lot about communication errors in hypothetical programming scenarios of no immediate relevance to those who do the most important work for improving clarity. The goal is to highlight the blind spots in the designer's brain that tempt to prematurely accept a type system that does not well reflect the reality of those who might use the language. Poor C2B interfaces are caused by (brain) cache inconsistency, which makes naming difficult as every communicator stores local name definitions that are easily outdated. (B) Similarly, cache inconsistency can easily occur when collaborating in the Cloud. In one example, Alice and Bob collaborate in the Cloud, and file changes get lost because there is no shared naming convention (loosely based on an actual scenario observed, to our surprise). (C) In another example, Alice and Bob are able to efficiently exchange files without data loss, after agreeing to a shared naming convention (illustrating how naming and cache invalidation are two sides of one coin). In biology, these problems involve longer timescales and more collaborators, as all naming starts at independent locations with new observations. Accumulating enough observations results in naming confusion that forces a choice of costs: pay explicitly to standardize names (ensure cache consistency) or implicitly by losing research results to semantic rot. Hence, experts at the NIH recommended the development of tools that support naming.65
Figure 2The challenge of combining independently developed models in systems biology, sometimes using different nomenclatures for the same entity (e.g., starch and amylum), or the same name for rather different entities (amylase is either under human or bacterial control resulting in very different time courses, as bacteria grow much faster than human cells23, 24, 25, 26, 27, 28, 29). Here, we illustrate a hypothetical scenario, where a model of starch breakdown in human saliva (A) was independently developed from a model of amylum breakdown by bacteria (B). How starch is broken down during human digestion can be better understood if researchers integrate these models (C), which could happen either by building a “supermodel” that manually wires each relevant change from one to the other or by combining them into a common namespace and letting the compiler do the wiring by interpreting the semantics encoded in the names of the simulated parts.
Questions on aspects of naming complexity
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| Explicitness | Part, whole | User relevance and ease; authority defines standards | Users | How many interesting details from lower levels of content are packed into the Name? | List some or all elements of the set that is being named |
| Clutter | Whole | As above | Users | How many uninteresting details from lower levels of content are included? | Listing the elements of subsets of subsets … makes for tedious reading, even though it increases precision (see “Form Filling” mode of computing) |
| Audience (intended) | External | Experience of the naming authority, feedback | Perceptive naming authorities | How will the names be used? As predicted by the authority? | Differ for different authorities |
| Audience (actual) | External | Quality and usefulness of names | Perceptive naming authorities | Which names are accessible for actual users? Which support exists (dictionaries, name trackers, immutability, etc)? | Differ for different authorities and may also depend on the audience (e.g., if allowing for different BEST Names dialects) |
| Mode of computing (Table | Whole, external | Machine type | Names users, authority | Which modes of computing are used for naming? Is it allowed to mix modes when naming, despite the potential confusion? | Name contains any address from a locally linear space, or all content, or content fragments, or any queries for element subset conditions, or a mix |
| Assigning | Whole, external | Item or pattern for which type is unknown | Authority, users | Is the Name assigned to an Item or Pattern that is observable in the real world? What is the most appropriate name for this content? | Searching for good names can take time. Resulting names can be temporary or local until they are centrally standardized (to avoid distributed naming) |
| Inventing | Whole, external | Invented type for which no real‐world equivalent is known | Authority, users | Which name best describes the properties of a Type or Pattern that is not observable (i.e., Fiction)? | Types exist without items. They are either made up or not yet identified. Their names reflect properties |
| Naming authority | External | Local, distributed, or global users | Naming process; users | Who names newly found content? Who can list all local names in a context, and how many at a time? Who names the authority and how is it found? | Authorities can be context defined (address)‐, self‐appointed‐, other‐appointed‐, self‐naming‐, other‐naming‐, local‐, distributed‐, global‐, absolute‐, etc. |
| Label purity | Whole | Naming authority | Stability | Pure names are pure labels without any other meaning. How independent is the name from anything that is being named? | Names can depend on storage location, content, context, type, or specifics of these. Pure names are free from any interpretation other than “label of content” (like a collision‐free hash key pointing to content) |
Note: Many aspects of naming contribute to its complexity: a multitude of competing perspectives and potential criteria, whose importance is often only clear with hindsight. To help navigate naming complexity and facilitate more conscious naming decisions, we present some questions of potential interest. An exhaustive list is beyond the scope of this study and would also have to investigate address‐names, relative names, naming‐ambiguity, brevity, name‐usage‐complexity, name‐search‐complexity, maturity, namespaces, absurdity, versioning, intuition, standardization, and many other aspects that can affect the Whole name, only a Part of the name, or something External to the character‐sequence of the name itself (as indicated in the column “Property of”). In this table, “users” refer to persons or programs that use a name that had been given by a naming “authority.”
Figure 3BEST Names can simplify naming in complex biological models, if all names and synonyms in a context are consistently mapped to exactly one StableMeaning (StM) as implemented by pointing to one StableContent. The insert shows what goes wrong if two BEST Names trees that should be separated happen to not be, due to a naming conflict: in that case, the synonym “B” would ambiguously map to two StMs. There is no point in biology to have computers automatically resolve this, because by the time a computer reaches this point, it will have completed the part that is most difficult for humans: finding the needle in the haystack where this miswiring actually occurred.
Ontology computing and other modes of computing affect the role of names in programs and bugs
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Note: Columns highlight parallels between the modes of computing given in rows, such as (imperative) Structured Commanding, Functional Form Filling, Ontological Dictionary Defining, Logical Solution Isolating, Concurrent Network Traffic, and the Physics World (matter–energy in space–time). This very foundational view enables a new perspective on the role of names in each mode of computing and the opportunities and dangers names present for all programmers. While proofs show that infinite time and resources allow each mode of computing to accomplish any task another mode can do, in practice each mode greatly simplifies some types of tasks, but not others. Names were chosen to provide a fresh perspective on these tasks, some of which are particularly fundamental to computing (Structured Commanding, Form Filling, Solution Isolating, and Network Traffic). Most general‐purpose languages combine aspects from these fundamental modes of computing to facilitate the efficient implementation of diverse complex scenarios, but support for some aspects is often retrofitted or limited. A more streamlined and thorough integration of fundamental modes of computing could simplify programming and the construction of many derived modes of computing such as Ontology Computing (derivable from Form Filling) or simulating the Physics World (derivable from Network Traffic).
Figure 4The Flipped Programming Language Design approach. (A) Most programming languages are designed by advanced programmers aiming to solve important types of problems in a better way; usually implementation is urgent and capabilities allow coding to start early. Few take the time to collect enough user and expert feedback during language design to break the curses of knowledge and ignorance. These curses make designers oblivious to idiosyncrasies and important missing features that frustrate both beginners and experts using the language. Many changes can be added after implementation starts, but fundamental redesigns are often prohibitively costly (e.g., Fig. 1A, changing names of logic operators). (B) Flipped Programming Language Design turns the tables in important ways by putting language designers in the hot seat (red) tasked with minimizing previously unnoted problems in language proposals as highlighted by users. Delayed implementation allows fundamental redesigns where needed. As an example from a different field, consider the 1940 Tacoma Narrows Bridge. Designs could have been changed before implementation, if more would have been known about “exceptional winds.” After construction, nothing could be done until it collapsed (see film at https://archive.org/details/SF121). It is often more difficult for a language designer to anticipate how a programming language for biology will be misread or fail than to solve such problems (as judged from several dozen redesigns of Evolvix); thus, repeated rounds of rigorous review by users and experts are critical. Not all words and concepts in a language need maximal scrutiny, but basic concepts and operators certainly do.
|
Brief |
for |
|
Explicit |
for |
|
Summarizing |
for |
|
Technical |
for |