Annotation jamborees

Biomedical scientists are used to collectively annotating massive datasets. Consider the sequencing of large genomes – the data is there, but what does it mean??

Basic questions include: Where does each gene stop and start within the genome? What’s the functional significance of a given gene? How does a given gene’s protein product participate within a signaling network in a cell?

genome annotation: from sequence to biology
https://www.nature.com/articles/35080529
levels of genome annotation

To go beyond the genetic example above and welcome those from other disciplines, the overriding question (the data is there, but what does it mean???) surely confronts every kind of source material. Historians are keenly aware of this……

Torah to Talmud

(also from Lincoln Stein – Genome annotation: from sequence to biology)

Some documents, as in the above example of the Talmud, have accrued massive amounts of annotation that exceed the original text. Other documents are still awaiting annotation by current and future scholars. If annotations could be pinpointed to discrete times to allow for longitudinal analysis, one could see the evolution of thought over time!!!

Stein offers an intriguing discussion of the sociology of annotation, where he describes different organizational models: these include the factory, the museum, and the party.

The factory

Key feature: automated labelling of the source material. Fit for early annotation efforts (in genetics – with simply just finding genes). Leads to a broad, but shallow baseline annotation of the source. No automated identification or classification algorithm is perfect. For social science disciplines, automated text classification to categorize the source text into discrete units would fit the “factory model” of annotation.

factory with robots

The museum

Key feature: interpreting the functional role of what was identified in the factory stage. In genetics, this would be akin to interpreting the functional role of what was identified as a gene in the factory stage, and also correcting mistakes made by the prediction algorithm in that first factory stage. For social science disciplines, it’s akin to taking a closer look at the text classification algorithm and curating the output into other categories that may emerge after human consideration. Need not be synchronous, can occur over an extended time period.

curator tour

The party

Key feature: synchronous effort of experts – same time, same room, plowing through the source material. There is precedence with fruit fly genome , mouse genome, and cell type annotation “jamborees”.

annotation jamboree

What about the classroom? How do these models fit?

For students annotating source materials……

if they are working asynchronously, this aligns with the museum model

if they are working synchronously/in-person, this better aligns with the party model.

For teachers, they are assessing the student annotation output, and the analysis holds value for grading or more granular understanding of the annotation output. For example, what kind of knowledge construction activity was taking place when the students were annotating? (see earlier discussion on this website, and source reference). This knowledge construction categorization is akin to a museum or party analysis model, as humans are doing the work of categorizing.

For factory models of annotation output, one can look at the automated analysis of annotations via Perusall, or Crowdlaaers. Surely classroom teachers are also double checking the automated analysis and taking on the above museum or party roles.

Are the annotations valued and rewarded?

In the classroom example: yes, they can form a graded assessment and count for some % of the student’s final grade.

Factory: large class of students – teacher cannot manually grade all

Museum: smaller class of students, teacher can manually read and then grade the annotation output

For the professional annotating other scholarly works: Hypothes.is has recommendations for citing annotations.

Finally, returning to the Stein article, he has suggested that the curation of a gene family or other set of database records should be considered akin to writing an invited review article. Its author should be provided with a citation, in an effort to credit the valuable analysis taking place in the annotations.