Information extraction and the missing Mark2Cure module
In our previous post, we asked readers, 'What is your preferred moniker?'. Here is the response:
Mark2Curator: 36%
Citizen Scientist: 36%
Contributor: 18%
"Anything BUT volunpeer": 10%
Although it may seem a little strange that researchers have been struggling to find an answer to the "What's in a name?" issue for discussing citizen science, this struggle is a deeply representative of some of the important work biocurators do.
Researchers need a common vocabulary to be able to coherently exchange information, but settling on that vocabulary--on how that vocabulary is structured is difficult. Without a common vocabulary, it is easy for scientists to miss research that is valuable to their field of study. Although it has yet to be seen how the citizen science research community will settle this issue, in biomedical research, biocurators help with that sort of determination. Biocurators help standardize terms, define the rules governing how terms are classified and how they are organized. In doing so, they facilitate information quality control and exchange. Biocurators do all this and more.
Given that biocurators do very important, very tedious, and often very difficult work, one question we get quite a bit is:
"How is it possible to train citizen scientists to replace such important, skilled researchers?"
But this question is built on a fundamentally incorrect assumption about the goals of Mark2Cure. We KNOW biocurators do very important work, and that one of the most tedious, and time-consuming things that they do is information extraction.
Information extraction can generally be broken down into three tasks:
1. Named Entity Recognition (identifying and classifying words/phrases in text)
2. Normalization (linking that text to an ontology)
3. Relationship Extraction (identifying the relationship between different entities).
We want to train citizen scientists to help with this task, so that biocurators can apply their unique training towards solving problems in biomedical research analogous to the ones we're seeing in the citizen science field.
Since Mark2Cure is a citizen science project, the "What's in a name?" issue applies to us as well. Although our informal poll was only for fun, I was personally very happy with the results for two reasons:
1. I am a fan of wordplay, and I love that many users liked the term Mark2Curator--a term which blends Mark2Cure and biocurator.
2. Even if I'm reading too much into it, I like to think that our users picked 'citizen scientist' or 'contributors' because they feel that the help they provide to Mark2Cure is important--because it is.
If you've gotten this far, you are probably one of our many astute readers and may have noticed that information extraction was divided into THREE tasks, when Mark2Cure only has TWO. Where is the third task? Why is it the missing task is the step in between the first and the last task?
The missing task, 'Normalization', is the task in between NER and Relationship Extraction. We started with NER because NER has been well-investigated so there was a solid foundation for us to build upon. We followed with the relationship extraction task because this would allow us to unlock some of the most difficult to access and valuable information in the text.
As for the Normalization task...it's currently in being built by volunteers. Mark2Curators have been helping us investigate NER mappings to different ontologies, and a very talented programmer and machine learning expert has been busy building the Normalization module. But we could use more help. We need feedback on potential interfaces for how parts of the module might work. If you'd like to help with that, answer the poll in our newsletter.
Of note for our U.S.-based Mark2Curators over 65 years of age.
Did you know? US National Park Services has a lifetime pass for seniors that will allow you to enter or park at US national parks for free or at a discounted rate. These passes only cost $10 now through August 27th. After August 28th, the price will go up to $80.
If you enjoy hiking, nature, or plan to visit any of our beautiful national parks, you may want to get your pass while it's still $10. In San Diego, the closest national park where you can purchase one in person is Cabrillo. To find the national park closest to you, visit the NPS's site. If you don't live near a park, but plan on visiting some in the future, you can purchase a pass by mail or online.