Opinion: Text Mining Medicine

Researchers should scour historic medical archives to discover knowledge that could inform today’s biomedical research and clinical practice.

By | June 25, 2012

Lin Kristensen" > Wikimedia Commons, Lin Kristensen


The medical world’s complexity contains a plethora of specialized terms that are inconsistent and may overlap. Since these medical terms are sporadically introduced by researchers in different geographical and temporal contexts, this may cause the meaning of terms to change or make terminology ambiguous or nonexistent. Such ambiguity in clinical practice guidelines leads to inconsistent interpretation and, in turn, to inappropriate treatment decisions and medical errors.

One solution is the creation of a medical ontology, or a set of standardized medical concepts. But standardizing terminology is easier said than done. Today’s medical language is living and complex, with new terms and medical fields constantly being created. As these new terms and fields evolve, earlier indexing may be incomplete or inappropriate, and may later cause misinformation or miscommunication. For example, the word “cold” can be interpreted in several ways depending on context. It can refer to an upper respiratory infection or it may have the other possible meanings such as an absence of heat, sensation produced by low temperatures, feeling or showing no enthusiasm, or the state of unconsciousness.

MeSH, the US National Library of Medicine’s controlled vocabulary for indexing articles for MEDLINE and PubMed, has made one of the biggest efforts to standardize medical language. Actively maintained by the National Center of Biotechnology Information, MeSH is one of the oldest computerized controlled vocabularies used by libraries. Even this document, however, has cross-referenced terms incorrectly due to changes in terminology. Furthermore, this and other efforts to standardize vocabularies have a significant amount of hand crafting, which leads to a certain level of subjectivity. Biases based on personal experiences, cultures, and domains of expertise can influence the medical indexing, such as MeSH. Some experts may introduce terms specific to a geographic region or organizational culture, for example, which may not be consistent in other similar professional collections. Studies have shown that miscommunication occurs frequently due to vague terminology or terms that have multiple meanings due to context and personal preference, which may result in inappropriate variation in medical practice and even medical errors in the worst case. Last but not least, MeSH has not catalogued any documents prior to 1950.

To create a more robust ontogeny, researchers should rely more heavily on text mining, the inter-disciplinary research field that discovers knowledge from large-scale unstructured text collections. Scouring historic medical archives, text mining techniques can explore possible connections between disparate terminologies that can lead to detect terminology changes overtime, uncovering inconsistencies and ambiguities in the MeSH and other medical controlled vocabularies to help reduce miscommunication.  Such efforts could also reveal trends in medicine’s past that may lead to insights relevant to today’s medical practice. Better understanding the potential risks of infection in the workplace, for example, could encourage practices to reduce those risks.

Biomedical text mining has become a core component of bioinformatics to discover useful information hidden in collections of genomic information, small molecule interactions, and other large datasets. For example, the Gene Ontology (GO), which is the result of collaborative work to make consistent descriptions of gene products in multi-heterogeneous databases, provides an aide to the discovery of new gene functions based on sequence data.  Mining historic medical archives for intelligent terminology management could have a similar impact on the field of medicine, resulting in the discovery of new treatments and improving our understanding of the evolution of medical practice.

But before we can mine historic archives, they must be digitized. Since the American Civil War, advances in surgery and other treatments have changed the practice of medicine from guesswork to scientific methodology. The mid-19th century was a time of dramatic and innovative development in medical treatments. Records such as the Bellevue Hospital’s casebooks, spanning 1860-1940, offer patient information, including medical histories and descriptions of complaints, diagnoses, treatments, and medication. But few of these historic collections are currently available in digital form.

Once established, however, text mining techniques could augment the MeSH controlled vocabulary with additional terminology and definitions that represent the medical language of today. This would serve to improve the understanding of concepts through historic reflections and the possible recognition of potential misunderstandings in our current knowledge. Increased cross-referencing of other medical artifacts (e.g., paintings, sketches, and medical instruments), in turn, will increase the richness of associated material for educational and research purposes. By providing digital versions of these artifacts, medical libraries will be able to provide more effective databases for exploring the histories of medical procedures and ailments they treated.

Min Song is an associate professor in the Department of Library and Information Science at Yonsei University.

Add a Comment

Avatar of: You



Sign In with your LabX Media Group Passport to leave a comment

Not a member? Register Now!

LabX Media Group Passport Logo


Avatar of: Paul


Posts: 1457

June 25, 2012

It is remarkable to see many crude ideas tried in the past decades, long before the technology was perfected, failed simply because of the crudeness, and then  buried forever with the phrase, "It didn't work."  There are lots of millionaires out there who brought things back from the grave.  It is unfortunate that many young scientists and engineers rely solely on the results of only twenty or so years of information in computer searches.

Paul Stein

Avatar of: Lydia Witman

Lydia Witman

Posts: 1457

June 25, 2012

Thank you, Min, for highlighting the topic of using automated text mining (and it's relative, "natural language processing") for the discovery of new knowledge. I agree, more historical material needs to be digitized -- that isn't my field, but my impression is that progress is being made there.

I'm also no informaticist, but I imagine it would be the UMLS (not MeSH) that would be used by the computerized system to uncover connections across the different silos of information. The UMLS and smaller thesauri/ontologies, such as SNOMED, are already being used for this purpose in other computerized information retrieval systems. To read more about UMLS, see http://www.nlm.nih.gov/researc... (or, for what it's worth, the Wikipedia entry at http://en.wikipedia.org/wiki/U... ). 

Thanks again! You might try sharing your ideas with those who study the history of medicine, and the librarians and archivist who work with them -- there are some good e-mail list-servs -- see http://www.nlm.nih.gov/hmd/res... and http://www.gegensatzpress.com/....
-Lydia (hospital librarian)

Avatar of: Karel Petrak

Karel Petrak

Posts: 2

June 25, 2012

I could not agree more! For example, it is quite shocking to see so many "scientific publications" making false promises about the use of "nanoparticles" for drug delivery and therapy. It is very often quite evident that the researchers in question are either unaware of a vast published literature on particles in drug delivery in general; or if they are aware, they did not understand it. Instead, they hide behind the trendy popularity of "nano" and pretend that giving an "old horse" a new name will win the race. Not drawing on past knowledge is wasteful, ineffective, and lazy, bordering on incompetence and dishonesty.

Avatar of: Lydia Witman

Lydia Witman

Posts: 1457

June 25, 2012

Thank you, Min, for highlighting this fascinating field of study (text mining for knowledge discovery). 
I expect it most likely wouldn't be MeSH but would be the UMLS, or SNOMED, that would be used by the computerized system to uncover connections. UMLS is already being used for this purpose in other databases. See  http://www.nlm.nih.gov/researc... or http://www.nlm.nih.gov/researc... for more information (or, for what it's worth, the Wikipedia entry on UMLS here http://en.wikipedia.org/wiki/U... ). The U.S. National Library of Medicine maintains UMLS and is also still involved in SNOMED.

-Lydia Witman (Medical Librarian)

Avatar of: JimBobToo


Posts: 1

June 25, 2012

I agree. I originally came up with the idea for patenting a group A strep for the production of medical-grade hyaluronan from reading an old Lancefield article from the early 1900's that was buried in the stacks of Diehl Hall library at the U of MN Mpls campus.

Avatar of: Seth Grimes

Seth Grimes

Posts: 1

June 26, 2012

Interesting error, "To create a more robust ontogeny..."

Avatar of: Petrifiedhippy


Posts: 1457

June 26, 2012

well put. As I had this conversation in a prism-ed way... the *younger* generation has not a need nor a want... for any thing beyond """"
twenty or so years of information in computer searches. """"""" 

I have a personal snootiness on different levels of all education; knowledge; funding.... BLAH

Unless weallofus care care a whole awful lot... "nothing will get better..." 

Avatar of: Lydia Witman

Lydia Witman

Posts: 1457

June 27, 2012

I'd guess UMLS would be a better language to use than MeSH; UMLS is already being used as a metathesaurus in similar database connections.

-Lydia (hospital librarian)

Avatar of: alexandru


Posts: 1457

June 27, 2012

Excellent article!

"As these new terms and fields evolve, earlier indexing may be incomplete or
inappropriate, and may later cause misinformation or miscommunication."


I give you an exemple of "miscommunication that occurs frequently due to vague terminology or terms that have multiple meanings due to context and personal preference".

Do you know how God "formed a woman from one man's rib"?

He formed Eve mtDNA using Adam mtDNA, existed only in xiphoid process (one man's rib). I found the best description in the world for this explanation in an old "organizational culture (Paul, Hebrew 4.12), which may not be consistent in other similar professional collections".

For "nanoparticles" and nano-biology you can also found some information in my research. The actual presence of paternal mitochondrial DNA in the sperm cells of boys "naturally born, not in vitro made", combined with information from the bible, Paul - Hebrews 4.12, Paul - 2 Chorintiens 2.1, and Luke 8.11, genetically certifies Genesis and led me to the development of the theory about the inheritance of paternal mitochondrial DNA (Mitochondrial Adam DNA data transmissions theory - - ISBN 978-606-92107-1-0), which completes the Eve mtDNA theory.

Popular Now

  1. Man Receives First In Vivo Gene-Editing Therapy
  2. Long-term Study Finds That the Pesticide Glyphosate Does Not Cause Cancer
  3. Researchers Build a Cancer Immunotherapy Without Immune Cells
  4. Research Links Gut Health to Neurodegeneration
    The Nutshell Research Links Gut Health to Neurodegeneration

    Rodent studies presented at the Society for Neuroscience meeting this week tie pathologies in the gastrointestinal tract or microbiome composition with Parkinson’s and Alzheimer’s diseases.