Automatic Content Analysis - A method coming of age

by Andrew E Smith

To follow up on my previous blog on Sense Making, I have been very impressed with two recent publications on automatic content analysis. These articles appear to me to be clear, systematic, and forthright in their pursuit of a better, more evidence based, less biased, and more scalable technique for making sense of text data. This is an appeal to redirect the energies of the Big Data movement towards inductive knowledge building. To someone who saw the coming and going of the data mining fad, I am still left wondering why we abandoned the desire to uncover the emerging models which might be acting, without our awareness or consent, in the human systems around us. As a former natural scientist, this is almost incomprehensible, although perhaps not when one considers the extreme reluctance in some quarters to examine some rather inconvenient truths in our natural environment.

Perhaps this is more about the battle between imposing the will of the powerful on the rest of us, against our scientific desire to uncover and understand the real machines that are being deployed in the modern world. Anyway, for those interested in the severe abuses of machine learning by the powerful, I strongly recommend this book: O'Neil, C. (2017). Weapons of math destruction: How big data increases inequality and threatens democracy. Broadway Books.

In these articles, I no longer see the half apologetic compromises being made with regards to traditional quantitative methods, which were never really suitable for such a richly multivariate and context driven data source as natural language. Nor do I see the mechanistic machine learning approach which not only neglects the severe context dependence of natural language usage, but also does not seek to enhance human understanding of our challenging world.

One of the final recommendations from Cheng and Edwards caught my eye:

With increasing volumes of data, data visualization will become an important component of ACA. However, this study demonstrates that interpretation of the visual representation takes con- siderable time and effort as using ACA requires both macro and micro insights of the phenomena to make insightful inferences (McAbee, Landis, & Burke, 2017). Thus, this research demonstrates the need for TH researchers to be comfortable in interpreting visual representations when using ACA.

This agrees with my own observations of the best way to perform sense-making with a tool such as Leximancer. One must repetitively change perspective from observing the macro patterns in the visualisation to interrogating the micro meanings in the supporting data. This is the only way to make sense of the emerging patterns. Though in essence, we are saying nothing new. This is also the method inherent in the technique of Cognitive Mapping, the most insightful of the content analysis techniques. I guess that we have combined the most useful aspects of machine learning and fed the results into the very human oriented mind mapping techniques of cognitive mapping.

The two papers are as follows:

  • Cheng, M., & Edwards, D. (2017). A comparative automated content analysis approach on the review of the sharing economy discourse in tourism and hospitality. Current Issues in Tourism. Advance online publication. doi: 10.1080/13683500.2017.1361908  Article
  • Nunez‐Mir, G. C., Iannone, B. V., Pijanowski, B. C., Kong, N., & Fei, S. (2016). Automated content analysis: addressing the big literature challenge in ecology and evolution. Methods in Ecology and Evolution7(11), 1262-1272. doi:10.1111/2041-210X.12602   Article


Leximancer Pty Ltd, Brisbane, Australia, ACN: 116 218 109