Automatic Content Analysis - A method coming of age

To follow up on my previous blog on Sense Making, I have been very impressed with two recent publications on automatic content analysis. These articles appear to me to be clear, systematic, and forthright in their pursuit of a better, more evidence based, less biased, and more scalable technique for making sense of text data. This is an appeal to redirect the energies of the Big Data movement towards inductive knowledge building. To someone who saw the coming and going of the data mining fad, I am still left wondering why we abandoned the desire to uncover the emerging models which might be acting, without our awareness or consent, in the human systems around us. As a former natural scientist, this is almost incomprehensible, although perhaps not when one considers the extreme reluctance in some quarters to examine some rather inconvenient truths in our natural environment.

Perhaps this is more about the battle between imposing the will of the powerful on the rest of us, against our scientific desire to uncover and understand the real machines that are being deployed in the modern world. Anyway, for those interested in the severe abuses of machine learning by the powerful, I strongly recommend this book: O'Neil, C. (2017). Weapons of math destruction: How big data increases inequality and threatens democracy. Broadway Books.

In these articles, I no longer see the half apologetic compromises being made with regards to traditional quantitative methods, which were never really suitable for such a richly multivariate and context driven data source as natural language. Nor do I see the mechanistic machine learning approach which not only neglects the severe context dependence of natural language usage, but also does not seek to enhance human understanding of our challenging world.

One of the final recommendations from Cheng and Edwards caught my eye:

With increasing volumes of data, data visualization will become an important component of ACA. However, this study demonstrates that interpretation of the visual representation takes con- siderable time and effort as using ACA requires both macro and micro insights of the phenomena to make insightful inferences (McAbee, Landis, & Burke, 2017). Thus, this research demonstrates the need for TH researchers to be comfortable in interpreting visual representations when using ACA.

This agrees with my own observations of the best way to perform sense-making with a tool such as Leximancer. One must repetitively change perspective from observing the macro patterns in the visualisation to interrogating the micro meanings in the supporting data. This is the only way to make sense of the emerging patterns. Though in essence, we are saying nothing new. This is also the method inherent in the technique of Cognitive Mapping, the most insightful of the content analysis techniques. I guess that we have combined the most useful aspects of machine learning and fed the results into the very human oriented mind mapping techniques of cognitive mapping.

The two papers are as follows:

  • Cheng, M., & Edwards, D. (2017). A comparative automated content analysis approach on the review of the sharing economy discourse in tourism and hospitality. Current Issues in Tourism. Advance online publication. doi: 10.1080/13683500.2017.1361908  Article
  • Nunez‐Mir, G. C., Iannone, B. V., Pijanowski, B. C., Kong, N., & Fei, S. (2016). Automated content analysis: addressing the big literature challenge in ecology and evolution. Methods in Ecology and Evolution7(11), 1262-1272. doi:10.1111/2041-210X.12602   Article


Sense Making - the goal of any Leximancer analysis

by Andrew E Smith

People ask me how to get take-away insights from a Leximancer analysis, so I thought I would write a post about this most important aspect of any analysis task -

Sense Making

I think that many people have lost sight of the goal of any analysis task. There is a perception that so long as you employ the correct methods, the results will speak for themselves; that is, the meaning will be self-evident. But this is never true, and the perception that nothing tells lies like statistics is the net result of this error. The meaning contained within any data set is very much dependent on the way you interpret it. Not only do you need to carefully decide what things to measure, but you need to understand what the analysis method is trying to achieve, and finally, you need to understand what this result actually tells you about your world.

This is true for quantitative data, much to the surprise of many people. Quant variables are rather difficult to understand, because their meaning is extremely dependent on how the measurement is made. More than that, the meaning and limitations of many quantitative statistical methods are often poorly understood, and it takes a very precise understanding of the research question, the data collection constraints, and the modelling process to actually come up with useful predictions and descriptions. So it should come as no surprise that the same is true for textual data.

As a text data analyst, you really need to decide what you want to achieve from this entire process. This should probably be written on the box of Leximancer, because quite a few people assume that, because they have some text, something really cool will happen if they just put this data into a tool like Leximancer. In my experience, the only value to be obtained from textual data is a more accurate understanding of the way the authors of the data viewed some aspect of their world.

So, if this is what you want from your text data, then you might like to think about what this means. To me, it seems that you would need to make sense of the various ideas presented in the data. The emphasis here is on you constructing meaning. The software doesn't make sense of your world - it isn't you, and it isn't human. You might ask what sense-making means. Roughly, it appears to involve interpreting messages written by other people who are not you, and do not share your perspective or your experiences. You are intent on seeing what they saw, and feeling what they feel, and then you want to record their perspective, so you can report this sense and insight to others. You may also then make recommendations about how this insight should modify behaviour for various reasons, including increasing customer satisfaction.

Meaning is Grounded in being human

Many old school qualitative analysts would have no problem with this explanation, but they might go further, and ask why you need software at all. I think their understanding of the task is clearer than the machine learning fraternity, because they understand the end game - that what people need from this task is to gain a clearer understanding of the perspective of others. Many machine learning enthusiasts are obsessively trying to come up with models of language which are more intelligent, or more predictive, or more something or other. But why? If the task we are engaged in is for a human to understand the thoughts of other humans, I would contend that current methods of automated text modelling, including but not limited to Leximancer, are more than adequate. The problem that remains requires the human cognitive system to appreciate and integrate a very large and complex amount of information. This is a major problem with the whole process of text analysis in my opinion. We appear to want machines to actually do all of the sense making for us. But why? Are we assuming that the machine is advanced enough that it understands what it means to be human - to have a mind grounded in our human physical reality? This is currently an absurd belief. Or are we giving up all responsibility for trying to understand the bigger picture? This appears to be the case, and this abandonment of human engagement with controlling or even understanding our own systems is infecting so many industries and professions that it is hard to know what role any of us will play in this mechanised future.

The Role of Software in Text Analysis

Anyway, if we assume that the task is for the analyst to understand the human meanings contained with their text data, the role of software such a Leximancer is easy to explain. Leximancer's entire purpose is to let the data generate a transparent model which can be interpreted by the analyst, so that this person may efficiently conduct a sense making examination of conceivably vast amounts of text. To be sure, the analyst can and should take some responsibility for what flavour of model is constructed from the data. This is a bit like pointing a telescope at different parts of the sky to focus on different parts of the data.

My sense-making and reporting Process

To conclude with some concrete advice, my suggestion for the best way to perform this sense making and reporting within a Leximancer model of a data set is as follows:

  1. Obtain a Leximancer map which:
    1. Only analyses the sections of your data that could be conceivably relevant to your research question.
    2. Uses a thesaurus (code book) that contains concepts relevant to your research question, and has enough detailed concepts to sufficiently code the relevant text segments.
  2. Increase the size of the theme bubbles until you get maybe 3 to 6 theme bubbles.
  3. I normally take a capture of the theme bubbles without their constituent concept names at this point to include in the report. This nicely frames the sense making process to follow, without either overwhelming the reader or risking them jumping to detailed but erroneous conclusions..
  4. Now refer to the Analyst Synopsis to the right of the concept map. Your task is now to make sense of what each theme bubble is trying to say. The Analyst Synopsis presents the concept list and the top five text segments which help explain each theme bubble. Read these.
  5. Try to put the meaning of each theme into your own words. If you need a clearer sense of what is being said, try clicking on the list of concepts under each theme name to reveal all the text segments which explain the theme. Read a sample of these until you stop being surprised. Now try explaining the consensus of these themed texts in your own words - in a paragraph. Grab some of the text segments which clearly describe this theme as examples - you can add them to the Leximancer log book.
  6. Write a summary of the meanings of the themes which includes your own explanation of each theme plus the supporting quotes from the text.
  7. You can optionally include a graphic of the full concept map at the end, including the constituent concept names, once the sense making process has been presented. The full map means so much more now to the reader, after they have read the sense making, and erroneous conclusions are less likely to be reached by a premature inspection of just the map.

This is the procedure I have used for reporting the meaning of a Leximancer text analysis. You may have discovered your own favourite approach, but whatever the case, the goal should be to understand.



Leximancer Version 4.5: Faster, Clearer, and more Accurate.

Leximancer v4.5 includes 10 new features that make it even easier, faster and more insightful for research, marketing and customer experience management, business forecasting, informing public policy, and many other industry-specific applications.

We’ve supercharged the Leximancer engine and introduced a project control panel to simplify how users configure the software for their needs.

The new Leximancer control panel makes your model building clear and intuitive.

The new Leximancer control panel makes your model building clear and intuitive.

The new Synopsis function is a useful tool for analysts. It condenses the gist of large document collections to explain and complement Leximancer’s unique concept map capabilities, plus all reporting of comments can be automatically bucketed for more valuable customer feedback insight.

If you need a one page summary of the complete meaning of a large document collection, the new Leximancer Analyst Synopsis is the perfect solution.

If you need a one page summary of the complete meaning of a large document collection, the new Leximancer Analyst Synopsis is the perfect solution.

Improved text extractors for pdf, doc, docx, and html documents will also please users. Because our customers use Leximancer for all kinds of decision-making, they need to analyse text from a variety of sources, like customer surveys, interview transcripts, long reports, research papers, web pages, feedback forms, tweets, and so on.

Other features new to Leximancer v4.5 will improve how Leximancer integrates with existing systems, such as relaxing project and document file path name restrictions, and removing the need for Java to be installed separately.

Leximancer v4.5 also supports Windows 7, 8, 10 and Mac OS X (10.9+) platforms with IE 11, Firefox  45+, IE Edge and Chrome 50+, and Safari 9+ browsers.

Leximancer Pty Ltd, Brisbane, Australia, ACN: 116 218 109