Text data is hierarchical, with scale-free statistics

Leximancer concept networks are uniquely fit-for-purpose when it comes to capturing models of text.

An inspection of any Leximancer concept network shows that we model text data as an hierarchical network of concepts. This can surprise some people, who wonder why text is not modelled using the usual bar charts or pie charts that appear in data reports.

I felt that it was worth re-stating that text naturally forms a hierarchical network of concepts, of a sort referred to as a Small World Network. There is plenty of research which supports this observation (see references at end).

Small World networks are in fact very common in the world. The Internet forms one. A social network of people is another example, as popularised by the game Six Degrees of Kevin Bacon.

These networks have many weakly connected nodes but also have a core of a few highly connected hub nodes. These hubs are the popular nodes, the people we know who know everyone and have big parties. If we want to get a message to someone we don’t know, we would go first to the more popular nodes to ask where the message should be directed to. This is how people do it, and this is also how networked computers do it.

Small worlds have an unusual distribution of word frequencies - it is shaped like the function y= 1/x, the power law curve, and has no measure of scale like a standard deviation. This is called Scale Free.

An interesting question is: how does the small world nature of language serve our brains when we write text or read text. It seems unavoidable that it assists our minds to navigate the neural networks of concepts in our brains - how to efficiently connect-up ideas into narratives, perhaps not too different from messages being routed from one distant host to another on the Internet.

In this way, a Leximancer concept map can show how a set of disparate ideas can be connected to each other via hub concepts to efficiently form the multiple narratives contained in a text data set.

 
 

References

Cancho, R. F. I., & Solé, R. V. (2001). The small world of human language. Proceedings of the Royal Society of London. Series B: Biological Sciences, 268(1482), 2261-2265.

Cong, J., & Liu, H. (2014). Approaching human language with complex networks. Physics of life reviews, 11(4), 598-618.

Masucci, A. P., & Rodgers, G. J. (2006). Network properties of written human language. Physical Review E, 74(2), 026102.

Senekal, B. A., & Geldenhuys, C. (2016). Afrikaans as a complex network: The word co-occurrence network in André P. Brink’s Donkermaan in Afrikaans, Dutch and English. Suid-Afrikaans Tydskrif vir Natuurwetenskap en Tegnologie/South African Journal of Science and Technology, 35(1), 9-bladsye.

Previous
Previous

Nothing is capable of making mistakes more invisibly or at such scale as AI / Machine Learning

Next
Next

Live Concept Mapping of Meetings