Text Analytics, the Bible and the Qur'an

A company, offering data mining and text analytics software, recently used text analytics on three Abrahamic religious texts (the Old and New Testaments of the Bible and the Qur'an) in order to determine whether the Qur'an is substantially more "violent" than it's Judeo-Christian counterparts, and also presumably to demonstrate their product through some eye-catching headlines.

This kind of analysis comes with a lot of caveats and cautions, and to the credit of the company involved (Anderson Analytics) there is an attempt to explain some of the limitations of the study. This is worth examining not only for the results but also as an example of the benefits and limitations of text analytics.

Image Credit: Ranoush


Text analytics is a method of data mining to extract high-quality information for a text document. This is more complex than simply counting the number of "violent" words in each document, text analytics involves statistical pattern learning to "teach" programs to understand the text in a more meaningful manner than the individual words. This is done using a range of techniques, including statistical inference, structural and sentiment analysis, morphological segmentation and others.

Text analytics is not necessarily free of human judgement, research can be conducted both bottom-up and top-down, with bottom-up research being data driven and top-down research using pre-identified concepts and themes. The Abrahamic text study was largely conducted using top-down analysis.

The potential benefits of this type of natural language processing is that language, particularly the often poetic and allusive language of the Biblical and Qur'anic texts, can be turned into data suitable for conventional analysis (including traditional quantitative methods).

We can take two phrases “they have nothing to fear” and “strike fear into their hearts" that do occur in the Qur'an. While both invoke the concept of fear, one aims to reassure and reduce fear while the other is an exhortation to create fear. Without sufficient processing, the common use of the word fear itself signifies very little.

Without this natural language processing, machine-driven text analytics would be of limited value.


The results of the study are fairly interesting and run counter to some of the popular discourse regarding religion.

The outcome of sentiment analysis (evaluating intended emotional communication in the text) yielded broad similarity amongst all three texts, being approximately 30% positive sentiment and 20% negative sentiment.

A look at emotional analysis (evaluating distinct emotional states) gives us more differentiation between the texts, as shown by the side-by-side comparison below.

Direct link to source
There is some similarity between the Old and New Testaments, albeit with the New Testament scoring far higher for trust and marginally higher for joy while the Old Testament rates higher for anger and sadness.

The Qur'an scores higher for trust than the Bible (far more so than the Old Testament, less so for the New Testament) alongside Fear/Anxiety. The Qur'an is squarely between both Testaments for Anger, with the Old Testament rating higher and the New Testament lower.

More traditional forms of analysis also produces some interesting results. The Old Testament contains the greatest number of references to violence and killing (5.3%) but the New Testament slightly edges the Qur'an for the same (2.8% over 2.1%). 

The concept of love is most prevalent in the New Testament at 3.0% with the Old Testament (1.9%) and the Qur'an (1.2%) behind.

Forgiveness/Grace feature more prominently in the Qur'an, at 6.3% of the text, with the New Testament (2.9%) and the Old Testament (0.7%) less focused on the concept.

The interpretation of the researchers was that, while the Old Testament appears to be "significantly angrier" and contains more references to violence, there are considerable commonalities between the texts. The Qur'an and New Testament especially are similar in both "positive" emotions (such as mercy) and "negatives" ones (such as dealing with enemies).

The headline question of "is the Qur'an more violent than the Bible" is answered in the negative.


As the authors admit, this is best seen as a "30,000-ft, cursory view of three texts" as there are a number of significant limitations.

The study itself is exclusively concerned with the Bible and Qur'an (which are central texts of their respective literatures rather than the sole texts), drawing wider conclusions about religious practice or motivation would be deeply problematic. Examining a text tells us very little about the context in which is read and the relationship between reader and text, alongside the issues of interpretation and theology that surround religious literature.

Even with the understanding that this is a simply comparison between particular texts, there is more human involvement that it may appear. Significant choices must be made, both about the concepts chosen in "top-down" analytics and in decisions regarding particular issues. 

The high association of the Qur'an with the concept of mercy appears to be partially related to the use of the terms "the most Merciful, the most Beneficent" that sometimes appear after the name of God. The decision to include this in the analysis rested on the argument that the non-traditional choice of honorific (the Bible and other religious traditions are more likely to use "Lord" or "Almighty") is itself significant and worthy of inclusion. 

While there is reasoning behind the decision, it is still important human input in the process, seeing this study as a purely machine-driven process would be misleading. 

There is also a lack of context given to the text as historical and religious scripture. Although both the Bible and the Qur'an are often treated a fixed entities in the popular imagination, both are historical texts that were compiled over time (decades for the Qur'an and centuries for the Bible). An attempt to parse and compare the texts does require significant context to understand them as both historical and religious scripture.

Image Credit

Violence itself is not a fixed concept, certainly some forms of violence are widely accepted as legitimate and others as illegitimate. Historical context is usually to interpret discussions of violence in scripture, just as context is used to interpret violence in the world around us.

Context can change our understanding of passages such as Matthew 10:34 from a threat of physical violence to a widely understood (at the time) motif of judgement or exhortation to moral struggle. Qu'ranic verses such verse 2:191 can go from a cry for violence to enforcing restraint on conduct during war-time and a stern warning against overzealousness.

Beyond contextualising references to violence, we can also acknowledge that violence itself is neither always binary (e.g. "good/bad" or "defensive/offensive") nor a fixed concept. Violence in religious texts, particularly Abrahamic scripture, is far more ambiguous and complex.

The central act of the violence in the New Testament is undoubtedly the crucifixion itself. This act of torture and execution is not reviled as an act of injustice or a wrong, but rather a far more complex set of ideas are drawn from the event including forgiveness, redemption and atonement. As George Bataille argued, perhaps the crucifixion is "the greatest sin and the greatest good, the most violent and the most communicative image in Christianity".

While text analytics can be extremely useful in many contexts, there are limitations to the depth of understanding that be gained in cases such as this. While providing some interesting data, discussions of violence and scripture require greater context, nuance and detail.


No comments:

Post a Comment