This project got me working a long time, longer than thought.
The text selected to work with, was the Bible.
This because it is one of the oldest text available today, it has a lot of history, has been center of attention for
many studies, it is the guide book for the religion of largest number in the world, it was also the first book printed on a type press.
It is also interesting to me, that it has been a book that for a long time was read and interpreted by a few to many. It has been used to control and guide western culture in various ways.
I did research on the internet, about the analysis of this book, I had never done it. And I was very surprised to find so many projects and efforts and money and man time and IT invested on the study and processing of it.
One of the sites I visited was visual complexity where I found this. Which led me in other different directions.
One that caught my attention and kept me working for a long time, before realizing that I was getting into something very complicated, was the semantic bible . Here I Found some xml files that I tried to parse in order to do some visualizations regarding POS and semantic meaning, but it turned out to be very difficult.
I also found this link, where I was able to get a glimpse of the level of difficulty and seriousness regarding semantics and interpretation. I found the Open Scripture Information Standard , read about OWL , Mr Steve DeRose, URI s and the semantic web which I found really, really interesting…but that got me of the track of the POS visualization exercise.
I tried shortly to look for Spanish POS corpora, without luck, but found this, that talks about Automatically Inducing a Part-of-Speech Tagger by Projecting from Multiple Source Languages Across Aligned Corpora…(a little out of my league for now, but Heather might find it interesting)
I went back to try a less complicated analysis of POS from the bible text, and found out about the many versions of the bible in different languages, in the bible gateway. I was surprised to see so many versions.
So, this finally made me focus on doing a POS visualization to compare the same chapter of the bible in different versions, and be able to “visually” compare and “see” the difference between them, and to be able “navigate” through the POS analysis.
I used Brown corpus tags, and started to work from the visualization file provided by Heather.
I designed a visualization based on circles, that way I think there is less hierarchy implied in the design or positioning of elements in the visualization. Plus the concentric circles and the interaction gives the visualization a feeling of submerging into the text.
The elements used to make the visualization:
All tags from POS from Brown corpus
A religion model, made from the Brown corpus.
A counter for each POS tag found in the text file.
A counter for how many different POS tags found in each text file.
A counter for the total number of different words used in each text file.
Words grouped by their POS identification.
An order in which each POS group is rendered. From the biggest set, to the smallest so that all the information was properly visualized.
The design of the visualization:
The circles, are concentric circles.
Each circle radius is a representation of the number of words in that POS.
The number of concentric circles is the number of different POS found in the text.
Around each circle, are the words found for that POS in the text.
The rendering: outer ring is the one POS with the most words, the inner ring is the POS with the less words. (this gave me a lot of hard time, but finally got through it with the help of the breakpoints and debug feature from eclipse, to debug the algorithm properly. It uses javas ArrayList)
There color palette selected was a gray scale. No other colors were used. This was decided to give it a more serious look not to have a more arbitrary selection for colors that could give other emotional weight to the POS analysis.
The color of the whole visualization depends on the text unique way it was written (its own POS), this is why I think it interesting to visually (through shape and color) see the differences between a same chapter of the bible.
There is no use of radom in any part of the design of the visualization of the data. This to be able to keep in the ideal that this book is “supposed to be not random”.
Each file produces a unique “color pattern”, just seeing them from afar, makes a really interesting result about how different this very important book can be.
The interaction of the visualization:
You are presented with a “tree cut” like visualization.
You can zoom in and out to be able to navigate through the POS rings and explore its words, by using a “zoom slider” on the top part of the window, without clicking the mouse.
You can click and drag the visualization to have a better exploration of the words.
You can restore to the start point at any time.
The chosen chapters of the bible:
The Genesis 1. First chapter of the first book of the Bible.
The Revelation 22. Last chapter of the last book of the Bible.
Bible English versions analyzed and visualized for POS:
Good News Translation.
King James Version.
New American Standard Bible.
New International Version.
Young’s Literal Translation.
I encourage this to make use of the interaction:
the Executable JAR for the program , the .java file, the POS .model file used, as well as the .txt files used, can be downloaded here.
The JAR and the txt file have to be placed on the same directory, the JAR uses the genesis1ProjectGutember.txt file and the brownCKmodelReligion.model file.
The .java file can be downloaded to be edited, but the linkgPipe libraries have to be installed in order for it to run.
Images from the Genesis 1 visualizations: