Voyant: A Reflection and Guide

Voyant, “a web-based tool” for “reading and analysis” of text, allows the user to discover the frequency of words in each document and corpus, which is a collection of documents (Voyant Tools). It displays visual components to help the user to examine and analyze words for text mining and topic modeling. In addition to the word cloud (Cirrus), Voyant displays several other visual displays or tools such as Summary, Reader, Trends, and Contexts. As an interactive tool, Voyant allows the user to select a word in the Cirrus and view the number frequency of the selected word.   The Summary provides the provides information about the texts such as the how many times a word appears in each document and which words are distinctive in certain documents. The Reader displays the documents or corpus in which the selected word/words appear. The word is highlighted throughout the corpus. The Reader provides a visual reference for the frequency of the word/words within the corpus. Trends display a line graph to provide another visual representation of the frequency of the word/words in a document or corpus. The Contexts is an interesting tool that depicts how word/words appear frequently in various parts of the document or corpus. This tool “shows the surrounding text of the selected word/words” (Voyant Tools). All five tools in Voyant provide a visual reference to depict the frequency or occurrence of the word/words in a document and corpus. It helps the user further analyze the frequency and distinctiveness of words. Voyant engages the user to think beyond what is on the surface of viewing the frequency of the words; It questions the impact of the words, depending on location, time, etc.

For my Voyant activity, I copied the text files provided by Dr. Roberston.  It was a dataset from the WPA Slave Narratives collection, which includes over a thousand interviews with former slaves from seventeen different states from 1936-1938. I copied the .txt files of the transcription, which comes from Project Gutenberg, and pasted them into Voyant. After selecting the “Reveal” button, the next page became a plethora of visual displays for examination and analysis.   What caught my attention the most was Voyant’s ability to provide different visual results, which allowed me to view the words in the selected text files in a different way. The words were no longer just words; instead, they became visually significant. The Cirrus tool, a visual word cloud, displayed the frequency or occurrence of the words in different colors. The size of the text indicated the frequency of the word. I also exported the word cloud for a single view of the tool to study it in more depth.

For the first activity, I selected two words, “come” and “one,” that were higher in frequency or higher occurrence in Cirrus. Then, I viewed the Trends graph to see the visual display of each word within a certain state and document.   I was able to view the word “one.”  Voyant would not allow me to scale the 2nd word “come.” The Trends graph becomes blank, and the reader highlights another word. By examining the Trend graph for the first word “one,”  I noticed that “one” appeared less frequently within the documents that I selected for Texas and Tennessee. However, its frequencies were higher in other documents related to states such as Oklahoma and Missouri. It might be that either the word was used less by the interviewer or the interviewee. Also, the transcriber in a specific state either used the word more frequently than the other states.   The frequencies of the word can be subjective because the selected context for old can be based on someone’s point of view and not necessarily the interviewee.

When I added more text from the list provided, I noticed that there was a change in the word cloud. For this activity, I selected “dey” and “dat” for the two most common words that appear in the Corpus. I referenced the Cirrus/Word cloud. For each different state (Texas and Mississippi) that is selected for the documents, the Trends graph displays the frequencies of the word for each state. However, when the Trends graph is exported for the 2nd document for each word, the graph looks similar to the first graph. I do not know how to fix the error in Voyant.   When I look back at the Trend graphs (before export) for each word and each state, I can see the changes in frequency of the selected word.   Depending on the document segments, the frequencies of “dey” and “dat” differ for each state. The interviewees in that particular state used that word more frequently than the other interviewees from another state.   For example, “dat” is used more frequently in Texas than Mississippi because the trend graph shows higher frequencies of the word in the documents. There is a lower frequency of “dat” in document 4 for Texas. For Mississippi, the lowest frequency of “dat” is in document 12. Another example would be the frequencies of “dey” that appear in the documents from Texas and Mississippi. The trends graphs show high frequencies of the word for both states. However, the word appears less frequently in document 4 for both states. For Mississippi, there is another low frequency of “dey” in document 12. Both states show less frequency of “dey” in document 17.   The Trends graphs for the frequencies of the “dat” and day” in TX and MS show that word usage differs between states in the South. Also, the context and dialect play a role in the frequency of the words being commonly recorded by the interviewers.

For the distinctive word activity, I viewed the Summary, Trends and Contexts more than Cirrus. The distinctive words inform me that the recording and transcription of them are subjective. The meaning of the distinctive word is used differently, and it depends on the context.   The word “ta” is distinctively used in Missouri, and it is used as a preposition. Instead of “to,” it is “ta.”  When it is compared to other states, “ta” is rarely used in the other states; therefore, “ta” is distinctive of Missouri. The second word “hoo” is used more frequently in Kentucky, and when it is compared to other states, the graph shows that it is also rarely used in other states. Therefore, “ta” and “hoo” are words distinctive of Missouri and Kentucky.

Voyant is an engaging digital tool for examining and analyzing text from a document or a collection of documents (corpus). It provides a visual reference for looking at text from a different perspective. It has several tools for the user to see the visual results of the frequency of words. Voyant is not perfect, but it is a great tool to for visualizing text in different ways. Despite the minor setbacks (mostly due to technical issues/glitches), Voyant goes beyond the visualization of words. It allows the user to think about words in the document or corpus and begin to question how they affect certain social, political, and/or cultural aspects of humanity.

 

 

 

 

 

 

 

A Guide to Digitization

There are certain things to consider when digitizing items.   First, the user needs to consider the type of item, whether it is text or a physical object.  When a text in a book or print source is captured, the focus is on the text.  For the photograph of an object that implies a three-dimensional view but displays a two-dimensional view, the user might consider capturing the object with a different digital tool.  Second, the user has to consider the type of background and lighting for the item.   For example, certain objects are captured best in dim or bright lighting.  Paul Conway contends, “Ideally rendering decisions take place under controlled lighting and through a carefully calibrated computer monitor, tools that may not be readily available to the most skilled user” (“Building Meaning” 5).  Dark or light backgrounds can enhance or diminish the visual display of the text or object. This part of the digitization process becomes subjective and based on the user’s skills and experience.   Third, the user needs to consider that a three-dimensional view of the item is impossible to capture in a photograph.  However, a 3D view of the item can be best captured with a video because it shows all sides of the item from different angles, depending on the user’s experience with the video camera.  The user can also record sound of the item by using video to enhance the visual experience and evoke an additional sensory detail of an item.

Photographs capture the images of the items by providing a visual reference for them.  By making a video of the text, the user can guide the reader to zoom in and out of the image.  Manoff cites N. Katherine Hayles by explaining that “a reader, viewer, or listener’s experience of a text is shaped by its material characteristics” (“Materiality of Digital” 313).   The still image or photograph of the the text allows the reader to view it by inviting his or her own experience with textual resources.  On the other hand, photographs do not capture other sensory details such as smell and some texture.

The video is the best form of digitization of different items. It provides a better visual of the item from different angles.  It displays close ups of the item, enhances the view of the texture, and it records sound.  Similar to the photograph, the video does not capture smell.  Depending on the videography, the item’s size can be determined.   On the other hand, the photograph is a more suitable digitization of the text item since it is a flat visual source, and the user is focusing only on the text.  The OCR is a great tool for enhancing the photograph of the text by capturing an visual image as if it was viewed in real time.

Working with digitized representations of items teaches scholars and digital humanists that capturing an item in its original form changes once it is captured as a photographic image and video.  Something does get lost in the digitized translation of each item.  According to Melissa Terras’ article, “Digitisation and Digital Resources in the Humanities,” “Digitisation programs aim to create consistent images of documents and artefacts which are fundamentally individual and inconsistent, presenting a variety of physical attributes and capture requirements to the digitiser.”  Perhaps, the guidelines for digitization will become uniform for the sake of consistency; or they may change depending on the purpose, research techniques, the availability of the digital tools at the time, and rise of new types of artifacts for digitization.

 

 

css.php