Niek Veldhuis

Abstract: Clustering Sumerian Literature

By Niek Veldhuis, University of California, Berkeley

American Oriental Society Meeting 2017
Saturday March 18th afternoon - ANE IV, Bunker Hill Room
Omni Hotel, California Plaza, 251 South Olive Street, Los Angeles

The definition of the Sumerian literary corpus is open to debate. Traditionally, "literature" was understood primarily as an aesthetic category, describing texts that use parallelism, metaphor, and other stylistic devices (see, for instance, Edzard 2004). More recently "literature" has been discussed in functional terms, as a group of texts used in scribal education (for instance Tinney 1999; Veldhuis 2004), or used in the production of royal self-representation (for instance Michalowski and Rutz 2016). Whatever the result of this debate will be, it is fairly obvious that the group of compositions once considered "Old Babylonian Sumerian Literature" is not a single entity. Some compositions were copied widely and frequently, others are attested only in a single exemplar. The present contribution will use computational methods to create clusters of compositions, based on their vocabulary. These clusters may be used as a starting point for further discussion of how clusters cohere and differ from other clusters and, ultimately, what that means for our understanding of Sumerian literature.

The analysis will use two types of clustering: hierarchical clustering and K-means clustering. Hierarchical clustering is most effective for a relatively small set of data points. The corpus of texts that will be subjected to clustering is the group of heroic tales around Gilgamesh, Enmerkar, and Lugalbanda. It will be shown that the Lugalbanda and Enmerkar stories form two coherent and closely connected groups, but that the Gilgamesh stories are scattered all over the place. This may indicate that the Gilgamesh narratives developed over a longer period of time, reflecting different needs and contexts. K-means clustering is useful for larger data sets and will be applied on the entire corpus of some 400 compositions. The resulting clusters partly overlap with the group of 'core-curricular' literary texts. Literary compositions in Sumerian may be differentiated according to genre or according to use. Clustering by vocabulary may add one more dimension to this debate.