Social Tagging Medieval History

An innovative use for social tagging has been applied to researching “Persons and Things in Medieval Europe,” a history course taught by Professor Dan Smail at Harvard. Using a tool called the Collaborative Research Tool (developed with the assistance of the Instructional Computing Group), student teams explore a sampling of all the genres of sources available from a given region, ranging from archaeological site reports and art to chronicles, private acts, and saints’ lives. The students are assigned groups of sources, in translation, from a given region and when they encounter passages of interest, they create and tag virtual note cards that are added to a database. Final research papers and even the lectures themselves are based on the primary sources compiled by students on the note cards.

Below is a sample virtual note card from the course:

virtualnotecard.gif

Social tagging offers is a unique approach to collecting and analyzing the broadly distributed primary source materials for medieval European culture and society. By tagging key passages from a variety of works and entering them into a database, students can compare how different medieval texts describe, for example, dress or warfare. The note cards become the basis for a complex intertexual discourse on a broad range of medieval topics. Also, by having students tag the virtual cards using their own language, they begin to assimilate the primary sources in a more personal way. The tagging system also allows students to share their virtual cards with peers and view (in the form of a tag cloud) the overall tagging habits of the class as a whole.

Here is an example of a tag cloud with tags occurring 5 or more times:

histagcloud.gif

This particular social tagging application is very well-defined—the user community is limited to students in the course, the sources that students read and tag are largely predetermined, examples of appropriate tags are provided (although students ultimately choose their own). Yet, these parameters help to ensure that a common understanding exists in the classroom in terms of what is being tagged, by whom, and for what purpose.

It would be interesting to see the results of sharing tags/note cards between courses in different academic departments or institutions. The research of art history students studying medieval dress in paintings, for example, would greatly be enriched by the primary sources tagged by Professor Smail’s students, and vice versa.

Supporting Open Raw

Most high-end digital cameras offer users the option of downloading their images in RAW format. Using RAW allows photographers to preserve the maximum amount of original image data and maximum control over image processing. RAW conversion software (like Bibble, Silkypix, Adobe Camera Raw, and Capture One) allows photographers to adjust exposure, white balance, noise reduction, sharpness and other elements after the exposure is made. It eliminate the potentially negative artifacts of in-camera processing.

RAW format would be the ideal archival format for digital images if a single, open standard could be established by camera manufacturers. Unfortunately, they have developed their own proprietary RAW formats (currently over 200!), each one storing images and organizing raw image data in a unique way. This has made things difficult for RAW conversion software companies, who must constantly adapt their products for particular models, but even more so for serious photographers and institutions who cannot rely on RAW as a means of archiving the original image data that they capture. Already older digital camera RAW formats are not being supported by the latest software.

Due to its “lossless” nature and because it is widely supported by image-manipulation applications, TIFFs have become the preservation format of choice. However, if there were a standard version of RAW, it would be a far more appealing alternative for archiving digital collections. The reasons include:

- Using RAW, artifacts are reduced because no compression is used.
- RAW makes the full dynamic range of a camera’s sensor available for post-processing.
- RAW file sizes are about 1/3 to 1/2 smaller than TIFFs.
- RAW files could be saved “as is.” They would not have to be converted to another lossless format (like TIFF) for preservation purposes.

The need for an open Raw standard is supported by a growing number of individuals, organizations, software companies, and developers. Libraries, archives, museums, and academic institutions should join the call for manufacturers to develop a single digital image preservation format based on open documentation. This would ensure that the capture data for image collections would be in the most original and flexible format possible for the future.

In a comment to the original posting, Barry Pearson notes that, although no open RAW standard currently exists…

“Adobe’s DNG is supported by a growing number of individuals, organizations, software companies, and developers. (And some camera manufacturers). Camera manufacturers themselves won’t cooperate to develop an alternative to DNG.

The place where such a standard belongs is ISO’s TC42 WG18, and they are currently reviewing ISO 12334-2 (TIFF/EP), and examining DNG (which is based on the current version of TIFF/EP) to see what lessons it has for the revision.”

Mapping China: GIS in the Classroom

Geographically referenced information is widely being applied to fields such as medicine, public health, and environmental studies using the latest GIS tools. One such application involving Chinese political history is being applied at Harvard.

Professor Peter Bol of Harvard’s Department of East Asian Languages and Civilizations uses geospatial imaging with his students to examine how modern and historic administrative boundaries in China relate to physical barriers like mountains, valleys, and rivers over time. The research requires downloading various datasets related to China from the Harvard Geospatial Library and other sources and then mapping the data using ArcGIS software.

Here are some illustrative examples provided by Professor Bol:

Map of the Qing empire in 1820, showing provincial boundaries and prefectural capitals. Overlayed on a digital elevation model (based on remote sensing data from 1990). Source for 1820 data CHGIS China Historical GIS), downloadable through HGL.

bol1.gif

The next map, from the same source, shows the north China plain in 1820 – provincial and prefectural boundaries are both shown. Comparing the red lines from 1820 with the green DEM, we can also see here how much the coastline has changed since 1820.

bol2.gif

Looking at the southern part of the same area in detail, with the addition of layers for county capitals and major populated places. Here the river system has been enhanced to show the relationship between settlement patterns and rivers, to show the West-East path of the Yellow River at the point where it intersects with the Grand Canal (SE-NW) in and flows into the sea in 1820.

bol3.gif

This is the same map, now without the DEM.

bol4.gif

The following represents the prosperous southeast coastal area, showing population distribution by prefectural population totals with the size of the yellow dot. Green circles show prefectural seats, black triangles are county seats.

bol5.gif

This could be represented with a color ramp instead, as below.

bol6.gif

For additional information on GIS initiatives at Harvard and beyond, visit the Center for Geographic Analysis.

Tag Clouds: A Gloomy Forecast?

Tag clouds, commonly found on social bookmarking sites like Technorati, del.ici.ous, and Flickr are a visual depiction of either the number of times that a tag has been applied to a single item or, more commonly, the number of items to which a tag has been applied. Most tag clouds are arranged alphabetically and more significant tags (those that are most often used by members of the tagging community) are depicted in a larger font, different color, or using some other form of emphasis.

tagcloud.gif

Tag clouds are truly the product of a Web 2.0 approach to the organization of knowledge. Some of the benefits often cited for using tag clouds include:

- allowing users a non-linear form of content browsing, facilitating the discovery of new resources.

- allowing users to easily monitor and visualize the research and tagging trends of a community.

- instantly adapting to new usages of language and categorization through the use of folksonomies.

Some deficiencies of tag clouds as a classification system for serious researchers include:

- little or no visual connection between related tags, common concepts, or subcategories of major topics.

- indicating popularity (or some other metric) by font size, color, weight, etc. gives the user only a very vague idea about the relative quality or quantity of materials associated with the tag.

- non-hierarchical systems of classification usually do not convey the important relationships between records that hierarchical systems do.

- folksonomies can suffer from the fact that general users tag inconsistently (sometimes using singular or plural), inaccurately (misspellings), and often for their own purposes (Ex. using tags like “things to buy,” instead of using title, author, subject terms).

- tag clouds make inefficient use of space on a Web page, so they are limited to displaying only a limited number of terms. What tags are you missing?

- many users who first look at a tag cloud are confused by a list of seemingly random terms in different fonts. Most sites do a poor job of describing exactly how their clouds work and the best techniques for navigation and discovery.

Recently, social bookmarking sites have begun adding a number of “options” to their tag clouds in order to help users gain more control over how their information is organized and presented. Del.ici.ous users, for example, can view their tags in a “list” as well as a “cloud” format, sort their tags alphabetically or by frequency, limit their searches by number of tags, and even create bundles (a way to arrange previously-used tags into subject groups). Some sites, like Flickr, have begun to define the scope of clouds more accurately: “Related tags” (tags specifically related to a particular search), “Hot tags in the past 24 hours,” “Hot tags over the past week,” “All time most popular tags,” etc. Finally, some sites, like LibraryThing, put the number of tags in parentheses besides the tag label: “J.K. Rowling (79,322).”

Tag clouds are moving in a positive direction. There is reason to be optimistic that, with careful implementation, they can be fashioned into tools for serious scholarly research. If so, we might actually welcome a cloudy forecast ahead!

Yahoo! Pipes: No Pipe Dream

Yahoo! has just released a beta version of an interactive feed aggregator and manipulator called Pipes (http://pipes.yahoo.com/). It allows researchers and librarians to “remix popular feed types and create data mashups using a visual editor.” Users can retrieve multiple feeds, reorder them, pass them through a content analysis tool, and RSS the final results to themselves or other users. Here is an example of how Pipes works:

  • A Pipe takes the New York Times homepage, passes it thru Content Analysis and uses the keywords to find Photos at Flickr. The results are then displayed and can be subscribed to as RSS. Here is what this particular pipe looks like:

    pipes.gif

Pipes can easily be configured to aggregate news sources or blogs on particular topics and then deliver the results into a new, single RSS feed. Pipes technology would be of special interest to those researchers who want to keep tabs on the latest developments in their field. It will undoubtedly attract political scientists, reporters, IT experts, media watchdogs, investors looking for the latest information on a company–to mention but a few. Also, as more and more electronic databases and news sources provide integrated alert and RSS tools, Pipe technology could be applied to aggregate results from multiple e-resources and deliver them to users.