info apprentice

Jul 24 2012

Notes on ORBIS

Having exhausted the supply of super-technical geoparsing/geocoding articles that I’ve been doing write-ups of lately (and also admittedly being exhausted by them), I’ve decided to switch my reading focus to mapping work that has been occurring in the digital humanities world. There are certainly a lot of folks interested in bringing mapping to disciplines where it hasn’t traditionally been utilized in scholarship, so I’m going to read about the spatial humanities for my last required post.

I stumbled upon the Digital Humanities Specialist blog from Stanford University today. Poking around there then led me to a project I had never heard of called ORBIS: The Stanford Geospatial Network Model of the Roman World. I’m not terribly plugged in to the DH world but I did grow familiar with a lot of the larger spatial humanities projects happening through DHSI, so yes, I was surprised to see this fairly large-scale project pop out of the woodwork.

ORBIS is a tool that allows users to experience the communication constraints of those living in the Roman World by approximating the realities of travel–from the time of year to the method of travel, users can plug in whatever option they so desire. The system then processes the user’s choices and produces a large map detailing the most efficient route a Roman traveler would take. Admittedly cool, even for a non-scholar. Unsurprisingly, some people have compared it to Oregon Trail!

Looking into the creators’ documentation makes it clear that they want to provide transparency to their historical information, which is excellent–so often I’ve seen DH sites that put up a decently pretty visualization and little else to ground users in the context of what is being conveyed. I was disappointed to see that the geospatial technology behind the maps wasn’t explained, though blog posts shed light on this to some extent.

With ORBIS, Mapping the Republic of Letters, Barbara Hui’s litmap, and a whole spate of other spatial humanities projects weaving together a supportive online community, it is clear that mapping skills are gaining traction as a real asset within academia and libraries. Because libraries are often the midway point between the IT side and academia, I think librarians’ role in shaping mapping projects will continue to grow. Granted, these positions may be alt-ac positions referred to as “specialist” that are part humanities background, part librarian, but that’s fine. Skill sets are all churning together and I think it signals good things for institutions. However, I hope that the education for both librarians and those in humanities fields develops to appropriately fit these needs, which I think may be the main problem.

ORBIS: http://orbis.stanford.edu/

Jul 20 2012

Leave a comment

weekly update

Final Reflections

This week, my Map-Based Discovery Services internship comes to a close. My final evaluation was this afternoon and I’m officially done. It’s strange to think about it being over but I will admit that I need a break. Moving forward, the DLP is going to look at ways to assist collection managers in creating mappable data if they are interested in maps as a discovery mechanism for their images, but unless they choose to have a service clean up their data I’m not sure if mapping all of ICO is feasible.

At times I’ve thought to myself, Could I have done more? Should I have done more? Did I learn enough?

I’ve felt that way as I headed into this last week. Then came the Special Libraries Association Annual Conference. In this muddle of conflicted thoughts I headed to Chicago to present my poster. The poster session was from 7-9pm this past Tuesday… I left Bloomington that afternoon, drove to the Hilton in DT Chicago without causing grievous bodily harm to myself or others, snacked on the fancy (and free) hors d’oeuvres, and set up my 8ft-long poster.

It was actually a great experience. People were interested, really interested, and I was able to speak knowledgeably. It was at that point that I realized hey hey, I’ve come a long way from knowing absolutely nothing about geospatial information. I felt proud. Although this internship was heavily research-based, I’m confident that we have done all the exploring possible within our timeframe. This is what it’s like to be a real librarian working with digital projects; sometimes the research adds up to a refocusing of goals, or a consensus that the bright shiny perfection that was hoped for isn’t currently attainable. That’s okay. Things were accomplished and I gained a closer relationship with two smart and savvy gals: my co-intern Tassie and my supervisor Michelle.

I feel good about where things ended. I also have the feeling digital mapping will resurface in my life sooner rather than later…

Jul 18 2012

1 Comment

reading reflections

Notes on “Digital Preservation of Geospatial Data”

Digital preservation has long been an interest of mine. As we create more digital objects, how will we preserve them? What is worth preserving and who gets to make those decisions? There are many important questions that spring to mind, and this article attempts to begin formulating the answers–at least in relation to geospatial information.

According to the article: “Digital geospatial data is now routinely found in libraries that carry cartographic data, geologic information, social science datasets, and other materials in support of disciplines using Geographic Information Systems (GIS) in their research and work. Over the course of years, the data have been received on floppy disks, CD-ROMS, DVDs, and hard drives or are available for free or for a fee over the Internet” (305). The authors go on to point out that with physical objects and documents, their preservation depends upon ideal conditions where they are handled as little as possible. On the contrary, digital objects need to be “manipulated on a regular basis if they are to survive.” And with constantly changing formats, the needs of digital preservation differ quite significantly with the needs of physical preservation.

The article mentions a project funded by a Nation Digital Information and Infrastructure Preservation Program (NDIIPP) where a model for archiving digitized resources like maps was created. Another such grant-funded project was the creation of the National Geospatial Digital Archive (NGDA), a project designed to enable “repository structures at each university and to collect materials across a broad spectrum of geographic formats” (306). The NGDA groups eventually decided to build their own registry, given that no one else had satisfactorily taken on the task they were attempting. The initial formats selected for preservation were: digital orthographic quarter quadrangles (DOQQs), digital raster graphics (DRGs), Environmental Systems Research Institute (ESRI) shapefiles and Landsat imagery, though they will continue to assess formats for future inclusion.

A portion of the article is devoted to copyright considerations. Though a lot of geospatial information is freely available through the government, ESRI shape files created through the ArcGIS software poses copyright issues. The base maps provided by the company could be a copyright infringement, depending on which repository is storing them. The article mentions that the NDIIPP plans to begin talks with ESRI and other commercial data suppliers, so possibly an agreement has already been reached.

One of the questions I mused about at the beginning of this post, What is worth preserving? is referenced: “at-risk” materials. The case is made that geospatial data is often at-risk, with the authors giving the four following reasons:

1. “The sheer magnitude of geospatial data being created and in existence makes it nearly impossible to collect it all for the future without significant efforts toward collaborative collecting models” (310).

2. “In addition to the volume of data being produced, geospatial data are often updated and changed, creating the need to save different versions of the same information. How often the versions are collected will have to be decided on a case-by-case basis.”

3. “Government geospatial data may well be considered at risk given the sensitive nature of some of the information, the decentralization of the computing environment, the lack of distribution of digital content that used to come to libraries as part of the Federal Depository Library Program, and the ease with which content can be removed from a government Web site.”

4. “Geospatial data is also potentially at risk for long-term preservation when it is produced by a small group or a single person. The ease with which content is now created and displayed has caused an explosion of small producers of high-quality geospatial content. Digital preservation requires a good deal of planning and expertise. It may also be prohibitively expensive to undertake” (310).

I’ll admit, the authors won me over. I can certainly comprehend the difficulty of handling so many odd geospatial formats–after learning about all of the file types within ArcGIS at DHSI, I get it. I am glad that institutions are taking steps toward tackling these important and challenging issues.

Sweetkind-Singer, J., Larsgaard, M., & Erwin, T. (2006). Digital Preservation of Geospatial Data. Library Trends, 55(2), 304-314.

Jul 16 2012

Leave a comment

weekly update

Final Considerations for Digital Mapping in the DLP

During our meeting earlier this week, I posited some final considerations for our internship. This is a brief summary of the considerations, in no significant order–I think mulling over them will be very important for this mapping project to progress.

Final Considerations

First of all, which collections does the DLP want to map? It seems unrealistic to think that all can be mapped without a huge team effort put forth. Focusing on a few—perhaps the Liberian collections—seems like a better plan.
Even if the DLP wants to focus on a few, that still comes with a LOT of metadata cleanup. In my opinion, a staggering amount. Either way there is not a simple and painless way to map the collections. It will require a dedicated effort.
After the data is in an address or lat/long format, the process of getting it mapped is not a problem—many tools exist to facilitate this process. At this point, the DLP just needs to decide what sort of time and energy, if any, can be dedicated to the project.
Are Google Fusion Tables and Modest Maps the final contenders? Google has a whole cadre of related tools and services, but Modest Maps is appealing as an open source option that allows the user to have the Javascript—either are viable mapping options. Modest Maps stands out from Google Fusion Tables because it allows users access to the Javascript; however, it would probably need to be used in conjunction with Wax or another of the available extensions.
The person who would be responsible for mapping should do a trial run of each process to determine which is the most efficient, based upon Tassie’s and my findings. They will be able to determine what sort of workflow is possible…especially when it comes to the programming. (As someone with only a basic understanding of Javascript, it was a challenge for me to evaluate Modest Maps. A true assessment may only be possible if a programmer, or someone in the role that would likely be responsible for the mapping, were to play around with it for a few hours.)
If the mapping of archival collections becomes a priority, baseline requirements for newly created location-related metadata should be implemented to facilitate the geocoding process and reduce the need for retroactive cleanup. These requirements/suggestions can then be shared with the relevant catalogers.

At the meeting a few good suggestions were made. Someone suggested that the mapping become an archive-driven project… the DLP could have guidelines, best practices, tutorials and maybe even some tech support but the individual collections would have to make the decision on their own about whether the work associated with mapping was worthwhile. The DLP could then help facilitate the process without taking on a huge workload. Another suggestion was that an outside company could be hired to clean up the data because of the amount of work it would take.

I’ll be catching up on readings, writing abstracts, completing my MODS class, and making additional Google Fusion Table prototypes in the coming days and weeks. Oh yeah, and my SLA poster session is tomorrow. Lots to do…

Jul 10 2012

Leave a comment

reading reflections

Notes on “Metaloging of Digital Geospatial Data”

I was drawn to this article because my internship has been much more metadata-focused than I expected, and also because I took workshops on Encoded Archival Description (EAD) and Metadata Object Description Schema (MODS) this summer, leading to a lot of formerly unfamiliar concepts coming together with text encoding and cataloging practices. Plus “metaloging” just sounds intriguing, doesn’t it?!

This article describes metaloging as cataloging records using standards other than International Standard Bibliographic Description (ISBD) or Anglo American Cataloguing Rules (AACR). Having never heard this term before, I’m glad the author clarified; it’s not as odd as I thought it might be. She then goes on to discuss ways that commonly used standards affect the cataloging of geospatial data. The standards she singles out are Dublin Core, MARC21, and XML, METS and MODS. While much of the background information supplied about these standards was review for me, it was nevertheless helpful to have her point out the fields and subfields where locations can be supplied. She includes lengthy examples of each standard, tags and delimiters included. She doesn’t makes any recommendations of one standard over another but rather encourages libraries and catalogers to pick one and stick with it rather than coming up with their own format.

This would make an excellent tutorial for someone interested in becoming a metadata librarian or cataloger, as it’s effectively a mashup of the geospatial information metaloging basics in its least drawn-out form. I was hoping the article would provide insight on best practices for those responsible for cataloging locations rather than just explaining the fields and tags, since that is likely what the DLP will seek to do, but unfortunately the article was not that in-depth. It is meant for complete beginners rather than seasoned catalogers/encoders.

Larsgaard, M. (2005). Metaloging of Digital Geospatial Data. Cartographic Journal, 42(3), 231-237. doi:10.1179/000870405X77183

Jul 09 2012

Leave a comment

weekly update

Geocoding Options

In my report for the meeting that occurred today, I presented some options for geocoding tools–if we were to utilize something like Modest Maps, we would need a reliable geocoder. (With Google Fusion Tables to geocoder is built in.) Since it’s still up in the air about the future of our mapping project, it doesn’t hurt to do the geocoding research now.

Geocoding Options

There are many free geocoders available for use. While Google Fusion Tables has a geocoding functionality built in, if the DLP wanted to use Modest Maps a workflow for geocoding would have to be introduced before anything could be mapped (as opposed to simultaneous to the mapping). Modest Maps requires lat/long coordinates.

The Modest Maps developers suggest using Get Lat Lon (www.getlatlon.com) to retrieve lat/long coordinates from a place name. Amusingly, it’s built from a Google map with a basic search interface, so it’s straightforward and easy to grab coordinates if you have so few that you can do them manually. However, that’s not exactly practical for our large data sets. Geocoder.us is another option for geocoding, but unlike Get Lat Lon it will not retrieve coordinates for anything less than a full address. There are many images that will not be assigned a full address for many reasons, so this is problematic. Also, it’s important to remember that geocoder.us will only give coordinates for places within the United States—ICO contains the Liberian collections, so this is problematic.

GPS Visualizer provides the functionality of an API—its ability to geocode multiple addresses quickly and efficiently—without introducing coding into the mix. It uses JSON code as a way to surpass the daily query limits for both Yahoo and Google’s APIs. It offers the user the option of grabbing either the Yahoo coordinates or the Google coordinates. I received an error when I attempted to get the Google coordinates but the Yahoo ones worked. I copied and pasted the addresses I cobbled together in my AAAMC spreadsheet (that I mapped in Google Fusion Tables) and stuck them in the input box:

Auburn Avenue Northeast, Atlanta, Georgia
Almeda Road, Houston, TX
Pavilion, Houston, TX
Columbus, OH
8000 Euclid Ave, Cleveland, Ohio 44103
4445 Lake Forest Dr # 420, Cincinnati, OH
Miami, FL
Columbus, OH
Birmingham, AL
Birmingham, AL
13242 Northwest 7th Avenue, North Miami, FL
Birmingham, AL
Birmingham, AL
Birmingham, AL
Birmingham, AL
300 Bryant Road Conroe, TX 77303
Pavilion, Houston, TX
800 Bagby Street, Houston, TX
Houston, TX
Houston, TX
Houston, TX
3100 Cleburne Street, Houston, TX
Southern Pacific Railroad, Houston, TX

I got this:

latitude,longitude,name,desc,color
33.75576,-84.387924,"Auburn Avenue Northeast, Atlanta, Georgia",-,
29.582548,-95.431251,"Almeda Road, Houston, TX",-,
29.697339,-95.638495,"Pavilion, Houston, TX",-,
39.96196,-83.002984,"Columbus, OH",-,
41.503772,-81.632826,"8000 Euclid Ave, Cleveland, Ohio 44103",-,
39.253357,-84.384468,"4445 Lake Forest Dr # 420, Cincinnati, OH",-,
25.728985,-80.237419,"Miami, FL",-,
39.96196,-83.002984,"Columbus, OH",-,
33.520295,-86.811504,"Birmingham, AL",-,
33.520295,-86.811504,"Birmingham, AL",-,
25.895935,-80.211175,"13242 Northwest 7th Avenue, North Miami, FL",-,
33.520295,-86.811504,"Birmingham, AL",-,
33.520295,-86.811504,"Birmingham, AL",-,
33.520295,-86.811504,"Birmingham, AL",-,
33.520295,-86.811504,"Birmingham, AL",-,
30.343096,-95.458413,"300 Bryant Road Conroe, TX 77303",-,
29.697339,-95.638495,"Pavilion, Houston, TX",-,
29.761163,-95.369215,"800 Bagby Street, Houston, TX",-,
29.76045,-95.369784,"Houston, TX",-,
29.76045,-95.369784,"Houston, TX",-,
29.76045,-95.369784,"Houston, TX",-,
29.725138,-95.36341,"3100 Cleburne Street, Houston, TX",-,
29.76045,-95.369784,"Southern Pacific Railroad, Houston, TX",-,

To all appearances, GPS Visualizer is able to handle different levels of address completion, although I haven’t tested these coordinates in any way for accuracy.

As far as geocoding goes, the Google Geocoding API is the elephant in the room. It’s an obvious frontrunner with a lot of documentation to back it up. However, it is prohibited to use this API without mapping the coordinates on a Google map—so unfortunately it is not a possibility to use with Modest Maps… I don’t think. I’m still a little hazy on the exact Google service terms though.

If you’re reading this and you know of a decent geocoder, please let me know!

Jul 03 2012

Leave a comment

weekly update

Google Fusion Tables

This week I’ve been playing around with Google Fusion Tables as I work remotely. Tassie and I are compiling documentation about our data issues and possible tools to present at larger meeting next week, when we all reconvene. I thought for some of my upcoming weekly updates I could share some of the information I’ve compiled for my report. I’ll start with Google Fusion Tables, which I investigated the most thoroughly.

Google Fusion Tables (http://www.google.com/fusiontables/Home/) represents Google’s attempts to make databases more accessible and user friendly. Spreadsheets can be ingested into the user’s Google account, where they are stored in Google Docs under Table (beta). Maps that are created can then be embedded into a website’s HTML, where they retain their interactivity.

Projects using Google Fusion Tables can be attempted using the Classic version or the Experimental version (https://support.google.com/fusiontables/bin/answer.py?hl=en&answer=2475373). For the DLP’s purposes, I have used the Classic version, which is presumably more stable. Easy to follow tutorials (https://sites.google.com/site/fusiontablestalks/talks/fusion-tables-where-2-0-workshop) and examples (https://sites.google.com/site/fusiontablestalks/stories) are provided on the Google Fusion Tables website.

A major strength of Google Fusion Tables is that users don’t need coding expertise to create maps. While it takes some playing around with the various options, it is fairly intuitive for a first-time user. The main challenge in my experience was controlling how the data displayed (under the “configure styles” option). This could mean that a student assistant could potentially be put in charge of the mapping, freeing up a librarian or programmer’s time.

A Google Fusion Tables API does exist, though I did not use this API as I created my prototype. HTTP requests can replace much of the manual work of creating, modifying, querying, styling and deleting tables. Google provides sample code, a basic introductory guide, and a developer’s guide (https://developers.google.com/fusiontables/).

The Process

For the purpose of visualization I chose to experiment with the AAAMC Black Radio Collection first. This collection’s metadata provided city names for most of the images with additional specificity found within unstructured fields. The AAAMC collection is a good example of low-hanging fruit for our mapping purposes.

I opened the spreadsheet containing sample AAAMC metadata:
Next, I opened Google Fusion Tables and went to Create –> Table (beta).
After initially uploading the original spreadsheet here, I realized that it is most efficient to clean up the data in Excel prior to uploading it to Google Fusion Tables. I cleaned up the data so that only mappable, locationally relevant fields remained: Title, Description/Notes, Date Taken, Corporate Name, Country, State/Province, and City. This got rid of any irrelevant or empty fields. (FYI: It is also possible to create views in Google Fusion Tables, meaning this paring down could occur later in the process to clean up the user’s available information in the pop-up display window.
I uploaded the new version to Google Fusion Tables. It displays in much the same way as a Google Spreadsheet. All cells are sortable and editable.
Next, I selected Visualize –> Map. Google Fusion Tables automatically selected the fields that contain mappable data and began geocoding the field that occurred first, which happened to be Corporate Name. It then displayed the following map. Oddly enough, it did not map the multiple images taking place in each location—only the one that occurred first. I switched the field being mapped up in the upper left corner, but this did not help.
After additional trial and error, I returned to the original spreadsheet and examined the metadata more closely. I researched the addresses of all radio stations that still existed and plugged them into a new field, which I called Full Address. The hack I implemented when I had trouble getting addresses to map was to search for them using plain old Google Maps, find how Google Maps displayed them and input that address string into my spreadsheet. It’s a bit of a circular process and quite reliant upon Google, which wasn’t optimal. However, this was how I managed to patch together locations that I otherwise could not have mapped. (Mapping Hofheinz Pavilion in Houston comes to mind; Google Maps recommended Pavilion, Houston, TX as the string and it worked—I could not have come up with that on my own, as it did not map Hofheinz Pavilion, Houston, TX.)

Conclusions

Getting data into Google Fusion Tables is user friendly and fast, but the ICO metadata would take extensive cleanup to be mappable. This problem is exacerbated if multiple collections need to be merged for mapping at the same time.
The same daily geocoding limits exist for Google Fusion Tables and the Google Geocoding API (2,500), so this would have to be taken into consideration when the workflow is developed.
The geocoder is automatic and flexible—it mapped locations I inputted in varying formats with different levels of address completion.
Thus far I have not been able to visualize multiple images in the same location using Google Fusion Tables—this is obviously a big problem and I’m sure there is a way to do this but I haven’t found it. It may be an issue with configuring the styles, which admittedly seems buggy and doesn’t have a lot of documentation. Currently only the first of multiple same locations will show up.
The Google Fusion Tables API provides programmers with an easy way to get the functionality of Google Fusion Tables without the manual work. However, student assistants without coding expertise could contribute to smaller collections using the non-API version if needed. There is a wide range of accessibility for workers with differing technology skill sets.

Jul 01 2012

Leave a comment

reading reflections

Notes on “Issues in Georeferenced Digital Libraries”

Although it is an older article from D-Lib Magazine, I opted to read “Issues in Georeferenced Digital Libraries” because it seemed like it would continue to widen my knowledge of how digital libraries are using geospatial data. After reading it fully, I’d have to contend that it is one of the most helpful articles in tying some of the big issues together, though ultimately out-of-date.

This article breaks down the issues referred to by the title into seven discrete sections: discovery, gazetteer integration, ranking, strong data typing, scalability, spatial context and resource access. The writers had knowledge of these issues because of their involvement with the Alexandria Digital Library Project, which began the creation of a georeferenced digital library all the way back in 1994. Despite the different systems they tried, they “noticed that, regardless of the architecture or technological approach, the same issues had arisen.”

Discovery was the most prominent issue the authors touched upon. I’d never given this much thought before this internship–questioning how people find things and how that process could be improved upon, that is. Of course, the discovery mechanism that everyone knows about is text-based. Type words into a box and get results. This is how much of the technology-using world does things. However, this isn’t necessarily the best option for all resources. As the authors noted, locations in particular are notorious for having multiple names and spellings and variations–“there is no easy or manageable way to associate every possible placename with every library resource,” as the authors noted. Their suggestion is to create a non-textual search by allowing users to select locations from a controlled vocabulary or to group coordinate ranges that users can select (seems less useful; who knows lat/long coordinates?). While they don’t go so far as to suggest creating a map interface, these steps hint at the current trend of doing just that. Gazetteers are of course the tool that allows for specific formal place names to link to the names that we actually know and refer to things as, and these are mentioned in conjunction with discovery. Tassie and I realized that the DLP has the TGN later on in our internship otherwise that may have affected our work more.

The other considerations interested me less. I’ll be honest…the section on strong data typing went over my head for the most part. Scalability is less of an issue now than in 2004, though daily limits on geocoding still apply. All in all, while I enjoyed the style of the article, I’m not sure how helpful it is as applied to digital libraries and geospatial projects in 2012. However, as far as getting a basic overview of some of the challenges facing digital libraries handling geospatial information it was a helpful read.

Janee, G., Frew, J., & Hill, L.L. (May 2004). Issues in Georeferenced Digital Libraries. D-Lib Magazine, 10(5). Retrieved from http://www.dlib.org/dlib/may04/janee/05janee.html

Jun 25 2012

Leave a comment

weekly update

On the Agenda: Modest Maps & Google Fusion Tables

Last week our little mapping team held a meeting to determine next steps for our project. With only 5 more weeks of Tassie’s and my internship left, it’s time to settle on what we want the final product to be… especially because July will be fairly busy for all of us. Tassie just began an Ajax class, I’m heading back to Wisconsin for a week over the 4th, and Michelle is preparing her dossier for a tenure review. We were all in a good place to move forward and make decisions about the direction of the internship.

We decided that Tassie and I will each create map prototypes using our data. The tools we will use to create the prototypes are Modest Maps and Google Fusion Tables. Based upon the ease of using these tools, we will then present our findings to the larger DLP team and provide more formal documentation regarding pertinent information about the tools.

I’ll really be able to sink into these final weeks–in order to be at 180 hours, the number I need to be at to earn my 3 SLIS credits, I’ll be working on internship tasks for 18 hours each week (whether I’m on-site or off-site). I’m particularly excited by the prospect of playing around with Google Fusion Tables, which Tassie discovered last week. One concern voiced about it was the issue of its reliance on HTML5 and the possibility of alienating users with older browsers; however, it looks promising enough to warrant exploring it to the fullest before throwing it out.

I’m also looking into Google Refine, “a power tool for working with messy data.” Maybe this can help me with my metadata spreadsheets?

Jun 20 2012

Leave a comment

reading reflections

Notes on “From Text to Geographic Coordinates: The Current State of Geocoding”

In the Zotero bibliography we’ve slowly been building over the course of this internship, articles that directly relate to our project are dwindling. The two articles closest to case studies of what we are attempting to do were written about by Tassie here and here, and since we are trying not to to overlap in our reading reflections I won’t be covering them.

While the article I’m writing about today isn’t directly applicable to our project (yet), it is germane in that it further widens my knowledge on this broad terrain of geospatial mapping considerations. It helped me develop a deeper understanding of the technical aspects of geocoding. Additionally, because the writers are coming from a geography background and don’t seem to have any connection to the library/archive world, I was reminded of the many other types of organizations that utilize geocoding capabilities, for tasks ranging from health to crime analysis.

So, first things first: what is geocoding, anyway? Distilled to its simplest, the authors refer to it as concerning “input, output, processing algorithm, and reference dataset” (35). Four components. Address data is the most commonly input information that geocoding software then turns into lat and long coordinates. Throwing coordinates in and getting addresses back is also possible, it’s just referred to as reverse geocoding.

Despite or perhaps because of its commonality, the authors make the point in this article that “address data are not the only type of locational data that can or should be geocoded” (35). They then go on to note the importance of the rise of gazetteers in geocoding, which pull in informal geographical places that may not have a specific address: “The development of multiresolution gazetteers defining geographic footprints for named geographic places such as the Alexandria Digital Library Gazetteer (Frew et al. 1998, Hill and Zheng 1999, Hill et al. 1999, Hill 2000) are pushing the limits of what type of geographic features can have geographic codes assigned to them (Davis et al. 2003, United Nations Economic Commission 2005), as well as the role of the geocoder in the larger geospatial information processing context. The proliferation of a variety of diverse types of locational addressing systems throughout the world precludes a “one size fits all” geocoding strategy that will work in all cases (Fonda-Bonardi 1994, Lind 2001, Davis et al. 2003, Walls 2003, United Nations Economic Commission 2005)”(34). Because I will likely have to work with gazetteers due to the unstructured metadata in Image Collections online, this exploration of gazetteers was of interest to me.

All in all, this was a heavy article to sift through–when they said it would be a technical article, boy did they mean it–but I am glad to have read it. The diagrams and charts were especially helpful in conveying challenging aspects of geocoding. I was wary of them at first (mostly due to captions like “schematic of deterministic address matching with attribute relaxation” and “sample address block with true parcel arrangement showing true geocoded point as ring”) but found that they were much more accessible than the text. I’m not going to lie–when this article went too deep into the calculations of centroids, parcel homogeneity assumptions, and interpolation algorithm deductions… it probably didn’t sink in quite so well. The lingo was well above my head. However, I was able to garner the some new knowledge from the section of the article on persistent geocoding difficulties. From developing countries with little or no infrastructure to rural addresses and P.O. boxes here in the US, it turns out there are plenty of challenges for geocoders to solve.

Goldberg, D., Wilson, J., & Knoblock, C. (2007). From text to geographic coordinates: The current state of geocoding. URISA Journal, 19(1), 33-46.

exploring geocoding in the digital library environment