Big Ideas for Geograph

Geograph is a web-based project collecting a large number of geographically located images with the aim to get a very broad coverage. The PHP source code is open-source to facilitate projects for other countries.




Image Search/Browse technology

These projects aim at giving external users a better, faster and more visual browsing experience. The code developed will not be confined to Geograph - other projects collating mixed datasets of visual and textual information such as image or map galleries, conservation projects building databases of recorded species or museum collections will have similar needs and can benefit from these developments.


Sample selection algorithm

Given a reasonable sized collection of images - say in the range of 30 - 5000 images, create a resource efficient method to generate a representative sample of, say, 20 images. The sample should not contain too many similar images, but rather show a wide range of images without excluding 'minority' sub-collections. Also allow for a slowly evolving sample as more images are added, a bonus would be a few variations of the algorithm to get a few different samples.


Faceted Browsing

Because our data is highly categorized, it lends itself well to browsing by so called Facets.
We've previouslly explored using specific techologies (noted below), but a similar home-grown system could be done too.


Example implementations and possible technology frameworks:

Interactive Graph Visualization

Create a visualization front-end to allow browsing of information and photos by exploring links between nodes in a 2D graph network. In particular, it needs an interface to link a graphical front end with the Geograph Database. ScreenshotsExternal link from a TouchGraph prototype. Another demoExternal link using the Arbor JS framework.


Photo Clustering

Given a large collection of images, find an automated way to group images into clusters, for example by geographic location, subject, date or a combination of these. The aim is to make browsing large collections easier. Rather than simply getting a long list of images, the user would get a good overview and be able to drill down into interesting areas. This could be implemented either as a bulk offline process to be displayed later, or as an interactive front-end.



Timeline View

Take an arbitrary collection of images (say, results of a search) and organize them by date taken. This will be particulalry useful to align images of the same location to identify changes. If there are lots of images, there could be sliders to zoom into specific periods.
Maybe powered by something like SIMILEExternal link or timemapExternal link etc


Visually Similar Images

An interesting way to browse/search images would be by visual similarity. Firstly, we will need a method to compare images and to find similar/dissimilar images. A number of frameworksExternal link exist for comparing images, so the main task would be implementing one of these as a searchable database. A bonus would be a search interface that takes advantage of this data; for example we could either exclude groups of similar images (to get a broad selection), or group/cluster them by similarity (i.e. "find more like this").


'Term' Identification in freeform text

For example given an image description like "A peaceful reach of the South Esk in winter, taken from the bridge near Clova Hotel.", automatically identify the terms useful for further searching, e.g. in the above "South Esk" and "Clova Hotel" could be considered such terms. A site visitor could use these terms to search related images in the area. Something like LinkExternal link might make a good starting point.




Related Images

Given the current image, find a list of related images, by various means. Geo-location, subject, timeline etc. Project deliverables are the algorithm to locate images as well as the interface to display them without being intrusive. This is basically a 'more like this' page, given an arbitary image as input.



Develop GeoBrowser further

GeoBrowserExternal link is an interactive application for exploring a large collection of images by various means. It's designed to be a standalone project interacting via the Geograph API, running in javascript in the user's Browser. This could be re-implemented with LinkExternal link .




Educational tools and games

The projects in this section develop applications which help students (in a classroom context or otherwise) to learn about maps and geography in a hopefully interesting and enjoyable way. Teachers can use some of these tools to build course materials to provide their students with a personalised and localised take on textbook topics. Code developed in these projects may be reused in projects in other countries, and the course material builder will be applicable to other subjects drawing on visual information.


Map/Photo Games

Having access to large numbers of geolocated images and interactive maps creates lots of opportunities for interactive educational games.


Possible examples:

Find features along a route - guide creator

A tool for virtual tourists. The user specifies a route (by drawing it on a map or by uploading a gpx file), the software translates this into centisquares of the Geograph grid along that route and looks up nearby points of interest in a database. Ideally, a user-configurable filter could determine which features are given prominence in areas where there are a lot of photos in the database. A nice feature would be if the route could be generated from Google Maps's Get Directions facility, so people could plan their travel according to interest en route. This would basically allow the creation of personalised travel guides based on information on Geograph.



Themed collections of images

Something like LinkExternal link - build a tool to create a categorized and themed collection of images, see LinkExternal link .
The categoriztion could be crowd-sourced, but a hierarchy would first need to be defined. A search/browse interface would also be needed.
This articleExternal link, saved searchesExternal link and tagsExternal link are very basic prototypes. The goal is to pre-select large quanties of images - to help students/teachers browse geographical images.



Illustrated quiz system

Build a system to allow creation of multiple choice quizes. Each question or answer can be illustrated with images from Geograph. Users would be able to create and share the quizzes they create. Visitors can fill out quizzes and compete on leaderboards.
PrototypeExternal link


Annotating images and footnotes

This project will help teachers to prepare course materials by being able to annotate and perhaps draw onto images and adding footnotes. Of course there is wider application potential, e.g. outdoor enthusiasts could indicate routes up cliffs and mountains or down river rapids (early prototypeExternal link).



Curated Collections Creator

Create an interface for users to pick and choose from a large collation of images, to create a highly specific 'Gallery'. The interface should work/scale to potentially thousands of images, for example seeding images from keyword search results. The idea is to keep control of the collection, but not have to copy/paste every single result.
A separate project would be to facilitate browsing of the collection by end users.
This has already been started: Geograph PortalsExternal link





Website Development

Port the site to a new Country

Already the site has been ported to GermanyExternal link, but there are plenty more countries out there!


Make the generic version of the site truly generic

We have started making a generic version of the code - using the current projects as a starting point. However, it still contains a number of wordings/features specific to one country. Cleaning up this code and putting all strings and messages in common files will make porting the site much easier.



Smartphone interface with mapping

A location-aware smartphone app that will allow plotting Geograph coverage data, images (selection narrowed down by user input) and image-specific data on a zoomable map. This needs to work with a number of different mapping and grid systems to allow international roaming coverage, e.g. Open Streetmap, Ordnance Survey OpenSpace and Google Maps. The app should also allow direct upload while on the move. Here are twoExternal link screenshotsExternal link from a prototype developed for Geograph Deutschland.
This could either be a dedicated iPhone, Android etc. app (built to interface with APIs) or a specific HTML version of the website, probably using localstorage for offline use.


Content/Collection search

Besides images, Geograph has a wide range of 'content', such as Articles, Galleries, Local Discussions, Placenames, Shared Descriptions, Routes, and User Profiles. Some of this is geographical (either referring to precise point locations or to ill defined general areas) but not all of it. The goal is to provide a single unified search/browse interface, in particular to 'find interesting stuff near here' or 'about this subject'.




User rating system

Develop and provide a system for site visitors to unobtrusively rate images. Once we have a large number of ratings, we will need a way to visualise the results.



Site-wide Filter

Build a site-wide filter, so that a user could e.g. filter the whole site to only show photos taken during a sepecific period or showing a specific geography. This would affect the search, maps, check-sheets, leaderboards and general site browsing. The two major tasks here are to identify all the places that could be filtered, and to implement 'namespacing' within each technology (search/map tiles/database query cache/smarty cache). A sort of lightweight version of this is implemented via [url=url=http://www.geographs.org/portals/]Portals[/url], which create mini-websites which have already been filtered.



Website/Template Redesign

Come up with a new fresh template for the site. In particular, help intergrate it into the current framework - working out any features needing tweaking to work with the new layout.




Other

Bulk data download server

We have an API for making small extracts (up to 1000 results) and site dumps of the whole database offering 2.8M images. There is a niche for mid-size dumps, e.g. getting all of a user's contribution (which may number 50,000 results), or all photos in a hectad (which can be 25,000+ images). This mechanism should perhaps be tailored to delivering between 1000 and 250,000 results in a single download.
This may take the form of an on-demand dump service, i.e. the user submits a 'request' and then the system prepares the dump and lets the user know when it's ready (as the dump could take minutes to prepare). We could perhaps offer a choice of CSV or mysqldump formats.



Streaming Server/Clients

We maintain the authoritative copy of the "Geograph Archive" in a mysql database. There are lots of 'interested parties' that would like to maintain their own copy - in close to real time. This could be used to power internal services (eg sphinxsearch RT index), offsite backups (to log files/dumps), or third-party websites (eg portals). We also need a server-side component to either 'broadcast' the changes out or just publish them somewhere. Then client adapters can either receive data from the server or contact it periodically, and put the data into their host application (database/index/files etc.).

Maybe use pubsubhubbubExternal link - it then needs a 'publisher' script that publishes a feed of updates and notifies subscribers.




Project auto-installer

While installing a copy of the site is relativly easy for an experienced web developer, there are lots of dependencies (php/mysql/apache/sphinx/memcache/redis etc.) which need configuring. To simplify this, build an installer that will check for and install/configure if requried the dependencies as well as download the latest copy of the site code. The goal would be for someone to get a running copy of the site in less than half an hour!



Access Log Processing

We have years of apache Access Logs but have never really analysed them. The analysis should be tailored to the structure of the site, e.g. to aggregate by photo, location or contributor. Could work at identifying patterns of how people arrive at the site, and the subjects people use to find the site in search engines.


Resources

In fact many of the above projects could be done in isolation as a standalone project, rather than directly integrated into the main codebase. Of course the project SVN Repository etc can be used to hold code, but data could be processed remotely. On the other hand some features would be directly integrated into the website, in which case having a local development version of the site would be essential.







Creative Commons Licence [Some Rights Reserved]   Text © Copyright March 2010, Barry Hunter; licensed for reuse under a Creative Commons Licence.
With contributions by Rudi Winter and Penny Mayes. (details)
You are not logged in login | register
Get Involved