Big Ideas for Geograph
Published: 9 March 2010
Contents
- Image Search/Browse technology
- Sample selection algorithm
- Faceted Browsing
- Interactive Graph Visualization
- Photo Clustering
- Timeline View
- Visually Similar Images
- 'Term' Identification in freeform text
- Related Images
- Develop GeoBrowser further
- Educational tools and games
- Map/Photo Games
- Find features along a route - guide creator
- Themed collections of images
- Illustrated quiz system
- Annotating images and footnotes
- Curated Collections Creator
- Website Development
- Port the site to a new Country
- Make the generic version of the site truly generic
- Smartphone interface with mapping
- Content/Collection search
- User rating system
- Site-wide Filter
- Website/Template Redesign
- Other
- Bulk data download server
- Streaming Server/Clients
- Project auto-installer
- Access Log Processing
- Resources
We have a new mini-app: Ideas for Geograph that lists potential ideas, with voting.
- This is a summary of some ideas for projects within the Geograph Project or simply to use the Geograph Archive of images. Presented here are some of the more standalone ideas, that would make a good project in their own right.
- We have lots of raw data to play with - including pretty pictures, all semantically referenced (primarily location, but also time, subject categorization etc.) - as well as large quantities of textual data. This leads to many possibilities for visualizations, interactive exploration tools, map mashups, and searching/browsing tools. Such tools could easily be reused for other non-Geograph sources of data.
- Alternatively you could choose to work on a project more directly involved with the website itself: enhancing features, new features, or making the open-source code even easier to reuse.
Image Search/Browse technology
These projects aim at giving external users a better, faster and more visual browsing experience. The code developed will not be confined to Geograph - other projects collating mixed datasets of visual and textual information such as image or map galleries, conservation projects building databases of recorded species or museum collections will have similar needs and can benefit from these developments.Sample selection algorithm
Given a reasonable sized collection of images - say in the range of 30 - 5000 images, create a resource efficient method to generate a representative sample of, say, 20 images. The sample should not contain too many similar images, but rather show a wide range of images without excluding 'minority' sub-collections. Also allow for a slowly evolving sample as more images are added, a bonus would be a few variations of the algorithm to get a few different samples.- Difficulty: Easy-Medium
- Requirements: PHP/MySQL
- Skills developed: Data processing/algorithms
Faceted Browsing
Because our data is highly categorized, it lends itself well to browsing by so-called Facets.We've previously explored using specific technologies (noted below), but a similar home-grown system could be done too.
Edit 2018: We since built a new system for this: Link - which is utilizes SphinxSearch, it has pretty powerful facetting functions.
- Difficulty: Medium
- Requirements: Databases, data manipulation, coding.
- Skills developed: Website scaling for performance, Python, large dataset processing, database indexes.
Example implementations and possible technology frameworks:
- Flamenco is a search interface for browsing large information spaces. A prototype installation and dataset was tested here. We have nearly three million photos to display; Flamenco doesn't seem to scale to that number, so the task is to optimize it for larger datasets.
- An alternative might be possible with senseidb - a more modern system along similar lines. Again, a prototype installation has been tried for small collection only; it didn't work when tried with over 10,000 records.
- Pivot from Microsoft Live Labs was an impressive faceted browser. It's no longer being developed, but is available here. It may still be feasible to get Geograph data working in the framework.
- imdb.cloudmining.net - is an excellent demonstration of the sort of thing that could be done. It is based on the fSphinx framework and SphinxSearch (which is already used heavily by Geograph).
- A self built prototype - built using javascript/jquery - interacting with the Geograph API.
Interactive Graph Visualization
Create a visualization front-end to allow browsing of information and photos by exploring links between nodes in a 2D graph network. In particular, it needs an interface to link a graphical front end with the Geograph Database. Screenshots from a TouchGraph prototype. Another demo using the Arbor JS framework.- Difficulty: Easy
- Requirements: XML familiarity, PHP/MySQL for XML generation.
- Skills developed: UI Design, data processing
Photo Clustering
Given a large collection of images, find an automated way to group images into clusters, for example by geographic location, subject, date or a combination of these. The aim is to make browsing large collections easier. Rather than simply getting a long list of images, the user would get a good overview and be able to drill down into interesting areas. This could be implemented either as a bulk offline process to be displayed later, or as an interactive front-end.- Difficulty: Hard
- Requirements: Algorithms and large-scale data processing.
- Skills developed: Data processing, Data Analytics, statistics and pattern processing
Timeline View
Take an arbitrary collection of images (say, results of a search) and organize them by date taken. This will be particularly useful to align images of the same location to identify changes. If there are lots of images, there could be sliders to zoom into specific periods.Maybe powered by something like SIMILE or timemap etc
- Difficulty: Easy-Medium
- Requirements: PHP/MySQL/HTML
- Skills developed: UI development, search/browse technologies.
Visually Similar Images
An interesting way to browse/search images would be by visual similarity. Firstly, we will need a method to compare images and to find similar/dissimilar images. A number of frameworks exist for comparing images, so the main task would be implementing one of these as a searchable database. A bonus would be a search interface that takes advantage of this data; for example we could either exclude groups of similar images (to get a broad selection), or group/cluster them by similarity (i.e. "find more like this").- Difficulty: Medium
- Requirements: Ideally PHP/MySQL and the Linux environment. Experience with basic html a bonus
- Skills developed: Database/indexing. Data processing. Possible front-end development
'Term' Identification in freeform text
For example given an image description like "A peaceful reach of the South Esk in winter, taken from the bridge near Clova Hotel.", automatically identify the terms useful for further searching, e.g. in the above "South Esk" and "Clova Hotel" could be considered such terms. A site visitor could use these terms to search related images in the area. Something like Link might make a good starting point.- Difficulty: Medium
- Requirements: PHP/MySQL/HTML. Linguistics
- Skills developed: Text processing
Related Images
Given the current image, find a list of related images, by various means. Geo-location, subject, timeline etc. Project deliverables are the algorithm to locate images as well as the interface to display them without being intrusive. This is basically a 'more like this' page, given an arbitrary image as input.- Difficulty: Easy
- Requirements: PHP/MySQL/HTML/
- Skills developed: Code development, UI development. Data Processing
Develop GeoBrowser further
GeoBrowser is an interactive application for exploring a large collection of images by various means. It's designed to be a standalone project interacting via the Geograph API, running in javascript in the user's Browser. This could be re-implemented with Link .- Difficulty: Easy
- Requirements: Experience with javascript/html a bonus
- Skills developed: UI Development, Design, javascript/html
Educational tools and games
The projects in this section develop applications which help students (in a classroom context or otherwise) to learn about maps and geography in a hopefully interesting and enjoyable way. Teachers can use some of these tools to build course materials to provide their students with a personalised and localised take on textbook topics. Code developed in these projects may be reused in projects in other countries, and the course material builder will be applicable to other subjects drawing on visual information.Map/Photo Games
Having access to large numbers of geolocated images and interactive maps creates lots of opportunities for interactive educational games.- Difficulty: Easy-Medium
- Requirements: Javascript/Flash experience ideally
- Skills developed: UI development, games, educational software
Possible examples:
- Draw a map based on a picture (or a few pictures) of a grid square, then compare it with the map.
- Map interpretation game. The player would be shown a map excerpt showing a camera position and view cone. They would then draw on a canvas a schematic diagram of what they see from that point. This would probably make a good smartphone app as it's easier to draw on a touchscreen than with a mouse. It would also need a few icons (houses, woods etc.) to drop on that canvas. At the end, the drawing can be compared with a photo of the scene.
Find features along a route - guide creator
A tool for virtual tourists. The user specifies a route (by drawing it on a map or by uploading a gpx file), the software translates this into centisquares of the Geograph grid along that route and looks up nearby points of interest in a database. Ideally, a user-configurable filter could determine which features are given prominence in areas where there are a lot of photos in the database. A nice feature would be if the route could be generated from Google Map's Get Directions facility, so people could plan their travel according to interest en route. This would basically allow the creation of personalised travel guides based on information on Geograph.- Difficulty: Medium
- Requirements: HTML/Javascript development. Working with remote APIs. Database access.
- Skills developed: End-to-End application development. Spatial/full-text database querying.
Themed collections of images
Something like Link - build a tool to create a categorized and themed collection of images, see Link .The categoriztion could be crowd-sourced, but a hierarchy would first need to be defined. A search/browse interface would also be needed.
This article, saved searches and tags are very basic prototypes. The goal is to pre-select large quanties of images - to help students/teachers browse geographical images.
- Difficulty: Medium
- Requirements: HTML/PHP/Mysql
- Skills developed: User Interface design, database systems, categorition
Illustrated quiz system
Build a system to allow creation of multiple choice quizzes. Each question or answer can be illustrated with images from Geograph. Users would be able to create and share the quizzes they create. Visitors can fill out quizzes and compete on leaderboards.Prototype
- Difficulty: Medium
- Requirements: HTML/PHP/Database
- Skills developed: User Interface design, database and server-side coding, application development.
Annotating images and footnotes
This project will help teachers to prepare course materials by being able to annotate and perhaps draw onto images and adding footnotes. Of course there is wider application potential, e.g. outdoor enthusiasts could indicate routes up cliffs and mountains or down river rapids (early prototype).- Difficulty: Medium
- Requirements: HTML/PHP/JavaScript/Flash
- Skills developed: User Interface design, database and server-side coding, application development.
Curated Collections Creator
Create an interface for users to pick and choose from a large collation of images, to create a highly specific 'Gallery'. The interface should work/scale to potentially thousands of images, for example seeding images from keyword search results. The idea is to keep control of the collection, but not have to copy/paste every single result.A separate project would be to facilitate browsing of the collection by end users.
This has already been started: Geograph Portals
- Difficulty: Easy-Medium
- Requirements: PHP/MySQL/HTML
- Skills developed: UI development, search/browse technologies.
Website Development
Port the site to a new Country
Already the site has been ported to Germany, but there are plenty more countries out there!- Difficulty: Medium
- Requirements: PHP/MySQL/HTML/Maps
- Skills developed: Code development, and understanding. International mapping systems.
Make the generic version of the site truly generic
We have started making a generic version of the code - using the current projects as a starting point. However, it still contains a number of wordings/features specific to one country. Cleaning up this code and putting all strings and messages in common files will make porting the site much easier.- Difficulty: Medium
- Requirements: PHP/MySQL/HTML
- Skills developed: Code development, version control (SVN). International mapping systems.
Smartphone interface with mapping
A location-aware smartphone app that will allow plotting Geograph coverage data, images (selection narrowed down by user input) and image-specific data on a zoomable map. This needs to work with a number of different mapping and grid systems to allow international roaming coverage, e.g. Open Streetmap, Ordnance Survey OpenSpace and Google Maps. The app should also allow direct upload while on the move. Here are two screenshots from a prototype developed for Geograph Deutschland.This could either be a dedicated iPhone, Android etc. app (built to interface with APIs) or a specific HTML version of the website, probably using local storage for offline use.
- Difficulty: Medium
- Requirements: PHP/MySQL, Android/IPhone or other OS
- Skills developed: App development, international mapping systems, databases
Content/Collection search
Besides images, Geograph has a wide range of 'content', such as Articles, Galleries, Local Discussions, Placenames, Shared Descriptions, Routes, and User Profiles. Some of this is geographical (either referring to precise point locations or to ill-defined general areas) but not all of it. The goal is to provide a single unified search/browse interface, in particular to 'find interesting stuff near here' or 'about this subject'.- Difficulty: Easy-Medium
- Requirements: PHP/MySQL/HTML
- Skills developed: Code development, and understanding. Geographical and Text Search.
User rating system
Develop and provide a system for site visitors to unobtrusively rate images. Once we have a large number of ratings, we will need a way to visualise the results.- Difficulty: Easy
- Requirements: PHP/MySQL/HTML/
- Skills developed: Code development, UI development. Statistics Processing
Site-wide Filter
Build a site-wide filter, so that a user could e.g. filter the whole site to only show photos taken during a specific period or showing a specific geography. This would affect the search, maps, check-sheets, leaderboards and general site browsing. The two major tasks here are to identify all the places that could be filtered, and to implement 'namespacing' within each technology (search/map tiles/database query cache/smarty cache). A sort of lightweight version of this is implemented via [url=url=http://www.geographs.org/portals/]Portals[/url], which create mini-websites which have already been filtered.- Difficulty: Medium
- Requirements: PHP/Mysql/Sphinx
- Skills developed: coding, development, database, scaling
Website/Template Redesign
Come up with a new fresh template for the site. In particular, help integrate it into the current framework - working out any features needing tweaking to work with the new layout.- Difficulty: Easy
- Requirements: UI/Design flair
- Skills developed: UI development, html/css, user testing
Other
Bulk data download server
We have an API for making small extracts (up to 1000 results) and site dumps of the whole database offering 2.8M images. There is a niche for mid-size dumps, e.g. getting all of a user's contribution (which may number 50,000 results), or all photos in a hectad (which can be 25,000+ images). This mechanism should perhaps be tailored to delivering between 1000 and 250,000 results in a single download.This may take the form of an on-demand dump service, i.e. the user submits a 'request' and then the system prepares the dump and lets the user know when it's ready (as the dump could take minutes to prepare). We could perhaps offer a choice of CSV or mysqldump formats.
- Difficulty: Easy-Medium
- Requirements: PHP/MySQL
- Skills developed: APIs / Data processing
Streaming Server/Clients
We maintain the authoritative copy of the "Geograph Archive" in a mysql database. There are lots of 'interested parties' that would like to maintain their own copy - in close to real time. This could be used to power internal services (eg sphinxsearch RT index), offsite backups (to log files/dumps), or third-party websites (eg portals). We also need a server-side component to either 'broadcast' the changes out or just publish them somewhere. Then client adapters can either receive data from the server or contact it periodically, and put the data into their host application (database/index/files etc.).Maybe use pubsubhubbub - it then needs a 'publisher' script that publishes a feed of updates and notifies subscribers.
- Difficulty: Medium
- Requirements: mysql, RPC client/server frameworkds,
- Skills developed: databases, distributed computing, sysadmin tasks etc
Project auto-installer
While installing a copy of the site is relatively easy for an experienced web developer, there are lots of dependencies (php/mysql/apache/sphinx/memcache/redis etc.) which need configuring. To simplify this, build an installer that will check for and install/configure if required the dependencies as well as download the latest copy of the site code. The goal would be for someone to get a running copy of the site in less than half an hour!- Difficulty: Medium
- Requirements: Linux fundamentals, knowledge of package installers
- Skills developed: Linux system administration
Access Log Processing
We have years of apache Access Logs but have never really analysed them. The analysis should be tailored to the structure of the site, e.g. to aggregate by photo, location or contributor. Could work at identifying patterns of how people arrive at the site, and the subjects people use to find the site in search engines.- Difficulty: Easy-Medium
- Requirements: General scripting and data patterns a bonus
- Skills developed: Data processing. Statistics and analytics
Resources
In fact many of the above projects could be done in isolation as a standalone project, rather than directly integrated into the main codebase. Of course the project SVN Repository etc can be used to hold code, but data could be processed remotely. On the other hand some features would be directly integrated into the website, in which case having a local development version of the site would be essential.- Virtual Machine: We have produced a VMware machine that runs the Geograph Site. This is possibly the easiest way to get going on developing website code. Runs with the free VMplayer software. Once running we can provide database dumps to get a more realistic test environment.