Checking External Links
Published: 14 February 2019
Contents
Background
Contributors can link to external websites in image descriptions. Alas the web evolves and pages go offline (or are moved, but the owner doesn't leave a forwarding address), so called 'Link Rot', meaning future viewers of the image, can't find the page refered to by the contributor.
Geograph attempts to check all links to external websites and performs a few actions to assist users of the photos long term.
New links are checked as soon as possible, as well as periodically rechecked over time.
Note, at this time, only links in Photo descriptions are checked, not Collections.
Checks
1. Checking if the page is archived on a publically available Web-Archival system
Firstly we check if general Web-Archival system (for example, archive.org and webarchive.org.uk) have a copy of the page saved. If a saved copy is found, the location of the closest version to the original link creation is saved in database for use later.
2. Checking if now a broken link
... a bot tries accessing the link to see if the link appears to see be online.
Actions
A. Requesting non-archived pages into a Web-Archival system
If a page is found to not be archived, but appears still online; an attempt is to made to save it on archive.org
Link
B. Linking to archived version
If not online anymore and we know of a archived version, a link direct is added to the description on the main photo page.
Statistics
As at Feb 2019.
- 160,157 Total known links in image descriptions
- 7,958 known to be broken links
- 9,859 seem to redirect to a error page (probably broken, but unsure)
- 92,160 found in a Archive system
- 53,688 was not found originally, but has been archived on request!
- 7,059 Found in Archive, but now appear broken.