Table of Contents

Introduction

This is a perl script which walks through the Crossfire Dokuwiki index, parses the links on every page, and reports any orphaned pages – those that are never linked to from other dokuwiki pages. (In dokuwiki parlance, they have no backlinks.) If these pages aren't under heavy construction, they should probably be linked to from somewhere so they can be found, or deleted if no longer needed.

This was a fairly quick hack, so it doesn't have all the error-checking that a completed script should, but I'm out of time for today, and it does appear to work. Thanks to Rednaxela for the idea.

Requirements

Results

Here are the results of a run on 2006-12-27, with comments in italics.

Code

#!/usr/bin/perl
use strict;
use warnings;
 
# This script walks the index tree of the Crossfire dokuwiki
# and counts the number of links to each page.  Those with zero
# links are reported as orphans.
 
use LWP::Simple;
 
my $base_url  = 'http://wiki.metalforge.net';
my $base_path = '/doku.php';
my %page;  # stores page paths, whether they've been followed, and a count
my %did_index;  # stores index directories that have been expanded
my $DEBUG = 0;
 
check_link($base_url.$base_path."/start?do=index");
 
for my $link (sort keys %page){
  next if $link =~ /^$base_path/;
  print "$link\n" unless $page{$link}->{count};
}
 
sub check_link {
  my $path = shift;
  debug("Checking link $path");
  my $index_text = get( $path );
  $index_text =~ s{^.+?wikipage start}{}s;
  my(@index_links) = $index_text =~ m{"$base_path/([^"]+?)"}g;
  for my $index_link (@index_links){
    if( $index_link =~ m{\?idx=(.+)$} ){
      # this is an index directory, recurse through it once
      unless( $did_index{$index_link} ){
	$did_index{$index_link} = 1;
	check_link("$base_url$base_path/$index_link");
      }
    } else {
      # this is a page, parse it and mark it if it hasn't been done yet
      unless( $page{$index_link}->{done} ){
	debug("Getting $base_url$base_path/$index_link?do=backlink");
	my $text = get("$base_url$base_path/$index_link?do=backlink");
	$text =~ s{^.+?wikipage start}{}s;
	$page{$index_link}->{count} = $text =~ m{(wikilink1)}gs;
	$page{$index_link}->{done} = 1;
      }
    }
  }
}
 
 
sub debug {
  print STDERR @_, "\n" if $DEBUG;
}

Notes & Comments

References