This time I am going to show you how to use PHP to replicate a little part of something I had been working on for my other site, Celeb ‘O Rama.

It’s sounds like quite a simple concept but it’s actually quite hard to do. What, You ask? Well I wanted a way to show a little information about the particular celebrity in a small css or javascript based pop-up, the easiest way to do this is to extract a small amount of info from the excellent Wikipedia.

The pop-up was easy thanks to Cody Lindley and his excellent jTip javascript tool tip, but extracting information from Wikipedia, that’s the hard part. So let’s get started.

I used this code as a WordPress plugin and therefore used WP 2.5′s new shortcode system to make my own shortcode for this. I will include that part, but the code can be used as a standalone by using only the getInfo.php page. You’ll see what I mean. ;)

So let’s start with the shortcode code.

wp-wiki.php

This is the page that will let WordPress understand what to do when it receives a [wiki] shortcode tag.

So here we go first we need to tell WP that this is a plugin file. To do that add this comment to the top of your page:

/*
Plugin Name: wp-wiki
Plugin URI: http://return-true.com/
Description: Uses WP 2.5 Shortcode to show wikipedia information for that word in a jTip tooltip
Author: Veneficus Unus (Paul Robinson)
Version: 1.0
Author URI: http://return-true.com/
*/

You can of course change that but it would be nice if you would leave my URI and Name in. :)

Now that WordPress knows this file is a plugin we can get on with the code:

add_shortcode('wiki', 'wiki_shortcode');

function wiki_shortcode($attr, $word = NULL) {

	extract(shortcode_atts(array(
		'title' => "{$word}",
		'no' => 2,
		'width' => '',
		'height' => '',
	), $attr));

First we add a new shortcode to WordPress and tell it to run wiki_shortcode() when it finds any of that shortcode written.

Next we define the wiki_shortcode() function, the $attr are the attributes given in the shortcode, if any. For example, [wiki title="Charlotte Hatherley" width="400" height="400"] in the finished plugin would tell it to give the tooltip a title of ‘Charlotte Hatherley‘ and a width & height of 400px. The $word is because we will be using the shortcode in [wiki]test[/wiki] format instead of [wiki] format.

Then we use extract to extract the items from the array given. Extract takes the keys and makes then proper variables with the values as their value. shortcode_atts() adds or replaces any of the default values given with the one specified.

Ok, next up:

        srand ((double) microtime( )*1000000);
	$rid = rand(0,2000);

	if($word == NULL)
		return false;
	else
		$newWord = convertForWiki($word);

	$confirm = get_web_page('http://en.wikipedia.org/wiki/'.$newWord);

Ok we make a random number between 0 – 2000 and assign it to $rid, this is for later. Then we check to see if there was a $word, remember $word is the word given between the shortcode tags. If there isn’t, which is NULL, then we can’t continue since we have no word to look for at Wikipedia so we return false to cancel the whole process and it removes the [wiki] tag. Otherwise we use a function called convertForWiki() to convert the word into a format recognisable by Wikipedia. I’ll get to that function shortly.

Now we confirm that a page exists at Wikipedia for that word by using CURL. But I will get to that function a little later too.

        if($confirm['errno'] != 0 || $confirm['http_code'] != 200) :
		$output = $word;
	else :
		if(preg_match("/Wikipedia does not have an article with this exact name./i", $confirm['content'])) :
			return $word;
		endif;

		$output = '<a href="'.get_bloginfo('wpurl').'/wp-content/plugins/wp-cwiki/getInfo.php?word='.$newWord.'&amp;no='.$no.'?';

		if($width)
			$output .= 'width='.$width.'&amp;';
		if($height)
			$output .= 'height='.$height;

		$output .='" name="'.$title.'" title="'.$word.'" class="jTip" id="jTip-'.$newWord.'-'.$rid.'">'.$word.'</a>';
	endif;

	return $output;
}

This is the last part of the wiki_shortcode() function. The CURL function returns an array with the error code, HTTP code & content of the page. First to confirm the page existed we need to check the error & HTTP codes. If they are 0 & 200 respectively then we can go ahead. If not we assume no page exists and we just output the original word without any link.

If we get past A page exists but since Wikipedia has a custom 404 page we can’t assume it is the right one. To check it is the right page we make sure we don’t have Wikipedia’s 404 page by looking for a prominent sentence such as this one:

Wikipedia does not have an article with this exact name.

I use preg_match() since I find it is more accurate than stristr(), but that’s just personal preference. If we are on Wikipedia’s 404 page we can just return prematurely with the original word since we can’t give a link.

Otherwise we can continue on and make the link for Cody Lindley’s jTip. This requires a little bit of complex concatenation, it can get confusing but I’m sure you can figure it out from looking at the code since an explination would take too long.

Finally we return the $output whatever it may be. ;)

function convertForWiki($c) {
	$c = trim($c);
	$c = ucwords($c);
	if(preg_match("/ /", $c))
		$c = str_replace(" ", "_", $c);
	return $c;
}

Next we have the convertForWiki() function I mentioned earlier. It basically turns this ‘Kate Tunstall’ into this ‘Kate_Tunstall’ as that is the format that Wikipedia’s URL takes. If you want a more detailed explanation of the function just ask. ;)

Finally:

function get_web_page( $url )
{
    $options = array(
        CURLOPT_RETURNTRANSFER => true,     // return web page
        CURLOPT_HEADER         => false,    // don't return headers
        CURLOPT_FOLLOWLOCATION => true,     // follow redirects
        CURLOPT_ENCODING       => "",       // handle all encodings
        CURLOPT_USERAGENT      => "spider", // who am i
        CURLOPT_AUTOREFERER    => true,     // set referer on redirect
        CURLOPT_CONNECTTIMEOUT => 120,      // timeout on connect
        CURLOPT_TIMEOUT        => 120,      // timeout on response
        CURLOPT_MAXREDIRS      => 10,       // stop after 10 redirects
    );

    $ch      = curl_init( $url );
    curl_setopt_array( $ch, $options );
    $content = curl_exec( $ch );
    $err     = curl_errno( $ch );
    $errmsg  = curl_error( $ch );
    $header  = curl_getinfo( $ch );
    curl_close( $ch );

    $header['errno']   = $err;
    $header['errmsg']  = $errmsg;
    $header['content'] = $content;
    return $header;
}

This is a CURL function I got from a website a little while ago now and boy, am I glad I kept it. I can’t remember what site but thank you whoever you are.

That’s it for this file just put it in a folder called ‘wp-wiki’ or something and hold onto it until we finish the next page.

getInfo.php

Ok, last page.

$word = urldecode($_GET['word']);
$no = $_GET['no'];
$url = "http://en.wikipedia.org/wiki/".$word;

$content = get_web_page($url);
$content = extractContent($content['content'], $no);

echo $content;

It’s quite simple this bit. We get the word which was handed over by url earlier and assign it to $word while decoding it from the url since underscores will be encoded when placed in a url. We get no which I forgot to mention before is the number of paragraphs you want to retrieve from the Wiki article, 2 is the default. Finally the URL which is the standard wiki url with the word put on the end.

Then we use the CURL function from before again, yes that means you’ll need to include the get_web_page() function from above on this page too, to get the content and then we run the content through another function called extractContent(). Once that’s done we echo out the result.

Here is that extractContent() function:

function extractContent($c, $no) {
	global $url, $word;
	$tree = new DOMDocument();
	@$tree->loadHTML($c);
	$count = 1;
	foreach($tree->getElementsByTagName('div') as $div) :
		if($div->getAttribute('id') == "bodyContent") :
			foreach($div->getElementsByTagName('p') as $p) :
				if($count <= $no) :
					$output .= "<p>".$p->nodeValue."</p>";
				endif;
				$count++;
			endforeach;
		endif;
	endforeach;
	//Clean up wikipedia stuff;
	$output = preg_replace("/[(d+)]/", "", $output);
	//Add excerpt taken from...
	$output = $output.'<strong style="float:right; display:block; font-size:9px;">Excerpt From Wikipedia</strong>';
	return $output;
}

Ok, so we pass along the content and the number of paragraphs we want. Then we make the url and the word global for later. We make a new DOMDocument so we can traverse the DOM. This will make it quite easy to get the info we are after. We set up a count. Then we loop through all of the divs on the page. If we get one with the ID of ‘bodyContent’ which is Wikipedia’s content div then we want to be inside there. We then run a loop on that div for all of the paragraphs inside it. Lucky for us the first paragraph on all of Wikipedia’s pages is the content we want, so we just add the nodeValue which is the contents of the p to a variable called $output We have to add the p tags back too as we retrieved the contents of the p’s, not the p’s themselves. We do that for as long as the count does not equal the max number of paragraphs set. We then add one to count so the counter works and then exit each loop.

Wikipedia has little reference icons which reference links at the bottom of the page that look like this [1]. We don’t want to include these so we use a quick preg_replace() to get rid of them. Then I want to credit Wiki so I add a little text to say it is from Wikipedia. Finally we return it.

That’s it. If you have the jTip code on your page all you need to do is put this file in the folder from before and upload into your WordPress plugins directory and enable it. It has been tested in WP 2.5 and runs of PHP 5, I am unsure of PHP 4. Also you must have the CURL PHP extension enabled, if you don’t know either ask your host or make a blank php page with phpinfo() written on it. View that page and look for CURL if it’s not there the you won’t be able to run this code without installing it.

If you have any problems at all just give me a shout. ;)

If you want to thank me for all the code & tutorials I write, you can visit my donations page where you can donate straight to server costs, to me personally, or you can buy me something from my Amazon Wish List.