Friday, 15 November 2019

Creating a keyword list from text in PHP


In PHP, there are useful functions to handle array data. Especially, the function in_array() allows you to check whether there is a specific value exists in an array. In the PHP manual, the function in_array() is defined as: in_array(mixed $needle, array $haystack [, bool $strict = FALSE ]):bool. We can search for $needle in the array $haystack. If $strict is set to TRUE, the type of the $needle is also checked in the $haystack. By using this function, we can create a keyword list from a string.

As shown in the code snippet below, the function getKeywords() extracts the keywords from the input string $str into an array.

function getKeywords($str) {

    //convert into lowercase
    $str = strtolower($str);

    //remove punctuation characters
    $str = preg_replace('/[.,\/#!$%\^&\*;:{}=\-_`~()\[\]]/', '', $str);

    //return an array of the string
    $data = explode(' ', $str);

    //define an empty array
    $result = array();
    for ($i = 0; $i < count($data); $i++) {
        if (!in_array($data[$i], $result, true)) {
            array_push($result, $data[$i]);

    return $result;


In line 4, the function strtolower() converts all characters to lowercase. In line 7, punctuation characters are removed by using the regular expression search and replace function preg_replace(). After that, the words in the string $str are extracted into an array $data. The return array $result is then created for storing the keywords. We use a loop to iterate over the array $data. The function in_array() is used in line 15 to check each element of the array $data. If it is not in the keyword array $result, it is pushed onto the end of the array $result.

As shown in the code snippet below, the function getKeywords() is called to extract the keywords from the string $str.

$str = 'There is a cat, a dog, and a zebra.';
$keywords = getKeywords($str);

The output will be:

Array ( [0] => there [1] => is [2] => a [3] => cat [4] => dog [5] => and [6] => zebra )