Token list for PHP

troels_kn · Post by **troels_kn** » Sat Jun 02, 2007 3:28 pm

The following PHP script parses a PHP script, and displays a list of classes/interfaces and functions along with line number and a short excerpt of any docblock type comments.

Create a new tool, with the following settings:
Command: php (You may have to put the full path to php.exe)
Parameters: "C:\Program Files\TextPad 5\System\tokens.php" "$File"
[X] in 'Capture output'
[X] in 'Sound alert when completed'
Leave other checkboxes empty.
Regular expression to match output: ^$[0-9]+$
Registers:
File: <empty> Line: 1 Column: <empty>

Code: Select all

<?php
if (!defined('T_UNSPECIFIED_STRING')) {
  define('T_UNSPECIFIED_STRING', -1);
}
function token_get_all_improved($data) {
  $tokens = Array();
  $line = 1;
  $col = 0;
  $level = 0;
  $scope_level = NULL;
  $in_scope = FALSE;
  foreach (token_get_all($data) as $token) {
    if (is_array($token)) {
      list ($token, $text) = $token;
    } else if (is_string($token)) {
      $text = $token;
      $token = T_UNSPECIFIED_STRING;
    }
    if ($token === T_CURLY_OPEN || $token === T_DOLLAR_OPEN_CURLY_BRACES || $text == '{') {
      ++$level;
      if (is_null($scope_level)) {
        $scope_level = $level;
      }
    } else if ($text == '}') {
      --$level;
      if ($in_scope && $level < $scope_level) {
        $in_scope = FALSE;
      }
    }
    $tmp = $text;
    $numNewLines = substr_count($tmp, "\n");
    if (1 <= $numNewLines) {
       $line += $numNewLines;
       $col  =  1;
       $tmp = substr($tmp, strrpos($tmp, "\n") + 1);
       if ($tmp === false) {
           $tmp = '';
       }
    }
    $col += strlen($tmp);

    if ($token === T_INTERFACE || $token === T_CLASS) {
      $in_scope = TRUE;
      $scope_level = NULL;
    }

    $xtoken = new StdClass();
    $xtoken->type = $token;
    $xtoken->text = $text;
    $xtoken->line = $line;
    $xtoken->col = $col;
    $xtoken->blockLevel = $level;
    $xtoken->isClassScope = $in_scope && !is_null($scope_level);
    $tokens[] = $xtoken;
  }
  return $tokens;
}

function docblock_excerpt($str) {
  if (preg_match('~\*{2}[\s\n*]+(.*)~', trim($str, "/"), $matches)) {
    return $matches[1];
  }
}

function parse_file($file) {
  $buffer = NULL;
  $docblock = NULL;
  $results = Array();
  foreach (token_get_all_improved(file_get_contents($file)) as $token) {
    switch ($token->type) {
      case T_DOC_COMMENT:
        $docblock = $token->text;
        break;
      case T_INTERFACE:
      case T_CLASS:
      case T_FUNCTION:
        $buffer = $token;
        break;
      case T_STRING:
        if (!is_null($buffer)) {
          $buffer->isMember = ($buffer->type != T_FUNCTION) || $buffer->isClassScope;
          $buffer->docblock = $docblock;
          $buffer->name = $token->text;
          $results[] = $buffer;
          $buffer = NULL;
          $docblock = NULL;
        }
        break;
    }
  }
  return $results;
}

function results_to_table($results) {
  $view = Array();
  $last = NULL;
  foreach ($results as $token) {
    if ($last && ((!$token->isMember && $last->isMember) || (in_array($token->type, Array(T_INTERFACE, T_CLASS))))) {
      $view[] = Array("", "", "", "");
    }
    $last = $token;

    $view[] = Array(
      $token->line,
      $type = strtolower(str_replace("T_", "", token_name($token->type))),
      $token->name,
      docblock_excerpt($token->docblock)
    );
  }
  return $view;
}

function format_table($map) {
  $out = Array();
  $column_widths = array_fill(0, count($map[0]), 0);
  foreach ($map as $row) {
    foreach ($row as $num => $col) {
      $column_widths[$num] = max($column_widths[$num], strlen($col));
    }
  }
  foreach ($map as $row) {
    $line = "";
    foreach ($row as $num => $col) {
      $line .= str_pad($col, $column_widths[$num] + 2);
    }
    $out[] = $line;
  }
  return implode("\n", $out);
}

print(
  format_table(
    results_to_table(
      parse_file($argv[1]))) . "\n");

kAlvaro · Post by **kAlvaro** » Sat Feb 02, 2008 12:56 pm

I believe there's a typo. Where it says:

Regular expression to match output: ^$[0-9]+$

it should say:

Regular expression to match output: ^([0-9]+)

ben_josephs · Post by **ben_josephs** » Sat Feb 02, 2008 2:19 pm

They're both correct. Which one you should use depends on the style of regular expression syntax selected at the time you enter the expression.

The $ ... $ form is correct for the default style.
The ( ... ) form is correct for Posix style.

You can select the style with

Configure | Preferences | Editor

[X] Use POSIX regular expression syntax

Recommendation: use Posix style. It reduces regular expression backslashitis and unreadability.