Page 1 of 1

Token list for PHP

Posted: Sat Jun 02, 2007 3:28 pm
by troels_kn
The following PHP script parses a PHP script, and displays a list of classes/interfaces and functions along with line number and a short excerpt of any docblock type comments.

Create a new tool, with the following settings:
Command: php (You may have to put the full path to php.exe)
Parameters: "C:\Program Files\TextPad 5\System\tokens.php" "$File"
[X] in 'Capture output'
[X] in 'Sound alert when completed'
Leave other checkboxes empty.
Regular expression to match output: ^\([0-9]+\)
Registers:
File: <empty> Line: 1 Column: <empty>

Code: Select all

<?php
if (!defined('T_UNSPECIFIED_STRING')) {
  define('T_UNSPECIFIED_STRING', -1);
}
function token_get_all_improved($data) {
  $tokens = Array();
  $line = 1;
  $col = 0;
  $level = 0;
  $scope_level = NULL;
  $in_scope = FALSE;
  foreach (token_get_all($data) as $token) {
    if (is_array($token)) {
      list ($token, $text) = $token;
    } else if (is_string($token)) {
      $text = $token;
      $token = T_UNSPECIFIED_STRING;
    }
    if ($token === T_CURLY_OPEN || $token === T_DOLLAR_OPEN_CURLY_BRACES || $text == '{') {
      ++$level;
      if (is_null($scope_level)) {
        $scope_level = $level;
      }
    } else if ($text == '}') {
      --$level;
      if ($in_scope && $level < $scope_level) {
        $in_scope = FALSE;
      }
    }
    $tmp = $text;
    $numNewLines = substr_count($tmp, "\n");
    if (1 <= $numNewLines) {
       $line += $numNewLines;
       $col  =  1;
       $tmp = substr($tmp, strrpos($tmp, "\n") + 1);
       if ($tmp === false) {
           $tmp = '';
       }
    }
    $col += strlen($tmp);

    if ($token === T_INTERFACE || $token === T_CLASS) {
      $in_scope = TRUE;
      $scope_level = NULL;
    }

    $xtoken = new StdClass();
    $xtoken->type = $token;
    $xtoken->text = $text;
    $xtoken->line = $line;
    $xtoken->col = $col;
    $xtoken->blockLevel = $level;
    $xtoken->isClassScope = $in_scope && !is_null($scope_level);
    $tokens[] = $xtoken;
  }
  return $tokens;
}

function docblock_excerpt($str) {
  if (preg_match('~\*{2}[\s\n*]+(.*)~', trim($str, "/"), $matches)) {
    return $matches[1];
  }
}

function parse_file($file) {
  $buffer = NULL;
  $docblock = NULL;
  $results = Array();
  foreach (token_get_all_improved(file_get_contents($file)) as $token) {
    switch ($token->type) {
      case T_DOC_COMMENT:
        $docblock = $token->text;
        break;
      case T_INTERFACE:
      case T_CLASS:
      case T_FUNCTION:
        $buffer = $token;
        break;
      case T_STRING:
        if (!is_null($buffer)) {
          $buffer->isMember = ($buffer->type != T_FUNCTION) || $buffer->isClassScope;
          $buffer->docblock = $docblock;
          $buffer->name = $token->text;
          $results[] = $buffer;
          $buffer = NULL;
          $docblock = NULL;
        }
        break;
    }
  }
  return $results;
}

function results_to_table($results) {
  $view = Array();
  $last = NULL;
  foreach ($results as $token) {
    if ($last && ((!$token->isMember && $last->isMember) || (in_array($token->type, Array(T_INTERFACE, T_CLASS))))) {
      $view[] = Array("", "", "", "");
    }
    $last = $token;

    $view[] = Array(
      $token->line,
      $type = strtolower(str_replace("T_", "", token_name($token->type))),
      $token->name,
      docblock_excerpt($token->docblock)
    );
  }
  return $view;
}

function format_table($map) {
  $out = Array();
  $column_widths = array_fill(0, count($map[0]), 0);
  foreach ($map as $row) {
    foreach ($row as $num => $col) {
      $column_widths[$num] = max($column_widths[$num], strlen($col));
    }
  }
  foreach ($map as $row) {
    $line = "";
    foreach ($row as $num => $col) {
      $line .= str_pad($col, $column_widths[$num] + 2);
    }
    $out[] = $line;
  }
  return implode("\n", $out);
}

print(
  format_table(
    results_to_table(
      parse_file($argv[1]))) . "\n");

Posted: Sat Feb 02, 2008 12:56 pm
by kAlvaro
I believe there's a typo. Where it says:

Regular expression to match output: ^\([0-9]+\)

it should say:

Regular expression to match output: ^([0-9]+)

Posted: Sat Feb 02, 2008 2:19 pm
by ben_josephs
They're both correct. Which one you should use depends on the style of regular expression syntax selected at the time you enter the expression.

The \( ... \) form is correct for the default style.
The ( ... ) form is correct for Posix style.

You can select the style with
Configure | Preferences | Editor

[X] Use POSIX regular expression syntax
Recommendation: use Posix style. It reduces regular expression backslashitis and unreadability.