41
(Do not be afraid of) PHP Compiler Internals Sebastian Bergmann May 27 th 2009

PHP Knowledgebase

Embed Size (px)

DESCRIPTION

PHP TUTORIAL - http://evolvebeyondmoney.com

Citation preview

Page 1: PHP Knowledgebase

(Do not be afraid of)

PHP Compiler Internals

Sebastian Bergmann

May 27th 2009

Page 2: PHP Knowledgebase

Who I Am

Sebastian Bergmann Involved in the PHP

project since 2000 Creator of PHPUnit Co-Founder and

Principal Consultant with thePHP.cc

Page 3: PHP Knowledgebase

Under PHP's Hood

Server API (SAPI)

(mod_php, FastCGI, CLI, ...)

PHP Core

Request ManagementFile and Network Operations

Extensions

(date, dom, gd, json, mysql, pcre, pdo, reflection, session, standard, …)

Zend Engine

Compilation and ExecutionMemory and Resource Allocation

This slide contains material by Sara Golemon

Page 4: PHP Knowledgebase

How PHP executes code

Lexical Analysis Converts the source from a sequence of characters into a

sequence of tokens

Page 5: PHP Knowledgebase

How PHP executes code

Lexical Analysis Syntax Analysis

Analyzes a sequence of tokens to determine their grammatical structure

Page 6: PHP Knowledgebase

How PHP executes code

Lexical Analysis Syntax Analysis Bytecode Generation

Generate bytecode based on the information gathered by analyzing the sourcecode

Page 7: PHP Knowledgebase

How PHP executes code

Lexical Analysis Syntax Analysis Bytecode Generation Bytecode Execution

Page 8: PHP Knowledgebase

Lexical Analysis

1 <?php2 if (TRUE) {3 print '*';4 }5 ?>

Scan a sequence of characters

Page 9: PHP Knowledgebase

Lexical Analysis

1 <?php2 if (TRUE) {3 print '*';4 }5 ?>

T_OPEN_TAG

Scan a sequence of characters

Page 10: PHP Knowledgebase

Lexical Analysis

1 <?php2 if (TRUE) {

3 print '*';4 }5 ?>

T_OPEN_TAGT_IFT_WHITESPACE(T_STRING)T_WHITESPACE{T_WHITESPACE

Scan a sequence of characters

Page 11: PHP Knowledgebase

Lexical Analysis

1 <?php2 if (TRUE) {

3 print '*';

4 }5 ?>

T_OPEN_TAGT_IFT_WHITESPACE(T_STRING)T_WHITESPACE{T_WHITESPACET_PRINTT_WHITESPACET_CONSTANT_ENCAPSED_STRING;

Scan a sequence of characters

Page 12: PHP Knowledgebase

Lexical Analysis

1 <?php2 if (TRUE) {

3 print '*';

4 }5 ?>

T_OPEN_TAGT_IFT_WHITESPACE(T_STRING)T_WHITESPACE{T_WHITESPACET_PRINTT_WHITESPACET_CONSTANT_ENCAPSED_STRING;T_WHITESPACE}

Scan a sequence of characters

Page 13: PHP Knowledgebase

Lexical Analysis

1 <?php2 if (TRUE) {

3 print '*';

4 }

5 ?>

T_OPEN_TAGT_IFT_WHITESPACE(T_STRING)T_WHITESPACE{T_WHITESPACET_PRINTT_WHITESPACET_CONSTANT_ENCAPSED_STRING;T_WHITESPACE}T_WHITESPACET_CLOSE_TAG

Scan a sequence of characters

Page 14: PHP Knowledgebase

Lexical Analysis

T_OPEN_TAGT_IFT_WHITESPACE(T_STRING)T_WHITESPACE{T_WHITESPACET_PRINTT_WHITESPACET_CONSTANT_ENCAPSED_STRING;T_WHITESPACE}T_WHITESPACET_CLOSE_TAG

Scan a sequence of characters<?phpif

TRUE

print

'*'

?>

Page 15: PHP Knowledgebase

Lexical AnalysisScan a sequence of characters

Page 16: PHP Knowledgebase

Lexical Analysis

You do not want to write a scanner by hand At least when the code for the scanner should

be efficient and maintainable

Tools such as flex or re2c generate the code for a scanner from a set of rules

Scanner Generators

"if" {return T_IF;

}

<ST_IN_SCRIPTING>"if" {return T_IF;

}

Page 17: PHP Knowledgebase

Lexical AnalysisPHP Tokens

T_ABSTRACT

T_AND_EQUAL

T_ARRAY

T_ARRAY_CAST

T_AS

T_BAD_CHARACTER

T_BOOLEAN_AND

T_BOOLEAN_OR

T_BOOL_CAST

T_BREAK

T_CASE

T_CATCH

T_CHARACTER

T_CLASS

T_CLASS_C

T_CLONE

T_CLOSE_TAG

T_COMMENT

T_CONCAT_EQUAL

T_CONST

T_CONSTANT_ENCAPSED_STRING

T_CONTINUE

T_CURLY_OPEN

T_DEC

T_DECLARE

T_DEFAULT

T_DIR

T_DIV_EQUAL

T_DNUMBER

T_DOC_COMMENT

T_DO

T_DOLLAR_OPEN_CURLY_BRACES

T_DOUBLE_ARROW

T_DOUBLE_CAST

T_DOUBLE_COLON

T_ECHO

T_ELSE

T_ELSEIF

T_EMPTY

T_ENCAPSED_AND_WHITESPACE

T_ENDDECLARE

T_ENDFOR

T_ENDFOREACH

T_ENDIF

T_ENDSWITCH

T_ENDWHILE

T_END_HEREDOC

T_EVAL

T_EXIT

T_EXTENDS

T_FILE

T_FINAL

T_FOR

T_FOREACH

T_FUNCTION

T_FUNC_C

T_GLOBAL

T_GOTO

T_HALT_COMPILER

T_IF

T_IMPLEMENTS

T_INC

T_INCLUDE

T_INCLUDE_ONCE

T_INLINE_HTML

T_INSTANCEOF

T_INT_CAST

T_INTERFACE

T_ISSET

T_IS_EQUAL

T_IS_GREATER_OR_EQUAL

T_IS_IDENTICAL

Page 18: PHP Knowledgebase

Lexical AnalysisPHP Tokens

T_IS_NOT_EQUAL

T_IS_NOT_IDENTICAL

T_IS_SMALLER_OR_EQUAL

T_LINE

T_LIST

T_LNUMBER

T_LOGICAL_AND

T_LOGICAL_OR

T_LOGICAL_XOR

T_METHOD_C

T_MINUS_EQUAL

T_ML_COMMENT

T_MOD_EQUAL

T_MUL_EQUAL

T_NAMESPACE

T_NS_C

T_NEW

T_NUM_STRING

T_OBJECT_CAST

T_OBJECT_OPERATOR

T_OLD_FUNCTION

T_OPEN_TAG

T_OPEN_TAG_WITH_ECHO

T_OR_EQUAL

T_PAAMAYIM_NEKUDOTAYIM

T_PLUS_EQUAL

T_PRINT

T_PRIVATE

T_PUBLIC

T_PROTECTED

T_REQUIRE

T_REQUIRE_ONCE

T_RETURN

T_SL

T_SL_EQUAL

T_SR

T_SR_EQUAL

T_START_HEREDOC

T_STATIC

T_STRING

T_STRING_CAST

T_STRING_VARNAME

T_SWITCH

T_THROW

T_TRY

T_UNSET

T_UNSET_CAST

T_USE

T_VAR

T_VARIABLE

T_WHILE

T_WHITESPACE

T_XOR_EQUAL

Page 19: PHP Knowledgebase

Syntax AnalysisAnalyze a sequence of tokens

Page 20: PHP Knowledgebase

Syntax Analysis

You do not want to write a parser by hand At least when the code for the scanner should

be efficient and maintainable

Tools such as bison or lemon generate the code for a parser from a set of rules

Parser Generators

T_IF '(' expr ')' { ... }statement { ... }elseif_list else_single { ... }

Page 21: PHP Knowledgebase

Bytecode Generation

1 <?php2 if (TRUE) {3 print '*';4 }5 ?>

filename: /home/sb/if.phpfunction name: (null)number of ops: 8compiled vars: noneline # op fetch ext return operands------------------------------------------------------------------------------- 2 0 EXT_STMT 1 JMPZ true, ->6 3 2 EXT_STMT 3 PRINT ~0 '%2A' 4 FREE ~0 4 5 JMP ->6 6 6 EXT_STMT 7 RETURN 1

Page 22: PHP Knowledgebase

Bytecode GenerationPHP Opcodes

NOP

ADD

SUB

MUL

DIV

MOD

SL

SR

CONCAT

BW_OR

BW_AND

BW_XOR

BW_NOT

BOOL_NOT

BOOL_XOR

IS_IDENTICAL

IS_NOT_IDENTICAL

IS_EQUAL

IS_NOT_EQUAL

IS_SMALLER

IS_SMALLER_OR_EQUAL

CAST

QM_ASSIGN

ASSIGN_ADD

ASSIGN_SUB

ASSIGN_MUL

ASSIGN_DIV

ASSIGN_MOD

ASSIGN_SL

ASSIGN_SR

ASSIGN_CONCAT

ASSIGN_BW_OR

ASSIGN_BW_AND

ASSIGN_BW_XOR

PRE_INC

PRE_DEC

POST_INC

POST_DEC

ASSIGN

ASSIGN_REF

ECHO

PRINT

JMPZ

JMPNZ

JMPZNZ

JMPZ_EX

JMPNZ_EX

CASE

SWITCH_FREE

BRK

BOOL

INIT_STRING

ADD_CHAR

ADD_STRING

ADD_VAR

BEGIN_SILENCE

END_SILENCE

INIT_FCALL_BY_NAME

DO_FCALL

DO_FCALL_BY_NAME

RETURN

RECV

RECV_INIT

SEND_VAL

SEND_VAR

SEND_REF

NEW

FREE

INIT_ARRAY

ADD_ARRAY_ELEMENT

INCLUDE_OR_EVAL

UNSET_VAR

UNSET_DIM

UNSET_OBJ

FE_RESET

FE_FETCH

EXIT

FETCH_R

FETCH_DIM_R

FETCH_OBJ_R

FETCH_W

FETCH_DIM_W

FETCH_OBJ_W

FETCH_RW

FETCH_DIM_RW

FETCH_OBJ_RW

FETCH_IS

FETCH_DIM_IS

FETCH_OBJ_IS

FETCH_FUNC_ARG

Page 23: PHP Knowledgebase

Bytecode GenerationPHP Opcodes

FETCH_DIM_FUNC_ARG

FETCH_OBJ_FUNC_ARG

FETCH_UNSET

FETCH_DIM_UNSET

FETCH_OBJ_UNSET

FETCH_DIM_TMP_VAR

FETCH_CONSTANT

EXT_STMT

EXT_FCALL_BEGIN

EXT_FCALL_END

EXT_NOP

TICKS

SEND_VAR_NO_REF

CATCH

THROW

FETCH_CLASS

CLONE

INIT_METHOD_CALL

INIT_STATIC_METHOD_CALL

ISSET_ISEMPTY_VAR

ISSET_ISEMPTY_DIM_OBJ

PRE_INC_OBJ

PRE_DEC_OBJ

POST_INC_OBJ

POST_DEC_OBJ

ASSIGN_OBJ

INSTANCEOF

DECLARE_CLASS

DECLARE_INHERITED_CLASS

DECLARE_FUNCTION

RAISE_ABSTRACT_ERROR

ADD_INTERFACE

VERIFY_ABSTRACT_CLASS

ASSIGN_DIM

ISSET_ISEMPTY_PROP_OBJ

HANDLE_EXCEPTION

Page 24: PHP Knowledgebase

Extending the Compiler

Page 25: PHP Knowledgebase

Test First!

--TEST--unless statement--FILE--<?phpunless (FALSE) { print 'unless FALSE is TRUE, this is printed';}

unless (TRUE) { print 'unless TRUE is TRUE, this is printed';}?>--EXPECT--unless FALSE is TRUE, this is printed

Zend/tests/unless.phpt

Page 26: PHP Knowledgebase

Extending the Compiler

Add token for unless to the scanner Add rule for unless to the parser Generate bytecode for unless in the compiler Add token for unless to ext/tokenizer

Page 27: PHP Knowledgebase

Add unless scanner token

<ST_IN_SCRIPTING>"if" {return T_IF;

}

<ST_IN_SCRIPTING>"unless" {return T_UNLESS;

}

<ST_IN_SCRIPTING>"elseif" {return T_ELSEIF;

}

<ST_IN_SCRIPTING>"endif" {return T_ENDIF;

}

<ST_IN_SCRIPTING>"else" {return T_ELSE;

}

Zend/zend_language_scanner.l

Page 28: PHP Knowledgebase

Add unless parser rule

%token T_NAMESPACE%token T_NS_C%token T_DIR%token T_NS_SEPARATOR%token T_UNLESS..unticked_statement: '{' inner_statement_list '}' | T_IF '(' expr ')' { . . | T_UNLESS '(' expr ')' { zend_do_unless_cond(&$3, &$4 TSRMLS_CC); } statement { zend_do_if_after_statement(&$4, 1 TSRMLS_CC); } { zend_do_if_end(TSRMLS_C); } . .

Zend/zend_language_parser.y

Page 29: PHP Knowledgebase

How if is compiled

void zend_do_if_cond(const znode *cond, znode *closing_bracket_token TSRMLS_DC){

}

zend_do_if_cond() is called when an if statement is compiled

Zend/zend_compile.c

Page 30: PHP Knowledgebase

How if is compiled

void zend_do_if_cond(const znode *cond, znode *closing_bracket_token TSRMLS_DC){ int if_cond_op_number = get_next_op_number(CG(active_op_array)); zend_op *opline = get_next_op(CG(active_op_array) TSRMLS_CC);

}

Allocate a new opline in the current oparray

Zend/zend_compile.c

Page 31: PHP Knowledgebase

How if is compiled

void zend_do_if_cond(const znode *cond, znode *closing_bracket_token TSRMLS_DC){ int if_cond_op_number = get_next_op_number(CG(active_op_array)); zend_op *opline = get_next_op(CG(active_op_array) TSRMLS_CC);

opline->opcode = ZEND_JMPZ;

}

Set the opcode of the new opline to JMPZ (jump if zero)

Zend/zend_compile.c

Page 32: PHP Knowledgebase

How if is compiled

void zend_do_if_cond(const znode *cond, znode *closing_bracket_token TSRMLS_DC){ int if_cond_op_number = get_next_op_number(CG(active_op_array)); zend_op *opline = get_next_op(CG(active_op_array) TSRMLS_CC);

opline->opcode = ZEND_JMPZ; opline->op1 = *cond;

}

Set the first operand of the new opline to the if condition

Zend/zend_compile.c

Page 33: PHP Knowledgebase

How if is compiled

void zend_do_if_cond(const znode *cond, znode *closing_bracket_token TSRMLS_DC){ int if_cond_op_number = get_next_op_number(CG(active_op_array)); zend_op *opline = get_next_op(CG(active_op_array) TSRMLS_CC);

opline->opcode = ZEND_JMPZ; opline->op1 = *cond; closing_bracket_token->u.opline_num = if_cond_op_number; SET_UNUSED(opline->op2); INC_BPC(CG(active_op_array));}

Perform book keeping tasks such as marking the second operand of the new opline as unused or incrementing the backpatching counter for the current oparray

Zend/zend_compile.c

Page 34: PHP Knowledgebase

Add unless to compiler

void zend_do_unless_cond(const znode *cond, znode *closing_bracket_token TSRMLS_DC){ int unless_cond_op_number = get_next_op_number(CG(active_op_array)); zend_op *opline = get_next_op(CG(active_op_array) TSRMLS_CC);

opline->opcode = ZEND_JMPNZ; opline->op1 = *cond; closing_bracket_token->u.opline_num = unless_cond_op_number; SET_UNUSED(opline->op2); INC_BPC(CG(active_op_array));}

All we have to do to generate code for the unless statement, as compared to generate code for the if statement, is to use the JMPNZ (jump if not zero) opcode instead of the JMPZ (jump if zero) opcode

Zend/zend_compile.c

Page 35: PHP Knowledgebase

Add unless to compiler

1 <?php2 unless (FALSE) {3 print '*';4 }5 ?>

filename: /home/sb/unless.phpfunction name: (null)number of ops: 8compiled vars: noneline # op fetch ext return operands------------------------------------------------------------------------------- 2 0 EXT_STMT 1 JMPNZ false, ->6 3 2 EXT_STMT 3 PRINT ~0 '%2A' 4 FREE ~0 4 5 JMP ->6 6 6 EXT_STMT 7 RETURN 1

The generated bytecode

Page 36: PHP Knowledgebase

Run the test

sb@ubuntu php-5.3-unless % make test TESTS=Zend/tests/unless.phpt

Build complete.Don't forget to run 'make test'.

=====================================================================PHP : /usr/local/src/php/php-5.3-unless/sapi/cli/php PHP_SAPI : cliPHP_VERSION : 5.3.0alpha4-devZEND_VERSION: 2.3.0PHP_OS : Linux - Linux ubuntu 2.6.27-9-generic #1 SMP Thu Nov 20 22:15:32 UTC 2008 x86_64INI actual : /usr/local/src/php/php-5.3-unless/tmp-php.iniMore .INIs : CWD : /usr/local/src/php/php-5.3-unlessExtra dirs : VALGRIND : Not used=====================================================================Running selected tests.PASS unless statement [Zend/tests/unless.phpt] =====================================================================Number of tests : 1 1Tests skipped : 0 ( 0.0%) --------Tests warned : 0 ( 0.0%) ( 0.0%)Tests failed : 0 ( 0.0%) ( 0.0%)Expected fail : 0 ( 0.0%) ( 0.0%)Tests passed : 1 (100.0%) (100.0%)---------------------------------------------------------------------Time taken : 0 seconds=====================================================================

Page 37: PHP Knowledgebase

Add unless to ext/tokenizerext/tokenizer/tokenizer_data.c

sb@ubuntu tokenizer % ./tokenizer_data_gen.shWrote tokenizer_data.c

Page 38: PHP Knowledgebase

The End

Thank you for your interest!

These slides will be linked soon fromhttp://sebastian-bergmann.de/

Page 39: PHP Knowledgebase

Acknowledgements

Thomas Lee, whose Python Language Internals presentation at OSDC 2008 inspired this presentation

Derick Rethans, without whose VLD we could not see PHP bytecode

Derick Rethans, David Soria Parra, and Scott MacVicar for reviewing these slides

Page 40: PHP Knowledgebase

References

http://www.php.net/manual/en/tokens.php http://www.zapt.info/opcodes.html ”Extending and Embedding PHP”

by Sara Golemon

Page 41: PHP Knowledgebase

  This presentation material is published under the Attribution-Share Alike 3.0 Unported license.

  You are free:

✔ to Share – to copy, distribute and transmit the work.

✔ to Remix – to adapt the work.

  Under the following conditions:

● Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).

● Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license.

  For any reuse or distribution, you must make clear to others the license terms of this work.

  Any of the above conditions can be waived if you get permission from the copyright holder.

  Nothing in this license impairs or restricts the author's moral rights.

License