---- presentation_topic: A Peek into Pugs Internals presentation_title: presentation_place: http://perlcabal.org/~gaal/peek/start.html presentation_date: 2006-02-28 author_name: Gaal Yahas author_email: gaal@forum2.org author_webpage: http://gaal.livejournal.com/ copyright_string: Copyright © 2006 Gaal Yahas ---- {image: ceri-peek2.jpg 495} A Peek into Pugs Internals Gaal Yahas /pugscode.org/ ---- == Warmup question package MyMod; use base 'Exporter'; @EXPORT = 'foo'; sub foo { print 42, "\n"; } ---- == Warmup question package MyMod; package A; use base 'Exporter'; use MyMod; @EXPORT = 'foo'; foo(); sub foo { print 42, "\n"; } ---- == Warmup question package MyMod; package A; package B; use base 'Exporter'; use MyMod; use MyMod; @EXPORT = 'foo'; foo(); foo(); sub foo { print 42, "\n"; } ---- == Warmup question package MyMod; package A; package B; use base 'Exporter'; use MyMod; use MyMod; @EXPORT = 'foo'; foo(); foo(); sub foo { print 42, "\n"; } # main.p5 use A; use B; ---- == Warmup question module MyMod; module A; module B; sub foo is export { use MyMod; use MyMod; say 42; } foo(); foo(); # main.p6 use A; use B; ---- == Up to r8288: % ./pugs test.pl + 42 + *** No such sub: "&foo" at B.pm line 5, column 1-6 +* What went wrong? ---- == `use MyMod` in Perl 5 .vim filetype: perl BEGIN { require MyMod; MyMod->import } .vim * The parser sees "use MyMod" +** Loads MyMod.pm +** Compiles it +** Runs its `import` routine +* `use` is a /digression/ in the caller's compilation +* BEGIN is a powerful but /weird/ tool ** The compiler and the evaluator intermix ---- == Perl 6 version * `is export` is a trait of `&foo` marking it as exportable * This is much nicer for the programmer than @EXPORT stuff +* But is more work for the language implementor ---- == Hackaround * Load module MyMod when it is first used * At parse time, push `is export` routines into the caller's namespace immediately * /Obviously/ broken, because MyMod.pm only gets parsed once ---- == Bleargh! * Don't get the impression that Pugs is a pile of mud because of this "feature" +* To do it right, we'd need to support lexical import, which is a new feature in Perl 6 +* A deliberate makeshift that let us act as users of Perl would before we had that much of perl available +* Good enough in many cases ** for example, writing 8,000 tests for /other/ things! +* But the fact that it don't work don't mean it don't need no fixing .c The way import works in Perl 6 is actually much more clever than Perl 5; you can say `{ use MyMod }` and not see the the effect of the use outside the braces. This is known as lexical import (or export -- depending on whose perspective you take). .c ---- == Okay, but what's the fix? .vim filetype: perl BEGIN { require MyMod; MyMod->import } .vim +* At parse time, make a note about what MyMod is willing to export +* Perform the export when somebody uses it ---- == `[patch]` .vim filetype: haskell - unsafeEvalExp $ mkSym nameExported .vim +.vim filetype: haskell + -- %*INC<&this_sub> + -- = expression-binding-&this_sub + unsafeEvalExp $ + Syn "=" [Syn "{}" [Syn "{}" [Syn "{}" + [Var "%*INC", Val $ VStr pkg], Val (VStr "exports")] + , Val $ VStr name], Val sub] .vim +* Don't freak out! +* We're just running Perl 6 in the parser! ---- == Classic compilation {html:} * This is a typical description of how compilation works in a language like c ---- == Classic compilation {html:} * This is a typical description of how compilation works in a language like c * Some of that doesn't interest us right now. * Every language needs a parser +** (except maybe LISP :-) ---- == Perl 5 model {html:} * The code we were looking at is in the parser itself +* We could have done `eval $some_perl_code`, but that's slow +* We could have used the internal API, but that's hard +* Instead, express what you want to do in the same way the parser does it ** Use ASTs ** Whatever those are .c This is probably the source of the confusion about calling Perl an interpreter: there's no storage for the result of the a parse that is kept outside of the evaluator (unless you're doing very fancy stuff). Unlike the strict definition of an interpreter, though, Perl does compile as much as it can straight off when it can. ...cgi -> fastcgi -> mod_perl ...parser always available to support eval ...btw did you know that the javac compiler is also available via a programmatic interface? And that you can load new classes at runtime? also available via a programmatic interface? And that you can load new classes at runtime? .c ---- == Pugs AST .vim filetype: perl6 if 42 { say "hello" } else { say "oh no!" } .vim + {image: gaal_no_ann.png 640} * An /abstract syntax tree/ is a structure representing the parsed program +* Each implementation picks the types of nodes it carries ** In practice, the language drives the implementation choices ** together with the implementor's emphasis (speed, education...) .c Why does `42` parse as `Val (VInt 42)`, and not just `42`? Because an `if` can take any expression as the condition; this one just happens to be a value. And Perl distinguishes between different types of values, so this is an integer and not, say a sting. If I had a more complex condition here, we'd just look at a more complex AST for this. .c ---- == Pugs AST (with annotations) .vim filetype: perl6 if 42 { say "hello" } else { say "oh no!" } .vim {html:} ---- == Pugs AST .vim filetype: haskell data Exp = Noop | FunctionApplication Exp (Maybe Exp) [Exp] | Syntax String [Exp] | Statements Exp Exp | Value Val | Variable Var .vim .c This isn't really how write this, because we like to golf. "Annotation" is one variant of `Exp`, which contains an Ann and an Exp. `Ann` is another data type defined elsewhere, and haskell does not confuse a variant (called "constructor") with names of data types, so we abbreviate and write "Ann" in both places. .c ---- == Pugs AST .vim filetype: haskell data Exp = Noop -- ^ No-op | App Exp (Maybe Exp) [Exp] -- ^ Function application -- e.g. myfun($invocant: $arg) | Syn String [Exp] -- ^ Syntactic construct that cannot -- be represented by 'App'. | Stmts Exp Exp -- ^ Multiple statements | Val Val -- ^ Value | Var Var -- ^ Variable .vim .c This is how it appears in the Pugs tree (this week) .c .c XXXXX elided ---- == Perl 6 model {image: http://pugs.blogs.com/photos/visiolization/simplecompilation.png 500} ---- == Parsec * A parser combinator library +* /Combine/ simple parsing functions into smarter parsers +* Then combine them into even smarter ones .c ---- == Simple example .vim filetype: haskell undefLiteral = do symbol "undef" return $ Val VUndef .vim +* `symbol` is a Parsec builtin +* it asserts the next symbol is a literal ("undef" in this case) +* eats it up +* and returns a small chunk of AST representing undef ---- == Another simple example .vim filetype: haskell ruleVerbatimBlock = verbatimRule "block" $ do body <- between (symbol "{") (char '}') ruleBlockBody return $ Syn "block" [body] .vim * "to parse a verbatim block, look for a block body /between/ `{` and `}`" +** `between A B C` is a Parsec function that looks for C between A and B +** A, B, and C are themselves parsers/functions +** `between` returns whatever C returns. ---- == More rules .vim filetype: haskell ruleStatement = do exp <- ruleExpression f <- option return $ choice [ rulePostConditional , rulePostLoop , rulePostIterate ] f exp .vim +.vim filetype: perl6 say "OSDC was great" if $cakes.yummy .vim +* `choice` means try out a few parsers +** the first that matches wins, otherwise backtrack +** if none succeeded, the `choice` fails +* in this case, it's protected by an `option`, letting you provide a fallback ---- == Parsing `for` .vim filetype: haskell ruleForConstruct = rule "for construct" $ do symbol "for" list <- maybeParens ruleExpression optional ruleComma block <- ruleBlockLiteral retSyn "for" [list, block] .vim * `optional` means try something out * if it wasn't found, don't fail the current rule but don't consume anything * skip what did parse though +* In Perl 5 regexps: `/(?:,)?/` ---- == Brace yourself .vim filetype: perl my $car = %models{$wanted}; .vim * Let's look at the code to extract `$wanted` out of this expression into an AST + .vim filetype: haskell ruleHashSubscriptBraces = do between (symbol "{") (char '}') $ option id $ do exp <- ruleExpression; return $ \x -> Syn "{}" [x, exp] .vim +* Three lines is more than it'd take with a regexp... ** but it does a little more. * We know some of this already ---- == Parsing with parsec - what goes down .vim filetype: haskell ruleHashSubscriptBraces = do between (symbol "{") (char '}') $ option id $ do exp <- ruleExpression; return $ \x -> Syn "{}" [x, exp] .vim ---- == Parsing with parsec - what goes down .vim filetype: haskell ruleHashSubscriptBraces = do between start end subscript where start = symbol "{" end = char '}' .vim + .vim filetype: haskell -- try out subscriptExp, but it's okay if it fails subscript = option id subscriptExp .vim + .vim filetype: haskell subscriptExp = do exp <- ruleExpression return $ \x -> Syn "{}" [x, exp] .vim * `id` *?* * `\x -> Syn "{}" [x, exp]` /*????*/ ---- == Many happy returns .vim filetype: haskell ruleHashSubscriptBraces :: RuleParser (Exp -> Exp) ruleHashSubscriptBraces = do between (symbol "{") (char '}') $ option id $ do exp <- ruleExpression; return $ \x -> Syn "{}" [x, exp] .vim * The return value of ruleHashSubscriptBraces is `(Exp ->Exp)` +* That is, a function taking an `Exp` and returning an `Exp` +* Most parser functions return an Exp +* This one returns a closure ---- == Back to Braces .vim filetype: haskell ruleHashSubscriptBraces :: RuleParser (Exp -> Exp) ruleHashSubscriptBraces = do between (symbol "{") (char '}') $ option id $ do exp <- ruleExpression; return $ \x -> Syn "{}" [x, exp] .vim + .vim filetype: haskell firstPossibleReturnValue = id where id x = x -- actually, this is a standard function otherPossibleReturnValue = \x -> Syn "{}" [x, exp] .vim + .vim filetype: perl return sub { my $x = shift; return $x }; return sub { my $x = shift; return Syn("{}", [$x, $exp]) }; .vim + .vim filetype: perl6 return -> $x { $x }; return -> $x { Syn("{}", [$x, $exp]) }; .vim ---- == Back to Braces .vim filetype: haskell ruleHashSubscriptBraces :: RuleParser (Exp -> Exp) ruleHashSubscriptBraces = do between (symbol "{") (char '}') $ option id $ do exp <- ruleExpression; return $ \x -> Syn "{}" [x, exp] .vim * Why do we need the `option id`, anyway? .vim filetype: perl6 %siblings{}; # exactly the same as %siblings, except .vim +.vim filetype: perl6 say "%siblings"; # "%siblings" say "%siblings{}"; # charlie => donald etc. .vim ---- == Breather (whew) * You now know quite a bit about parsing Perl 6 with Haskell * Any questions before we go on? ---- == "Why do I need this?" * Maybe you prefer to keep writing Perl ** You already know Perl 5 regular expressions ** It's Perl 6 you want to use now +* Whenever you see a Buddha on the road ---- == "Why do I need this?" * Maybe you prefer to keep writing Perl ** You already know Perl 5 regular expressions ** It's Perl 6 you want to use now * Whenever you see a Good Idea on the road +** Steal it! ---- == Perl 6 Rules * Of course it does! +* A lot like regexps +* Lots of reusability improvements +* Powerful enough to do everything we've seen Parsec do ---- == Use case .vim filetype: perl my $obj; if (/Car=(?:(Ferrari)|(ModelT)(\d\d\d\d))/) { if ($1) { $obj = Car->new({color => "red", ... }); ... } elsif ($2) { $obj = Car->new({color => "black", year => $3}); ... } } .vim * Well, fine, but can't this be made shorter /and/ more readable? ---- == Rules version .vim filetype: perl6 my $obj = m{ Car = [ Ferrari : { return Car.new(:color) } | ModelT $:=(\d\d\d\d) : { return Car.new(:color :$) } ] }; ---- == Parsing undef .vim filetype: perl my $undefined = qr/undef/; .vim +.vim filetype: perl6 rule undefined { undef } .vim +.vim filetype: perl6 rule undefined { undef : { return Val(VUndef) } } .vim ---- == Parsing literals .vim filetype: perl6 rule term { undef : { return Val(VUndef) } | } .vim ---- == Parsing literals .vim filetype: perl6 rule term { undef : { return Val(VUndef) } | \d+ } .vim ---- == Parsing literals .vim filetype: perl6 rule term { undef : { return Val(VUndef) } | (\d+) : { return Val(VInt($0)) } } .vim ---- == Parsing literals .vim filetype: perl6 rule term { undef : { return Val(VUndef) } | $:=(\d+) : { return Val(VInt($)) } } .vim ---- == Parsing literals .vim filetype: perl6 rule term { undef : { return Val(VUndef) } | (\d+) : { return Val(VInt($0)) } } .vim ---- == Parsing terms .vim filetype: perl6 rule term { undef : { return VUndef } | (\d+) : { return VInt($0) } } rule expr { ... .vim ---- == Parsing terms .vim filetype: perl6 rule term { undef : { return VUndef } | (\d+) : { return VInt($0) } } rule expr { : { return Val($term) } | .vim ---- == Parsing terms .vim filetype: perl6 rule term { undef : { return VUndef } | (\d+) : { return VInt($0) } } rule expr { : { return Val($term) } | () (<[+-]>) () : { return Op(VStr($1), $0, $2) } } .vim ---- == Parsing terms .vim filetype: perl6 rule term { undef : { return VUndef } | (\d+) : { return VInt($0) } } rule expr { : { return Val($term) } | $ := $ := : { return Op(Val($), $, $) } } .vim +.vim filetype: perl6 rule operator { $ := <[+-]> : { return VStr($) } } .vim ---- == Conclusion * Parsec is flexible ** not as scary as it looks +* Perl 6 Rules can be as powerful as Parsec ---- == ObMoose {image: http://forum2.org/gaal/m00se.png} === /Thank you!/ ----