About Zeev’s proposal of PHP superset

William Pinaud
11 min readAug 13, 2019

--

Random alter of the PHP logo I did to make you focus on this. Gotcha.

N.B. : this is around 8 minutes read.

First of all, please remember this is posted as of 13/08/19, so this will sink in history as a collection of garbage and gibberish nonsense. I will delete this one day or another. Probably. :)

Also, please note this is a transcript from a Twitter thread I made here. Thus the hashed sentences lengths and format.

So, let’s take a closer look at what Zeev Zuraski (@zeevs) posted and proposed a few hours ago in the PHP internals.

There was a little confusion about what was proposed, despite his words being quite clear. This is going to be VERY long, for I have no idea how to sum up things. ¯\_(ツ)_/¯

Before we start, history!

First, let’s rewind the timelines a little, and have a look at history.
I hate the past and history, but as pure facts, it sometimes is needed.

PHP emerged in its raw, primitive form, circa 1994. It’s now 25 years old. The first idea was to get rid of CGI backends, crashing segfaults, maintenance nightmare, that were not suited for a web world. All that in the form of scripts, parsed through a slower but safer interpreter.

Rasmus Ledorf (@rasmus on Twitter), now 50 years old, from Denmark (Greenland) initiated the project, first for himself. But many people from newsgroups (the genetic sibling of mailing lists, the ancestor of forums, which are the ancestors of social networks #ExplainItToMillenals) came to help.

Andi Gutmans (@andigutmans on Twitter) and Zeev Suraski (@zeevs on Twitter), both now in their mid 40s, from Israel, were among those people.
Boy, time flies. #TempusFugit

They founded the bases of what we know today.

Andi and Zeev, together, founded Zend Technologies, in Cupertino, USA, and recently left it. Their careers are now somewhere else. Both are also emeritus members of Apache Software Foundation (@TheASF on Twitter). Andi Gutmans is now working for Amazon (head of a fistful of AWS).
Zeev Suraski left Zend a few weeks ago. He said “he’s still in tech”. No more for now. :)
Now Zend is part of Rogue Wave Software (@RogueWaveInc on Twitter).

Rasmus Lerdorf is still involved, and all of them three are still board members of the PHP Group, who “theorically owns” property and strategy for the language (https://www.php.net/credits.php). But, as he said himself: he creates solutions and then handles them to “better people”, seeing himself as a “bad programmer” -which we all know he is definitely not-. If you have one hour and like PHP, you should definitely have a look at Rasmus’ talks. Like this one:

Rasmus’ talk for 2019. With very personal and insightful stuff inside.

A quick history can be found in my slides, mostly inspired by Rasmus’ ones, here: https://www.slideshare.net/WilliamPinaud/php-in-2018-q4-afup-limoges

(and my corresponding conference, given in an AFUP / @ Limoges meeting, in French, is here:)

Oh, yeah, this is in French, but the slides are in English! :D

Please give a warm high five to those three people. There wouldn’t be any PHP discussion without them. They did an amazing job over those 25 years.

Explaining the divergence

Now, this partially explains the origins of everything now. If you look at the PHP RFCs (https://wiki.php.net/rfc), you’ll notice a trend. As for every project, there are four types of proposed contributions. That somehow reflect the Cynefin / Stacey Matrix.

Image source of lifebytwobytwo.com

In short, there are:

🔵 Bugs (https://bugs.php.net/)
🔵 Legacy management and unused stuff dropping
🔵 Evolutive facets that follow the new world
🔵 Abstract proposals made to test new, fuzzy stuff

If you want examples, think of, respectively:

🔵 A bug in parse_url()
🔵 Deprecating __autoload() magic function in PHP 7.2.0
🔵 Implementing the foreign functions interface (FFI)
🔵 Moving from an AOT compiler to a JIT compiler.

Now, the problem is that PHP still holds the primitive, unharmonized implementations of its first contributors, back then when no one had any idea they needed THAT much organization, voting and stuff. Remember, this is ten years before Git was born, and thirteen before Github was.

For instance, if you look at the perpetual comments regarding language consistency, you always face the same trolls, and people who come out from other languages often get lost on that.

For example, looking at naming consistencies: as Rasmus says so himself (here: https://youtu.be/wCZ5TJCBWMg?t=1116): there are no real naming inconsistencies, they’re just “not the way you expect them to be”. :)

Compensating struggle puzzle of the legacy API

The deal with PHP one has to understand, is that it’s quite old, and NEVER was pumped by major companies will. Contrary to Javascript (which is nowadays essentially Google, despite the W3C background), Java (Oracle), C# (Apple) or TypeScript (Microsoft).

So, like Python, Ruby, or Rust, PHP has a struggle in finding core maintainers (https://github.com/php/php-src). That requires above the average skills in C, security, algorithmics, among other things. Right now, PHP is lacking some of them.

Also this lack of programmers to maintain the core was responsible for one of the major crashes into its development. Between 2008 and 2011, there was one of the major initiatives ever tried to make PHP a more than ever unique language. That ultimately got abandoned.

People wanted to go in further, and offer universal source code character recognition, new engines, and much more new stuff. They would call it PHP 6. Due to the complexity of the project and the lack of motivation of many people, PHP stopped evolving for a few years.

Much of the work was ported to PHP 5.3.0 >> 5.6.0. But no real, structural, fundamental innovation was made to the engine. During that time, many major companies invested a no-way-back amount of money (billions) on PHP projects.

One of those companies was from the GAFAM, who all own one or more properties on languages and technologies. This is strategic defense. In the case of open licenses becoming private, they needed independence.

Facebook did not have this independence, despite becoming too large to allow not having such independence. They did not own. So they invested of faster web techs, like Google does, for example.

They built the foundations of what is now partially distributed and known as React.js for the frontend, and Hip-Hop Virtual Machine (HHVM) for the backend (these are NOT what Facebook actually uses, but it gets close to it).

They also added meta-languages for both sides. JSX (https://reactjs.org/docs/introducing-jsx.html) and Hack (https://hacklang.org/) languages were born.

By doing so, they completed their independence roadmap, and compensated for a time the difference between the entrepreneurial pace and the open-source pace. This avoided a potential crisis for them. Cool. But not.
(They also rewrote databases, like Google did for BigTable, and created Hive to complete the ecosystem.)

The problem here is that this outpaced the original, open-source based project that PHP was. And a strategic move had to be done. I mean, leaving PHP to Facebook could be a thing, but the GAFAM show no mercy when it comes to taking arbitrary moves. So the losses would be great for the whole community. You can buy beginner developers with hype cookies, but you won’t poison the most experimented ones: they have already seen such fallacies.

Also one must understand that there’s a difference in providing a programming ecosystem that people use to fuel your business (like Java for Android applications) and doing so simply because you use it… Until you don’t need it anymore.

Mostly, the reason why Facebook needed updates is that while it was becoming the second most used domain behind google.com, the need for performance was becoming more and more immediate.

We’re talking about tens of thousands of servers, for over 1.5 billion active users. Think of their cost and ecological imprint (the latter is more than probably ignored by GAFAM, with the exception of their communications services, though, but they do go along, fortunately for the planet).

By this time, PHP had Zend Engine (the core PHP interpreter since the dawn of men) version 1.0 from 2000, version 2.0 from 2004, but the core engine desperately need a rework. PHP community then woke up and eventually came up with a new engine, labeled php-ng (short for “next generation”).

As of PHP 7.2.0, php-ng strictly outraces HHVM. And we’re back to having an open source project, despite the many discussions regarding the PHP License, which is partially protected and therefore seen as many as “not totally open source”. Whatever (if you are interested in reading about the incompatibility with GNU, have a look here: https://www.gnu.org/licenses/license-list.en.html#PHP-3.01).

And that is not the sole strategic move for PHP. Many (like, many) of the current PRs / RFCs deal with “adding cool stuff”.

Cause PHP is one of the most evolving languages as of today (https://www.php.net/ChangeLog-7.php), along with Javascript (http://www.ecma-international.org/publications/standards/Ecma-262.htm), or Python, for instance (https://docs.python.org/3/whatsnew/changelog.html). This is part why those languages are among the top popular languages.

Keeping the pace and touching the future

There are two corporate strategies behind those moves. And both of these mix essential development of a programming language project on one side, and battle planning among the global computing ecosystems on the other side:

1. Changing legacy incoherences makes less trolls and attracts dumb CTOs / IT C-levels (you know, the ones who read stuff like this — https://www.google.com/search?q=whyphpsucks).

2. Adding functionalities that embrace new paradigms, which is mandatory. Change or die, they said.

If you don’t understand point 1, take a dive into the strong typing, arrays syntax (braces and array() function, mostly), case-insensitive constants and function arguments order debates, you’ll get to the point quite fast (https://wiki.php.net/rfc).

If you look closely, PHP has fascinating, unexpected side projects. Look at Swoole, php-ml (https://php-ml.readthedocs.io/en/latest/), PHPOpenCV (https://phpopencv.org/index.html), Symfony Messenger Component (https://symfony.com/components/Messenger), php-rdkafka (https://github.com/arnaud-lb/php-rdkafka) or APIPlatform (https://api-platform.com/), for instance.

With the arrival of FFI, pre-loading, JIT compilation with lazy-loading and hot code detection, OpCache core integration, PHP is answering the polarization of modern web apps, having rich frontends segregated from backend calculation and functional/structural constraints.

Just take a look at this quick, drafted demo from Zeev from mid-2018:

Zeev Suraski here shows a sample difference from a raw JIT modded version of PHP and the core AOT compilers. Remember: this is just a demo.

This is also why there could be a new PHP engine coming pretty soon as well. The ability for PHP to evolve, regardless of extreme C-level contracts, makes it very deemed to go every direction. This, my friends, is where the internals have set the course to.

Brave both worlds

So now, we’re back to the main topic, now you have the necessary background to fully understand the idea Zeev raised there, and why they are critical and incredibly clever, which is: how to arbitrate the polarizing discussions and RFC between:

- How to move away from legacy incoherences without breaking the planet (PHP fuels 80% of websites worldwide — https://w3techs.com/technologies/overview/programming_language/all)?
- How to implement new, radical changes at the same time so as to make strategic, enterprise-friendly moves (i.e., to enroll CTOs worldwide)?

First, if you haven’t done so, you need to read what Zeev wrote:

- In the RFC wiki: https://wiki.php.net/pplusplus/faq
- In the internals list: https://externals.io/message/106453
- Alternately, on Mailing lists ARChives: https://marc.info/?l=php-internals&m=156529545007909&w=2

In short, Zeev proposes to create a new “alternative version” of PHP (call it a “dialect” if you want), based on the same engine, source code and interpreter basics, bundled with PHP, but offering an alternative way of reading scripts, getting rid of BC breaks. Both would cohabit.

The main thing, here, is the first quote from Zeev, that is one of the introduction sentences on the internals “FAQ” post, made to sum up things (the link is just above):

“There are two big, substantial schools of thought in the PHP world. The first likes PHP roughly the way it is — dynamic, with strong BC bias and emphasis on simplicity; The other, prefers a stricter language, with reduced baggage and more advanced/complex features.”

As Zeev states, people concerned about any of those moves are less concerned about the other ones (but not ignoring them, though). To illustrate this, one of the most breaking stuff right now is the strong typing concerns. This implies lowering performance in order to allow the interpreter to check memory implementation structures and interpretation. This is by essence mandatory in low-level languages like C, and is actually done in C by the Zend Engine. If you want to read more regarding structural memory storage of PHP variables, have a look at Julien Pauli ‘s amazing article on his blog: http://blog.jpauli.tech/2016-04-08-hashtables-html/

Also, please remember: in PHP, as it could be anywhere else, strong typing is NOT a synonym of “progress”. This is a fallacy for people who started with low-level languages. The ultimate aim of programming is not, and will never be to stick to the machine, but to get closer to human thought process. Most people tend to forget this. Going towards binary electronic impulsions is NOT “cleaner”, “more logical”, nor is it “safer”. PHP is a higher-level abstraction language. There’s a reason why you don’t see a segfault every day while using Symfony or Laravel.
Dynamic typing is NOT a bug: it’s a fully willingly implemented feature from day one, for instance.

Offering duality

As Zeev states, this wouldn’t be a fork. For it would divide, without conquering. As stated above: PHP does not really have an excessive number of core developers at the moment (❤ to them all, by the way!). Both “standards” or “core API sets” would be embedded in any install.

In short: same engine, same code base, different modules, different constraints. One would have, let’s imagine, short tags, the other would not. One would have rewritten, strict code, syntax and signatures. The other one would keep legacy stuff that is still at risk if deleted/changed.

Looking at real-world examples, right now, there aren’t many strictly similar moves, but you can think of:

- Kotlin/Clojure/Groovy/Scala for Java/JVM,
- C#/Rust/C++/Cilk/Objective-C for C/GCC,
- Dart/ESX/TypeScript/CoffeeScript/ES6 for Javascript/V8.

The worst part of this is that a new “P++” language -whatever it would be called- should expose perfectly ALL new API, deprecations, and BC breaks in ONE shot. This would be hell of a challenge. We all know why we’re talking about BC breaks here: legacy code worldwide. It simply can’t happen twice. Or people will move to something else. Python is an excellent alternate candidate for PHP developers, and thanks to machine learning, it is growing very fast.

The idea behind that systemic distribution of both sets would be to avoid the failure of Hack language, which was not promoted by PHP itself, and was solely developed by an individual company. Which open source developers don’t really like.

Also, remember that the same team would have a grasp on both directions, implementations and choice. There wouldn’t be two teams, two companies, two divergent strategic needs. That changes everything.

Last but not the least, the legacy implementation of the “low-level” set would subsequently never be abandoned, for it would benefit from all common, core upgrades, like the latest engine upgrades. And older systems would still see a progress in upgrading low-level tech layers.

Like Zeev wrote, the hardest challenge now will be to find a decent, commercial name for this. 😜

And please, remember: this is JUST a FAQ, just a draft of thoughts. This is an open discussion that top-level people are having, not a C-level “do it or you’re fired” instruction.

That’s it. I hope I didn’t rewrite history, nor deface the thoughts and wills of people mentioned above. If so, please do correct me. This is a catch-up on history and internal discussions, not everything might be accurate.

Please forgive me if it is so. 😇
You have a great day. Peace.


William

@innersonics on Twitter.
Here on LinkedIn.
Here on Facebook.
Here if you like music.
Here and here if you like photography.

--

--

William Pinaud
William Pinaud

Written by William Pinaud

Developer / Lead developer / Web Artchitect / Technical Innovation since 2007. Also, I do photography, music, playing MTG, video games, and writing, a LOT. IAD.

Responses (3)