Apr 21, 2010

Attn: PHP Internals

Here's a flamy email that I just sent to the PHP Internals team.
______________

Regarding some of the features that are going to ship in PHP 6, I'm going to take some liberty and make some personal remarks in the form of positive criticism.

1. SYSTEM NAMESPACES. There are many PHP built-in functions that act on certain groups of entities. The best examples are the array_* and str* functions. There are many of them, and it gets really cumbersome to repeat the same prefix for each and every one. This is clearly a reminiscence of original procedural-style PHP. But now we have namespaces, that were introduced exactly for this kind of situations. Why not take advantage of it? PHP can have a built-in \std\array or \php\array or \__array namespace that would group all functions related to arrays, and thus have the array_ prefix removed. I see this as an elegant solution for grouping functionality, without the use of classes and objects as some languages do to solve this issue. Also, as namespaces support const values, they can easily be employed here and have some of their prefixes removed too.

Moreover, I see this extended to certain extensions as well, such as the database extension. Because, let's face it, it's not that logical to have a mysqli class and objects of type mysqli. It would make more sense to have mysql, mysqli, mssql, oracle (not oci8), sqlite etc. namespaces from a logical (and realistic) point of view. What I want to emphasize is: Let's not use classes and repetitive prefixes for grouping purposes, especially when we have a dedicated language feature for that.

2. TYPE HINTING. Currently PHP supports argument type hinting for arrays and objects. As I know, it's also been decided to offer support for this in function return values. For me it is hard to understand why not offer support for type hinting of scalar values as well. Hinting string, int, float and bool values can save a lot of debugging time and would provide a great mechanism for early detection of bugs. It will also allow developers to avoid writing hundreds of lines of code (with is_type tests) in a medium application just to ensure their parameters are of the expected type. It's better and faster to have checks at compile time.

I cannot agree with the phrase "We do not allow type-hinted properties as it's not the PHP way". But what is the PHP way? Classes and namespaces were not the PHP way either. Here we have (and use) them. PHP needs type-hinted object properties just as it needs type-hinted function arguments. It will reduce a lot of errors and bugs in code that logically requires some properties to be of a certain type. It will make objects more consistent. It will make interfaces more intuitive and more semantic. And the time spent by the compiler to make the checks will be a better tradeoff than have lots of lines of application code testing for types. It's logical for a Person to have a string name, an int age, a bool gender and so on. I believe this is the PHP way.

3. CONST VALUES. PHP supports class constants and, as of 5.3, namespace constants with the same syntax. However, there is a major limitation upon constants: "The value must be a constant expression, not (for example) a variable, a property, a result of a mathematical operation, or a function call". I think PHP makes a faulty use of constants throughout its implementation. We have original constants (can be define'd even with non-constant expressions), class and namespace constants (can have only constant-expression values) and the so-called magic constants that... are not really constants at all.

Generally, constants are language elements that are defined with an initial value and cannot be reassigned or redefined later. It's an improper use of "initial value" as this is their only value. There should be no restriction upon where their value comes from, as long as you cannot change it later on. In PHP this limitation comes from the fact that constants are resolved at compile time, rather than run time. However, I think it should be possible to have constant arrays as their values, just as we have them in object properties. Also, as with the introduction of namespaces, the non-namespaced code coincides with the global namespace ( \ ) code, I think it's safe to remove the define function and possibly allow the const keyword to define runtime constants as well (when the value is not a constant expression).

4. PARAMETER ORDER. As noted in an older PHP meeting:

We went over the string functions and found that there are only two functions that have "needle, haystack" instead of "haystack, needle", namely in_array() and array_search(). For in_array() it makes sense in a logical way to work in the same way as SQL, where you first specify the value, and then you check if it fits "in the array". As array_search() was modelled on this is_array() function the parameter order is the same.
As there are not many inconsistencies, and changing them would cause quite some problems for current applications we decided not to change the order.


The conclusion here is a bit disappointing. We have the chance to fix a problem, but we choose not to. The very fact that there are only two functions with an inconsistent parameter order is a real reason to make the change. What if one third of the functions were inconsistent, would we make the change more easily? I doubt it, and at the same time I am convinced that it is much better to have things fixed earlier and before it's too late.

These were the major issues that came to my mind until now. I am sure many things and ideas can be rejected with the reason of backward compatibility and fear of breaking tons of lines of ancient code. But existing code can and has to be rewritten, modified or maintained in order to keep the pace. I'm totally against the idea that PHP should keep the pace with old code, and against the idea of an unbreakable constant-expression PHP style. This is a dynamic language and should act like one. Developers expect it to change for the better, and not just add up features.

I really appreciate all the hard work that has been put into the development of PHP 5.3. I like seeing intuitive and efficient (not just productive) features added, the same way I want bad features taken out.

Mar 24, 2010

The approximate HTML

At the time of writing this, Google had 40, Yahoo 154, Facebook 41, Twitter 88. All are validation errors on their homepages. No matter whether they define their document type as being HTML 4.01, HTML5 or XHTML 1.0, they all fail to validate and break basic rules. It's worth to mention that Google is represented in the HTML working group at W3C and they also handle the HTML5 draft editing. That's one example.

But why would they make such flagrant breakings of the web standards? They, in fact, produce incorrect code. Client code. And then make it inevitably public and available for others to see how they've done it. These are some of the most visited and busiest sites out there, and yet they see the rules of the web as something optional. Web browsers (some more than others) are often criticized for not providing a correct rendering of pages and for making it difficult for developers to build consistent applications. Very true. But do developers respect what they claim to be respected? Seems not. And this is just a (representative) sample, because there are only a few sites having valid markup. And even though an end user may not notice a simple HTML error, or may not care about it, the web standards have a clear goal of making things overall better for end users, while being transparent to them. Markup is the support framework of any content. And in the long run, a healthy markup will sustain better content. Better = correct, accessible and meaningful.

Of course, it's not all that easy to produce valid code, especially for dynamic web pages. It's even harder when outputting a lot of markup from the server side. But this alone is a bad practice as well. Fewer bytes per page and heavy use of Javascript can make for reasons of not being valid, but they cannot stand any solid argument. I think the best way to have your HTML clean and correct is to be willing to do it. After all, it's not enough to have a doctype at the top of your page; you'll also need to write the rest of the page with it in mind.

Mar 15, 2010

Hard to learn from the Web

Some time back Adam Bosworth wrote an interesting article about what the Web can and does teach us, and about why it's important to extend its capabilities in other areas. However, the Web is itself a poor learner. It is diverse and heterogeneous content. Content serving as information. Information coming from data, by means of... HTML.

The problem is not with the content, but with the structure. HTML has been an excellent markup at that. What some may not know is that its latest (current) stable version, HTML 4.01, is 11 (eleven, as in a soccer team) years old. That beats even the age of the C++ standard, or does it? In an environment that changes so often, that has had a tremendous growth over the past decade, one that can be taken as an example of evolution – what we get to work with inside it is an aged and almost deprecated language acting as a main tool. After so “many” revisions and after such a “continuous” development, HTML has clearly failed to keep the pace with today's Web. It is difficult to express modern and original ideas with a technology that was unable to stay in sync with the very medium it acts upon.

That, in fact, is the main reason that there are so many flavors of HTML, so many flagrant quirk modes, and so many browser-specific markup extensions. HTML5 is coming out a “bit” late. But better later than never. One of the best parts of HTML 4.01 was its simplicity; I believe XHTML has failed (did it?) just because of breaking this rule. HTML5 seems to have observed this, and moreover seems to have learned some key lessons from the semantic Web.

Mar 4, 2010

What you C++ is what you get

This year, more than ever, C++ is coming closer and closer to hitting a major upgrade as a “standard programming language”. It's the next big take on this language – the upcoming and so-called C++0x, a new ISO standard. The first C++ standard was released in 1998, as far as I know, and another tiny revision in 2003, that is after 5 years. But that one was so small it can barely be called a “revision”. C++0x, on the other hand, will make for a real revision and will add up many polished features to the language (threads, regexes, hash tables, tuples, the auto type, a new for-loop, lambdas, delegation, variadic templates – just to name a few). Great things and great work on this side of the page.

But – because there is a big but – C++ deserves more. More is less, someone once told me. And in this case, it may need less. C++ is a big and complex language, it is a “cathedral” prisoner struggling to escape into the world and experience with everything the “bazaar” has to offer. A first-class prisoner at that. Way too many features are being discussed and debated and standardized, and yet the removal of obsolete or simply bad features is kind of left aside. IMO these are the bugs that C++ as a solid language has, and that won't be fixed too soon, if ever:

  • The whole idea of C as a strict subset of C++, along with the famous idea of maintaining backward compatibility with millions and millions of lines of C code, are both counter-evolutionary and will cause static software at any time. Existing code must change if it needs or wants to evolve. Languages should encourage software (existing and future) to change for the better. The post-increment operator in its name should mean more than “superset” and “object-oriented”.
  • C++ must have a general cleanup of its features. It could generalize many useful ideas from the STL in other parts of the language. It can have the STL simplified a lot.
  • Do we still need char * ? Moreover, do we still need C pointers at all? Do we need all the iterators in order to pass through a mere collection? Do we need all the maps and multimaps and sets and multisets and hashes and so on, when we could have one good hash? Do we need templates, when some auto type could do it? Do we need all the stacks, queues, deques, when a proper vector would suffice?
  • Maybe the best way to advance is not making simple things complicated; and just maybe C++ can learn some new and good things from interpreted languages as PHP, Python and Ruby.
  • Maybe the ISO standardization and voting process are not that suited for such a language. C++ can learn a lot from open source and from the evolution of languages similar to those mentioned above. Release early, release often, community support, simplicity, continuous development, are facts that can be borrowed successfully from software development to actual language construction.

Who hasn't heard of C or C++? It has inspired and influenced many other languages, and has helped in building a lot of them. At this moment, C++ can too learn and pull good things and best practices from others. In breaking the excessive backward compatibility with C and with itself, it can have a glimpse at the way PHP handled this when passing to version 5 from 4. Also, Python stepping up to a revamped version 3 is a living example.

The Web has shown us more than anything else that software is ever-changing. C++ should and is able to keep the pace in the most intelligent manner.