1
0
mirror of https://github.com/vichan-devel/vichan.git synced 2024-11-28 17:31:00 +01:00

add htmlpurifier library

This commit is contained in:
8chan 2014-09-23 23:25:04 +00:00
parent 7c877c9056
commit da9ee32c2b
807 changed files with 64577 additions and 0 deletions

View File

@ -0,0 +1 @@
configdoc/usage.xml -crlf

24
inc/lib/htmlpurifier-4.5.0/.gitignore vendored Normal file
View File

@ -0,0 +1,24 @@
tags
conf/
test-settings.php
config-schema.php
library/HTMLPurifier/DefinitionCache/Serializer/*/
library/standalone/
library/HTMLPurifier.standalone.php
library/HTMLPurifier*.tgz
library/package*.xml
smoketests/test-schema.html
configdoc/*.html
configdoc/configdoc.xml
docs/doxygen*
*.phpt.diff
*.phpt.exp
*.phpt.log
*.phpt.out
*.phpt.php
*.phpt.skip.php
*.htmlt.ini
*.patch
/*.php
vendor
composer.lock

View File

@ -0,0 +1,9 @@
CREDITS
Almost everything written by Edward Z. Yang (Ambush Commander). Lots of thanks
to the DevNetwork Community for their help (see docs/ref-devnetwork.html for
more details), Feyd especially (namely IPv6 and optimization). Thanks to RSnake
for letting me package his fantastic XSS cheatsheet for a smoketest.
vim: et sw=4 sts=4

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,13 @@
4 - Minor feature enhancements
[ Appendix A: Release focus IDs ]
0 - N/A
1 - Initial freshmeat announcement
2 - Documentation
3 - Code cleanup
4 - Minor feature enhancements
5 - Major feature enhancements
6 - Minor bugfixes
7 - Major bugfixes
8 - Minor security fixes
9 - Major security fixes

View File

@ -0,0 +1,374 @@
Install
How to install HTML Purifier
HTML Purifier is designed to run out of the box, so actually using the
library is extremely easy. (Although... if you were looking for a
step-by-step installation GUI, you've downloaded the wrong software!)
While the impatient can get going immediately with some of the sample
code at the bottom of this library, it's well worth reading this entire
document--most of the other documentation assumes that you are familiar
with these contents.
---------------------------------------------------------------------------
1. Compatibility
HTML Purifier is PHP 5 only, and is actively tested from PHP 5.0.5 and
up. It has no core dependencies with other libraries. PHP
4 support was deprecated on December 31, 2007 with HTML Purifier 3.0.0.
HTML Purifier is not compatible with zend.ze1_compatibility_mode.
These optional extensions can enhance the capabilities of HTML Purifier:
* iconv : Converts text to and from non-UTF-8 encodings
* bcmath : Used for unit conversion and imagecrash protection
* tidy : Used for pretty-printing HTML
These optional libraries can enhance the capabilities of HTML Purifier:
* CSSTidy : Clean CSS stylesheets using %Core.ExtractStyleBlocks
* Net_IDNA2 (PEAR) : IRI support using %Core.EnableIDNA
---------------------------------------------------------------------------
2. Reconnaissance
A big plus of HTML Purifier is its inerrant support of standards, so
your web-pages should be standards-compliant. (They should also use
semantic markup, but that's another issue altogether, one HTML Purifier
cannot fix without reading your mind.)
HTML Purifier can process these doctypes:
* XHTML 1.0 Transitional (default)
* XHTML 1.0 Strict
* HTML 4.01 Transitional
* HTML 4.01 Strict
* XHTML 1.1
...and these character encodings:
* UTF-8 (default)
* Any encoding iconv supports (with crippled internationalization support)
These defaults reflect what my choices would be if I were authoring an
HTML document, however, what you choose depends on the nature of your
codebase. If you don't know what doctype you are using, you can determine
the doctype from this identifier at the top of your source code:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
...and the character encoding from this code:
<meta http-equiv="Content-type" content="text/html;charset=ENCODING">
If the character encoding declaration is missing, STOP NOW, and
read 'docs/enduser-utf8.html' (web accessible at
http://htmlpurifier.org/docs/enduser-utf8.html). In fact, even if it is
present, read this document anyway, as many websites specify their
document's character encoding incorrectly.
---------------------------------------------------------------------------
3. Including the library
The procedure is quite simple:
require_once '/path/to/library/HTMLPurifier.auto.php';
This will setup an autoloader, so the library's files are only included
when you use them.
Only the contents in the library/ folder are necessary, so you can remove
everything else when using HTML Purifier in a production environment.
If you installed HTML Purifier via PEAR, all you need to do is:
require_once 'HTMLPurifier.auto.php';
Please note that the usual PEAR practice of including just the classes you
want will not work with HTML Purifier's autoloading scheme.
Advanced users, read on; other users can skip to section 4.
Autoload compatibility
----------------------
HTML Purifier attempts to be as smart as possible when registering an
autoloader, but there are some cases where you will need to change
your own code to accomodate HTML Purifier. These are those cases:
PHP VERSION IS LESS THAN 5.1.2, AND YOU'VE DEFINED __autoload
Because spl_autoload_register() doesn't exist in early versions
of PHP 5, HTML Purifier has no way of adding itself to the autoload
stack. Modify your __autoload function to test
HTMLPurifier_Bootstrap::autoload($class)
For example, suppose your autoload function looks like this:
function __autoload($class) {
require str_replace('_', '/', $class) . '.php';
return true;
}
A modified version with HTML Purifier would look like this:
function __autoload($class) {
if (HTMLPurifier_Bootstrap::autoload($class)) return true;
require str_replace('_', '/', $class) . '.php';
return true;
}
Note that there *is* some custom behavior in our autoloader; the
original autoloader in our example would work for 99% of the time,
but would fail when including language files.
AN __autoload FUNCTION IS DECLARED AFTER OUR AUTOLOADER IS REGISTERED
spl_autoload_register() has the curious behavior of disabling
the existing __autoload() handler. Users need to explicitly
spl_autoload_register('__autoload'). Because we use SPL when it
is available, __autoload() will ALWAYS be disabled. If __autoload()
is declared before HTML Purifier is loaded, this is not a problem:
HTML Purifier will register the function for you. But if it is
declared afterwards, it will mysteriously not work. This
snippet of code (after your autoloader is defined) will fix it:
spl_autoload_register('__autoload')
Users should also be on guard if they use a version of PHP previous
to 5.1.2 without an autoloader--HTML Purifier will define __autoload()
for you, which can collide with an autoloader that was added by *you*
later.
For better performance
----------------------
Opcode caches, which greatly speed up PHP initialization for scripts
with large amounts of code (HTML Purifier included), don't like
autoloaders. We offer an include file that includes all of HTML Purifier's
files in one go in an opcode cache friendly manner:
// If /path/to/library isn't already in your include path, uncomment
// the below line:
// require '/path/to/library/HTMLPurifier.path.php';
require 'HTMLPurifier.includes.php';
Optional components still need to be included--you'll know if you try to
use a feature and you get a class doesn't exists error! The autoloader
can be used in conjunction with this approach to catch classes that are
missing. Simply add this afterwards:
require 'HTMLPurifier.autoload.php';
Standalone version
------------------
HTML Purifier has a standalone distribution; you can also generate
a standalone file from the full version by running the script
maintenance/generate-standalone.php . The standalone version has the
benefit of having most of its code in one file, so parsing is much
faster and the library is easier to manage.
If HTMLPurifier.standalone.php exists in the library directory, you
can use it like this:
require '/path/to/HTMLPurifier.standalone.php';
This is equivalent to including HTMLPurifier.includes.php, except that
the contents of standalone/ will be added to your path. To override this
behavior, specify a new HTMLPURIFIER_PREFIX where standalone files can
be found (usually, this will be one directory up, the "true" library
directory in full distributions). Don't forget to set your path too!
The autoloader can be added to the end to ensure the classes are
loaded when necessary; otherwise you can manually include them.
To use the autoloader, use this:
require 'HTMLPurifier.autoload.php';
For advanced users
------------------
HTMLPurifier.auto.php performs a number of operations that can be done
individually. These are:
HTMLPurifier.path.php
Puts /path/to/library in the include path. For high performance,
this should be done in php.ini.
HTMLPurifier.autoload.php
Registers our autoload handler HTMLPurifier_Bootstrap::autoload($class).
You can do these operations by yourself--in fact, you must modify your own
autoload handler if you are using a version of PHP earlier than PHP 5.1.2
(See "Autoload compatibility" above).
---------------------------------------------------------------------------
4. Configuration
HTML Purifier is designed to run out-of-the-box, but occasionally HTML
Purifier needs to be told what to do. If you answer no to any of these
questions, read on; otherwise, you can skip to the next section (or, if you're
into configuring things just for the heck of it, skip to 4.3).
* Am I using UTF-8?
* Am I using XHTML 1.0 Transitional?
If you answered no to any of these questions, instantiate a configuration
object and read on:
$config = HTMLPurifier_Config::createDefault();
4.1. Setting a different character encoding
You really shouldn't use any other encoding except UTF-8, especially if you
plan to support multilingual websites (read section three for more details).
However, switching to UTF-8 is not always immediately feasible, so we can
adapt.
HTML Purifier uses iconv to support other character encodings, as such,
any encoding that iconv supports <http://www.gnu.org/software/libiconv/>
HTML Purifier supports with this code:
$config->set('Core.Encoding', /* put your encoding here */);
An example usage for Latin-1 websites (the most common encoding for English
websites):
$config->set('Core.Encoding', 'ISO-8859-1');
Note that HTML Purifier's support for non-Unicode encodings is crippled by the
fact that any character not supported by that encoding will be silently
dropped, EVEN if it is ampersand escaped. If you want to work around
this, you are welcome to read docs/enduser-utf8.html for a fix,
but please be cognizant of the issues the "solution" creates (for this
reason, I do not include the solution in this document).
4.2. Setting a different doctype
For those of you using HTML 4.01 Transitional, you can disable
XHTML output like this:
$config->set('HTML.Doctype', 'HTML 4.01 Transitional');
Other supported doctypes include:
* HTML 4.01 Strict
* HTML 4.01 Transitional
* XHTML 1.0 Strict
* XHTML 1.0 Transitional
* XHTML 1.1
4.3. Other settings
There are more configuration directives which can be read about
here: <http://htmlpurifier.org/live/configdoc/plain.html> They're a bit boring,
but they can help out for those of you who like to exert maximum control over
your code. Some of the more interesting ones are configurable at the
demo <http://htmlpurifier.org/demo.php> and are well worth looking into
for your own system.
For example, you can fine tune allowed elements and attributes, convert
relative URLs to absolute ones, and even autoparagraph input text! These
are, respectively, %HTML.Allowed, %URI.MakeAbsolute and %URI.Base, and
%AutoFormat.AutoParagraph. The %Namespace.Directive naming convention
translates to:
$config->set('Namespace.Directive', $value);
E.g.
$config->set('HTML.Allowed', 'p,b,a[href],i');
$config->set('URI.Base', 'http://www.example.com');
$config->set('URI.MakeAbsolute', true);
$config->set('AutoFormat.AutoParagraph', true);
---------------------------------------------------------------------------
5. Caching
HTML Purifier generates some cache files (generally one or two) to speed up
its execution. For maximum performance, make sure that
library/HTMLPurifier/DefinitionCache/Serializer is writeable by the webserver.
If you are in the library/ folder of HTML Purifier, you can set the
appropriate permissions using:
chmod -R 0755 HTMLPurifier/DefinitionCache/Serializer
If the above command doesn't work, you may need to assign write permissions
to all. This may be necessary if your webserver runs as nobody, but is
not recommended since it means any other user can write files in the
directory. Use:
chmod -R 0777 HTMLPurifier/DefinitionCache/Serializer
You can also chmod files via your FTP client; this option
is usually accessible by right clicking the corresponding directory and
then selecting "chmod" or "file permissions".
Starting with 2.0.1, HTML Purifier will generate friendly error messages
that will tell you exactly what you have to chmod the directory to, if in doubt,
follow its advice.
If you are unable or unwilling to give write permissions to the cache
directory, you can either disable the cache (and suffer a performance
hit):
$config->set('Core.DefinitionCache', null);
Or move the cache directory somewhere else (no trailing slash):
$config->set('Cache.SerializerPath', '/home/user/absolute/path');
---------------------------------------------------------------------------
6. Using the code
The interface is mind-numbingly simple:
$purifier = new HTMLPurifier($config);
$clean_html = $purifier->purify( $dirty_html );
That's it! For more examples, check out docs/examples/ (they aren't very
different though). Also, docs/enduser-slow.html gives advice on what to
do if HTML Purifier is slowing down your application.
---------------------------------------------------------------------------
7. Quick install
First, make sure library/HTMLPurifier/DefinitionCache/Serializer is
writable by the webserver (see Section 5: Caching above for details).
If your website is in UTF-8 and XHTML Transitional, use this code:
<?php
require_once '/path/to/htmlpurifier/library/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
$purifier = new HTMLPurifier($config);
$clean_html = $purifier->purify($dirty_html);
?>
If your website is in a different encoding or doctype, use this code:
<?php
require_once '/path/to/htmlpurifier/library/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
$config->set('Core.Encoding', 'ISO-8859-1'); // replace with your encoding
$config->set('HTML.Doctype', 'HTML 4.01 Transitional'); // replace with your doctype
$purifier = new HTMLPurifier($config);
$clean_html = $purifier->purify($dirty_html);
?>
vim: et sw=4 sts=4

View File

@ -0,0 +1,69 @@

Installation
Comment installer HTML Purifier
Attention: Ce document a encode en UTF-8. Si les lettres avec les accents
est essoreuse, prenez un mieux editeur de texte.
À L'Aide: Je ne suis pas un diseur natif de français. Si vous trouvez une
erreur dans ce document, racontez-moi! Merci.
L'installation de HTML Purifier est trés simple, parce qu'il ne doit pas
la configuration. Dans le pied de de document, les utilisateurs
impatient peuvent trouver le code, mais je recommande que vous lisez
ce document pour quelques choses.
1. Compatibilité
HTML Purifier fonctionne dans PHP 5. PHP 5.0.5 est le dernier
version que je le testais. Il ne dépend de les autre librairies.
Les extensions optionnel est iconv (en général déjà installer) et
tidy (répandu aussi). Si vous utilisez UTF-8 et ne voulez pas
l'indentation, vous pouvez utiliser HTML Purifier sans ces extensions.
2. Inclure la librarie
Utilisez:
require_once '/path/to/library/HTMLPurifier.auto.php';
...quand vous devez utiliser HTML Purifier (ne inclure pas quand vous
ne devez pas, parce que HTML Purifier est trés grand.)
HTML Purifier utilise 'autoload'. Si vous avez définu la fonction
__autoload, vous doivez ajoute cet programme:
spl_autoload_register('__autoload')
Plus d'information est dans le document 'INSTALL'.
3. Installation vite
Si votre site web est en UTF-8 et XHTML Transitional, utilisez:
<?php
require_once '/path/to/htmlpurifier/library/HTMLPurifier.auto.php';
$purificateur = new HTMLPurifier();
$html_propre = $purificateur->purify($html_salle);
?>
Sinon, utilisez:
<?php
require_once '/path/to/htmlpurifier/library/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
$config->set('Core', 'Encoding', 'ISO-8859-1'); //remplacez avec votre encoding
$config->set('Core', 'XHTML', true); //remplacez avec false si HTML 4.01
$purificateur = new HTMLPurifier($config);
$html_propre = $purificateur->purify($html_salle);
?>
vim: et sw=4 sts=4

View File

@ -0,0 +1,504 @@
GNU LESSER GENERAL PUBLIC LICENSE
Version 2.1, February 1999
Copyright (C) 1991, 1999 Free Software Foundation, Inc.
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
[This is the first released version of the Lesser GPL. It also counts
as the successor of the GNU Library Public License, version 2, hence
the version number 2.1.]
Preamble
The licenses for most software are designed to take away your
freedom to share and change it. By contrast, the GNU General Public
Licenses are intended to guarantee your freedom to share and change
free software--to make sure the software is free for all its users.
This license, the Lesser General Public License, applies to some
specially designated software packages--typically libraries--of the
Free Software Foundation and other authors who decide to use it. You
can use it too, but we suggest you first think carefully about whether
this license or the ordinary General Public License is the better
strategy to use in any particular case, based on the explanations below.
When we speak of free software, we are referring to freedom of use,
not price. Our General Public Licenses are designed to make sure that
you have the freedom to distribute copies of free software (and charge
for this service if you wish); that you receive source code or can get
it if you want it; that you can change the software and use pieces of
it in new free programs; and that you are informed that you can do
these things.
To protect your rights, we need to make restrictions that forbid
distributors to deny you these rights or to ask you to surrender these
rights. These restrictions translate to certain responsibilities for
you if you distribute copies of the library or if you modify it.
For example, if you distribute copies of the library, whether gratis
or for a fee, you must give the recipients all the rights that we gave
you. You must make sure that they, too, receive or can get the source
code. If you link other code with the library, you must provide
complete object files to the recipients, so that they can relink them
with the library after making changes to the library and recompiling
it. And you must show them these terms so they know their rights.
We protect your rights with a two-step method: (1) we copyright the
library, and (2) we offer you this license, which gives you legal
permission to copy, distribute and/or modify the library.
To protect each distributor, we want to make it very clear that
there is no warranty for the free library. Also, if the library is
modified by someone else and passed on, the recipients should know
that what they have is not the original version, so that the original
author's reputation will not be affected by problems that might be
introduced by others.
Finally, software patents pose a constant threat to the existence of
any free program. We wish to make sure that a company cannot
effectively restrict the users of a free program by obtaining a
restrictive license from a patent holder. Therefore, we insist that
any patent license obtained for a version of the library must be
consistent with the full freedom of use specified in this license.
Most GNU software, including some libraries, is covered by the
ordinary GNU General Public License. This license, the GNU Lesser
General Public License, applies to certain designated libraries, and
is quite different from the ordinary General Public License. We use
this license for certain libraries in order to permit linking those
libraries into non-free programs.
When a program is linked with a library, whether statically or using
a shared library, the combination of the two is legally speaking a
combined work, a derivative of the original library. The ordinary
General Public License therefore permits such linking only if the
entire combination fits its criteria of freedom. The Lesser General
Public License permits more lax criteria for linking other code with
the library.
We call this license the "Lesser" General Public License because it
does Less to protect the user's freedom than the ordinary General
Public License. It also provides other free software developers Less
of an advantage over competing non-free programs. These disadvantages
are the reason we use the ordinary General Public License for many
libraries. However, the Lesser license provides advantages in certain
special circumstances.
For example, on rare occasions, there may be a special need to
encourage the widest possible use of a certain library, so that it becomes
a de-facto standard. To achieve this, non-free programs must be
allowed to use the library. A more frequent case is that a free
library does the same job as widely used non-free libraries. In this
case, there is little to gain by limiting the free library to free
software only, so we use the Lesser General Public License.
In other cases, permission to use a particular library in non-free
programs enables a greater number of people to use a large body of
free software. For example, permission to use the GNU C Library in
non-free programs enables many more people to use the whole GNU
operating system, as well as its variant, the GNU/Linux operating
system.
Although the Lesser General Public License is Less protective of the
users' freedom, it does ensure that the user of a program that is
linked with the Library has the freedom and the wherewithal to run
that program using a modified version of the Library.
The precise terms and conditions for copying, distribution and
modification follow. Pay close attention to the difference between a
"work based on the library" and a "work that uses the library". The
former contains code derived from the library, whereas the latter must
be combined with the library in order to run.
GNU LESSER GENERAL PUBLIC LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
0. This License Agreement applies to any software library or other
program which contains a notice placed by the copyright holder or
other authorized party saying it may be distributed under the terms of
this Lesser General Public License (also called "this License").
Each licensee is addressed as "you".
A "library" means a collection of software functions and/or data
prepared so as to be conveniently linked with application programs
(which use some of those functions and data) to form executables.
The "Library", below, refers to any such software library or work
which has been distributed under these terms. A "work based on the
Library" means either the Library or any derivative work under
copyright law: that is to say, a work containing the Library or a
portion of it, either verbatim or with modifications and/or translated
straightforwardly into another language. (Hereinafter, translation is
included without limitation in the term "modification".)
"Source code" for a work means the preferred form of the work for
making modifications to it. For a library, complete source code means
all the source code for all modules it contains, plus any associated
interface definition files, plus the scripts used to control compilation
and installation of the library.
Activities other than copying, distribution and modification are not
covered by this License; they are outside its scope. The act of
running a program using the Library is not restricted, and output from
such a program is covered only if its contents constitute a work based
on the Library (independent of the use of the Library in a tool for
writing it). Whether that is true depends on what the Library does
and what the program that uses the Library does.
1. You may copy and distribute verbatim copies of the Library's
complete source code as you receive it, in any medium, provided that
you conspicuously and appropriately publish on each copy an
appropriate copyright notice and disclaimer of warranty; keep intact
all the notices that refer to this License and to the absence of any
warranty; and distribute a copy of this License along with the
Library.
You may charge a fee for the physical act of transferring a copy,
and you may at your option offer warranty protection in exchange for a
fee.
2. You may modify your copy or copies of the Library or any portion
of it, thus forming a work based on the Library, and copy and
distribute such modifications or work under the terms of Section 1
above, provided that you also meet all of these conditions:
a) The modified work must itself be a software library.
b) You must cause the files modified to carry prominent notices
stating that you changed the files and the date of any change.
c) You must cause the whole of the work to be licensed at no
charge to all third parties under the terms of this License.
d) If a facility in the modified Library refers to a function or a
table of data to be supplied by an application program that uses
the facility, other than as an argument passed when the facility
is invoked, then you must make a good faith effort to ensure that,
in the event an application does not supply such function or
table, the facility still operates, and performs whatever part of
its purpose remains meaningful.
(For example, a function in a library to compute square roots has
a purpose that is entirely well-defined independent of the
application. Therefore, Subsection 2d requires that any
application-supplied function or table used by this function must
be optional: if the application does not supply it, the square
root function must still compute square roots.)
These requirements apply to the modified work as a whole. If
identifiable sections of that work are not derived from the Library,
and can be reasonably considered independent and separate works in
themselves, then this License, and its terms, do not apply to those
sections when you distribute them as separate works. But when you
distribute the same sections as part of a whole which is a work based
on the Library, the distribution of the whole must be on the terms of
this License, whose permissions for other licensees extend to the
entire whole, and thus to each and every part regardless of who wrote
it.
Thus, it is not the intent of this section to claim rights or contest
your rights to work written entirely by you; rather, the intent is to
exercise the right to control the distribution of derivative or
collective works based on the Library.
In addition, mere aggregation of another work not based on the Library
with the Library (or with a work based on the Library) on a volume of
a storage or distribution medium does not bring the other work under
the scope of this License.
3. You may opt to apply the terms of the ordinary GNU General Public
License instead of this License to a given copy of the Library. To do
this, you must alter all the notices that refer to this License, so
that they refer to the ordinary GNU General Public License, version 2,
instead of to this License. (If a newer version than version 2 of the
ordinary GNU General Public License has appeared, then you can specify
that version instead if you wish.) Do not make any other change in
these notices.
Once this change is made in a given copy, it is irreversible for
that copy, so the ordinary GNU General Public License applies to all
subsequent copies and derivative works made from that copy.
This option is useful when you wish to copy part of the code of
the Library into a program that is not a library.
4. You may copy and distribute the Library (or a portion or
derivative of it, under Section 2) in object code or executable form
under the terms of Sections 1 and 2 above provided that you accompany
it with the complete corresponding machine-readable source code, which
must be distributed under the terms of Sections 1 and 2 above on a
medium customarily used for software interchange.
If distribution of object code is made by offering access to copy
from a designated place, then offering equivalent access to copy the
source code from the same place satisfies the requirement to
distribute the source code, even though third parties are not
compelled to copy the source along with the object code.
5. A program that contains no derivative of any portion of the
Library, but is designed to work with the Library by being compiled or
linked with it, is called a "work that uses the Library". Such a
work, in isolation, is not a derivative work of the Library, and
therefore falls outside the scope of this License.
However, linking a "work that uses the Library" with the Library
creates an executable that is a derivative of the Library (because it
contains portions of the Library), rather than a "work that uses the
library". The executable is therefore covered by this License.
Section 6 states terms for distribution of such executables.
When a "work that uses the Library" uses material from a header file
that is part of the Library, the object code for the work may be a
derivative work of the Library even though the source code is not.
Whether this is true is especially significant if the work can be
linked without the Library, or if the work is itself a library. The
threshold for this to be true is not precisely defined by law.
If such an object file uses only numerical parameters, data
structure layouts and accessors, and small macros and small inline
functions (ten lines or less in length), then the use of the object
file is unrestricted, regardless of whether it is legally a derivative
work. (Executables containing this object code plus portions of the
Library will still fall under Section 6.)
Otherwise, if the work is a derivative of the Library, you may
distribute the object code for the work under the terms of Section 6.
Any executables containing that work also fall under Section 6,
whether or not they are linked directly with the Library itself.
6. As an exception to the Sections above, you may also combine or
link a "work that uses the Library" with the Library to produce a
work containing portions of the Library, and distribute that work
under terms of your choice, provided that the terms permit
modification of the work for the customer's own use and reverse
engineering for debugging such modifications.
You must give prominent notice with each copy of the work that the
Library is used in it and that the Library and its use are covered by
this License. You must supply a copy of this License. If the work
during execution displays copyright notices, you must include the
copyright notice for the Library among them, as well as a reference
directing the user to the copy of this License. Also, you must do one
of these things:
a) Accompany the work with the complete corresponding
machine-readable source code for the Library including whatever
changes were used in the work (which must be distributed under
Sections 1 and 2 above); and, if the work is an executable linked
with the Library, with the complete machine-readable "work that
uses the Library", as object code and/or source code, so that the
user can modify the Library and then relink to produce a modified
executable containing the modified Library. (It is understood
that the user who changes the contents of definitions files in the
Library will not necessarily be able to recompile the application
to use the modified definitions.)
b) Use a suitable shared library mechanism for linking with the
Library. A suitable mechanism is one that (1) uses at run time a
copy of the library already present on the user's computer system,
rather than copying library functions into the executable, and (2)
will operate properly with a modified version of the library, if
the user installs one, as long as the modified version is
interface-compatible with the version that the work was made with.
c) Accompany the work with a written offer, valid for at
least three years, to give the same user the materials
specified in Subsection 6a, above, for a charge no more
than the cost of performing this distribution.
d) If distribution of the work is made by offering access to copy
from a designated place, offer equivalent access to copy the above
specified materials from the same place.
e) Verify that the user has already received a copy of these
materials or that you have already sent this user a copy.
For an executable, the required form of the "work that uses the
Library" must include any data and utility programs needed for
reproducing the executable from it. However, as a special exception,
the materials to be distributed need not include anything that is
normally distributed (in either source or binary form) with the major
components (compiler, kernel, and so on) of the operating system on
which the executable runs, unless that component itself accompanies
the executable.
It may happen that this requirement contradicts the license
restrictions of other proprietary libraries that do not normally
accompany the operating system. Such a contradiction means you cannot
use both them and the Library together in an executable that you
distribute.
7. You may place library facilities that are a work based on the
Library side-by-side in a single library together with other library
facilities not covered by this License, and distribute such a combined
library, provided that the separate distribution of the work based on
the Library and of the other library facilities is otherwise
permitted, and provided that you do these two things:
a) Accompany the combined library with a copy of the same work
based on the Library, uncombined with any other library
facilities. This must be distributed under the terms of the
Sections above.
b) Give prominent notice with the combined library of the fact
that part of it is a work based on the Library, and explaining
where to find the accompanying uncombined form of the same work.
8. You may not copy, modify, sublicense, link with, or distribute
the Library except as expressly provided under this License. Any
attempt otherwise to copy, modify, sublicense, link with, or
distribute the Library is void, and will automatically terminate your
rights under this License. However, parties who have received copies,
or rights, from you under this License will not have their licenses
terminated so long as such parties remain in full compliance.
9. You are not required to accept this License, since you have not
signed it. However, nothing else grants you permission to modify or
distribute the Library or its derivative works. These actions are
prohibited by law if you do not accept this License. Therefore, by
modifying or distributing the Library (or any work based on the
Library), you indicate your acceptance of this License to do so, and
all its terms and conditions for copying, distributing or modifying
the Library or works based on it.
10. Each time you redistribute the Library (or any work based on the
Library), the recipient automatically receives a license from the
original licensor to copy, distribute, link with or modify the Library
subject to these terms and conditions. You may not impose any further
restrictions on the recipients' exercise of the rights granted herein.
You are not responsible for enforcing compliance by third parties with
this License.
11. If, as a consequence of a court judgment or allegation of patent
infringement or for any other reason (not limited to patent issues),
conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License. If you cannot
distribute so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you
may not distribute the Library at all. For example, if a patent
license would not permit royalty-free redistribution of the Library by
all those who receive copies directly or indirectly through you, then
the only way you could satisfy both it and this License would be to
refrain entirely from distribution of the Library.
If any portion of this section is held invalid or unenforceable under any
particular circumstance, the balance of the section is intended to apply,
and the section as a whole is intended to apply in other circumstances.
It is not the purpose of this section to induce you to infringe any
patents or other property right claims or to contest validity of any
such claims; this section has the sole purpose of protecting the
integrity of the free software distribution system which is
implemented by public license practices. Many people have made
generous contributions to the wide range of software distributed
through that system in reliance on consistent application of that
system; it is up to the author/donor to decide if he or she is willing
to distribute software through any other system and a licensee cannot
impose that choice.
This section is intended to make thoroughly clear what is believed to
be a consequence of the rest of this License.
12. If the distribution and/or use of the Library is restricted in
certain countries either by patents or by copyrighted interfaces, the
original copyright holder who places the Library under this License may add
an explicit geographical distribution limitation excluding those countries,
so that distribution is permitted only in or among countries not thus
excluded. In such case, this License incorporates the limitation as if
written in the body of this License.
13. The Free Software Foundation may publish revised and/or new
versions of the Lesser General Public License from time to time.
Such new versions will be similar in spirit to the present version,
but may differ in detail to address new problems or concerns.
Each version is given a distinguishing version number. If the Library
specifies a version number of this License which applies to it and
"any later version", you have the option of following the terms and
conditions either of that version or of any later version published by
the Free Software Foundation. If the Library does not specify a
license version number, you may choose any version ever published by
the Free Software Foundation.
14. If you wish to incorporate parts of the Library into other free
programs whose distribution conditions are incompatible with these,
write to the author to ask for permission. For software which is
copyrighted by the Free Software Foundation, write to the Free
Software Foundation; we sometimes make exceptions for this. Our
decision will be guided by the two goals of preserving the free status
of all derivatives of our free software and of promoting the sharing
and reuse of software generally.
NO WARRANTY
15. BECAUSE THE LIBRARY IS LICENSED FREE OF CHARGE, THERE IS NO
WARRANTY FOR THE LIBRARY, TO THE EXTENT PERMITTED BY APPLICABLE LAW.
EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR
OTHER PARTIES PROVIDE THE LIBRARY "AS IS" WITHOUT WARRANTY OF ANY
KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE
LIBRARY IS WITH YOU. SHOULD THE LIBRARY PROVE DEFECTIVE, YOU ASSUME
THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
16. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN
WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY
AND/OR REDISTRIBUTE THE LIBRARY AS PERMITTED ABOVE, BE LIABLE TO YOU
FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR
CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE
LIBRARY (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING
RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A
FAILURE OF THE LIBRARY TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF
SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH
DAMAGES.
END OF TERMS AND CONDITIONS
How to Apply These Terms to Your New Libraries
If you develop a new library, and you want it to be of the greatest
possible use to the public, we recommend making it free software that
everyone can redistribute and change. You can do so by permitting
redistribution under these terms (or, alternatively, under the terms of the
ordinary General Public License).
To apply these terms, attach the following notices to the library. It is
safest to attach them to the start of each source file to most effectively
convey the exclusion of warranty; and each file should have at least the
"copyright" line and a pointer to where the full notice is found.
<one line to give the library's name and a brief idea of what it does.>
Copyright (C) <year> <name of author>
This library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public
License along with this library; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
Also add information on how to contact you by electronic and paper mail.
You should also get your employer (if you work as a programmer) or your
school, if any, to sign a "copyright disclaimer" for the library, if
necessary. Here is a sample; alter the names:
Yoyodyne, Inc., hereby disclaims all copyright interest in the
library `Frob' (a library for tweaking knobs) written by James Random Hacker.
<signature of Ty Coon>, 1 April 1990
Ty Coon, President of Vice
That's all there is to it!
vim: et sw=4 sts=4

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,24 @@
README
All about HTML Purifier
HTML Purifier is an HTML filtering solution that uses a unique combination
of robust whitelists and agressive parsing to ensure that not only are
XSS attacks thwarted, but the resulting HTML is standards compliant.
HTML Purifier is oriented towards richly formatted documents from
untrusted sources that require CSS and a full tag-set. This library can
be configured to accept a more restrictive set of tags, but it won't be
as efficient as more bare-bones parsers. It will, however, do the job
right, which may be more important.
Places to go:
* See INSTALL for a quick installation guide
* See docs/ for developer-oriented documentation, code examples and
an in-depth installation guide.
* See WYSIWYG for information on editors like TinyMCE and FCKeditor
HTML Purifier can be found on the web at: http://htmlpurifier.org/
vim: et sw=4 sts=4

View File

@ -0,0 +1,150 @@
TODO List
= KEY ====================
# Flagship
- Regular
? Maybe I'll Do It
==========================
If no interest is expressed for a feature that may require a considerable
amount of effort to implement, it may get endlessly delayed. Do not be
afraid to cast your vote for the next feature to be implemented!
Things to do as soon as possible:
- http://htmlpurifier.org/phorum/read.php?3,5560,6307#msg-6307
- Think about allowing explicit order of operations hooks for transforms
- Fix "<.<" bug (trailing < is removed if not EOD)
- Build in better internal state dumps and debugging tools for remote
debugging
- Allowed/Allowed* have strange interactions when both set
? Transform lone embeds into object tags
- Deprecated config options that emit warnings when you set them (with'
a way of muting the warning if you really want to)
- Make HTML.Trusted work with Output.FlashCompat
- HTML.Trusted and HTML.SafeObject have funny interaction; general
problem is what to do when a module "supersedes" another
(see also tables and basic tables.) This is a little dicier
because HTML.SafeObject has some extra functionality that
trusted might find useful. See http://htmlpurifier.org/phorum/read.php?3,5762,6100
FUTURE VERSIONS
---------------
4.6 release [OMG CONFIG PONIES]
! Fix Printer. It's from the old days when we didn't have decent XML classes
! Factor demo.php into a set of Printer classes, and then create a stub
file for users here (inside the actual HTML Purifier library)
- Fix error handling with form construction
- Do encoding validation in Printers, or at least, where user data comes in
- Config: Add examples to everything (make built-in which also automatically
gives output)
- Add "register" field to config schemas to eliminate dependence on
naming conventions (try to remember why we ultimately decided on tihs)
5.0 release [HTML 5]
# Swap out code to use html5lib tokenizer and tree-builder
! Allow turning off of FixNesting and required attribute insertion
5.1 release [It's All About Trust] (floating)
# Implement untrusted, dangerous elements/attributes
# Implement IDREF support (harder than it seems, since you cannot have
IDREFs to non-existent IDs)
- Implement <area> (client and server side image maps are blocking
on IDREF support)
# Frameset XHTML 1.0 and HTML 4.01 doctypes
- Figure out how to simultaneously set %CSS.Trusted and %HTML.Trusted (?)
5.2 release [Error'ed]
# Error logging for filtering/cleanup procedures
# Additional support for poorly written HTML
- Microsoft Word HTML cleaning (i.e. MsoNormal, but research essential!)
- Friendly strict handling of <address> (block -> <br>)
- XSS-attempt detection--certain errors are flagged XSS-like
- Append something to duplicate IDs so they're still usable (impl. note: the
dupe detector would also need to detect the suffix as well)
6.0 release [Beyond HTML]
# Legit token based CSS parsing (will require revamping almost every
AttrDef class). Probably will use CSSTidy
# More control over allowed CSS properties using a modularization
# IRI support (this includes IDN)
- Standardize token armor for all areas of processing
7.0 release [To XML and Beyond]
- Extended HTML capabilities based on namespacing and tag transforms (COMPLEX)
- Hooks for adding custom processors to custom namespaced tags and
attributes, offer default implementation
- Lots of documentation and samples
Ongoing
- More refactoring to take advantage of PHP5's facilities
- Refactor unit tests into lots of test methods
- Plugins for major CMSes (COMPLEX)
- phpBB
- Also, a FAQ for extension writers with HTML Purifier
AutoFormat
- Smileys
- Syntax highlighting (with GeSHi) with <pre> and possibly <?php
- Look at http://drupal.org/project/Modules/category/63 for ideas
Neat feature related
! Support exporting configuration, so users can easily tweak settings
in the demo, and then copy-paste into their own setup
- Advanced URI filtering schemes (see docs/proposal-new-directives.txt)
- Allow scoped="scoped" attribute in <style> tags; may be troublesome
because regular CSS has no way of uniquely identifying nodes, so we'd
have to generate IDs
- Explain how to use HTML Purifier in non-PHP languages / create
a simple command line stub (or complicated?)
- Fixes for Firefox's inability to handle COL alignment props (Bug 915)
- Automatically add non-breaking spaces to empty table cells when
empty-cells:show is applied to have compatibility with Internet Explorer
- Table of Contents generation (XHTML Compiler might be reusable). May also
be out-of-band information.
- Full set of color keywords. Also, a way to add onto them without
finalizing the configuration object.
- Write a var_export and memcached DefinitionCache - Denis
- Built-in support for target="_blank" on all external links
- Convert RTL/LTR override characters to <bdo> tags, or vice versa on demand.
Also, enable disabling of directionality
? Externalize inline CSS to promote clean HTML, proposed by Sander Tekelenburg
? Remove redundant tags, ex. <u><u>Underlined</u></u>. Implementation notes:
1. Analyzing which tags to remove duplicants
2. Ensure attributes are merged into the parent tag
3. Extend the tag exclusion system to specify whether or not the
contents should be dropped or not (currently, there's code that could do
something like this if it didn't drop the inner text too.)
? Make AutoParagraph also support paragraph-izing double <br> tags, and not
just double newlines. This is kind of tough to do in the current framework,
though, and might be reasonably approximated by search replacing double <br>s
with newlines before running it through HTML Purifier.
Maintenance related (slightly boring)
# CHMOD install script for PEAR installs
! Factor out command line parser into its own class, and unit test it
- Reduce size of internal data-structures (esp. HTMLDefinition)
- Allow merging configurations. Thus,
a -> b -> default
c -> d -> default
becomes
a -> b -> c -> d -> default
Maybe allow more fine-grained tuning of this behavior. Alternatively,
encourage people to use short plist depths before building them up.
- Time PHPT tests
ChildDef related (very boring)
- Abstract ChildDef_BlockQuote to work with all elements that only
allow blocks in them, required or optional
- Implement lenient <ruby> child validation
Wontfix
- Non-lossy smart alternate character encoding transformations (unless
patch provided)
- Pretty-printing HTML: users can use Tidy on the output on entire page
- Native content compression, whitespace stripping: use gzip if this is
really important
vim: et sw=4 sts=4

View File

@ -0,0 +1 @@
4.5.0

View File

@ -0,0 +1,6 @@
HTML Purifier 4.5.0 is a minor bugfix and feature release, containing an
accumulation of changes over a year. CSS support has been extended to
support display:inline-block, white-space, underscores in font families,
page-break-* CSS3 properties (when proprietary is enabled.) We now use
SHA-1 to identify cached definitions, and the semantics of stacked
attribute transforms has changed slightly.

View File

@ -0,0 +1,20 @@
WYSIWYG - What You See Is What You Get
HTML Purifier: A Pretty Good Fit for TinyMCE and FCKeditor
Javascript-based WYSIWYG editors, simply stated, are quite amazing. But I've
always been wary about using them due to security issues: they handle the
client-side magic, but once you've been served a piping hot load of unfiltered
HTML, what should be done then? In some situations, you can serve it uncleaned,
since you only offer these facilities to trusted(?) authors.
Unfortunantely, for blog comments and anonymous input, BBCode, Textile and
other markup languages still reign supreme. Put simply: filtering HTML is
hard work, and these WYSIWYG authors don't offer anything to alleviate that
trouble. Therein lies the solution:
HTML Purifier is perfect for filtering pure-HTML input from WYSIWYG editors.
Enough said.
vim: et sw=4 sts=4

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 382 B

View File

@ -0,0 +1,101 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-- Created with Inkscape (http://www.inkscape.org/) -->
<svg
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:cc="http://web.resource.org/cc/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns="http://www.w3.org/2000/svg"
xmlns:sodipodi="http://inkscape.sourceforge.net/DTD/sodipodi-0.dtd"
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
width="16.000000px"
height="16.000000px"
id="svg2"
sodipodi:version="0.32"
inkscape:version="0.42.2"
sodipodi:docbase="C:\DOCUME~1\Edward\MYDOCU~1\MYWEBS~1\HTMLPU~1\art"
sodipodi:docname="ICON-1~1.SVG"
inkscape:export-filename="C:\Documents and Settings\Edward\My Documents\My Webs\htmlpurifier\art\icon.png"
inkscape:export-xdpi="90.000000"
inkscape:export-ydpi="90.000000">
<defs
id="defs4">
<linearGradient
id="linearGradient5003">
<stop
style="stop-color:#537ddd;stop-opacity:1.0000000;"
offset="0.00000000"
id="stop5005" />
<stop
style="stop-color:#000000;stop-opacity:1.0000000;"
offset="1.0000000"
id="stop5007" />
</linearGradient>
<linearGradient
id="linearGradient4993">
<stop
style="stop-color:#183577;stop-opacity:1.0000000;"
offset="0.00000000"
id="stop4995" />
<stop
id="stop5001"
offset="0.58142859"
style="stop-color:#8b9fbb;stop-opacity:1.0000000;" />
<stop
style="stop-color:#ffffff;stop-opacity:1.0000000;"
offset="1.0000000"
id="stop4997" />
</linearGradient>
</defs>
<sodipodi:namedview
id="base"
pagecolor="#ffffff"
bordercolor="#666666"
borderopacity="1.0"
inkscape:pageopacity="0.0"
inkscape:pageshadow="2"
inkscape:zoom="23.255956"
inkscape:cx="3.2274095"
inkscape:cy="4.1487021"
inkscape:document-units="px"
inkscape:current-layer="layer1"
showgrid="false"
showguides="false"
gridspacingy="0.50000000cm"
gridspacingx="0.50000000cm"
gridoriginy="0.00000000cm"
gridoriginx="0.00000000cm"
gridtolerance="0.10000000cm"
gridempspacing="2"
inkscape:grid-points="true"
inkscape:grid-bbox="false"
inkscape:window-width="975"
inkscape:window-height="759"
inkscape:window-x="130"
inkscape:window-y="129" />
<metadata
id="metadata7">
<rdf:RDF>
<cc:Work
rdf:about="">
<dc:format>image/svg+xml</dc:format>
<dc:type
rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
</cc:Work>
</rdf:RDF>
</metadata>
<g
inkscape:label="Layer 1"
inkscape:groupmode="layer"
id="layer1">
<path
style="fill:none;fill-opacity:0.75000000;fill-rule:evenodd;stroke:#000000;stroke-width:1.0000004px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1.0000000"
d="M 6.1949060,157.57756 L 6.1949060,157.57756 z "
id="path1319" />
<path
style="fill:#224db0;fill-opacity:1.0000000;fill-rule:evenodd;stroke:none;stroke-width:6.0000000;stroke-linecap:butt;stroke-linejoin:round;marker-start:none;stroke-miterlimit:4.0000000;stroke-dasharray:none;stroke-opacity:1.0000000"
d="M 0.58257145,1.2121863 C 0.63548185,3.4711463 0.61139575,5.1777365 1.0905516,7.3958765 C 1.2484712,8.3223265 2.8202544,8.4670465 3.5807376,8.5653065 C 4.0690207,8.6283965 4.0902492,8.5661465 4.0906878,9.1995265 C 4.1064337,11.195036 4.0317991,11.792303 4.0906878,13.786273 C 4.0906878,15.786203 4.0909380,15.786453 6.0908601,15.786453 C 9.0907431,15.786453 10.414747,15.786053 12.414669,15.786053 C 14.414592,15.786053 14.414842,15.785813 14.414842,13.785893 C 14.418166,10.552283 14.404522,7.8374163 14.421897,4.6040763 C 14.625008,2.9294663 14.978320,2.5231063 15.830433,1.1499263 C 15.040843,1.1499263 14.943078,1.1493063 14.129102,1.1493063 C 8.8912931,1.1493063 5.8203807,1.2121863 0.58257145,1.2121863 z M 2.0905126,2.4028663 C 2.8330809,2.4028663 3.3476197,2.4028663 4.0901879,2.4028663 C 4.0901879,4.8586963 4.0906878,4.8408265 4.0906878,7.2966665 C 3.2248799,7.2967665 2.2374444,7.5239665 2.1488437,6.3255665 C 1.8765344,4.2270765 1.8837837,4.4962263 2.0905126,2.4028663 z M 5.4346466,6.5656265 L 6.8695071,6.5656265 L 6.8696331,13.558223 L 9.8657791,11.557663 L 9.8656551,6.5656265 L 13.070633,6.5656265 L 13.070633,13.786523 C 13.070633,14.786503 13.070633,14.786503 12.070671,14.786503 L 6.4158330,14.786503 C 5.5388453,14.786503 5.4347719,14.786723 5.4347719,13.786523 L 5.4346466,6.5656265 z "
id="path1325"
sodipodi:nodetypes="ccsccsscccsccccccccccccccccc" />
</g>
</svg>

After

Width:  |  Height:  |  Size: 4.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 615 B

View File

@ -0,0 +1,101 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-- Created with Inkscape (http://www.inkscape.org/) -->
<svg
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:cc="http://web.resource.org/cc/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns="http://www.w3.org/2000/svg"
xmlns:sodipodi="http://inkscape.sourceforge.net/DTD/sodipodi-0.dtd"
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
width="16.000000px"
height="16.000000px"
id="svg2"
sodipodi:version="0.32"
inkscape:version="0.42.2"
sodipodi:docbase="C:\DOCUME~1\Edward\MYDOCU~1\MYWEBS~1\HTMLPU~1\art"
sodipodi:docname="icon-32x32.svg"
inkscape:export-filename="C:\Documents and Settings\Edward\My Documents\My Webs\htmlpurifier\art\icon-16x16.png"
inkscape:export-xdpi="90.000000"
inkscape:export-ydpi="90.000000">
<defs
id="defs4">
<linearGradient
id="linearGradient5003">
<stop
style="stop-color:#537ddd;stop-opacity:1.0000000;"
offset="0.00000000"
id="stop5005" />
<stop
style="stop-color:#000000;stop-opacity:1.0000000;"
offset="1.0000000"
id="stop5007" />
</linearGradient>
<linearGradient
id="linearGradient4993">
<stop
style="stop-color:#183577;stop-opacity:1.0000000;"
offset="0.00000000"
id="stop4995" />
<stop
id="stop5001"
offset="0.58142859"
style="stop-color:#8b9fbb;stop-opacity:1.0000000;" />
<stop
style="stop-color:#ffffff;stop-opacity:1.0000000;"
offset="1.0000000"
id="stop4997" />
</linearGradient>
</defs>
<sodipodi:namedview
id="base"
pagecolor="#ffffff"
bordercolor="#666666"
borderopacity="1.0"
inkscape:pageopacity="0.0"
inkscape:pageshadow="2"
inkscape:zoom="23.255956"
inkscape:cx="3.2274095"
inkscape:cy="4.1487021"
inkscape:document-units="px"
inkscape:current-layer="layer1"
showgrid="false"
showguides="false"
gridspacingy="0.50000000cm"
gridspacingx="0.50000000cm"
gridoriginy="0.00000000cm"
gridoriginx="0.00000000cm"
gridtolerance="0.10000000cm"
gridempspacing="2"
inkscape:grid-points="true"
inkscape:grid-bbox="false"
inkscape:window-width="975"
inkscape:window-height="759"
inkscape:window-x="130"
inkscape:window-y="129" />
<metadata
id="metadata7">
<rdf:RDF>
<cc:Work
rdf:about="">
<dc:format>image/svg+xml</dc:format>
<dc:type
rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
</cc:Work>
</rdf:RDF>
</metadata>
<g
inkscape:label="Layer 1"
inkscape:groupmode="layer"
id="layer1">
<path
style="fill:none;fill-opacity:0.75000000;fill-rule:evenodd;stroke:#000000;stroke-width:1.0000004px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1.0000000"
d="M 6.1949060,157.57756 L 6.1949060,157.57756 z "
id="path1319" />
<path
style="fill:#224db0;fill-opacity:1.0000000;fill-rule:evenodd;stroke:none;stroke-width:6.0000000;stroke-linecap:butt;stroke-linejoin:round;marker-start:none;stroke-miterlimit:4.0000000;stroke-dasharray:none;stroke-opacity:1.0000000"
d="M 0.58257145,1.2121863 C 0.63548185,3.4711463 0.61139575,5.1777365 1.0905516,7.3958765 C 1.2484712,8.3223265 2.8202544,8.4670465 3.5807376,8.5653065 C 4.0690207,8.6283965 4.0902492,8.5661465 4.0906878,9.1995265 C 4.1064337,11.195036 4.0317991,11.792303 4.0906878,13.786273 C 4.0906878,15.786203 4.0909380,15.786453 6.0908601,15.786453 C 9.0907431,15.786453 10.414747,15.786053 12.414669,15.786053 C 14.414592,15.786053 14.414842,15.785813 14.414842,13.785893 C 14.418166,10.552283 14.404522,7.8374163 14.421897,4.6040763 C 14.625008,2.9294663 14.978320,2.5231063 15.830433,1.1499263 C 15.040843,1.1499263 14.943078,1.1493063 14.129102,1.1493063 C 8.8912931,1.1493063 5.8203807,1.2121863 0.58257145,1.2121863 z M 2.0905126,2.4028663 C 2.8330809,2.4028663 3.3476197,2.4028663 4.0901879,2.4028663 C 4.0901879,4.8586963 4.0906878,4.8408265 4.0906878,7.2966665 C 3.2248799,7.2967665 2.2374444,7.5239665 2.1488437,6.3255665 C 1.8765344,4.2270765 1.8837837,4.4962263 2.0905126,2.4028663 z M 5.0906487,6.5656265 L 6.8695071,6.5656265 L 6.8696331,13.558223 L 9.8657791,11.557663 L 9.8656551,6.5656265 L 13.414631,6.5656265 L 13.414631,13.786523 C 13.414631,14.786503 13.414631,14.786503 12.414669,14.786503 L 6.0718351,14.786503 C 5.1948474,14.786503 5.0907740,14.786723 5.0907740,13.786523 L 5.0906487,6.5656265 z "
id="path1325"
sodipodi:nodetypes="ccsccsscccsccccccccccccccccc" />
</g>
</svg>

After

Width:  |  Height:  |  Size: 4.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.0 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.6 KiB

View File

@ -0,0 +1,119 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-- Created with Inkscape (http://www.inkscape.org/) -->
<svg
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:cc="http://web.resource.org/cc/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns="http://www.w3.org/2000/svg"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:sodipodi="http://inkscape.sourceforge.net/DTD/sodipodi-0.dtd"
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
width="82.000000mm"
height="82.000000mm"
id="svg2"
sodipodi:version="0.32"
inkscape:version="0.42.2"
sodipodi:docbase="C:\Documents and Settings\Edward\My Documents\My Webs\htmlpurifier\art"
sodipodi:docname="logo-bold-large.svg">
<defs
id="defs4">
<linearGradient
id="linearGradient5003">
<stop
style="stop-color:#537ddd;stop-opacity:1.0000000;"
offset="0.00000000"
id="stop5005" />
<stop
style="stop-color:#000000;stop-opacity:1.0000000;"
offset="1.0000000"
id="stop5007" />
</linearGradient>
<linearGradient
id="linearGradient4993">
<stop
style="stop-color:#183577;stop-opacity:1.0000000;"
offset="0.00000000"
id="stop4995" />
<stop
id="stop5001"
offset="0.58142859"
style="stop-color:#8b9fbb;stop-opacity:1.0000000;" />
<stop
style="stop-color:#ffffff;stop-opacity:1.0000000;"
offset="1.0000000"
id="stop4997" />
</linearGradient>
<linearGradient
inkscape:collect="always"
xlink:href="#linearGradient4993"
id="linearGradient4999"
x1="283.46457"
y1="141.72675"
x2="11.135608"
y2="141.72675"
gradientUnits="userSpaceOnUse"
gradientTransform="translate(2.000000,4.000000)" />
<linearGradient
inkscape:collect="always"
xlink:href="#linearGradient5003"
id="linearGradient5009"
x1="2.9621078"
y1="141.72675"
x2="283.46457"
y2="141.72675"
gradientUnits="userSpaceOnUse"
gradientTransform="translate(2.000000,4.000000)" />
</defs>
<sodipodi:namedview
id="base"
pagecolor="#ffffff"
bordercolor="#666666"
borderopacity="1.0"
inkscape:pageopacity="0.0"
inkscape:pageshadow="2"
inkscape:zoom="1.0000000"
inkscape:cx="100.23897"
inkscape:cy="178.03397"
inkscape:document-units="px"
inkscape:current-layer="layer1"
showgrid="false"
showguides="false"
gridspacingy="0.50000000cm"
gridspacingx="0.50000000cm"
gridoriginy="0.00000000cm"
gridoriginx="0.00000000cm"
gridtolerance="0.10000000cm"
gridempspacing="2"
inkscape:grid-points="true"
inkscape:grid-bbox="false"
inkscape:window-width="975"
inkscape:window-height="759"
inkscape:window-x="139"
inkscape:window-y="88" />
<metadata
id="metadata7">
<rdf:RDF>
<cc:Work
rdf:about="">
<dc:format>image/svg+xml</dc:format>
<dc:type
rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
</cc:Work>
</rdf:RDF>
</metadata>
<g
inkscape:label="Layer 1"
inkscape:groupmode="layer"
id="layer1">
<path
style="fill:none;fill-opacity:0.75000000;fill-rule:evenodd;stroke:#000000;stroke-width:1.0000000px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1.0000000"
d="M 19.726130,9.6194746 L 19.726130,9.6194746 z "
id="path1319" />
<path
style="fill:url(#linearGradient4999);fill-opacity:1.0000000;fill-rule:evenodd;stroke:url(#linearGradient5009);stroke-width:10.000000;stroke-linecap:butt;stroke-linejoin:round;marker-start:none;stroke-miterlimit:4.0000000;stroke-dasharray:none;stroke-opacity:1.0000000"
d="M 4.9621078,5.1031050 C 5.8995325,45.125639 5.4727947,85.709799 13.962107,125.00936 C 16.760003,141.42321 35.988453,143.98728 49.462107,145.72811 C 58.113128,146.84585 67.112800,145.74318 67.120572,156.96457 C 67.399545,192.31975 66.077225,216.69948 67.120572,252.02707 C 67.120572,287.46015 67.125004,287.46457 102.55807,287.46457 C 155.70768,287.46457 179.16536,287.45768 214.59843,287.45768 C 250.03151,287.45768 250.03593,287.45325 250.03593,252.02018 C 250.09483,194.72943 249.85310,122.48395 250.16093,65.197704 C 253.75952,35.528496 270.36749,28.328596 285.46457,4.0000020 C 271.47521,4.0000020 259.39483,3.9889299 244.97343,3.9889297 C 152.17399,3.9889293 97.761552,5.1031060 4.9621078,5.1031050 z M 31.678643,26.198667 C 44.834892,26.198667 53.955464,26.198667 67.111714,26.198667 C 67.111713,69.709084 67.120572,79.741005 67.120572,123.25143 C 51.780853,123.25317 34.281868,127.27859 32.712107,106.04622 C 27.887543,68.866763 28.015982,63.286945 31.678643,26.198667 z M 84.837103,110.29921 L 102.55585,110.29921 L 102.55807,247.98656 L 155.64148,212.54217 L 155.63926,110.29921 L 232.31496,110.29921 L 232.31496,252.03150 C 232.31496,269.74803 232.31496,269.74803 214.59843,269.74803 L 102.22100,269.74803 C 86.683215,269.74803 84.839322,269.75212 84.839322,252.03150 L 84.837103,110.29921 z "
id="path1325"
sodipodi:nodetypes="ccsccsscccsccccccccccccccccc" />
</g>
</svg>

After

Width:  |  Height:  |  Size: 5.3 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 297 B

View File

@ -0,0 +1,16 @@
<?php
chdir(dirname(__FILE__));
//require_once '../library/HTMLPurifier.path.php';
shell_exec('php ../maintenance/generate-schema-cache.php');
require_once '../library/HTMLPurifier.path.php';
require_once 'HTMLPurifier.includes.php';
$begin = xdebug_memory_usage();
$schema = HTMLPurifier_ConfigSchema::makeFromSerial();
echo xdebug_memory_usage() - $begin;
// vim: et sw=4 sts=4

View File

@ -0,0 +1,158 @@
<?php
require_once '../library/HTMLPurifier.auto.php';
@include_once '../test-settings.php';
// PEAR
require_once 'Benchmark/Timer.php'; // to do the timing
require_once 'Text/Password.php'; // for generating random input
$LEXERS = array();
$RUNS = isset($GLOBALS['HTMLPurifierTest']['Runs'])
? $GLOBALS['HTMLPurifierTest']['Runs'] : 2;
require_once 'HTMLPurifier/Lexer/DirectLex.php';
$LEXERS['DirectLex'] = new HTMLPurifier_Lexer_DirectLex();
if (version_compare(PHP_VERSION, '5', '>=')) {
require_once 'HTMLPurifier/Lexer/DOMLex.php';
$LEXERS['DOMLex'] = new HTMLPurifier_Lexer_DOMLex();
}
// custom class to aid unit testing
class RowTimer extends Benchmark_Timer
{
var $name;
function RowTimer($name, $auto = false) {
$this->name = htmlentities($name);
$this->Benchmark_Timer($auto);
}
function getOutput() {
$total = $this->TimeElapsed();
$result = $this->getProfiling();
$dashes = '';
$out = '<tr>';
$out .= "<td>{$this->name}</td>";
$standard = false;
foreach ($result as $k => $v) {
if ($v['name'] == 'Start' || $v['name'] == 'Stop') continue;
//$perc = (($v['diff'] * 100) / $total);
//$tperc = (($v['total'] * 100) / $total);
//$out .= '<td align="right">' . $v['diff'] . '</td>';
if ($standard == false) $standard = $v['diff'];
$perc = $v['diff'] * 100 / $standard;
$bad_run = ($v['diff'] < 0);
$out .= '<td align="right"'.
($bad_run ? ' style="color:#AAA;"' : '').
'>' . number_format($perc, 2, '.', '') .
'%</td><td>'.number_format($v['diff'],4,'.','').'</td>';
}
$out .= '</tr>';
return $out;
}
}
function print_lexers() {
global $LEXERS;
$first = true;
foreach ($LEXERS as $key => $value) {
if (!$first) echo ' / ';
echo htmlspecialchars($key);
$first = false;
}
}
function do_benchmark($name, $document) {
global $LEXERS, $RUNS;
$config = HTMLPurifier_Config::createDefault();
$context = new HTMLPurifier_Context();
$timer = new RowTimer($name);
$timer->start();
foreach($LEXERS as $key => $lexer) {
for ($i=0; $i<$RUNS; $i++) $tokens = $lexer->tokenizeHTML($document, $config, $context);
$timer->setMarker($key);
}
$timer->stop();
$timer->display();
}
?>
<html>
<head>
<title>Benchmark: <?php print_lexers(); ?></title>
</head>
<body>
<h1>Benchmark: <?php print_lexers(); ?></h1>
<table border="1">
<tr><th>Case</th><?php
foreach ($LEXERS as $key => $value) {
echo '<th colspan="2">' . htmlspecialchars($key) . '</th>';
}
?></tr>
<?php
// ************************************************************************** //
// sample of html pages
$dir = 'samples/Lexer';
$dh = opendir($dir);
while (false !== ($filename = readdir($dh))) {
if (strpos($filename, '.html') !== strlen($filename) - 5) continue;
$document = file_get_contents($dir . '/' . $filename);
do_benchmark("File: $filename", $document);
}
// crashers, caused infinite loops before
$snippets = array();
$snippets[] = '<a href="foo>';
$snippets[] = '<a "=>';
foreach ($snippets as $snippet) {
do_benchmark($snippet, $snippet);
}
// random input
$random = Text_Password::create(80, 'unpronounceable', 'qwerty <>="\'');
do_benchmark('Random input', $random);
?></table>
<?php
echo '<div>Random input was: ' .
'<span colspan="4" style="font-family:monospace;">' .
htmlspecialchars($random) . '</span></div>';
?>
</body></html>
<?php
// vim: et sw=4 sts=4

View File

@ -0,0 +1,21 @@
<?php
ini_set('xdebug.trace_format', 1);
ini_set('xdebug.show_mem_delta', true);
if (file_exists('Trace.xt')) {
echo "Previous trace Trace.xt must be removed before this script can be run.";
exit;
}
xdebug_start_trace(dirname(__FILE__) . '/Trace');
require_once '../library/HTMLPurifier.auto.php';
$purifier = new HTMLPurifier();
$data = $purifier->purify(file_get_contents('samples/Lexer/4.html'));
xdebug_stop_trace();
echo "Trace finished.";
// vim: et sw=4 sts=4

View File

@ -0,0 +1,56 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<title>Main Page - Huaxia Taiji Club</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link rel="stylesheet" type="text/css" media="screen, projection" href="/screen.css" />
<link rel="stylesheet" type="text/css" media="print" href="/print.css" />
</head>
<body>
<div id="translation"><a href="/ch/Main_Page">&#20013;&#25991;</a></div>
<div id="heading"><a href="/en/Main_Page" title="English Main Page">Huaxia Taiji Club</a>
<a class="heading_ch" href="/ch/Main_Page" title="&#20013;&#25991;&#20027;&#39029;">&#21326;&#22799;&#22826;&#26497;&#20465;&#20048;&#37096;</a></div>
<ul id="menu">
<li><a href="/en/Main_Page" class="active">Main Page</a></li><li><a href="/en/About">About</a></li><li><a href="/en/News">News</a></li><li><a href="/en/Events">Events</a></li><li><a href="/en/Digest">Digest</a></li><li><a href="/en/Taiji_and_I">Taiji and I</a></li><li><a href="/en/Downloads">Downloads</a></li><li><a href="/en/Registration">Registration</a></li><li><a href="/en/Contact">Contact</a></li> <li><a href="http://www.taijiclub.org/gallery2/main.php">Gallery</a></li>
<li><a href="http://www.taijiclub.org/forums/index.php">Forums</a></li>
</ul>
<div id="content">
<h1 id="title">Main Page</h1><h2>Taiji (Tai Chi) </h2>
<div id="sidebar">
<h3>Recent News</h3>
<ul>
<li>Zou Xiaojun was elected as the new club vice president </li>
<li>HX Edison Taiji Club <a href="http://www.taijiclub.org/downloads/Taiji_club_regulation_.pdf">by-law</a> effective 3/28/2006</li>
<li>A new email account for our club: HXEdisontaijiclub@yahoo.com</li>
<li>Workshop conducted by <a href="http://www.taijiclub.org/ch/Digest/LiDeyin">?????</a> Li Deyin is set on June 4, 2006 at Clarion Hotel in Edison from 9:30am-12pm; <a href="http://www.taijiclub.org/en/Registration">Registration</a></li>
</ul>
</div>
<p><i>Taiji</i> is an ancient Chinese tradition of movement systems that is associated with philosophy, physiology, psychology, geometry and dynamics. It is the slowest form of martial arts and is meant to improve the internal spirit. It is soothing to the soul and extremely invigorating. </p>
<p>The founder of Taiji was Zhang Sanfeng (Chang San-feng), who was a monk of the Wu Dang (Wu Tang) Monastery and lived in the period from 1391 to 1459. His exercises stressed suppleness and elasticity as opposed to the hardness and force of other martial art styles. Several centuries old, Taiji was originally developed as a form of self-defense, emphasizing strength, balance, flexibility and speed. Tai Chi also differs from other martial arts in that it is based on the Taoist religion and aims to avoid aggressive forces. </p>
<p>Modern Taiji includes many forms &mdash; Quan, Sword and Fan. Impacting the mind and body of the practitioners, Taiji is practiced as a meditative exercise made up of a series of forms, or choreographed motions, requiring slow, gentle movement of the arms, legs and torso. Taiji practitioners learn to center their attention on their breathing and body movements so that the exercise strengthens their overall mental and physical awareness. In a sense, Taiji is similar to yoga in that it is also a form of moving meditation, with the goal of achieving stillness through the motion and awareness of breath. To perform Taiji, practitioners have to empty their mind of thoughts and worries in order to achieve harmony. It is a great aid for reducing stress and improving the quality of life. </p>
<p>In China and in communities all over the world, Taiji is practiced by young and old in the early morning hours. It's a great way to bring a new and fresh day!</p>
<p>Check out our <a href="/gallery2/main.php">gallery</a>.</p>
<div style="text-align:center;"><a href="http://www.taijiclub.org/gallery2/v/2006/group1b.jpg.html?g2_imageViewsIndex=1"><img src="/gallery2/d/1836-2/group1b.jpg" /></a></div>
<div style="text-align:center;">Click on photo to see HR version</div></div>
</body>
</html>
<!-- vim: et sw=4 sts=4
-->

View File

@ -0,0 +1,20 @@
<html><head><meta http-equiv="content-type" content="text/html; charset=UTF-8"><title>Google</title><style><!--
body,td,a,p,.h{font-family:arial,sans-serif;}
.h{font-size: 20px;}
.q{color:#0000cc;}
//-->
</style>
<script>
<!--
function sf(){document.f.q.focus();}
function rwt(el,ct,cd,sg){var e = window.encodeURIComponent ? encodeURIComponent : escape;el.href="/url?sa=t&ct="+e(ct)+"&cd="+e(cd)+"&url="+e(el.href).replace(/\+/g,"%2B")+"&ei=fHNBRJDEG4HSaLONmIoP"+sg;el.onmousedown="";return true;}
// -->
</script>
</head><body bgcolor=#ffffff text=#000000 link=#0000cc vlink=#551a8b alink=#ff0000 onLoad=sf() topmargin=3 marginheight=3><center><table border=0 cellspacing=0 cellpadding=0 width=100%><tr><td align=right nowrap><font size=-1><b>edwardzyang@gmail.com</b>&nbsp;|&nbsp;<a href="/url?sa=p&pref=ig&pval=2&q=http://www.google.com/ig%3Fhl%3Den" onmousedown="return rwt(this,'pro','hppphou:def','&sig2=hDbTpsWIp9YG37a23n6krQ')">Personalized Home</a>&nbsp;|&nbsp;<a href="/searchhistory/?hl=en">Search History</a>&nbsp;|&nbsp;<a href="https://www.google.com/accounts/ManageAccount">My Account</a>&nbsp;|&nbsp;<a href="http://www.google.com/accounts/Logout?continue=http://www.google.com/">Sign out</a></font></td></tr><tr height=4><td><img alt="" width=1 height=1></td></tr></table><img src="/intl/en/images/logo.gif" width=276 height=110 alt="Google"><br><br>
<form action=/search name=f><script><!--
function qs(el) {if (window.RegExp && window.encodeURIComponent) {var ue=el.href;var qe=encodeURIComponent(document.f.q.value);if(ue.indexOf("q=")!=-1){el.href=ue.replace(new RegExp("q=[^&$]*"),"q="+qe);}else{el.href=ue+"&q="+qe;}}return 1;}
// -->
</script><table border=0 cellspacing=0 cellpadding=4><tr><td nowrap><font size=-1><b>Web</b>&nbsp;&nbsp;&nbsp;&nbsp;<a id=1a class=q href="/imghp?hl=en&tab=wi" onClick="return qs(this);">Images</a>&nbsp;&nbsp;&nbsp;&nbsp;<a id=2a class=q href="http://groups.google.com/grphp?hl=en&tab=wg" onClick="return qs(this);">Groups</a>&nbsp;&nbsp;&nbsp;&nbsp;<a id=4a class=q href="http://news.google.com/nwshp?hl=en&tab=wn" onClick="return qs(this);">News</a>&nbsp;&nbsp;&nbsp;&nbsp;<a id=5a class=q href="http://froogle.google.com/frghp?hl=en&tab=wf" onClick="return qs(this);">Froogle</a>&nbsp;&nbsp;&nbsp;&nbsp;<a id=8a class=q href="/lochp?hl=en&tab=wl" onClick="return qs(this);">Local</a>&nbsp;&nbsp;&nbsp;&nbsp;<b><a href="/intl/en/options/" class=q>more&nbsp;&raquo;</a></b></font></td></tr></table><table cellspacing=0 cellpadding=0><tr><td width=25%>&nbsp;</td><td align=center><input type=hidden name=hl value=en><input maxlength=2048 size=55 name=q value="" title="Google Search"><br><input type=submit value="Google Search" name=btnG><input type=submit value="I'm Feeling Lucky" name=btnI></td><td valign=top nowrap width=25%><font size=-2>&nbsp;&nbsp;<a href=/advanced_search?hl=en>Advanced Search</a><br>&nbsp;&nbsp;<a href=/preferences?hl=en>Preferences</a><br>&nbsp;&nbsp;<a href=/language_tools?hl=en>Language Tools</a></font></td></tr></table></form><br><br><font size=-1><a href="/ads/">Advertising&nbsp;Programs</a> - <a href=/services/>Business Solutions</a> - <a href=/about.html>About Google</a></font><p><font size=-2>&copy;2006 Google</font></p></center></body></html>
<!-- vim: et sw=4 sts=4
-->

View File

@ -0,0 +1,131 @@
<html>
<head>
<title>Anime Digi-Lib Index</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>
<div id="tb">
<form name="lycos_search" method="get" target="_new" style="margin: 0px"
action="http://r.hotbot.com/r/memberpgs_lycos_searchbox_af/http://www.angelfire.lycos.com/cgi-bin/search/pursuit">
<table id="tbtable" cellpadding="0" cellspacing="0" border="0" width="100%" style="border: 1px solid black;">
<tr style="background-color: #dcf7ff">
<td colspan="3">
<table cellpadding="0" cellspacing="0" border="0">
<tr>
<td>&nbsp;Search:</td>
<td><input type="radio" name="cat" value="lycos" checked></td>
<td nowrap="nowrap">The Web</td>
<td><input type="radio" name="cat" value="angelfire"></td>
<td nowrap="nowrap">Angelfire</td>
<td nowrap="nowrap">&nbsp;&nbsp;&nbsp;<img src="http://af.lygo.com/d/toolbar/planeticon.gif"></td><td nowrap="nowrap">&nbsp;<a href="http://r.lycos.com/r/tlbr_planet/http://planet.lycos.com" target="_new">Planet</a></td>
</tr>
</table>
<td nowrap="nowrap"><a href="http://lt.angelfire.com/af_toolbar/edit/_h_/www.angelfire.lycos.com/build/index.tmpl" target="_top">
<span id="build">Edit your Site</span></a>&nbsp;</td>
<td><img src="http://af.lygo.com/d/toolbar/dir.gif" alt="show site directory" border="0" height="10" hspace="3" width="8"></td>
<td nowrap="nowrap"><a href="http://lt.angelfire.com/af_toolbar/browse/_h_/www.angelfire.lycos.com/directory/index.tmpl" target="_top">Browse Sites</a>&nbsp;</td>
<td><a href="http://lt.angelfire.com/af_toolbar/angelfire/_h_/www.angelfire.lycos.com" target="_top"><img src="http://af.lygo.com/d/toolbar/aflogo_top.gif" alt="hosted by angelfire" border="0" height="26" width="143"></a></td>
</tr>
<tr style="background-color: #dcf7ff">
<td nowrap="nowrap" valign="middle">&nbsp;<input size="30" style="font-size: 10px; background-color: #fff;" type="text" name="query" id="searchbox"></td>
<td style="background: #fff url(http://af.lygo.com/d/toolbar/bg.gif) repeat-x; text-align: center;" colspan="3" align="center">
<a href="http://clk.atdmt.com/VON/go/lycsnvon0710000019von/direct/01/"><img src="/sys/free_logo_xxxx_157x20.gif" height="20" width="157" border="0" alt="Vonage"></a><img src="http://view.atdmt.com/VON/view/lycsnvon0710000019von/direct/01/"></td>
<span style="font-size: 11px;">
<span style="color:#00f; font-weight:bold;">&#171;</span>
<span id="top100">
<a href="javascript:void top100('prev')" target="_top">Previous</a> |
<a href="http://lt.angelfire.com/af_toolbar/top100/_h_/www.angelfire.lycos.com/cgi-bin/top100/pagelist?start=1" target="_top">Top 100</a> |
<a href="javascript:void top100('next')" target="_top">Next</a>
</span>
<span style="color: #00f; font-weight: bold;">&#187;</span>
</span>
</td>
<td valign="top" style="background: #fff url(http://af.lygo.com/d/toolbar/bg.gif) repeat-x;"><a href="http://lt.angelfire.com/af_toolbar/angelfire/_h_/www.angelfire.lycos.com" target="_top"><img src="http://af.lygo.com/d/toolbar/aflogo_bot.gif" alt="hosted by angelfire" border="0" height="22" width="143"></a></td>
</tr>
</table>
</form>
</div>
<table border="0" cellpadding="0" cellspacing="0" width="728"><tr><td>
<script type="text/javascript">
if (objAdMgr.isSlotAvailable("leaderboard")) {
objAdMgr.renderSlot("leaderboard")
}
</script>
<noscript>
<a href="http://network.realmedia.com/RealMedia/ads/click_nx.ads/lycosangelfire/ros/728x90/wp/ss/a/491169@Top1?x"><img border="0" src="http://network.realmedia.com/RealMedia/ads/adstream_nx.ads/lycosangelfire/ros/728x90/wp/ss/a/491169@Top1" alt="leaderboard ad" /></a>
</noscript>
</td></tr>
</table>
<table width="86%" border="0" cellspacing="0" cellpadding="2">
<tr>
<td height="388" width="19%" bgcolor="#FFCCFF" valign="top">
<p>May 1, 2000</p>
<p><b>Pop Culture</b> </p>
<p>by. H. Finkelstein</p>
</td>
<td height="388" width="52%" valign="top">
<p>Welcome to the <b>Anime Digi-Lib</b>, a virtual index to anime on the
internet. This site strives to house a comprehensive index to both personal
and commercial websites and provides reviews to these sites. We hope to
be a gateway for people who've never imagined they'd ever be interested
in Japanese Animation. </p>
<table width="99%" border="1" cellspacing="0" cellpadding="2" height="320" name="Searchnservices">
<tr>
<td height="263" valign="top" width="58%">
<p>&nbsp; </p>
<p>&nbsp;</p>
<FORM ACTION="/cgi-bin/script_library/site_search/search" METHOD="GET">
<table border="0" cellpadding="2" cellspacing="0">
<tr>
<td colspan="2">Search term: <INPUT NAME="search_term"><br></td>
</tr>
<tr>
<td colspan="2" align="center">Case-sensitive -
<INPUT TYPE="checkbox" NAME="case_sensitive">yes<br></td>
</tr>
<tr>
<td align="right"><INPUT TYPE="radio" NAME="search_type" VALUE="exact" CHECKED>exact</td>
<td><INPUT TYPE="radio" NAME="search_type" VALUE="fuzzy">fuzzy<br></td>
</tr>
<tr>
<td colspan="2" align="center"><INPUT TYPE="hidden" NAME="display" VALUE="#FF0000"><INPUT TYPE="submit"></td>
</tr>
</table>
</form>
<td>
<table border="0" cellpadding="0" cellspacing="0" width="100%">
<tr><td><font face="verdana,geneva" color="#000011" size="1">What is better, subtitled or dubbed anime?</font></td></tr>
<tr><td><input type="radio" name="rd" value="1"><font face="verdana" size="2" color="#000011">Subtitled</font></td></tr>
<tr><td align="middle"><font face="verdana" size="1"><a href="http://pub.alxnet.com/poll?id=2079873&q=view">Current results</a></font></td></tr>
</table></td></tr>
<tr>
<td><font face="verdana" size="1"><a href="http://www.alxnet.com/services/poll/">Free
Web Polls</a></font></td>
</tr>
</table></form>
<!-- Alxnet.com -- web poll code ends -->
</td>
</tr>
</table>
</body>
</html>
<!-- vim: et sw=4 sts=4
-->

View File

@ -0,0 +1,543 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" dir="ltr">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="keywords" content="Tai Chi Chuan,Yang Pan-hou,Yang Chien-hou,Yang style Tai Chi Chuan,Yang Lu-ch'an,Wu/Hao style T'ai Chi Ch'uan,Wu Ch'uan-yü,Hao Wei-chen,Yang Shou-chung,Wu style T'ai Chi Ch'uan,Wu Chien-ch'üan" />
<link rel="shortcut icon" href="/favicon.ico" />
<link rel="copyright" href="http://www.gnu.org/copyleft/fdl.html" />
<title>Tai Chi Chuan - Wikipedia, the free encyclopedia</title>
<style type="text/css" media="screen,projection">/*<![CDATA[*/ @import "/skins-1.5/monobook/main.css?9"; /*]]>*/</style>
<link rel="stylesheet" type="text/css" media="print" href="/skins-1.5/common/commonPrint.css" />
<!--[if lt IE 5.5000]><style type="text/css">@import "/skins-1.5/monobook/IE50Fixes.css";</style><![endif]-->
<!--[if IE 5.5000]><style type="text/css">@import "/skins-1.5/monobook/IE55Fixes.css";</style><![endif]-->
<!--[if IE 6]><style type="text/css">@import "/skins-1.5/monobook/IE60Fixes.css";</style><![endif]-->
<!--[if IE 7]><style type="text/css">@import "/skins-1.5/monobook/IE70Fixes.css?1";</style><![endif]-->
<!--[if lt IE 7]><script type="text/javascript" src="/skins-1.5/common/IEFixes.js"></script>
<meta http-equiv="imagetoolbar" content="no" /><![endif]-->
<script type="text/javascript">var skin = 'monobook';var stylepath = '/skins-1.5';</script>
<script type="text/javascript" src="/skins-1.5/common/wikibits.js?1"><!-- wikibits js --></script>
<script type="text/javascript" src="/w/index.php?title=-&amp;action=raw&amp;smaxage=0&amp;gen=js"><!-- site js --></script>
<style type="text/css">/*<![CDATA[*/
@import "/w/index.php?title=MediaWiki:Common.css&action=raw&ctype=text/css&smaxage=2678400";
@import "/w/index.php?title=MediaWiki:Monobook.css&action=raw&ctype=text/css&smaxage=2678400";
@import "/w/index.php?title=-&action=raw&gen=css&maxage=2678400&ts=20060721225848";
@import "/w/index.php?title=User:Edward_Z._Yang/monobook.css&action=raw&ctype=text/css";
/*]]>*/</style>
<script type="text/javascript" src="/w/index.php?title=User:Edward_Z._Yang/monobook.js&amp;action=raw&amp;ctype=text/javascript&amp;dontcountme=s"></script>
<!-- Head Scripts -->
</head>
<body class="ns-0 ltr">
<div id="globalWrapper">
<div id="column-content">
<div id="content">
<a name="top" id="top"></a>
<div id="siteNotice"><div id="wikimania2006" style="text-align:right; font-size:80%"><a href="http://wm06reg.wikimedia.org/" class="external text" title="http://wm06reg.wikimedia.org/">Registration</a> for <a href="http://wikimania2006.wikimedia.org" class="external text" title="http://wikimania2006.wikimedia.org">Wikimania 2006</a> is open.&nbsp;&nbsp;&nbsp;</div>
</div> <h1 class="firstHeading">Tai Chi Chuan</h1>
<div id="bodyContent">
<h3 id="siteSub">From Wikipedia, the free encyclopedia</h3>
<div id="contentSub"></div>
<div id="jump-to-nav">Jump to: <a href="#column-one">navigation</a>, <a href="#searchInput">search</a></div> <!-- start content -->
<table border="1" cellpadding="2" cellspacing="0" align="right">
<tr>
<th colspan="2" bgcolor="#FFCCCC"><big>???</big></th>
</tr>
<tr>
<td colspan="2">
<div class="center">
<div class="thumb tnone">
<div style="width:182px;"><a href="/wiki/Image:Yang_Ch%27eng-fu_circa_1918.jpg" class="internal" title="Yang Chengfu in a posture from the Tai Chi solo form known as Single Whip, circa 1918"><img src="http://upload.wikimedia.org/wikipedia/en/thumb/d/d1/Yang_Ch%27eng-fu_circa_1918.jpg/180px-Yang_Ch%27eng-fu_circa_1918.jpg" alt="Yang Chengfu in a posture from the Tai Chi solo form known as Single Whip, circa 1918" width="180" height="255" longdesc="/wiki/Image:Yang_Ch%27eng-fu_circa_1918.jpg" /></a>
<div class="thumbcaption">
<div class="magnify" style="float:right"><a href="/wiki/Image:Yang_Ch%27eng-fu_circa_1918.jpg" class="internal" title="Enlarge"><img src="/skins-1.5/common/images/magnify-clip.png" width="15" height="11" alt="Enlarge" /></a></div>
<b><a href="/wiki/Yang_Chengfu" title="Yang Chengfu">Yang Chengfu</a> in a posture from the Tai Chi solo form known as <i>Single Whip</i>, circa <a href="/wiki/1918" title="1918">1918</a></b></div>
</div>
</div>
</div>
</td>
</tr>
<tr>
<th colspan="2"><a href="/wiki/Chinese_language" title="Chinese language">Chinese</a> Name</th>
</tr>
<tr>
<td><a href="/wiki/Hanyu_Pinyin" title="Hanyu Pinyin">Hanyu Pinyin</a></td>
<td>Tàijíquán</td>
</tr>
<tr>
<td><a href="/wiki/Wade-Giles" title="Wade-Giles">Wade-Giles</a></td>
<td>T'ai<sup>4</sup> Chi<sup>2</sup> Ch'üan<sup>2</sup></td>
</tr>
<tr>
<td><a href="/wiki/Simplified_Chinese" title="Simplified Chinese">Simplified Chinese</a></td>
<td>???</td>
</tr>
<tr>
<td><a href="/wiki/Traditional_Chinese" title="Traditional Chinese">Traditional Chinese</a></td>
<td><a href="http://en.wiktionary.org/wiki/%E5%A4%AA" class="extiw" title="wiktionary:?">?</a><a href="http://en.wiktionary.org/wiki/%E6%A5%B5" class="extiw" title="wiktionary:?">?</a><a href="http://en.wiktionary.org/wiki/%E6%8B%B3" class="extiw" title="wiktionary:?">?</a></td>
</tr>
<tr>
<td><a href="/wiki/Cantonese_%28linguistics%29" title="Cantonese (linguistics)">Cantonese</a></td>
<td>taai3 gik6 kyun4</td>
</tr>
<tr>
<td><a href="/wiki/Hiragana" title="Hiragana">Japanese Hiragana</a></td>
<td>???????</td>
</tr>
<tr>
<td><a href="/wiki/Korean_%28language%29" title="Korean (language)">Korean</a></td>
<td>???</td>
</tr>
<tr>
<td><a href="/wiki/Vietnamese_%28language%29" title="Vietnamese (language)">Vietnamese</a></td>
<td>Thái C?c Quy?n</td>
</tr>
</table>
<p><b>Tai Chi Chuan</b>, <b>T'ai Chi Ch'üan</b> or <b>Taijiquan</b> (<a href="/wiki/Traditional_Chinese_character" title="Traditional Chinese character">Traditional Chinese</a>: ???; <a href="/wiki/Simplified_Chinese_character" title="Simplified Chinese character">Simplified Chinese</a>: ???; <a href="/wiki/Pinyin" title="Pinyin">pinyin</a>: Tàijíquán; literally "supreme ultimate fist"), commonly known as <b>Tai Chi</b>, <b>T'ai Chi</b>, or <b><a href="/wiki/Taiji" title="Taiji">Taiji</a></b>, is an <a href="/wiki/Neijia" title="Neijia">internal</a> <a href="/wiki/Chinese_martial_arts" title="Chinese martial arts">Chinese martial art</a>. There are different styles of T'ai Chi Ch'üan, although most agree they are all based on the system originally taught by the <a href="/wiki/Chen" title="Chen">Chen</a> family to the <a href="/wiki/Yang" title="Yang">Yang</a> family starting in <a href="/wiki/1820" title="1820">1820</a>. It is often promoted and practiced as a <a href="/wiki/Martial_arts_therapy" title="Martial arts therapy">martial arts therapy</a> for the purposes of <a href="/wiki/Health" title="Health">health</a> and <a href="/wiki/Longevity" title="Longevity">longevity</a>, (some <a href="/wiki/Tai_Chi_Chuan#Citations_to_medical_research" title="Tai Chi Chuan">recent medical studies</a> support its effectiveness). T'ai Chi Ch'üan is considered a <i>soft</i> style martial art, an art applied with as complete a relaxation or "softness" in the musculature as possible, to distinguish its theory and application from that of the <i>hard</i> martial art styles which use a degree of tension in the muscles.</p>
<p>Variations of T'ai Chi Ch'üan's basic training forms are well known as the slow motion routines that groups of people practice every morning in parks across China and other parts of the world. Traditional T'ai Chi training is intended to teach awareness of one's own balance and what affects it, awareness of the same in others, an appreciation of the practical value in one's ability to moderate extremes of behavior and attitude at both mental and physical levels, and how this applies to effective self-defense principles.</p>
<table id="toc" class="toc" summary="Contents">
<tr>
<td>
<div id="toctitle">
<h2>Contents</h2>
</div>
<ul>
<li class="toclevel-1"><a href="#Overview"><span class="tocnumber">1</span> <span class="toctext">Overview</span></a></li>
<li class="toclevel-1"><a href="#Training_and_techniques"><span class="tocnumber">2</span> <span class="toctext">Training and techniques</span></a></li>
<li class="toclevel-1"><a href="#Styles_and_history"><span class="tocnumber">3</span> <span class="toctext">Styles and history</span></a>
<ul>
<li class="toclevel-2"><a href="#Family_tree"><span class="tocnumber">3.1</span> <span class="toctext">Family tree</span></a></li>
<li class="toclevel-2"><a href="#Notes_to_Family_tree_table"><span class="tocnumber">3.2</span> <span class="toctext">Notes to Family tree table</span></a></li>
</ul>
</li>
<li class="toclevel-1"><a href="#Modern_T.27ai_Chi"><span class="tocnumber">4</span> <span class="toctext">Modern T'ai Chi</span></a>
<ul>
<li class="toclevel-2"><a href="#Modern_forms"><span class="tocnumber">4.1</span> <span class="toctext">Modern forms</span></a></li>
</ul>
</li>
<li class="toclevel-1"><a href="#Health_benefits"><span class="tocnumber">5</span> <span class="toctext">Health benefits</span></a>
<ul>
<li class="toclevel-2"><a href="#Citations_to_medical_research"><span class="tocnumber">5.1</span> <span class="toctext">Citations to medical research</span></a></li>
</ul>
</li>
<li class="toclevel-1"><a href="#See_also"><span class="tocnumber">6</span> <span class="toctext">See also</span></a></li>
<li class="toclevel-1"><a href="#External_links"><span class="tocnumber">7</span> <span class="toctext">External links</span></a></li>
</ul>
</td>
</tr>
</table>
<p><script type="text/javascript">
//<![CDATA[
if (window.showTocToggle) { var tocShowText = "show"; var tocHideText = "hide"; showTocToggle(); }
//]]>
</script></p>
<div class="editsection" style="float:right;margin-left:5px;">[<a href="/w/index.php?title=Tai_Chi_Chuan&amp;action=edit&amp;section=1" title="Edit section: Overview">edit</a>]</div>
<p><a name="Overview" id="Overview"></a></p>
<h2>Overview</h2>
<p>Historically, T'ai Chi Ch'üan has been regarded as a martial art, and its traditional practitioners still teach it as one. Even so, it has developed a worldwide following among many thousands of people with little or no interest in martial training for its aforementioned benefits to health and <a href="/wiki/Preventive_medicine" title="Preventive medicine">health maintenance</a>. Some call it a form of moving <a href="/wiki/Meditation" title="Meditation">meditation</a>, and T'ai Chi theory and practice evolved in agreement with many of the principles of <a href="/wiki/Traditional_Chinese_medicine" title="Traditional Chinese medicine">traditional Chinese medicine</a>. Besides general health benefits and <a href="/wiki/Stress_management" title="Stress management">stress management</a> attributed to beginning and intermediate level T'ai Chi training, many therapeutic interventions along the lines of traditional Chinese medicine are taught to advanced T'ai Chi students.</p>
<p>T'ai Chi Ch'üan as physical training is characterized by its requirement for the use of leverage through the joints based on coordination in relaxation rather than muscular tension in order to neutralize or initiate physical attacks. The slow, repetitive work involved in that process is said to gently increase and open the internal circulation (<a href="/wiki/Breath" title="Breath">breath</a>, body heat, <a href="/wiki/Blood" title="Blood">blood</a>, <a href="/wiki/Lymph" title="Lymph">lymph</a>, <a href="/wiki/Peristalsis" title="Peristalsis">peristalsis</a>, etc.). Over time, proponents say, this enhancement becomes a lasting effect, a direct reversal of the constricting physical effects of stress on the human body. This reversal allows much more of the students' native energy to be available to them, which they may then apply more effectively to the rest of their lives; families, careers, spiritual or creative pursuits, hobbies, etc.</p>
<p>The study of T'ai Chi Ch'üan involves three primary subjects:</p>
<ul>
<li><b>Health</b> - an unhealthy or otherwise uncomfortable person will find it difficult to meditate to a state of calmness or to use T'ai Chi as a martial art. T'ai Chi's health training therefore concentrates on relieving the physical effects of stress on the body and mind.</li>
<li><b>Meditation</b> - the focus meditation and subsequent calmness cultivated by the meditative aspect of T'ai Chi is seen as necessary to maintain optimum health (in the sense of effectively maintaining stress relief or <a href="/wiki/Homeostasis" title="Homeostasis">homeostasis</a>) and in order to use it as a soft style martial art.</li>
<li><b>Martial art</b> - the ability to competently use T'ai Chi as a martial art is said to be proof that the health and meditation aspects are working according to the dictates of the theory of T'ai Chi Ch'üan.</li>
</ul>
<p>In its traditional form (many modern variations exist which ignore at least one of the above requirements) every aspect of its training has to conform with all three of the aforementioned categories.</p>
<p>The <a href="/wiki/Mandarin_%28linguistics%29" title="Mandarin (linguistics)">Mandarin</a> term "T'ai Chi Ch'üan" translates as "Supreme Ultimate Boxing" or "Boundless Fist". T'ai Chi training involves learning solo routines, known as <i>forms</i>, and two person routines, known as <i><a href="/wiki/Pushing_hands" title="Pushing hands">pushing hands</a></i>, as well as <i><a href="/wiki/Acupressure" title="Acupressure">acupressure</a></i>-related manipulations taught by traditional schools. T'ai Chi Ch'üan is seen by many of its schools as a variety of <a href="/wiki/Taoism" title="Taoism">Taoism</a>, and it does seemingly incorporate many Taoist principles into its practice (see below). It is an art form said to date back many centuries (although not reliably documented under that name before 1850), with precursor disciplines dating back thousands of years. The explanation given by the traditional T'ai Chi family schools for why so many of their previous generations have dedicated their lives to the study and preservation of the art is that the discipline it seems to give its students to dramatically improve the effects of stress in their lives, with a few years of hard work, should hold a useful purpose for people living in a stressful world. They say that once the T'ai Chi principles have been understood and internalized into the bodily framework the practitioner will have an immediately accessible "toolkit" thereby to improve and then maintain their health, to provide a meditative focus, and that can work as an effective and subtle martial art for self-defense.</p>
<p>Teachers say the study of T'ai Chi Ch'üan is, more than anything else, about challenging one's ability to change oneself appropriately in response to outside forces. These principles are taught using the examples of <a href="/wiki/Physics" title="Physics">physics</a> as experienced by two (or more) bodies in <a href="/wiki/Combat" title="Combat">combat</a>. In order to be able to protect oneself or someone else by using change, it is necessary to understand what the consequences are of changing appropriately, changing inappropriately and not changing at all in response to an attack. Students, by this theory, will appreciate the full benefits of the entire art in the fastest way through physical training of the martial art aspect.</p>
<p><a href="/wiki/Wu_Chien-ch%27uan" title="Wu Chien-ch'uan">Wu Chien-ch'üan</a>, co-founder of the Wu family style, described the name <i>T'ai Chi Ch'üan</i> this way at the beginning of the <a href="/wiki/20th_century" title="20th century">20th century</a>:</p>
<dl>
<dd>"Various people have offered different explanations for the name <i>T'ai Chi Ch'uan</i>. Some have said: 'In terms of self-cultivation, one must train from a state of movement towards a state of stillness. <i>T'ai Chi</i> comes about through the balance of <i><a href="/wiki/Yin" title="Yin">yin</a></i> and <i><a href="/wiki/Yang" title="Yang">yang</a></i>. In terms of the art of attack and defense then, in the context of the <a href="/wiki/I_Ching" title="I Ching">changes</a> of full and empty, one is constantly internally latent, not outwardly expressive, as if the <i>yin</i> and <i>yang</i> of <i>T'ai Chi</i> have not yet divided apart.'</dd>
<dd>Others say: 'Every movement of <i>T'ai Chi Ch'uan</i> is based on circles, just like the shape of a <a href="/wiki/Taijitu" title="Taijitu"><i>T'ai Chi</i> symbol</a>. Therefore, it is called <i>T'ai Chi Ch'uan</i>.' Both explanations are quite reasonable, especially the second, which is more complete."</dd>
</dl>
<div class="editsection" style="float:right;margin-left:5px;">[<a href="/w/index.php?title=Tai_Chi_Chuan&amp;action=edit&amp;section=2" title="Edit section: Training and techniques">edit</a>]</div>
<p><a name="Training_and_techniques" id="Training_and_techniques"></a></p>
<h2>Training and techniques</h2>
<div class="thumb tright">
<div style="width:182px;"><a href="/wiki/Image:Yin_yang.svg" class="internal" title="The T'ai Chi Symbol or T'ai Chi T'u (Taijitu)"><img src="http://upload.wikimedia.org/wikipedia/commons/thumb/1/17/Yin_yang.svg/180px-Yin_yang.svg.png" alt="The T'ai Chi Symbol or T'ai Chi T'u (Taijitu)" width="180" height="180" longdesc="/wiki/Image:Yin_yang.svg" /></a>
<div class="thumbcaption">
<div class="magnify" style="float:right"><a href="/wiki/Image:Yin_yang.svg" class="internal" title="Enlarge"><img src="/skins-1.5/common/images/magnify-clip.png" width="15" height="11" alt="Enlarge" /></a></div>
<b>The T'ai Chi Symbol or T'ai Chi T'u (Taijitu)</b></div>
</div>
</div>
<p>As the name <i>T'ai Chi Ch'üan</i> is held to be derived from the T'ai Chi symbol, the <i>taijitu</i> or <i>t'ai chi t'u</i> (???, <a href="/wiki/Pinyin" title="Pinyin">pinyin</a> tàijítú), commonly known in the West as the "<a href="/wiki/Yin-yang" title="Yin-yang">yin-yang</a>" diagram, T'ai Chi Ch'üan techniques are said therefore to physically and energetically balance <i>yin</i> (receptive) and <i>yang</i> (active) principles: "From ultimate softness comes ultimate hardness."</p>
<p>The core training involves two primary features: the first being the solo form or <i>ch'üan</i>, a slow sequence of movements which emphasize a straight spine, relaxed breathing and a natural range of motion; the second being different styles of <i>pushing hands</i> or <i>t'ui shou</i> (??) for training "stickiness" and sensitivity in the reflexes through various motions from the forms in concert with a training partner in order to learn leverage, timing, coordination and positioning when interacting with another. Pushing hands is seen as necessary not only for training the self-defense skills of a soft style such as T'ai Chi by demonstrating the forms' movement principles experientially, but also it is said to improve upon the level of conditioning provided by practice of the solo forms by increasing the workload on students while they practice those movement principles.</p>
<p>The solo form should take the students through a complete, natural, range of motion over their centre of gravity. Accurate, repeated practice of the solo routine is said to retrain posture, encourage circulation throughout the students' bodies, maintain flexibility through their joints and further familiarize students with the martial application sequences implied by the forms. The major traditional styles of T'ai Chi have forms which differ somewhat cosmetically, but there are also many obvious similarities which point to their common origin. The solo forms, empty-hand and <a href="/wiki/Weapon" title="Weapon">weapon</a>, are catalogues of movements that are practised individually in pushing hands and martial application scenarios to prepare students for self-defence training. In most traditional schools different variations of the solo forms can be practiced; fast/slow, small circle/large circle, square/round (which are different expressions of leverage through the joints), low sitting/high sitting (the degree to which weight-bearing knees are kept bent throughout the form), for example.</p>
<p>In a fight, if one uses hardness to resist violent force then both sides are certain to be injured, at least to some degree. Such injury, according to T'ai Chi theory, is a natural consequence of meeting brute force with brute force. The collision of two like forces, yang with yang, is known as "double-weighted" in T'ai Chi terminology. Instead, students are taught not to fight or resist an incoming force, but to meet it in softness and "stick" to it, following its motion while remaining in physical contact until the incoming force of attack exhausts itself or can be safely redirected, the result of meeting yang with yin. Done correctly, achieving this yin/yang or yang/yin balance in combat (and, by extension, other areas of one's life) is known as being "single-weighted" and is a primary goal of T'ai Chi Ch'üan training. <a href="/wiki/Lao_Tzu" title="Lao Tzu">Lao Tzu</a> provided the <a href="/wiki/Archetype" title="Archetype">archetype</a> for this in the <a href="/wiki/Tao_Te_Ching" title="Tao Te Ching">Tao Te Ching</a> when he wrote, "The soft and the pliable will defeat the hard and strong." This soft "neutralization" of an attack can be accomplished very quickly in an actual fight by an adept practitioner. A T'ai Chi student has to be well conditioned by many years of disciplined training; stable, sensitive and elastic mentally and physically in order to realize this ability, however.</p>
<p>Other training exercises include:</p>
<ul>
<li>Weapons training and <a href="/wiki/Fencing" title="Fencing">fencing</a> applications employing the straight <i><a href="/wiki/Sword" title="Sword">sword</a></i> known as the <i>jian</i> or <i>chien</i> or <i>gim</i> (<a href="/wiki/Jian" title="Jian">jiàn</a> ?), a heavier curved <i>sabre</i>, sometimes called a <i>broadsword</i> or <i>tao</i> (<a href="/wiki/Dao_%28saber%29" title="Dao (saber)">dao</a> ?, which is actually considered a big <i><a href="/wiki/Knife" title="Knife">knife</a></i>), folding <i><a href="/wiki/Tessen" title="Tessen">fan</a></i>, <i>staff</i> (?), 7 foot (2 m) <i><a href="/wiki/Qiang_%28spear%29" title="Qiang (spear)">spear</a></i> and 13 foot (4 m) <i><a href="/wiki/Lance" title="Lance">lance</a></i> (both called qiang ?). More exotic weapons still used by some traditional styles are the large <i>Da Dao</i> or <i>Ta Tao</i> (??) sabre, <i><a href="/wiki/Ji_%28halberd%29" title="Ji (halberd)">halberd</a></i> (ji ?), <i>cane</i>, <i>rope-dart</i>, <i><a href="/wiki/Three_sectional_staff" title="Three sectional staff">three sectional staff</a></i>, <i><a href="/wiki/Lasso" title="Lasso">lasso</a></i>, <i><a href="/wiki/Whip" title="Whip">whip</a></i>, <i><a href="/wiki/Chain_whip" title="Chain whip">chain whip</a></i> and <i>steel whip</i>.</li>
<li>Two-person tournament sparring (as part of push hands competitions and/or <i><a href="/wiki/San_shou" title="San shou">san shou</a></i> ??);</li>
<li>Breathing exercises; <i><a href="/wiki/Nei_kung" title="Nei kung">nei kung</a></i> (?? nèigong) or, more commonly, <i><a href="/wiki/Ch%27i_kung" title="Ch'i kung">ch'i kung</a></i> (?? qìgong) to develop <b><a href="/wiki/Ch%27i" title="Ch'i">ch'i</a></b> (? qì) or "breath energy" in coordination with physical movement and <a href="/wiki/Zhan_zhuang" title="Zhan zhuang">post standing</a> or combinations of the two. These were formerly taught only to disciples as a separate, complementary training system. In the last 50 years they have become more well known to the general public.</li>
</ul>
<p>T'ai Chi's martial aspect relies on sensitivity to the opponent's movements and centre of gravity dictating appropriate responses. Effectively affecting or "capturing" the opponent's centre of gravity immediately upon contact is trained as the primary goal of the martial T'ai Chi student, and from there all other technique can follow with seeming effortlessness. The alert calmness required to achieve the necessary sensitivity is acquired over thousands of hours of first <i>yin</i> (slow, repetitive, meditative, low impact) and then later adding <i>yang</i> ("realistic," active, fast, high impact) martial training; forms, pushing hands and sparring. T'ai Chi Ch'üan trains in three basic ranges, close, medium and long, and then everything in between. Pushes and open hand strikes are more common than punches, and kicks are usually to the legs and lower torso, never higher than the hip in most styles. The fingers, fists, palms, sides of the hands, wrists, forearms, elbows, shoulders, back, hips, knees and feet are commonly used to strike, with strikes to the eyes, throat, heart, groin and other acupressure points trained by advanced students. There is an extensive repertoire of joint traps, locks and breaks (<a href="/wiki/Chin_na" title="Chin na">chin na</a>), particularly applied to lock up or break an opponent's elbows, wrists, fingers, ankles, back or neck. Most T'ai Chi teachers expect their students to thoroughly learn defensive or neutralizing skills first, and a student will have to demonstrate proficiency with them before offensive skills will be extensively trained. There is also an emphasis in the traditional schools on kind-heartedness. One is expected to show mercy to one's opponents, as instanced by a poem preserved in some of the T'ai Chi families said to be derived from the <a href="/wiki/Shaolin" title="Shaolin">Shaolin</a> temple:</p>
<dl>
<dd>"I would rather maim than kill</dd>
<dd>Hurt than maim</dd>
<dd>Intimidate than hurt</dd>
<dd>Avoid than intimidate."</dd>
</dl>
<div class="thumb tright">
<div style="width:352px;"><a href="/wiki/Image:Martial_arts_-_Fragrant_Hills.JPG" class="internal" title="An outdoor Chen style class in Beijing"><img src="http://upload.wikimedia.org/wikipedia/commons/thumb/b/b6/Martial_arts_-_Fragrant_Hills.JPG/350px-Martial_arts_-_Fragrant_Hills.JPG" alt="An outdoor Chen style class in Beijing" width="350" height="233" longdesc="/wiki/Image:Martial_arts_-_Fragrant_Hills.JPG" /></a>
<div class="thumbcaption">
<div class="magnify" style="float:right"><a href="/wiki/Image:Martial_arts_-_Fragrant_Hills.JPG" class="internal" title="Enlarge"><img src="/skins-1.5/common/images/magnify-clip.png" width="15" height="11" alt="Enlarge" /></a></div>
An outdoor Chen style class in Beijing</div>
</div>
</div>
<div class="editsection" style="float:right;margin-left:5px;">[<a href="/w/index.php?title=Tai_Chi_Chuan&amp;action=edit&amp;section=3" title="Edit section: Styles and history">edit</a>]</div>
<p><a name="Styles_and_history" id="Styles_and_history"></a></p>
<h2>Styles and history</h2>
<p>There are five major styles of T'ai Chi Ch'üan, each named after the Chinese family that teaches (or taught) it:</p>
<ul>
<li><b><a href="/wiki/Chen_style_Tai_Chi_Chuan" title="Chen style Tai Chi Chuan">Chen style</a></b> (??) and its close cousin <b><a href="/w/index.php?title=Zhaobao_style_T%27ai_Chi_Ch%27uan&amp;action=edit" class="new" title="Zhaobao style T'ai Chi Ch'uan">Zhao Bao Style</a></b> (?????)</li>
<li><b><a href="/wiki/Yang_style_T%27ai_Chi_Ch%27uan" title="Yang style T'ai Chi Ch'uan">Yang style</a></b> (??)</li>
<li><b><a href="/wiki/Wu/Hao_style_T%27ai_Chi_Ch%27uan" title="Wu/Hao style T'ai Chi Ch'uan">Wu or Wu/Hao style of Wu Yu-hsiang</a></b> (??)</li>
<li><b><a href="/wiki/Wu_style_T%27ai_Chi_Ch%27uan" title="Wu style T'ai Chi Ch'uan">Wu style of Wu Ch'uan-yü and Wu Chien-ch'uan</a></b> (??)</li>
<li><b><a href="/wiki/Sun_style_T%27ai_Chi_Ch%27uan" title="Sun style T'ai Chi Ch'uan">Sun style</a></b> (??)</li>
</ul>
<p>The order of seniority is as listed above. The order of popularity is Yang, Wu, Chen, Sun, and Wu/Hao. The first five major family styles share much underlying theory, but differ in their approaches to training.</p>
<p>In the modern world there are now dozens of new styles, hybrid styles and offshoots of the main styles, but the five family schools are the groups recognised by the international community as being orthodox. For example, there are <i>several</i> groups teaching what they call <a href="/wiki/Wudang_Tai_Chi_Chuan" title="Wudang Tai Chi Chuan">Wu Tang style T'ai Chi Ch'üan (??????)</a>. The best known modern style going by the name <i>Wu Tang</i> has gained some publicity internationally, especially in the <a href="/wiki/United_Kingdom" title="United Kingdom">UK</a> and <a href="/wiki/Europe" title="Europe">Europe</a>, but was originally taught by a senior student of the Wu (?) style.</p>
<p>The designation <i><a href="/wiki/Wudangquan" title="Wudangquan">Wu Tang Ch'üan</a></i> is also used to broadly distinguish <i>internal</i> or <i>nei chia</i> martial arts (said to be a specialty of the <a href="/wiki/Monasteries" title="Monasteries">monasteries</a> at <a href="/wiki/Wudangshan" title="Wudangshan">Wu Tang Shan</a>) from what are known as the <i>external</i> or <i>wei chia</i> styles based on <i><a href="/w/index.php?title=Shaolinquan_kung_fu&amp;action=edit" class="new" title="Shaolinquan kung fu">Shaolinquan kung fu</a></i>, although that distinction is sometimes disputed by individual schools. In this broad sense, among many T'ai Chi schools <i>all</i> styles of T'ai Chi (as well as related arts such as <a href="/wiki/Bagua_zhang" title="Bagua zhang">Pa Kua Chang</a> and <a href="/wiki/Hsing_Yi" title="Hsing Yi">Hsing-i Ch'üan</a>) are therefore considered to be "Wu Tang style" martial arts. The schools that designate themselves "Wu Tang style" relative to the family styles mentioned above mostly claim to teach an "original style" they say was formulated by a Taoist monk called <a href="/wiki/Zhang_Sanfeng" title="Zhang Sanfeng">Zhang Sanfeng</a> and taught by him in the Taoist monasteries at Wu Tang Shan. Some consider that what is practised under that name today may be a modern back-formation based on stories and popular veneration of Zhang Sanfeng (see below) as well as the martial fame of the Wu Tang monastery (there are many other martial art styles historically associated with Wu Tang besides T'ai Chi).</p>
<p>When tracing T'ai Chi Ch'üan's formative influences to <a href="/wiki/Taoist" title="Taoist">Taoist</a> and <a href="/wiki/Buddhist" title="Buddhist">Buddhist</a> monasteries, one has little more to go on than legendary tales from a modern historical perspective, but T'ai Chi Ch'üan's practical connection to and dependence upon the theories of <a href="/wiki/Song_dynasty" title="Song dynasty">Sung dynasty</a> <a href="/wiki/Neo-Confucianism" title="Neo-Confucianism">Neo-Confucianism</a> (a conscious synthesis of Taoist, Buddhist and <a href="/wiki/Confucian" title="Confucian">Confucian</a> traditions, esp. the teachings of <a href="/wiki/Mencius" title="Mencius">Mencius</a>) is readily apparent to its practitioners. The philosophical and political landscape of that time in Chinese history is fairly well documented, even if the origin of the art later to become known as T'ai Chi Ch'üan in it is not. T'ai Chi Ch'üan's theories and practice are therefore believed by some schools to have been formulated by the Taoist monk <a href="/wiki/Zhang_Sanfeng" title="Zhang Sanfeng">Zhang Sanfeng</a> in the 12th century, a time frame fitting well with when the principles of the Neo-Confucian school were making themselves felt in Chinese intellectual life. Therefore the didactic story is told that Zhang Sanfeng as a young man studied <a href="/wiki/Tao_Yin" title="Tao Yin">Tao Yin</a> (??, <a href="/wiki/Pinyin" title="Pinyin">Pinyin</a> daoyin) breathing exercises from his Taoist teachers and martial arts at the Buddhist Shaolin monastery, eventually combining the martial forms and breathing exercises to formulate the soft or internal principles we associate with T'ai Chi Ch'üan and related martial arts. Its subsequent fame attributed to his teaching, Wu Tang monastery was known thereafter as an important martial center for many centuries, its many styles of internal <a href="/wiki/Kung_fu" title="Kung fu">kung fu</a> preserved and refined at various Taoist temples.</p>
<div class="editsection" style="float:right;margin-left:5px;">[<a href="/w/index.php?title=Tai_Chi_Chuan&amp;action=edit&amp;section=4" title="Edit section: Family tree">edit</a>]</div>
<p><a name="Family_tree" id="Family_tree"></a></p>
<h3>Family tree</h3>
<p>This family tree is not comprehensive.</p>
<pre>
<b>LEGENDARY FIGURES</b>
|
<a href="/wiki/Zhang_Sanfeng" title="Zhang Sanfeng">Zhang Sanfeng</a>*
circa 12th century
<a href="/wiki/Nei_chia" title="Nei chia">NEI CHIA</a>
|
<a href="/w/index.php?title=Wang_Zongyue&amp;action=edit" class="new" title="Wang Zongyue">Wang Zongyue</a>*
T'AI CHI CH'ÜAN
|
<b>THE 5 MAJOR CLASSICAL FAMILY STYLES</b>
|
Chen Wangting
1600-1680 9th generation Chen
<a href="/wiki/Chen_style_Tai_Chi_Chuan" title="Chen style Tai Chi Chuan">CHEN STYLE</a>
|
+-------------------------------------------------------------------+
| |
<a href="/w/index.php?title=Chen_Changxing&amp;action=edit" class="new" title="Chen Changxing">Chen Changxing</a> <a href="/w/index.php?title=Chen_Youben&amp;action=edit" class="new" title="Chen Youben">Chen Youben</a>
1771-1853 14th generation Chen circa 1800s 14th generation Chen
Chen Old Frame Chen New Frame
| |
<a href="/wiki/Yang_Lu-ch%27an" title="Yang Lu-ch'an">Yang Lu-ch'an</a> <a href="/w/index.php?title=Chen_Qingping&amp;action=edit" class="new" title="Chen Qingping">Chen Qingping</a>
1799-1872 1795-1868
<a href="/wiki/Yang_style_Tai_Chi_Chuan" title="Yang style Tai Chi Chuan">YANG STYLE</a> Chen Small Frame, Zhao Bao Frame
| |
+---------------------------------+-----------------------------+ |
| | | |
<a href="/wiki/Yang_Pan-hou" title="Yang Pan-hou">Yang Pan-hou</a> <a href="/wiki/Yang_Chien-hou" title="Yang Chien-hou">Yang Chien-hou</a> <a href="/w/index.php?title=Wu_Yu-hsiang&amp;action=edit" class="new" title="Wu Yu-hsiang">Wu Yu-hsiang</a>
1837-1892 1839-1917 1812-1880
Yang Small Frame | <a href="/wiki/Wu/Hao_style_T%27ai_Chi_Ch%27uan" title="Wu/Hao style T'ai Chi Ch'uan">WU/HAO STYLE</a>
| +-----------------+ |
| | | |
<a href="/wiki/Wu_Ch%27uan-y%C3%BC" title="Wu Ch'uan-yü">Wu Ch'uan-yü</a> <a href="/wiki/Yang_Shao-hou" title="Yang Shao-hou">Yang Shao-hou</a> <a href="/wiki/Yang_Ch%27eng-fu" title="Yang Ch'eng-fu">Yang Ch'eng-fu</a> <a href="/w/index.php?title=Li_I-y%C3%BC&amp;action=edit" class="new" title="Li I-yü">Li I-yü</a>
1834-1902 1862-1930 1883-1936 1832-1892
| Yang Small Frame <a href="/wiki/103_form_Yang_family_T%27ai_Chi_Ch%27uan" title="103 form Yang family T'ai Chi Ch'uan">Yang Big Frame</a> |
<a href="/wiki/Wu_Chien-ch%27%C3%BCan" title="Wu Chien-ch'üan">Wu Chien-ch'üan</a> | <a href="/wiki/Hao_Wei-chen" title="Hao Wei-chen">Hao Wei-chen</a>
1870-1942 <a href="/wiki/Yang_Shou-chung" title="Yang Shou-chung">Yang Shou-chung</a> 1849-1920
<a href="/wiki/Wu_style_T%27ai_Chi_Ch%27uan" title="Wu style T'ai Chi Ch'uan">WU STYLE</a> 1910-1985 |
<a href="/wiki/108_form_Wu_family_T%27ai_Chi_Ch%27uan" title="108 form Wu family T'ai Chi Ch'uan">108 Form</a> |
| <a href="/wiki/Sun_Lu-t%27ang" title="Sun Lu-t'ang">Sun Lu-t'ang</a>
<a href="/wiki/Wu_Kung-i" title="Wu Kung-i">Wu Kung-i</a> 1861-1932
1900-1970 <a href="/wiki/Sun_style_T%27ai_Chi_Ch%27uan" title="Sun style T'ai Chi Ch'uan">SUN STYLE</a>
| |
<a href="/w/index.php?title=Wu_Ta-kuei&amp;action=edit" class="new" title="Wu Ta-kuei">Wu Ta-kuei</a> <a href="/w/index.php?title=Sun_Hsing-i&amp;action=edit" class="new" title="Sun Hsing-i">Sun Hsing-i</a>
1923-1970 1891-1929
<b>MODERN FORMS</b>
from Yang Ch`eng-fu
|
|
|
+--------------+
| |
<a href="/wiki/Cheng_Man-ch%27ing" title="Cheng Man-ch'ing">Cheng Man-ch'ing</a> |
1901-1975 |
Short (37) Form |
|
Chinese Sports Commission
1956
Beijing <a href="/wiki/24_Form" title="24 Form">24 Form</a>
.
.
1989
<a href="/wiki/42_Form_%28Competition_Form%29_T%27ai_Chi_Ch%27uan" title="42 Form (Competition Form) T'ai Chi Ch'uan">42 Competition Form</a>
(<a href="/wiki/Wushu" title="Wushu">Wushu</a> competition form combined from Sun, Wu, Chen, and Yang styles)
</pre>
<div class="editsection" style="float:right;margin-left:5px;">[<a href="/w/index.php?title=Tai_Chi_Chuan&amp;action=edit&amp;section=5" title="Edit section: Notes to Family tree table">edit</a>]</div>
<p><a name="Notes_to_Family_tree_table" id="Notes_to_Family_tree_table"></a></p>
<h3>Notes to Family tree table</h3>
<p>Names denoted by an asterisk are legendary or semilegendary figures in the lineage, which means their involvement in the lineage, while accepted by most of the major schools, isn't independently verifiable from known historical records.</p>
<p>The Cheng Man-ch'ing and <a href="/w/index.php?title=Chinese_Sports_Commission&amp;action=edit" class="new" title="Chinese Sports Commission">Chinese Sports Commission</a> short forms are said to be derived from Yang family forms, but neither are recognized as Yang family T'ai Chi Ch'üan by current Yang family teachers. The Chen, Yang and Wu families are now promoting their own shortened demonstration forms for competitive purposes.</p>
<div class="editsection" style="float:right;margin-left:5px;">[<a href="/w/index.php?title=Tai_Chi_Chuan&amp;action=edit&amp;section=6" title="Edit section: Modern T'ai Chi">edit</a>]</div>
<p><a name="Modern_T.27ai_Chi" id="Modern_T.27ai_Chi"></a></p>
<h2>Modern T'ai Chi</h2>
<div class="thumb tright">
<div style="width:352px;"><a href="/wiki/Image:Taichi_shanghai_bund_2005.jpg" class="internal" title="Yang style in Shanghai"><img src="http://upload.wikimedia.org/wikipedia/commons/thumb/9/9f/Taichi_shanghai_bund_2005.jpg/350px-Taichi_shanghai_bund_2005.jpg" alt="Yang style in Shanghai" width="350" height="263" longdesc="/wiki/Image:Taichi_shanghai_bund_2005.jpg" /></a>
<div class="thumbcaption">
<div class="magnify" style="float:right"><a href="/wiki/Image:Taichi_shanghai_bund_2005.jpg" class="internal" title="Enlarge"><img src="/skins-1.5/common/images/magnify-clip.png" width="15" height="11" alt="Enlarge" /></a></div>
Yang style in Shanghai</div>
</div>
</div>
<p>T'ai Chi has become very popular in the last twenty years or so, as the <a href="/wiki/Baby_boomers" title="Baby boomers">baby boomers</a> age and T'ai Chi's reputation for ameliorating the effects of aging becomes more well-known. Hospitals, clinics, community and senior centers are all hosting T'ai Chi classes in communities around the world. As a result of this popularity, there has been some divergence between those who say they practice T'ai Chi primarily for fighting, those who practice it for its <a href="/wiki/Aesthetic" title="Aesthetic">aesthetic</a> appeal (as in the shortened, modern, theatrical "Taijiquan" forms of <a href="/wiki/Wushu" title="Wushu">wushu</a>, see below), and those who are more interested in its benefits to physical and mental health. The wushu aspect is primarily for show; the forms taught for those purposes are designed to earn points in competition and are mostly unconcerned with either health maintenance or martial ability. More traditional stylists still see the two aspects of health and martial arts as equally necessary pieces of the puzzle, the <i>yin</i> and <i>yang</i> of T'ai Chi Ch'üan. The T'ai Chi "family" schools therefore still present their teachings in a martial art context even though the majority of their students nowadays profess that they are primarily interested in training for the claimed health benefits.</p>
<p>Along with <a href="/wiki/Yoga" title="Yoga">Yoga</a>, it is one of the fastest growing fitness and health maintenance activities, in terms of numbers of students enrolling in classes. Since there is no universal certification process and most Westerners haven't seen very much T'ai Chi and don't know what to look for, practically anyone can learn or even make up a few moves and call themselves a teacher. This is especially prevalent in the <a href="/wiki/New_Age" title="New Age">New Age</a> community. Relatively few of these teachers even know that there are martial applications to the T'ai Chi forms. Those who do know that it is a martial art usually don't teach martially themselves. If they do teach self-defense, it is often a mixture of motions which the teachers think look like T'ai Chi Ch'üan with some other system. This is especially evident in schools located outside of China. While this phenomenon may have made some external aspects of T'ai Chi available for a wider audience, the traditional T'ai Chi family schools see the martial focus as a fundamental part of their training, both for health <i>and</i> self-defense purposes. They claim that while the students may not need to practice martial applications themselves to derive a benefit from T'ai Chi training, they assert that T'ai Chi teachers at least should know the martial applications to ensure that the movements they teach are done correctly and safely by their students. Also, working on the ability to protect oneself from physical attack (one of the most stressful things that can happen to a person) certainly falls under the category of complete "health maintenance." For these reasons they claim that a school not teaching those aspects somewhere in their syllabus cannot be said to be actually teaching the art itself, and will be much less likely to be able to reproduce the full health benefits that made T'ai Chi's reputation in the first place.</p>
<div class="editsection" style="float:right;margin-left:5px;">[<a href="/w/index.php?title=Tai_Chi_Chuan&amp;action=edit&amp;section=7" title="Edit section: Modern forms">edit</a>]</div>
<p><a name="Modern_forms" id="Modern_forms"></a></p>
<h3>Modern forms</h3>
<div class="thumb tright">
<div style="width:352px;"><a href="/wiki/Image:Tai_Chi_fans.jpg" class="internal" title="Women practicing non-martial T'ai Chi in Chinatown (New York City, New York, USA)."><img src="http://upload.wikimedia.org/wikipedia/en/thumb/a/ad/Tai_Chi_fans.jpg/350px-Tai_Chi_fans.jpg" alt="Women practicing non-martial T'ai Chi in Chinatown (New York City, New York, USA)." width="350" height="201" longdesc="/wiki/Image:Tai_Chi_fans.jpg" /></a>
<div class="thumbcaption">
<div class="magnify" style="float:right"><a href="/wiki/Image:Tai_Chi_fans.jpg" class="internal" title="Enlarge"><img src="/skins-1.5/common/images/magnify-clip.png" width="15" height="11" alt="Enlarge" /></a></div>
Women practicing non-martial T'ai Chi in <a href="/wiki/Chinatown_%28Manhattan%29" title="Chinatown (Manhattan)">Chinatown</a> (<a href="/wiki/New_York_City" title="New York City">New York City</a>, <a href="/wiki/New_York" title="New York">New York</a>, <a href="/wiki/USA" title="USA">USA</a>).</div>
</div>
</div>
<p>In order to standardize T'ai Chi Ch'üan for wushu tournament judging, and because many of the family T'ai Chi Ch'üan teachers had either moved out of China or had been forced to stop teaching after the <a href="/wiki/Chinese_Civil_War" title="Chinese Civil War">Communist regime was established</a> in <a href="/wiki/1949" title="1949">1949</a>, the government sponsored Chinese Sports Committee brought together four of their wushu teachers to truncate the Yang family hand form to <a href="/wiki/24_Form_%28Simplified_Form%29_T%27ai_Chi_Ch%27uan" title="24 Form (Simplified Form) T'ai Chi Ch'uan">24 postures</a> in <a href="/wiki/1956" title="1956">1956</a>. They wanted to somehow retain the look of T'ai Chi Ch'üan but make an easy to remember routine that was less difficult to teach and much less difficult to learn than longer (generally 88 to 108 posture) classical solo hand forms. In <a href="/wiki/1976" title="1976">1976</a>, they developed a slightly longer form also for the purposes of demonstration that still didn't involve the complete memory, balance and coordination requirements of the traditional forms. This was a combination form, the <i>Combined 48 Forms</i> that were created by three wushu coaches headed by Professor Men Hui Feng. The combined forms were created based on simplifying and combining some features of the classical forms from four of the original styles; Ch'en, Yang, Wu, and Sun. Even though shorter modern forms don't have the conditioning benefits of the classical forms, the idea was to take what they felt were distinctive cosmetic features of these styles and to express them in a shorter time for purposes of competition.</p>
<p>As T'ai Chi again became popular on the Mainland, competitive forms were developed to be completed within a 6 minute time limit. In the late <a href="/wiki/1980s" title="1980s">1980s</a>, the Chinese Sports Committee standardized many different competition forms. It had developed sets said to represent the four major styles as well as combined forms. These five sets of forms were created by different teams, and later approved by a committee of wushu coaches in China. All sets of forms thus created were named after their style, e.g., the Ch'en Style National Competition Form is the <i>56 Forms</i>, and so on. The combined forms are <i>The 42 Form</i> or simply the <i>Competition Form</i>, as it is known in China. In the 11th <a href="/wiki/Asian_Games" title="Asian Games">Asian Games</a> of <a href="/wiki/1990" title="1990">1990</a>, wushu was included as an item for competition for the first time with the 42 Form being chosen to represent T'ai Chi. The International Wushu Federation (<a href="/w/index.php?title=IWUF&amp;action=edit" class="new" title="IWUF">IWUF</a>) has applied for wushu to be part of the <a href="/wiki/Olympic_games" title="Olympic games">Olympic games</a>. If accepted, it is likely that T'ai Chi and wushu will be represented as demonstration events in <a href="/wiki/2008" title="2008">2008</a>.</p>
<p>Representatives of the original T'ai Chi families do not teach the forms developed by the Chinese Sports Committee. T'ai Chi Ch'üan has historically been seen by them as a martial art, not a sport, with competitions mostly entered as a hobby or to promote one's school publicly, but with little bearing on measuring actual accomplishment in the art. Their criticisms of modern forms include that the modern, "government" routines have no standardized, internally consistent training requirements. Also, that people studying competition forms rarely train pushing hands or other power generation trainings vital to learning the martial applications of T'ai Chi Ch'üan and thereby lack the <a href="/wiki/Quality_control" title="Quality control">quality control</a> traditional teachers maintain is essential for achieving the full benefits from both the health and the martial aspect of traditional T'ai Chi training.</p>
<div class="editsection" style="float:right;margin-left:5px;">[<a href="/w/index.php?title=Tai_Chi_Chuan&amp;action=edit&amp;section=8" title="Edit section: Health benefits">edit</a>]</div>
<p><a name="Health_benefits" id="Health_benefits"></a></p>
<h2>Health benefits</h2>
<p>Researchers have found that long-term T'ai Chi practice had favorable effects on the promotion of balance control, flexibility and cardiovascular fitness and reduced the risk of falls in elders. The studies also reported reduced pain, stress and anxiety in healthy subjects. Other studies have indicated improved cardiovascular and respiratory function in healthy subjects as well as those who had undergone coronary artery bypass surgery. Patients also benefited from T'ai Chi who suffered from heart failure, high blood pressure, heart attacks, arthritis and multiple sclerosis.</p>
<p>T'ai Chi has also been shown to reduce the symptoms of young Attention Deficit and Hyperactivity Disorder (<a href="/wiki/ADHD" title="ADHD">ADHD</a>) sufferers. T'ai Chi's gentle, low impact, movements surprisingly burn more calories than surfing and nearly as many as downhill skiing. T'ai Chi also boosts aspects of the immune system's function very significantly, and has been shown to reduce the incidence of anxiety, depression, and overall mood disturbance. (See research citations listed below.)</p>
<p>A pilot study has found evidence that T'ai Chi and related qigong helps reduce the severity of <a href="/wiki/Diabetes" title="Diabetes">diabetes</a>.<a href="http://www.abc.net.au/pm/content/2005/s1535304.htm" class="external autonumber" title="http://www.abc.net.au/pm/content/2005/s1535304.htm">[1]</a></p>
<div class="editsection" style="float:right;margin-left:5px;">[<a href="/w/index.php?title=Tai_Chi_Chuan&amp;action=edit&amp;section=9" title="Edit section: Citations to medical research">edit</a>]</div>
<p><a name="Citations_to_medical_research" id="Citations_to_medical_research"></a></p>
<h3>Citations to medical research</h3>
<ul>
<li>Wolf SL, Sattin RW, Kutner M. Intense T'ai Chi exercise training and fall occurrences in older, transitionally frail adults: a randomized, controlled trial. J Am Geriatr Soc. 2003 Dec; 51(12): 1693-701. <a href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=pubmed&amp;dopt=Abstract&amp;list_uids=14687346" class="external" title="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=pubmed&amp;dopt=Abstract&amp;list_uids=14687346">PMID 14687346</a></li>
<li>Wang C, Collet JP, Lau J. The effect of Tai Chi on health outcomes in patients with chronic conditions: a <a href="/wiki/Systematic_review" title="Systematic review">systematic review</a>. Arch Intern Med. 2004 Mar 8;164(5):493-501. <a href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=pubmed&amp;dopt=Abstract&amp;list_uids=15006825" class="external" title="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&amp;db=pubmed&amp;dopt=Abstract&amp;list_uids=15006825">PMID 15006825</a></li>
<li>Search a listing of articles relating to the FICSIT trials and T'ai Chi <a href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Select+from+History&amp;db=pubmed&amp;query_key=3" class="external autonumber" title="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Select+from+History&amp;db=pubmed&amp;query_key=3">[2]</a></li>
<li>Hernandez-Reif, M., Field, T.M., &amp; Thimas, E. (2001). Attention deficit hyperactivity disorder: benefits from Tai Chi. Journal of Bodywork &amp; Movement Therapies, 5(2):120-3, 2001 Apr, 5(23 ref), 120-123</li>
<li>Calorie Burning Chart <a href="http://www.nutristrategy.com/activitylist3.htm" class="external autonumber" title="http://www.nutristrategy.com/activitylist3.htm">[3]</a></li>
<li>Tai Chi boosts T-Cell counts in immune system <a href="http://www.acupuncturetoday.com/archives2003/nov/11taichi.html" class="external autonumber" title="http://www.acupuncturetoday.com/archives2003/nov/11taichi.html">[4]</a></li>
<li>Tai Chi, depression, anxiety, and mood disturbance (American Psychological Association) Journal of Psychosomatic Research, 1989 Vol 33 (2) 197-206</li>
<li>A comprehensive listing of Tai Chi medical research links <a href="http://www.worldtaichiday.org/WTCQDHlthBenft.html" class="external autonumber" title="http://www.worldtaichiday.org/WTCQDHlthBenft.html">[5]</a></li>
<li>References to medical publications <a href="http://www.worldtaichiday.org/HeadlineNews.html" class="external autonumber" title="http://www.worldtaichiday.org/HeadlineNews.html">[6]</a></li>
<li><a href="http://www.abc.net.au/pm/content/2005/s1535304.htm" class="external text" title="http://www.abc.net.au/pm/content/2005/s1535304.htm">Tai Chi a promising remedy for diabetes</a>, <i>Australian Broadcasting Corporation</i>, 20 December, 2005 - Pilot study of Qigong and tai chi in diabetes sufferers.</li>
<li>Health Research Articles on "Tai Chi as Health Therapy" for many issues, i.e. ADHD, Cardiac Health &amp; Rehabilitation, Diabetes, High Blood Pressure, Menopause, Bone Loss, Weight Loss, etc.<a href="http://www.worldtaichiday.org/LIBRARYArticles/LIBRARYTaiChiArticlesMenu.html" class="external autonumber" title="http://www.worldtaichiday.org/LIBRARYArticles/LIBRARYTaiChiArticlesMenu.html">[7]</a></li>
</ul>
<div class="editsection" style="float:right;margin-left:5px;">[<a href="/w/index.php?title=Tai_Chi_Chuan&amp;action=edit&amp;section=10" title="Edit section: See also">edit</a>]</div>
<p><a name="See_also" id="See_also"></a></p>
<h2>See also</h2>
<ul>
<li><a href="/wiki/Jing_%28TCM%29" title="Jing (TCM)">Jing</a></li>
<li><a href="/wiki/List_of_Tai_Chi_Chuan_forms" title="List of Tai Chi Chuan forms">List of Tai Chi Chuan forms</a></li>
<li><a href="/wiki/Nei_Jin" title="Nei Jin">nei chin</a></li>
<li><a href="/wiki/Silk_reeling" title="Silk reeling">Silk reeling</a></li>
<li><a href="/wiki/World_Tai_Chi_and_Qigong_Day" title="World Tai Chi and Qigong Day">World Tai Chi and Qigong Day</a></li>
</ul>
<div class="editsection" style="float:right;margin-left:5px;">[<a href="/w/index.php?title=Tai_Chi_Chuan&amp;action=edit&amp;section=11" title="Edit section: External links">edit</a>]</div>
<p><a name="External_links" id="External_links"></a></p>
<h2>External links</h2>
<ul>
<li><a href="http://www.chenxiaowang.com/" class="external text" title="http://www.chenxiaowang.com/">A Chen Family Website</a></li>
<li><a href="http://www.yangfamilytaichi.com/" class="external text" title="http://www.yangfamilytaichi.com/">Yang Family Website</a></li>
<li><a href="http://www.wustyle.com/" class="external text" title="http://www.wustyle.com/">Wu Chien-ch'üan Family Website</a></li>
<li><a href="http://www.fushengyuan-taichi.com.au/" class="external text" title="http://www.fushengyuan-taichi.com.au/">Fu Family Website</a></li>
<li><a href="http://www.itcca.org/" class="external text" title="http://www.itcca.org/">Yang family disciple's website (ITCCA)</a></li>
<li><a href="http://www.dongtaichi.com/" class="external text" title="http://www.dongtaichi.com/">Dong T'ai Chi</a></li>
<li><a href="http://www.leefamilystyle.com/" class="external text" title="http://www.leefamilystyle.com/">UK website for Li style, popular in Europe</a></li>
<li><a href="http://www.chen-taiji.com/mambo/" class="external text" title="http://www.chen-taiji.com/mambo/">The World of Taijiquan</a></li>
<li><a href="http://www.scheele.org/lee/tcclinks.html" class="external text" title="http://www.scheele.org/lee/tcclinks.html">Lee Scheele's Links to T'ai Chi Ch'uan Web Sites</a></li>
<li><a href="http://news.bbc.co.uk/1/hi/health/3543907.stm" class="external text" title="http://news.bbc.co.uk/1/hi/health/3543907.stm">BBC article</a></li>
<li><a href="http://www.acupuncturetoday.com/archives2004/jul/07taichi.html" class="external text" title="http://www.acupuncturetoday.com/archives2004/jul/07taichi.html">Tai Chi: Good for the Mind, Good for the Body</a></li>
<li><a href="http://www.taichiunion.com/" class="external text" title="http://www.taichiunion.com/">Tai Chi Chuan Union for Great Britian: The largest collective of independent Tai Chi Chuan Instructors in the British Isles</a></li>
</ul>
<!-- Saved in parser cache with key enwiki:pcache:idhash:30690-0!0!0!1!!en!2 and timestamp 20060722121412 -->
<div class="printfooter">
Retrieved from "<a href="http://en.wikipedia.org/wiki/Tai_Chi_Chuan">http://en.wikipedia.org/wiki/Tai_Chi_Chuan</a>"</div>
<div id="catlinks"><p class='catlinks'><a href="/w/index.php?title=Special:Categories&amp;article=Tai_Chi_Chuan" title="Special:Categories">Categories</a>: <span dir='ltr'><a href="/wiki/Category:Chinese_martial_arts" title="Category:Chinese martial arts">Chinese martial arts</a></span> | <span dir='ltr'><a href="/wiki/Category:T%27ai_Chi_Ch%27uan" title="Category:T'ai Chi Ch'uan">T'ai Chi Ch'uan</a></span> | <span dir='ltr'><a href="/wiki/Category:Taoism" title="Category:Taoism">Taoism</a></span> | <span dir='ltr'><a href="/wiki/Category:Meditation" title="Category:Meditation">Meditation</a></span> | <span dir='ltr'><a href="/wiki/Category:Mind-body_interventions" title="Category:Mind-body interventions">Mind-body interventions</a></span> | <span dir='ltr'><a href="/wiki/Category:Traditional_Chinese_medicine" title="Category:Traditional Chinese medicine">Traditional Chinese medicine</a></span></p></div> <!-- end content -->
<div class="visualClear"></div>
</div>
</div>
</div>
<div id="column-one">
<div id="p-cactions" class="portlet">
<h5>Views</h5>
<ul>
<li id="ca-nstab-main" class="selected"><a href="/wiki/Tai_Chi_Chuan">Article</a></li>
<li id="ca-talk"><a href="/wiki/Talk:Tai_Chi_Chuan">Discussion</a></li>
<li id="ca-edit"><a href="/w/index.php?title=Tai_Chi_Chuan&amp;action=edit">Edit this page</a></li>
<li id="ca-history"><a href="/w/index.php?title=Tai_Chi_Chuan&amp;action=history">History</a></li>
<li id="ca-protect"><a href="/w/index.php?title=Tai_Chi_Chuan&amp;action=protect">Protect</a></li>
<li id="ca-delete"><a href="/w/index.php?title=Tai_Chi_Chuan&amp;action=delete">Delete</a></li>
<li id="ca-move"><a href="/w/index.php?title=Special:Movepage&amp;target=Tai_Chi_Chuan">Move</a></li>
<li id="ca-unwatch"><a href="/w/index.php?title=Tai_Chi_Chuan&amp;action=unwatch">Unwatch</a></li>
</ul>
</div>
<div class="portlet" id="p-personal">
<h5>Personal tools</h5>
<div class="pBody">
<ul>
<li id="pt-userpage"><a href="/wiki/User:Edward_Z._Yang">Edward Z. Yang</a></li>
<li id="pt-mytalk"><a href="/wiki/User_talk:Edward_Z._Yang">My talk</a></li>
<li id="pt-preferences"><a href="/wiki/Special:Preferences">My preferences</a></li>
<li id="pt-watchlist"><a href="/wiki/Special:Watchlist">My watchlist</a></li>
<li id="pt-mycontris"><a href="/wiki/Special:Contributions/Edward_Z._Yang">My contributions</a></li>
<li id="pt-logout"><a href="/w/index.php?title=Special:Userlogout&amp;returnto=Tai_Chi_Chuan">Log out</a></li>
</ul>
</div>
</div>
<div class="portlet" id="p-logo">
<a style="background-image: url(/images/wiki-en.png);" href="/wiki/Main_Page" title="Main Page"></a>
</div>
<script type="text/javascript"> if (window.isMSIE55) fixalpha(); </script>
<div class='portlet' id='p-navigation'>
<h5>Navigation</h5>
<div class='pBody'>
<ul>
<li id="n-mainpage"><a href="/wiki/Main_Page">Main Page</a></li>
<li id="n-portal"><a href="/wiki/Wikipedia:Community_Portal">Community Portal</a></li>
<li id="n-Featured-articles"><a href="/wiki/Wikipedia:Featured_articles">Featured articles</a></li>
<li id="n-currentevents"><a href="/wiki/Portal:Current_events">Current events</a></li>
<li id="n-recentchanges"><a href="/wiki/Special:Recentchanges">Recent changes</a></li>
<li id="n-randompage"><a href="/wiki/Special:Random">Random article</a></li>
<li id="n-help"><a href="/wiki/Help:Contents">Help</a></li>
<li id="n-contact"><a href="/wiki/Wikipedia:Contact_us">Contact Wikipedia</a></li>
<li id="n-sitesupport"><a href="http://wikimediafoundation.org/wiki/Fundraising#Donation_methods">Donations</a></li>
</ul>
</div>
</div>
<div id="p-search" class="portlet">
<h5><label for="searchInput">Search</label></h5>
<div id="searchBody" class="pBody">
<form action="/wiki/Special:Search" id="searchform"><div>
<input id="searchInput" name="search" type="text" accesskey="f" value="" />
<input type='submit' name="go" class="searchButton" id="searchGoButton" value="Go" />&nbsp;
<input type='submit' name="fulltext" class="searchButton" value="Search" />
</div></form>
</div>
</div>
<div class="portlet" id="p-tb">
<h5>Toolbox</h5>
<div class="pBody">
<ul>
<li id="t-whatlinkshere"><a href="/w/index.php?title=Special:Whatlinkshere&amp;target=Tai_Chi_Chuan">What links here</a></li>
<li id="t-recentchangeslinked"><a href="/w/index.php?title=Special:Recentchangeslinked&amp;target=Tai_Chi_Chuan">Related changes</a></li>
<li id="t-upload"><a href="/wiki/Special:Upload">Upload file</a></li>
<li id="t-specialpages"><a href="/wiki/Special:Specialpages">Special pages</a></li>
<li id="t-print"><a href="/w/index.php?title=Tai_Chi_Chuan&amp;printable=yes">Printable version</a></li> <li id="t-permalink"><a href="/w/index.php?title=Tai_Chi_Chuan&amp;oldid=65197836">Permanent link</a></li><li id="t-cite"><a href="/w/index.php?title=Special:Cite&amp;page=Tai_Chi_Chuan&amp;id=65197836">Cite this article</a></li> </ul>
</div>
</div>
<div id="p-lang" class="portlet">
<h5>In other languages</h5>
<div class="pBody">
<ul>
<li class="interwiki-br"><a href="http://br.wikipedia.org/wiki/Taichichuan">Brezhoneg</a></li>
<li class="interwiki-ca"><a href="http://ca.wikipedia.org/wiki/Tai_txi_txuan">Català</a></li>
<li class="interwiki-cs"><a href="http://cs.wikipedia.org/wiki/Tchaj-%C5%A5i">Cesky</a></li>
<li class="interwiki-da"><a href="http://da.wikipedia.org/wiki/Tai_Chi">Dansk</a></li>
<li class="interwiki-de"><a href="http://de.wikipedia.org/wiki/Taijiquan">Deutsch</a></li>
<li class="interwiki-et"><a href="http://et.wikipedia.org/wiki/Taijiquan">Eesti</a></li>
<li class="interwiki-el"><a href="http://el.wikipedia.org/wiki/%CE%A4%CE%AC%CE%B9_%CE%A4%CE%B6%CE%AF_%CE%A3%CE%BF%CF%85%CE%AC%CE%BD">????????</a></li>
<li class="interwiki-es"><a href="http://es.wikipedia.org/wiki/Tai_Chi_Chuan">Español</a></li>
<li class="interwiki-eo"><a href="http://eo.wikipedia.org/wiki/Taj%C4%9Di%C4%89uano">Esperanto</a></li>
<li class="interwiki-fr"><a href="http://fr.wikipedia.org/wiki/Tai-chi-chuan">Français</a></li>
<li class="interwiki-it"><a href="http://it.wikipedia.org/wiki/Taijiquan">Italiano</a></li>
<li class="interwiki-he"><a href="http://he.wikipedia.org/wiki/%D7%98%D7%90%D7%99_%D7%A6%27%D7%99">?????</a></li>
<li class="interwiki-hu"><a href="http://hu.wikipedia.org/wiki/Taijiquan">Magyar</a></li>
<li class="interwiki-nl"><a href="http://nl.wikipedia.org/wiki/Tai_Chi">Nederlands</a></li>
<li class="interwiki-ja"><a href="http://ja.wikipedia.org/wiki/%E5%A4%AA%E6%A5%B5%E6%8B%B3">???</a></li>
<li class="interwiki-pl"><a href="http://pl.wikipedia.org/wiki/Taijiquan">Polski</a></li>
<li class="interwiki-pt"><a href="http://pt.wikipedia.org/wiki/Tai_Chi_Chuan">Português</a></li>
<li class="interwiki-ro"><a href="http://ro.wikipedia.org/wiki/Taijiquan">Româna</a></li>
<li class="interwiki-ru"><a href="http://ru.wikipedia.org/wiki/%D0%A2%D0%B0%D0%B9%D1%86%D0%B7%D0%B8%D1%86%D1%8E%D0%B0%D0%BD%D1%8C">???????</a></li>
<li class="interwiki-fi"><a href="http://fi.wikipedia.org/wiki/Taijiquan">Suomi</a></li>
<li class="interwiki-sv"><a href="http://sv.wikipedia.org/wiki/Taijiquan">Svenska</a></li>
<li class="interwiki-th"><a href="http://th.wikipedia.org/wiki/%E0%B9%84%E0%B8%97%E0%B9%88%E0%B9%80%E0%B8%81%E0%B9%8A%E0%B8%81">???</a></li>
<li class="interwiki-tr"><a href="http://tr.wikipedia.org/wiki/Tai-Chi_Chuan">Türkçe</a></li>
<li class="interwiki-zh"><a href="http://zh.wikipedia.org/wiki/%E5%A4%AA%E6%9E%81%E6%8B%B3">??</a></li>
</ul>
</div>
</div>
</div><!-- end of the left (by default at least) column -->
<div class="visualClear"></div>
<div id="footer">
<div id="f-poweredbyico"><a href="http://www.mediawiki.org/"><img src="/skins-1.5/common/images/poweredby_mediawiki_88x31.png" alt="MediaWiki" /></a></div>
<div id="f-copyrightico"><a href="http://wikimediafoundation.org/"><img src="/images/wikimedia-button.png" border="0" alt="Wikimedia Foundation"/></a></div>
<ul id="f-list">
<li id="lastmod"> This page was last modified 03:15, July 22, 2006.</li>
<li id="copyright">All text is available under the terms of the <a class='internal' href="/wiki/Wikipedia:Text_of_the_GNU_Free_Documentation_License" title="Wikipedia:Text of the GNU Free Documentation License">GNU Free Documentation License</a>. (See <b><a class='internal' href="/wiki/Wikipedia:Copyrights" title="Wikipedia:Copyrights">Copyrights</a></b> for details.) <br /> Wikipedia&reg; is a registered trademark of the Wikimedia Foundation, Inc.<br /></li>
<li id="privacy"><a href="http://wikimediafoundation.org/wiki/Privacy_policy" title="wikimedia:Privacy policy">Privacy policy</a></li>
<li id="about"><a href="/wiki/Wikipedia:About" title="Wikipedia:About">About Wikipedia</a></li>
<li id="disclaimer"><a href="/wiki/Wikipedia:General_disclaimer" title="Wikipedia:General disclaimer">Disclaimers</a></li>
</ul>
</div>
<script type="text/javascript"> if (window.runOnloadHook) runOnloadHook();</script>
</div>
<!-- Served by srv25 in 0.089 secs. -->
</body></html>
<!-- vim: et sw=4 sts=4
-->

View File

@ -0,0 +1,7 @@
Disclaimer:
The HTML used in these samples are taken from random websites. I claim
no copyright over these and assert that I may use them like this under
fair use.
vim: et sw=4 sts=4

View File

@ -0,0 +1,22 @@
{
"name": "ezyang/htmlpurifier",
"description": "Standards compliant HTML filter written in PHP",
"type": "library",
"keywords": ["html"],
"homepage": "http://htmlpurifier.org/",
"license": "LGPL",
"authors": [
{
"name": "Edward Z. Yang",
"email": "admin@htmlpurifier.org",
"homepage": "http://ezyang.com"
}
],
"require": {
"php": ">=5.2"
},
"autoload": {
"psr-0": { "HTMLPurifier": "library/" },
"files": ["library/HTMLPurifier.composer.php"]
}
}

View File

@ -0,0 +1,64 @@
<?php
/**
* Generates XML and HTML documents describing configuration.
* @note PHP 5.2+ only!
*/
/*
TODO:
- make XML format richer
- extend XSLT transformation (see the corresponding XSLT file)
- allow generation of packaged docs that can be easily moved
- multipage documentation
- determine how to multilingualize
- add blurbs to ToC
*/
if (version_compare(PHP_VERSION, '5.2', '<')) exit('PHP 5.2+ required.');
error_reporting(E_ALL | E_STRICT);
// load dual-libraries
require_once dirname(__FILE__) . '/../extras/HTMLPurifierExtras.auto.php';
require_once dirname(__FILE__) . '/../library/HTMLPurifier.auto.php';
// setup HTML Purifier singleton
HTMLPurifier::getInstance(array(
'AutoFormat.PurifierLinkify' => true
));
$builder = new HTMLPurifier_ConfigSchema_InterchangeBuilder();
$interchange = new HTMLPurifier_ConfigSchema_Interchange();
$builder->buildDir($interchange);
$loader = dirname(__FILE__) . '/../config-schema.php';
if (file_exists($loader)) include $loader;
$interchange->validate();
$style = 'plain'; // use $_GET in the future, careful to validate!
$configdoc_xml = dirname(__FILE__) . '/configdoc.xml';
$xml_builder = new HTMLPurifier_ConfigSchema_Builder_Xml();
$xml_builder->openURI($configdoc_xml);
$xml_builder->build($interchange);
unset($xml_builder); // free handle
$xslt = new ConfigDoc_HTMLXSLTProcessor();
$xslt->importStylesheet(dirname(__FILE__) . "/styles/$style.xsl");
$output = $xslt->transformToHTML($configdoc_xml);
if (!$output) {
echo "Error in generating files\n";
exit(1);
}
// write out
file_put_contents(dirname(__FILE__) . "/$style.html", $output);
if (php_sapi_name() != 'cli') {
// output (instant feedback if it's a browser)
echo $output;
} else {
echo "Files generated successfully.\n";
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,44 @@
body {margin:0;padding:0;}
#content {
margin:1em auto;
max-width: 47em;
width: expression(document.body.clientWidth >
85 * parseInt(document.body.currentStyle.fontSize) ?
"54em": "auto");
}
table {border-collapse:collapse;}
table td, table th {padding:0.2em;}
table.constraints {margin:0 0 1em;}
table.constraints th {
text-align:right;padding-left:0.4em;padding-right:0.4em;background:#EEE;
width:8em;vertical-align:top;}
table.constraints td {padding-right:0.4em; padding-left: 1em;}
table.constraints td ul {padding:0; margin:0; list-style:none;}
table.constraints td pre {margin:0;}
#tocContainer {position:relative;}
#toc {list-style-type:none; font-weight:bold; font-size:1em; margin-bottom:1em;}
#toc li {position:relative; line-height: 1.2em;}
#toc .col-2 {margin-left:50%;}
#toc .col-l {float:left;}
#toc ul {list-style-type:disc; font-weight:normal; padding-bottom:1.2em;}
.description p {margin-top:0;margin-bottom:1em;}
#library, h1 {text-align:center; font-family:Garamond, serif;
font-variant:small-caps;}
#library {font-size:1em;}
h1 {margin-top:0;}
h2 {border-bottom:1px solid #CCC; font-family:sans-serif; font-weight:normal;
font-size:1.3em; clear:both;}
h3 {font-family:sans-serif; font-size:1.1em; font-weight:bold; }
h4 {font-family:sans-serif; font-size:0.9em; font-weight:bold; }
.deprecated {color: #CCC;}
.deprecated table.constraints th {background:#FFF;}
.deprecated-notice {color: #000; text-align:center; margin-bottom: 1em;}
/* vim: et sw=4 sts=4 */

View File

@ -0,0 +1,253 @@
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
version = "1.0"
xmlns = "http://www.w3.org/1999/xhtml"
xmlns:xsl = "http://www.w3.org/1999/XSL/Transform"
>
<xsl:output
method = "xml"
encoding = "UTF-8"
doctype-public = "-//W3C//DTD XHTML 1.0 Transitional//EN"
doctype-system = "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"
indent = "no"
media-type = "text/html"
/>
<xsl:param name="css" select="'styles/plain.css'"/>
<xsl:param name="title" select="'Configuration Documentation'"/>
<xsl:variable name="typeLookup" select="document('../types.xml')/types" />
<xsl:variable name="usageLookup" select="document('../usage.xml')/usage" />
<!-- Twiddle this variable to get the columns as even as possible -->
<xsl:variable name="maxNumberAdjust" select="2" />
<xsl:template match="/">
<html lang="en" xml:lang="en">
<head>
<title><xsl:value-of select="$title" /> - <xsl:value-of select="/configdoc/title" /></title>
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />
<link rel="stylesheet" type="text/css" href="{$css}" />
</head>
<body>
<div id="content">
<div id="library"><xsl:value-of select="/configdoc/title" /></div>
<h1><xsl:value-of select="$title" /></h1>
<div id="tocContainer">
<h2>Table of Contents</h2>
<ul id="toc">
<xsl:apply-templates mode="toc">
<xsl:with-param name="overflowNumber" select="round(count(/configdoc/namespace) div 2) + $maxNumberAdjust" />
</xsl:apply-templates>
</ul>
</div>
<div id="typesContainer">
<h2>Types</h2>
<xsl:apply-templates select="$typeLookup" mode="types" />
</div>
<xsl:apply-templates />
</div>
</body>
</html>
</xsl:template>
<xsl:template match="type" mode="types">
<div class="type-block">
<xsl:attribute name="id">type-<xsl:value-of select="@id" /></xsl:attribute>
<h3><code><xsl:value-of select="@id" /></code>: <xsl:value-of select="@name" /></h3>
<div class="type-description">
<xsl:copy-of xmlns:xhtml="http://www.w3.org/1999/xhtml" select="xhtml:div/node()" />
</div>
</div>
</xsl:template>
<xsl:template match="title" mode="toc" />
<xsl:template match="namespace" mode="toc">
<xsl:param name="overflowNumber" />
<xsl:variable name="number"><xsl:number level="single" /></xsl:variable>
<xsl:variable name="directiveNumber"><xsl:number level="any" count="directive" /></xsl:variable>
<xsl:if test="count(directive)&gt;0">
<li>
<!-- BEGIN multicolumn code -->
<xsl:if test="$number &gt;= $overflowNumber">
<xsl:attribute name="class">col-2</xsl:attribute>
</xsl:if>
<xsl:if test="$number = $overflowNumber">
<xsl:attribute name="style">margin-top:-<xsl:value-of select="($number * 2 + $directiveNumber - 3) * 1.2" />em</xsl:attribute>
</xsl:if>
<!-- END multicolumn code -->
<a href="#{@id}"><xsl:value-of select="name" /></a>
<ul>
<xsl:apply-templates select="directive" mode="toc">
<xsl:with-param name="overflowNumber" select="$overflowNumber" />
</xsl:apply-templates>
</ul>
<xsl:if test="$number + 1 = $overflowNumber">
<div class="col-l" />
</xsl:if>
</li>
</xsl:if>
</xsl:template>
<xsl:template match="directive" mode="toc">
<xsl:variable name="number">
<xsl:number level="any" count="directive|namespace" />
</xsl:variable>
<xsl:if test="not(deprecated)">
<li>
<a href="#{@id}"><xsl:value-of select="name" /></a>
</li>
</xsl:if>
</xsl:template>
<xsl:template match="title" />
<xsl:template match="namespace">
<div class="namespace">
<xsl:apply-templates />
<xsl:if test="count(directive)=0">
<p>No configuration directives defined for this namespace.</p>
</xsl:if>
</div>
</xsl:template>
<xsl:template match="namespace/name">
<h2 id="{../@id}"><xsl:value-of select="." /></h2>
</xsl:template>
<xsl:template match="namespace/description">
<div class="description">
<xsl:copy-of xmlns:xhtml="http://www.w3.org/1999/xhtml" select="xhtml:div/node()" />
</div>
</xsl:template>
<xsl:template match="directive">
<div>
<xsl:attribute name="class"><!--
-->directive<!--
--><xsl:if test="deprecated"> deprecated</xsl:if><!--
--></xsl:attribute>
<xsl:apply-templates>
<xsl:with-param name="id" select="@id" />
</xsl:apply-templates>
</div>
</xsl:template>
<xsl:template match="directive/name">
<xsl:param name="id" />
<xsl:apply-templates select="../aliases/alias" mode="anchor" />
<h3 id="{$id}"><xsl:value-of select="$id" /></h3>
</xsl:template>
<xsl:template match="alias" mode="anchor">
<a id="{.}"></a>
</xsl:template>
<!-- Do not pass through -->
<xsl:template match="alias"></xsl:template>
<xsl:template match="directive/constraints">
<xsl:param name="id" />
<table class="constraints">
<xsl:apply-templates />
<xsl:if test="../aliases/alias">
<xsl:apply-templates select="../aliases" mode="constraints" />
</xsl:if>
<xsl:apply-templates select="$usageLookup/directive[@id=$id]" />
</table>
</xsl:template>
<xsl:template match="directive/aliases" mode="constraints">
<tr>
<th>Aliases</th>
<td>
<xsl:for-each select="alias">
<xsl:if test="position()&gt;1">, </xsl:if>
<xsl:value-of select="." />
</xsl:for-each>
</td>
</tr>
</xsl:template>
<xsl:template match="directive/description">
<div class="description">
<xsl:copy-of xmlns:xhtml="http://www.w3.org/1999/xhtml" select="xhtml:div/node()" />
</div>
</xsl:template>
<xsl:template match="directive/deprecated">
<div class="deprecated-notice">
<strong>Warning:</strong>
This directive was deprecated in version <xsl:value-of select="version" />.
<a href="#{use}">%<xsl:value-of select="use" /></a> should be used instead.
</div>
</xsl:template>
<xsl:template match="usage/directive">
<tr>
<th>Used in</th>
<td>
<ul>
<xsl:apply-templates />
</ul>
</td>
</tr>
</xsl:template>
<xsl:template match="usage/directive/file">
<li>
<em><xsl:value-of select="@name" /></em> on line<xsl:if test="count(line)&gt;1">s</xsl:if>
<xsl:text> </xsl:text>
<xsl:for-each select="line">
<xsl:if test="position()&gt;1">, </xsl:if>
<xsl:value-of select="." />
</xsl:for-each>
</li>
</xsl:template>
<xsl:template match="constraints/version">
<tr>
<th>Version added</th>
<td><xsl:value-of select="." /></td>
</tr>
</xsl:template>
<xsl:template match="constraints/type">
<tr>
<th>Type</th>
<td>
<xsl:variable name="type" select="text()" />
<xsl:attribute name="class">type type-<xsl:value-of select="$type" /></xsl:attribute>
<a>
<xsl:attribute name="href">#type-<xsl:value-of select="$type" /></xsl:attribute>
<xsl:value-of select="$typeLookup/type[@id=$type]/@name" />
<xsl:if test="@allow-null='yes'">
(or null)
</xsl:if>
</a>
</td>
</tr>
</xsl:template>
<xsl:template match="constraints/allowed">
<tr>
<th>Allowed values</th>
<td>
<xsl:for-each select="value"><!--
--><xsl:if test="position()&gt;1">, </xsl:if>
&quot;<xsl:value-of select="." />&quot;<!--
--></xsl:for-each>
</td>
</tr>
</xsl:template>
<xsl:template match="constraints/default">
<tr>
<th>Default</th>
<td><pre><xsl:value-of select="." xml:space="preserve" /></pre></td>
</tr>
</xsl:template>
<xsl:template match="constraints/external">
<tr>
<th>External deps</th>
<td>
<ul>
<xsl:apply-templates />
</ul>
</td>
</tr>
</xsl:template>
<xsl:template match="constraints/external/project">
<li><xsl:value-of select="." /></li>
</xsl:template>
</xsl:stylesheet>
<!-- vim: et sw=4 sts=4
-->

View File

@ -0,0 +1,69 @@
<?xml version="1.0" encoding="UTF-8"?>
<types>
<type id="string" name="String"><div xmlns="http://www.w3.org/1999/xhtml">
A <a
href="http://docs.php.net/manual/en/language.types.string.php">sequence
of characters</a>.
</div></type>
<type id="istring" name="Case-insensitive string"><div xmlns="http://www.w3.org/1999/xhtml">
A series of case-insensitive characters. Internally, upper-case
ASCII characters will be converted to lower-case.
</div></type>
<type id="text" name="Text"><div xmlns="http://www.w3.org/1999/xhtml">
A series of characters that may contain newlines. Text tends to
indicate human-oriented text, as opposed to a machine format.
</div></type>
<type id="itext" name="Case-insensitive text"><div xmlns="http://www.w3.org/1999/xhtml">
A series of case-insensitive characters that may contain newlines.
</div></type>
<type id="int" name="Integer"><div xmlns="http://www.w3.org/1999/xhtml">
An <a
href="http://docs.php.net/manual/en/language.types.integer.php">
integer</a>. You are alternatively permitted to pass a string of
digits instead, which will be cast to an integer using
<code>(int)</code>.
</div></type>
<type id="float" name="Float"><div xmlns="http://www.w3.org/1999/xhtml">
A <a href="http://docs.php.net/manual/en/language.types.float.php">
floating point number</a>. You are alternatively permitted to
pass a numeric string (as defined by <code>is_numeric()</code>),
which will be cast to a float using <code>(float)</code>.
</div></type>
<type id="bool" name="Boolean"><div xmlns="http://www.w3.org/1999/xhtml">
A <a
href="http://docs.php.net/manual/en/language.types.boolean.php">boolean</a>.
You are alternatively permitted to pass an integer <code>0</code> or
<code>1</code> (other integers are not permitted) or a string
<code>"on"</code>, <code>"true"</code> or <code>"1"</code> for
<code>true</code>, and <code>"off"</code>, <code>"false"</code> or
<code>"0"</code> for <code>false</code>.
</div></type>
<type id="lookup" name="Lookup array"><div xmlns="http://www.w3.org/1999/xhtml">
An array whose values are <code>true</code>, e.g. <code>array('key'
=> true, 'key2' => true)</code>. You are alternatively permitted
to pass an array list of the keys <code>array('key', 'key2')</code>
or a comma-separated string of keys <code>"key, key2"</code>. If
you pass an array list of values, ensure that your values are
strictly numerically indexed: <code>array('key1', 2 =>
'key2')</code> will not do what you expect and emits a warning.
</div></type>
<type id="list" name="Array list"><div xmlns="http://www.w3.org/1999/xhtml">
An array which has consecutive integer indexes, e.g.
<code>array('val1', 'val2')</code>. You are alternatively permitted
to pass a comma-separated string of keys <code>"val1, val2"</code>.
If your array is not in this form, <code>array_values</code> is run
on the array and a warning is emitted.
</div></type>
<type id="hash" name="Associative array"><div xmlns="http://www.w3.org/1999/xhtml">
An array which is a mapping of keys to values, e.g.
<code>array('key1' => 'val1', 'key2' => 'val2')</code>. You are
alternatively permitted to pass a comma-separated string of
key-colon-value strings, e.g. <code>"key1: val1, key2: val2"</code>.
</div></type>
<type id="mixed" name="Mixed"><div xmlns="http://www.w3.org/1999/xhtml">
An arbitrary PHP value of any type.
</div></type>
</types>
<!-- vim: et sw=4 sts=4
-->

View File

@ -0,0 +1,547 @@
<?xml version="1.0" encoding="UTF-8"?>
<usage>
<directive id="Core.CollectErrors">
<file name="HTMLPurifier.php">
<line>131</line>
</file>
<file name="HTMLPurifier/Lexer.php">
<line>81</line>
<line>284</line>
</file>
<file name="HTMLPurifier/Lexer/DirectLex.php">
<line>53</line>
<line>73</line>
<line>348</line>
</file>
<file name="HTMLPurifier/Strategy/RemoveForeignElements.php">
<line>50</line>
</file>
</directive>
<directive id="CSS.MaxImgLength">
<file name="HTMLPurifier/CSSDefinition.php">
<line>157</line>
</file>
</directive>
<directive id="CSS.Proprietary">
<file name="HTMLPurifier/CSSDefinition.php">
<line>215</line>
</file>
</directive>
<directive id="CSS.AllowTricky">
<file name="HTMLPurifier/CSSDefinition.php">
<line>219</line>
</file>
</directive>
<directive id="CSS.Trusted">
<file name="HTMLPurifier/CSSDefinition.php">
<line>223</line>
</file>
</directive>
<directive id="CSS.AllowImportant">
<file name="HTMLPurifier/CSSDefinition.php">
<line>227</line>
</file>
</directive>
<directive id="CSS.AllowedProperties">
<file name="HTMLPurifier/CSSDefinition.php">
<line>302</line>
</file>
</directive>
<directive id="CSS.ForbiddenProperties">
<file name="HTMLPurifier/CSSDefinition.php">
<line>316</line>
</file>
</directive>
<directive id="Cache.DefinitionImpl">
<file name="HTMLPurifier/DefinitionCacheFactory.php">
<line>49</line>
</file>
</directive>
<directive id="HTML.Doctype">
<file name="HTMLPurifier/DoctypeRegistry.php">
<line>83</line>
</file>
</directive>
<directive id="HTML.CustomDoctype">
<file name="HTMLPurifier/DoctypeRegistry.php">
<line>85</line>
</file>
</directive>
<directive id="HTML.XHTML">
<file name="HTMLPurifier/DoctypeRegistry.php">
<line>88</line>
</file>
</directive>
<directive id="HTML.Strict">
<file name="HTMLPurifier/DoctypeRegistry.php">
<line>93</line>
</file>
</directive>
<directive id="Core.Encoding">
<file name="HTMLPurifier/Encoder.php">
<line>337</line>
<line>372</line>
</file>
</directive>
<directive id="Test.ForceNoIconv">
<file name="HTMLPurifier/Encoder.php">
<line>341</line>
<line>379</line>
</file>
</directive>
<directive id="Core.EscapeNonASCIICharacters">
<file name="HTMLPurifier/Encoder.php">
<line>373</line>
</file>
</directive>
<directive id="Output.CommentScriptContents">
<file name="HTMLPurifier/Generator.php">
<line>61</line>
</file>
</directive>
<directive id="Output.FixInnerHTML">
<file name="HTMLPurifier/Generator.php">
<line>62</line>
</file>
</directive>
<directive id="Output.SortAttr">
<file name="HTMLPurifier/Generator.php">
<line>63</line>
</file>
</directive>
<directive id="Output.FlashCompat">
<file name="HTMLPurifier/Generator.php">
<line>64</line>
</file>
</directive>
<directive id="Output.TidyFormat">
<file name="HTMLPurifier/Generator.php">
<line>93</line>
</file>
</directive>
<directive id="Core.NormalizeNewlines">
<file name="HTMLPurifier/Generator.php">
<line>107</line>
</file>
<file name="HTMLPurifier/Lexer.php">
<line>266</line>
</file>
</directive>
<directive id="Output.Newline">
<file name="HTMLPurifier/Generator.php">
<line>108</line>
</file>
</directive>
<directive id="HTML.BlockWrapper">
<file name="HTMLPurifier/HTMLDefinition.php">
<line>222</line>
</file>
</directive>
<directive id="HTML.Parent">
<file name="HTMLPurifier/HTMLDefinition.php">
<line>230</line>
</file>
</directive>
<directive id="HTML.AllowedElements">
<file name="HTMLPurifier/HTMLDefinition.php">
<line>247</line>
</file>
</directive>
<directive id="HTML.AllowedAttributes">
<file name="HTMLPurifier/HTMLDefinition.php">
<line>248</line>
</file>
</directive>
<directive id="HTML.Allowed">
<file name="HTMLPurifier/HTMLDefinition.php">
<line>251</line>
</file>
</directive>
<directive id="HTML.ForbiddenElements">
<file name="HTMLPurifier/HTMLDefinition.php">
<line>342</line>
</file>
</directive>
<directive id="HTML.ForbiddenAttributes">
<file name="HTMLPurifier/HTMLDefinition.php">
<line>343</line>
</file>
</directive>
<directive id="HTML.Trusted">
<file name="HTMLPurifier/HTMLModuleManager.php">
<line>204</line>
</file>
<file name="HTMLPurifier/Lexer.php">
<line>271</line>
</file>
<file name="HTMLPurifier/HTMLModule/Image.php">
<line>27</line>
</file>
<file name="HTMLPurifier/Lexer/DirectLex.php">
<line>36</line>
</file>
<file name="HTMLPurifier/Strategy/RemoveForeignElements.php">
<line>23</line>
</file>
</directive>
<directive id="HTML.AllowedModules">
<file name="HTMLPurifier/HTMLModuleManager.php">
<line>211</line>
</file>
</directive>
<directive id="HTML.CoreModules">
<file name="HTMLPurifier/HTMLModuleManager.php">
<line>212</line>
</file>
</directive>
<directive id="HTML.Proprietary">
<file name="HTMLPurifier/HTMLModuleManager.php">
<line>222</line>
</file>
</directive>
<directive id="HTML.SafeObject">
<file name="HTMLPurifier/HTMLModuleManager.php">
<line>225</line>
</file>
</directive>
<directive id="HTML.SafeEmbed">
<file name="HTMLPurifier/HTMLModuleManager.php">
<line>228</line>
</file>
</directive>
<directive id="HTML.SafeScripting">
<file name="HTMLPurifier/HTMLModuleManager.php">
<line>231</line>
</file>
<file name="HTMLPurifier/HTMLModule/SafeScripting.php">
<line>17</line>
</file>
</directive>
<directive id="HTML.Nofollow">
<file name="HTMLPurifier/HTMLModuleManager.php">
<line>234</line>
</file>
</directive>
<directive id="HTML.TargetBlank">
<file name="HTMLPurifier/HTMLModuleManager.php">
<line>237</line>
</file>
</directive>
<directive id="Attr.IDBlacklist">
<file name="HTMLPurifier/IDAccumulator.php">
<line>26</line>
</file>
</directive>
<directive id="Core.Language">
<file name="HTMLPurifier/LanguageFactory.php">
<line>88</line>
</file>
</directive>
<directive id="Core.LexerImpl">
<file name="HTMLPurifier/Lexer.php">
<line>76</line>
</file>
</directive>
<directive id="Core.MaintainLineNumbers">
<file name="HTMLPurifier/Lexer.php">
<line>80</line>
</file>
<file name="HTMLPurifier/Lexer/DirectLex.php">
<line>48</line>
</file>
</directive>
<directive id="Core.ConvertDocumentToFragment">
<file name="HTMLPurifier/Lexer.php">
<line>282</line>
</file>
</directive>
<directive id="Core.RemoveProcessingInstructions">
<file name="HTMLPurifier/Lexer.php">
<line>303</line>
</file>
</directive>
<directive id="URI.">
<file name="HTMLPurifier/URIDefinition.php">
<line>60</line>
</file>
<file name="HTMLPurifier/URIFilter/Munge.php">
<line>12</line>
</file>
</directive>
<directive id="URI.Host">
<file name="HTMLPurifier/URIDefinition.php">
<line>70</line>
</file>
<file name="HTMLPurifier/URIScheme.php">
<line>81</line>
</file>
</directive>
<directive id="URI.Base">
<file name="HTMLPurifier/URIDefinition.php">
<line>71</line>
</file>
</directive>
<directive id="URI.DefaultScheme">
<file name="HTMLPurifier/URIDefinition.php">
<line>78</line>
</file>
</directive>
<directive id="URI.AllowedSchemes">
<file name="HTMLPurifier/URISchemeRegistry.php">
<line>41</line>
</file>
</directive>
<directive id="URI.OverrideAllowedSchemes">
<file name="HTMLPurifier/URISchemeRegistry.php">
<line>42</line>
</file>
</directive>
<directive id="URI.Disable">
<file name="HTMLPurifier/AttrDef/URI.php">
<line>28</line>
</file>
</directive>
<directive id="Core.ColorKeywords">
<file name="HTMLPurifier/AttrDef/CSS/Color.php">
<line>12</line>
</file>
<file name="HTMLPurifier/AttrDef/HTML/Color.php">
<line>12</line>
</file>
</directive>
<directive id="CSS.AllowedFonts">
<file name="HTMLPurifier/AttrDef/CSS/FontFamily.php">
<line>50</line>
</file>
</directive>
<directive id="Attr.AllowedClasses">
<file name="HTMLPurifier/AttrDef/HTML/Class.php">
<line>18</line>
</file>
</directive>
<directive id="Attr.ForbiddenClasses">
<file name="HTMLPurifier/AttrDef/HTML/Class.php">
<line>19</line>
</file>
</directive>
<directive id="Attr.AllowedFrameTargets">
<file name="HTMLPurifier/AttrDef/HTML/FrameTarget.php">
<line>15</line>
</file>
</directive>
<directive id="Attr.EnableID">
<file name="HTMLPurifier/AttrDef/HTML/ID.php">
<line>30</line>
</file>
</directive>
<directive id="Attr.IDPrefix">
<file name="HTMLPurifier/AttrDef/HTML/ID.php">
<line>36</line>
</file>
</directive>
<directive id="Attr.IDPrefixLocal">
<file name="HTMLPurifier/AttrDef/HTML/ID.php">
<line>38</line>
<line>41</line>
</file>
</directive>
<directive id="Attr.IDBlacklistRegexp">
<file name="HTMLPurifier/AttrDef/HTML/ID.php">
<line>64</line>
</file>
</directive>
<directive id="Attr.">
<file name="HTMLPurifier/AttrDef/HTML/LinkTypes.php">
<line>30</line>
</file>
</directive>
<directive id="Core.EnableIDNA">
<file name="HTMLPurifier/AttrDef/URI/Host.php">
<line>67</line>
</file>
</directive>
<directive id="Attr.DefaultTextDir">
<file name="HTMLPurifier/AttrTransform/BdoDir.php">
<line>13</line>
</file>
</directive>
<directive id="Core.RemoveInvalidImg">
<file name="HTMLPurifier/AttrTransform/ImgRequired.php">
<line>18</line>
</file>
<file name="HTMLPurifier/Strategy/RemoveForeignElements.php">
<line>20</line>
</file>
</directive>
<directive id="Attr.DefaultInvalidImage">
<file name="HTMLPurifier/AttrTransform/ImgRequired.php">
<line>19</line>
</file>
</directive>
<directive id="Attr.DefaultImageAlt">
<file name="HTMLPurifier/AttrTransform/ImgRequired.php">
<line>25</line>
</file>
</directive>
<directive id="Attr.DefaultInvalidImageAlt">
<file name="HTMLPurifier/AttrTransform/ImgRequired.php">
<line>33</line>
</file>
</directive>
<directive id="HTML.Attr.Name.UseCDATA">
<file name="HTMLPurifier/AttrTransform/Name.php">
<line>11</line>
</file>
<file name="HTMLPurifier/HTMLModule/Name.php">
<line>13</line>
</file>
</directive>
<directive id="HTML.FlashAllowFullScreen">
<file name="HTMLPurifier/AttrTransform/SafeParam.php">
<line>38</line>
</file>
</directive>
<directive id="Core.EscapeInvalidChildren">
<file name="HTMLPurifier/ChildDef/Required.php">
<line>62</line>
</file>
</directive>
<directive id="Cache.SerializerPath">
<file name="HTMLPurifier/DefinitionCache/Serializer.php">
<line>91</line>
</file>
</directive>
<directive id="Cache.SerializerPermissions">
<file name="HTMLPurifier/DefinitionCache/Serializer.php">
<line>107</line>
<line>124</line>
</file>
</directive>
<directive id="Filter.ExtractStyleBlocks.TidyImpl">
<file name="HTMLPurifier/Filter/ExtractStyleBlocks.php">
<line>55</line>
</file>
</directive>
<directive id="Filter.ExtractStyleBlocks.Scope">
<file name="HTMLPurifier/Filter/ExtractStyleBlocks.php">
<line>79</line>
</file>
</directive>
<directive id="Filter.ExtractStyleBlocks.Escaping">
<file name="HTMLPurifier/Filter/ExtractStyleBlocks.php">
<line>277</line>
</file>
</directive>
<directive id="HTML.SafeIframe">
<file name="HTMLPurifier/HTMLModule/Iframe.php">
<line>17</line>
</file>
<file name="HTMLPurifier/URIFilter/SafeIframe.php">
<line>23</line>
</file>
</directive>
<directive id="HTML.MaxImgLength">
<file name="HTMLPurifier/HTMLModule/Image.php">
<line>14</line>
</file>
<file name="HTMLPurifier/HTMLModule/SafeEmbed.php">
<line>13</line>
</file>
<file name="HTMLPurifier/HTMLModule/SafeObject.php">
<line>19</line>
</file>
</directive>
<directive id="HTML.TidyLevel">
<file name="HTMLPurifier/HTMLModule/Tidy.php">
<line>45</line>
</file>
</directive>
<directive id="HTML.TidyAdd">
<file name="HTMLPurifier/HTMLModule/Tidy.php">
<line>49</line>
</file>
</directive>
<directive id="HTML.TidyRemove">
<file name="HTMLPurifier/HTMLModule/Tidy.php">
<line>50</line>
</file>
</directive>
<directive id="AutoFormat.PurifierLinkify.DocURL">
<file name="HTMLPurifier/Injector/PurifierLinkify.php">
<line>15</line>
</file>
</directive>
<directive id="AutoFormat.RemoveEmpty.RemoveNbsp">
<file name="HTMLPurifier/Injector/RemoveEmpty.php">
<line>15</line>
</file>
</directive>
<directive id="AutoFormat.RemoveEmpty.RemoveNbsp.Exceptions">
<file name="HTMLPurifier/Injector/RemoveEmpty.php">
<line>16</line>
</file>
</directive>
<directive id="Core.AggressivelyFixLt">
<file name="HTMLPurifier/Lexer/DOMLex.php">
<line>44</line>
</file>
</directive>
<directive id="Core.DirectLexLineNumberSyncInterval">
<file name="HTMLPurifier/Lexer/DirectLex.php">
<line>70</line>
</file>
</directive>
<directive id="Core.DisableExcludes">
<file name="HTMLPurifier/Strategy/FixNesting.php">
<line>57</line>
</file>
</directive>
<directive id="Core.EscapeInvalidTags">
<file name="HTMLPurifier/Strategy/MakeWellFormed.php">
<line>53</line>
</file>
<file name="HTMLPurifier/Strategy/RemoveForeignElements.php">
<line>19</line>
</file>
</directive>
<directive id="HTML.AllowedComments">
<file name="HTMLPurifier/Strategy/RemoveForeignElements.php">
<line>24</line>
</file>
</directive>
<directive id="HTML.AllowedCommentsRegexp">
<file name="HTMLPurifier/Strategy/RemoveForeignElements.php">
<line>25</line>
</file>
</directive>
<directive id="Core.RemoveScriptContents">
<file name="HTMLPurifier/Strategy/RemoveForeignElements.php">
<line>28</line>
</file>
</directive>
<directive id="Core.HiddenElements">
<file name="HTMLPurifier/Strategy/RemoveForeignElements.php">
<line>29</line>
</file>
</directive>
<directive id="URI.HostBlacklist">
<file name="HTMLPurifier/URIFilter/HostBlacklist.php">
<line>12</line>
</file>
</directive>
<directive id="URI.MungeResources">
<file name="HTMLPurifier/URIFilter/Munge.php">
<line>14</line>
</file>
</directive>
<directive id="URI.MungeSecretKey">
<file name="HTMLPurifier/URIFilter/Munge.php">
<line>15</line>
</file>
</directive>
<directive id="URI.SafeIframeRegexp">
<file name="HTMLPurifier/URIFilter/SafeIframe.php">
<line>18</line>
</file>
</directive>
</usage>

View File

@ -0,0 +1,26 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="description" content="Specification for HTML Purifier's advanced API for defining custom filtering behavior." />
<link rel="stylesheet" type="text/css" href="style.css" />
<title>Advanced API - HTML Purifier</title>
</head><body>
<h1>Advanced API</h1>
<div id="filing">Filed under Development</div>
<div id="index">Return to the <a href="index.html">index</a>.</div>
<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
<p>
Please see <a href="enduser-customize.html">Customize!</a>
</p>
</body></html>
<!-- vim: et sw=4 sts=4
-->

View File

@ -0,0 +1,29 @@
Code Quality Issues
Okay, face it. Programmers can get lazy, cut corners, or make mistakes. They
also can do quick prototypes, and then forget to rewrite them later. Well,
while I can't list mistakes in here, I can list prototype-like segments
of code that should be aggressively refactored. This does not list
optimization issues, that needs to be done after intense profiling.
docs/examples/demo.php - ad hoc HTML/PHP soup to the extreme
AttrDef - a lot of duplication, more generic classes need to be created;
a lot of strtolower() calls, no legit casing
Class - doesn't support Unicode characters (fringe); uses regular expressions
Lang - code duplication; premature optimization
Length - easily mistaken for CSSLength
URI - multiple regular expressions; missing validation for parts (?)
CSS - parser doesn't accept advanced CSS (fringe)
Number - constructor interface inconsistent with Integer
Strategy
FixNesting - cannot bubble nodes out of structures, duplicated checks
for special-case parent node
RemoveForeignElements - should be run in parallel with MakeWellFormed
URIScheme - needs to have callable generic checks
mailto - doesn't validate emails, doesn't validate querystring
news - doesn't validate opaque path
nntp - doesn't constrain path
vim: et sw=4 sts=4

View File

@ -0,0 +1,79 @@
Configuration Backwards-Compatibility Breaks
In version 4.0.0, the configuration subsystem (composed of the outwards
facing Config class, as well as the ConfigSchema and ConfigSchema_Interchange
subsystems), was significantly revamped to make use of property lists.
While most of the changes are internal, some internal APIs were changed for the
sake of clarity. HTMLPurifier_Config was kept completely backwards compatible,
although some of the functions were retrofitted with an unambiguous alternate
syntax. Both of these changes are discussed in this document.
1. Outwards Facing Changes
--------------------------------------------------------------------------------
The HTMLPurifier_Config class now takes an alternate syntax. The general rule
is:
If you passed $namespace, $directive, pass "$namespace.$directive"
instead.
An example:
$config->set('HTML', 'Allowed', 'p');
becomes:
$config->set('HTML.Allowed', 'p');
New configuration options may have more than one namespace, they might
look something like %Filter.YouTube.Blacklist. While you could technically
set it with ('HTML', 'YouTube.Blacklist'), the logical extension
('HTML', 'YouTube', 'Blacklist') does not work.
The old API will still work, but will emit E_USER_NOTICEs.
2. Internal API Changes
--------------------------------------------------------------------------------
Some overarching notes: we've completely eliminated the notion of namespace;
it's now an informal construct for organizing related configuration directives.
Also, the validation routines for keys (formerly "$namespace.$directive")
have been completely relaxed. I don't think it really should be necessary.
2.1 HTMLPurifier_ConfigSchema
First off, if you're interfacing with this class, you really shouldn't.
HTMLPurifier_ConfigSchema_Builder_ConfigSchema is really the only class that
should ever be creating HTMLPurifier_ConfigSchema, and HTMLPurifier_Config the
only class that should be reading it.
All namespace related methods were removed; they are completely unnecessary
now. Any $namespace, $name arguments must be replaced with $key (where
$key == "$namespace.$name"), including for addAlias().
The $info and $defaults member variables are no longer indexed as
[$namespace][$name]; they are now indexed as ["$namespace.$name"].
All deprecated methods were finally removed, after having yelled at you as
an E_USER_NOTICE for a while now.
2.2 HTMLPurifier_ConfigSchema_Interchange
Member variable $namespaces was removed.
2.3 HTMLPurifier_ConfigSchema_Interchange_Id
Member variable $namespace and $directive removed; member variable $key added.
Any method that took $namespace, $directive now takes $key.
2.4 HTMLPurifier_ConfigSchema_Interchange_Namespace
Removed.
vim: et sw=4 sts=4

View File

@ -0,0 +1,164 @@
Configuration naming
HTML Purifier 4.0.0 features a new configuration naming system that
allows arbitrary nesting of namespaces. While there are certain cases
in which using two namespaces is obviously better (the canonical example
is where we were using AutoFormatParam to contain directives for AutoFormat
parameters), it is unclear whether or not a general migration to highly
namespaced directives is a good idea or not.
== Case studies ==
=== Attr.* ===
We have a dead duck HTML.Attr.Name.UseCDATA which migrated before we decided
to think this out thoroughly.
We currently have a large number of directives in the Attr.* namespace.
These directives tweak the behavior of some HTML attributes. They have
the properties:
* While they apply to only one attribute at a time, the attribute can
span over multiple elements (not necessarily all attributes, either).
The information of which elements it impacts is either omitted or
informally stated (EnableID applies to all elements, DefaultImageAlt
applies to <img> tags, AllowedRev doesn't say but only applies to a tags).
* There is a certain degree of clustering that could be applied, especially
to the ID directives. The clustering could be done with respect to
what element/attribute was used, i.e.
*.id -> EnableID, IDBlacklistRegexp, IDBlacklist, IDPrefixLocal, IDPrefix
img.src -> DefaultInvalidImage
img.alt -> DefaultImageAlt, DefaultInvalidImageAlt
bdo.dir -> DefaultTextDir
a.rel -> AllowedRel
a.rev -> AllowedRev
a.target -> AllowedFrameTargets
a.name -> Name.UseCDATA
* The directives often reference generic attribute types that were specified
in the DTD/specification. However, some of the behavior specifically relies
on the fact that other use cases of the attribute are not, at current,
supported by HTML Purifier.
AllowedRel, AllowedRev -> heavily <a> specific; if <link> ends up being
allowed, we will also have to give users specificity there (we also
want to preserve generality) DTD %Linktypes, HTML5 distinguishes
between <link> and <a>/<area>
AllowedFrameTargets -> heavily <a> specific, but also used by <area>
and <form>. Transitional DTD %FrameTarget, not present in strict,
HTML5 calls them "browsing contexts"
Default*Image* -> as a default parameter, is almost entirely exlcusive
to <img>
EnableID -> global attribute
Name.UseCDATA -> heavily <a> specific, but has heavy other usage by
many things
== AutoFormat.* ==
These have the fairly normal pluggable architecture that lends itself to
large amounts of namespaces (pluggability may be the key to figuring
out when gratuitous namespacing is good.) Properties:
* Boolean directives are fair game for being namespaced: for example,
RemoveEmpty.RemoveNbsp triggers RemoveEmpty.RemoveNbsp.Exceptions,
the latter of which only makes sense when RemoveEmpty.RemoveNbsp
is set to true. (The same applies to RemoveNbsp too)
The AutoFormat string is a bit long, but is the only bit of repeated
context.
== Core.* ==
Core is the potpourri of directives, mostly regarding some minor behavioral
tweaks for HTML handling abilities.
AggressivelyFixLt
ConvertDocumentToFragment
DirectLexLineNumberSyncInterval
LexerImpl
MaintainLineNumbers
Lexer
CollectErrors
Language
Error handling (Language is ostensibly a little more general, but
it's only used for error handling right now)
ColorKeywords
CSS and HTML
Encoding
EscapeNonASCIICharacters
Character encoding
EscapeInvalidChildren
EscapeInvalidTags
HiddenElements
RemoveInvalidImg
Lexing/Output
RemoveScriptContents
Deprecated
== HTML.* ==
AllowedAttributes
AllowedElements
AllowedModules
Allowed
ForbiddenAttributes
ForbiddenElements
Element set tuning
BlockWrapper
Child def advanced twiddle
CoreModules
CustomDoctype
Advanced HTMLModuleManager twiddles
DefinitionID
DefinitionRev
Caching
Doctype
Parent
Strict
XHTML
Global environment
MaxImgLength
Attribute twiddle? (applies to two attributes)
Proprietary
SafeEmbed
SafeObject
Trusted
Extra functionality/tagsets
TidyAdd
TidyLevel
TidyRemove
Tidy
== Output.* ==
These directly affect the output of Generator. These are all advanced
twiddles.
== URI.* ==
AllowedSchemes
OverrideAllowedSchemes
Scheme tuning
Base
DefaultScheme
Host
Global environment
DefinitionID
DefinitionRev
Caching
DisableExternalResources
DisableExternal
DisableResources
Disable
Contextual/authority tuning
HostBlacklist
Authority tuning
MakeAbsolute
MungeResources
MungeSecretKey
Munge
Transformation behavior (munge can be grouped)

View File

@ -0,0 +1,412 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="description" content="Describes config schema framework in HTML Purifier." />
<link rel="stylesheet" type="text/css" href="./style.css" />
<title>Config Schema - HTML Purifier</title>
</head>
<body>
<h1>Config Schema</h1>
<div id="filing">Filed under Development</div>
<div id="index">Return to the <a href="index.html">index</a>.</div>
<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
<p>
HTML Purifier has a fairly complex system for configuration. Users
interact with a <code>HTMLPurifier_Config</code> object to
set configuration directives. The values they set are validated according
to a configuration schema, <code>HTMLPurifier_ConfigSchema</code>.
</p>
<p>
The schema is mostly transparent to end-users, but if you're doing development
work for HTML Purifier and need to define a new configuration directive,
you'll need to interact with it. We'll also talk about how to define
userspace configuration directives at the very end.
</p>
<h2>Write a directive file</h2>
<p>
Directive files define configuration directives to be used by
HTML Purifier. They are placed in <code>library/HTMLPurifier/ConfigSchema/schema/</code>
in the form <code><em>Namespace</em>.<em>Directive</em>.txt</code> (I
couldn't think of a more descriptive file extension.)
Directive files are actually what we call <code>StringHash</code>es,
i.e. associative arrays represented in a string form reminiscent of
<a href="http://qa.php.net/write-test.php">PHPT</a> tests. Here's a
sample directive file, <code>Test.Sample.txt</code>:
</p>
<pre>Test.Sample
TYPE: string/null
DEFAULT: NULL
ALLOWED: 'foo', 'bar'
VALUE-ALIASES: 'baz' => 'bar'
VERSION: 3.1.0
--DESCRIPTION--
This is a sample configuration directive for the purposes of the
&lt;code&gt;dev-config-schema.html&lt;code&gt; documentation.
--ALIASES--
Test.Example</pre>
<p>
Each of these segments has a specific meaning:
</p>
<table class="table">
<thead>
<tr>
<th>Key</th>
<th>Example</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>ID</td>
<td>Test.Sample</td>
<td>The name of the directive, in the form Namespace.Directive
(implicitly the first line)</td>
</tr>
<tr>
<td>TYPE</td>
<td>string/null</td>
<td>The type of variable this directive accepts. See below for
details. You can also add <code>/null</code> to the end of
any basic type to allow null values too.</td>
</tr>
<tr>
<td>DEFAULT</td>
<td>NULL</td>
<td>A parseable PHP expression of the default value.</td>
</tr>
<tr>
<td>DESCRIPTION</td>
<td>This is a...</td>
<td>An HTML description of what this directive does.</td>
</tr>
<tr>
<td>VERSION</td>
<td>3.1.0</td>
<td><em>Recommended</em>. The version of HTML Purifier this directive was added.
Directives that have been around since 1.0.0 don't have this,
but any new ones should.</td>
</tr>
<tr>
<td>ALIASES</td>
<td>Test.Example</td>
<td><em>Optional</em>. A comma separated list of aliases for this directive.
This is most useful for backwards compatibility and should
not be used otherwise.</td>
</tr>
<tr>
<td>ALLOWED</td>
<td>'foo', 'bar'</td>
<td><em>Optional</em>. Set of allowed value for a directive,
a comma separated list of parseable PHP expressions. This
is only allowed string, istring, text and itext TYPEs.</td>
</tr>
<tr>
<td>VALUE-ALIASES</td>
<td>'baz' =&gt; 'bar'</td>
<td><em>Optional</em>. Mapping of one value to another, and
should be a comma separated list of keypair duples. This
is only allowed string, istring, text and itext TYPEs.</td>
</tr>
<tr>
<td>DEPRECATED-VERSION</td>
<td>3.1.0</td>
<td><em>Not shown</em>. Indicates that the directive was
deprecated this version.</td>
</tr>
<tr>
<td>DEPRECATED-USE</td>
<td>Test.NewDirective</td>
<td><em>Not shown</em>. Indicates what new directive should be
used instead. Note that the directives will functionally be
different, although they should offer the same functionality.
If they are identical, use an alias instead.</td>
</tr>
<tr>
<td>EXTERNAL</td>
<td>CSSTidy</td>
<td><em>Not shown</em>. Indicates if there is an external library
the user will need to download and install to use this configuration
directive. As of right now, this is merely a Google-able name; future
versions may also provide links and instructions.</td>
</tr>
</tbody>
</table>
<p>
Some notes on format and style:
</p>
<ul>
<li>
Each of these keys can be expressed in the short format
(<code>KEY: Value</code>) or the long format
(<code>--KEY--</code> with value beneath). You must use the
long format if multiple lines are needed, or if a long format
has been used already (that's why <code>ALIASES</code> in our
example is in the long format); otherwise, it's user preference.
</li>
<li>
The HTML descriptions should be wrapped at about 80 columns; do
not rely on editor word-wrapping.
</li>
</ul>
<p>
Also, as promised, here is the set of possible types:
</p>
<table class="table">
<thead>
<tr>
<th>Type</th>
<th>Example</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>string</td>
<td>'Foo'</td>
<td><a href="http://docs.php.net/manual/en/language.types.string.php">String</a> without newlines</td>
</tr>
<tr>
<td>istring</td>
<td>'foo'</td>
<td>Case insensitive ASCII string without newlines</td>
</tr>
<tr>
<td>text</td>
<td>"A<em>\n</em>b"</td>
<td>String with newlines</td>
</tr>
<tr>
<td>itext</td>
<td>"a<em>\n</em>b"</td>
<td>Case insensitive ASCII string without newlines</td>
</tr>
<tr>
<td>int</td>
<td>23</td>
<td>Integer</td>
</tr>
<tr>
<td>float</td>
<td>3.0</td>
<td>Floating point number</td>
</tr>
<tr>
<td>bool</td>
<td>true</td>
<td>Boolean</td>
</tr>
<tr>
<td>lookup</td>
<td>array('key' =&gt; true)</td>
<td>Lookup array, used with <code>isset($var[$key])</code></td>
</tr>
<tr>
<td>list</td>
<td>array('f', 'b')</td>
<td>List array, with ordered numerical indexes</td>
</tr>
<tr>
<td>hash</td>
<td>array('key' =&gt; 'val')</td>
<td>Associative array of keys to values</td>
</tr>
<tr>
<td>mixed</td>
<td>new stdclass</td>
<td>Any PHP variable is fine</td>
</tr>
</tbody>
</table>
<p>
The examples represent what will be returned out of the configuration
object; users have a little bit of leeway when setting configuration
values (for example, a lookup value can be specified as a list;
HTML Purifier will flip it as necessary.) These types are defined
in <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/VarParser.php">
library/HTMLPurifier/VarParser.php</a>.
</p>
<p>
For more information on what values are allowed, and how they are parsed,
consult <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ConfigSchema/InterchangeBuilder.php">
library/HTMLPurifier/ConfigSchema/InterchangeBuilder.php</a>, as well
as <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ConfigSchema/Interchange/Directive.php">
library/HTMLPurifier/ConfigSchema/Interchange/Directive.php</a> for
the semantics of the parsed values.
</p>
<h2>Refreshing the cache</h2>
<p>
You may have noticed that your directive file isn't doing anything
yet. That's because it hasn't been added to the runtime
<code>HTMLPurifier_ConfigSchema</code> instance. Run
<code>maintenance/generate-schema-cache.php</code> to fix this.
If there were no errors, you're good to go! Don't forget to add
some unit tests for your functionality!
</p>
<p>
If you ever make changes to your configuration directives, you
will need to run this script again.
</p>
<h2>Adding in-house schema definitions</h2>
<p>
Placing stuff directly in HTML Purifier's source tree is generally not a
good idea, so HTML Purifier 4.0.0+ has some facilities in place to make your
life easier.
</p>
<p>
The first is to pass an extra parameter to <code>maintenance/generate-schema-cache.php</code>
with the location of your directory (relative or absolute path will do). For example,
if I'm storing my custom definitions in <em>/var/htmlpurifier/myschema</em>, run:
<code>php maintenance/generate-schema-cache.php /var/htmlpurifier/myschema</code>.
</p>
<p>
Alternatively, you can create a small loader PHP file in the HTML Purifier base
directory named <code>config-schema.php</code> (this is the same directory
you would place a <code>test-settings.php</code> file). In this file, add
the following line for each directory you want to load:
</p>
<pre>$builder-&gt;buildDir($interchange, '/var/htmlpurifier/myschema');</pre>
<p>You can even load a single file using:</p>
<pre>$builder-&gt;buildFile($interchange, '/var/htmlpurifier/myschema/MyApp.Directive.txt');</pre>
<p>Storing custom definitions that you don't plan on sending back upstream in
a separate directory is <em>definitely</em> a good idea! Additionally, picking
a good namespace can go a long way to saving you grief if you want to use
someone else's change, but they picked the same name, or if HTML Purifier
decides to add support for a configuration directive that has the same name.</p>
<!-- TODO: how to name directives that rely on naming conventions -->
<h2>Errors</h2>
<p>
All directive files go through a rigorous validation process
through <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ConfigSchema/Validator.php">
library/HTMLPurifier/ConfigSchema/Validator.php</a>, as well
as some basic checks during building. While
listing every error out here is out-of-scope for this document, we
can give some general tips for interpreting error messages.
There are two types of errors: builder errors and validation errors.
</p>
<h3>Builder errors</h3>
<blockquote>
<p>
<strong>Exception:</strong> Expected type string, got
integer in DEFAULT in directive hash 'Ns.Dir'
</p>
</blockquote>
<p>
You can identify a builder error by the keyword "directive hash."
These are the easiest to deal with, because they directly correspond
with your directive file. Find the offending directive file (which
is the directive hash plus the .txt extension), find the
offending index ("in DEFAULT" means the DEFAULT key) and fix the error.
This particular error would occur if your default value is not the same
type as TYPE.
</p>
<h3>Validation errors</h3>
<blockquote>
<p>
<strong>Exception:</strong> Alias 3 in valueAliases in directive
'Ns.Dir' must be a string
</p>
</blockquote>
<p>
These are a little trickier, because we're not actually validating
your directive file, or even the direct string hash representation.
We're validating an Interchange object, and the error messages do
not mention any string hash keys.
</p>
<p>
Nevertheless, it's not difficult to figure out what went wrong.
Read the "context" statements in reverse:
</p>
<dl>
<dt>in directive 'Ns.Dir'</dt>
<dd>This means we need to look at the directive file <code>Ns.Dir.txt</code></dd>
<dt>in valueAliases</dt>
<dd>There's no key actually called this, but there's one that's close:
VALUE-ALIASES. Indeed, that's where to look.</dd>
<dt>Alias 3</dt>
<dd>The value alias that is equal to 3 is the culprit.</dd>
</dl>
<p>
In this particular case, you're not allowed to alias integers values to
strings values.
</p>
<p>
The most difficult part is translating the Interchange member variable (valueAliases)
into a directive file key (VALUE-ALIASES), but there's a one-to-one
correspondence currently. If the two formats diverge, any discrepancies
will be described in <a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ConfigSchema/InterchangeBuilder.php">
library/HTMLPurifier/ConfigSchema/InterchangeBuilder.php</a>.
</p>
<h2>Internals</h2>
<p>
Much of the configuration schema framework's codebase deals with
shuffling data from one format to another, and doing validation on this
data.
The keystone of all of this is the <code>HTMLPurifier_ConfigSchema_Interchange</code>
class, which represents the purest, parsed representation of the schema.
</p>
<p>
Hand-writing this data is unwieldy, however, so we write directive files.
These directive files are parsed by <code>HTMLPurifier_StringHashParser</code>
into <code>HTMLPurifier_StringHash</code>es, which then
are run through <code>HTMLPurifier_ConfigSchema_InterchangeBuilder</code>
to construct the interchange object.
</p>
<p>
From the interchange object, the data can be siphoned into other forms
using <code>HTMLPurifier_ConfigSchema_Builder</code> subclasses.
For example, <code>HTMLPurifier_ConfigSchema_Builder_ConfigSchema</code>
generates a runtime <code>HTMLPurifier_ConfigSchema</code> object,
which <code>HTMLPurifier_Config</code> uses to validate its incoming
data. There is also an XML serializer, which is used to build documentation.
</p>
</body>
</html>
<!-- vim: et sw=4 sts=4
-->

View File

@ -0,0 +1,68 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="description" content="Discusses when to flush HTML Purifier's various caches." />
<link rel="stylesheet" type="text/css" href="./style.css" />
<title>Flushing the Purifier - HTML Purifier</title>
</head>
<body>
<h1>Flushing the Purifier</h1>
<div id="filing">Filed under Development</div>
<div id="index">Return to the <a href="index.html">index</a>.</div>
<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
<p>
If you've been poking around the various folders in HTML Purifier,
you may have noticed the <code>maintenance</code> directory. Almost
all of these scripts are devoted to flushing out the various caches
HTML Purifier uses. Normal users don't have to worry about this:
regular library usage is transparent. However, when doing development
work on HTML Purifier, you may find you have to flush one of the
caches.
</p>
<p>
As a general rule of thumb, run <code>flush.php</code> whenever you make
any <em>major</em> changes, or when tests start mysteriously failing.
In more detail, run this script if:
</p>
<ul>
<li>
You added new source files to HTML Purifier's main library.
(see <code>generate-includes.php</code>)
</li>
<li>
You modified the configuration schema (see
<code>generate-schema-cache.php</code>). This usually means
adding or modifying files in <code>HTMLPurifier/ConfigSchema/schema/</code>,
although in rare cases modifying <code>HTMLPurifier/ConfigSchema.php</code>
will also require this.
</li>
<li>
You modified a Definition, or its subsystems. The most usual candidate
is <code>HTMLPurifier/HTMLDefinition.php</code>, which also encompasses
the files in <code>HTMLPurifier/HTMLModule/</code> as well as if you've
<a href="enduser-customize.html">customizing definitions</a> without
the cache disabled. (see <code>flush-generation-cache.php</code>)
</li>
<li>
You modified source files, and have been using the standalone
version from the full installation. (see <code>generate-standalone.php</code>)
</li>
</ul>
<p>
You can check out the corresponding scripts for more information on what they
do.
</p>
</body></html>
<!-- vim: et sw=4 sts=4
-->

View File

@ -0,0 +1,281 @@
INCLUDES, AUTOLOAD, BYTECODE CACHES and OPTIMIZATION
The Problem
-----------
HTML Purifier contains a number of extra components that are not used all
of the time, only if the user explicitly specifies that we should use
them.
Some of these optional components are optionally included (Filter,
Language, Lexer, Printer), while others are included all the time
(Injector, URIFilter, HTMLModule, URIScheme). We will stipulate that these
are all developer specified: it is conceivable that certain Tokens are not
used, but this is user-dependent and should not be trusted.
We should come up with a consistent way to handle these things and ensure
that we get the maximum performance when there is bytecode caches and
when there are not. Unfortunately, these two goals seem contrary to each
other.
A peripheral issue is the performance of ConfigSchema, which has been
shown take a large, constant amount of initialization time, and is
intricately linked to the issue of includes due to its pervasive use
in our plugin architecture.
Pros and Cons
-------------
We will assume that user-based extensions will be included by them.
Conditional includes:
Pros:
- User management is simplified; only a single directive needs to be set
- Only necessary code is included
Cons:
- Doesn't play nicely with opcode caches
- Adds complexity to standalone version
- Optional configuration directives are not exposed without a little
extra coaxing (not implemented yet)
Include it all:
Pros:
- User management is still simple
- Plays nicely with opcode caches and standalone version
- All configuration directives are present
Cons:
- Lots of (how much?) extra code is included
- Classes that inherit from external libraries will cause compile
errors
Build an include stub (Let's do this!):
Pros:
- Only necessary code is included
- Plays nicely with opcode caches and standalone version
- require (without once) can be used, see above
- Could further extend as a compilation to one file
Cons:
- Not implemented yet
- Requires user intervention and use of a command line script
- Standalone script must be chained to this
- More complex and compiled-language-like
- Requires a whole new class of system-wide configuration directives,
as configuration objects can be reused
- Determining what needs to be included can be complex (see above)
- No way of autodetecting dynamically instantiated classes
- Might be slow
Include stubs
-------------
This solution may be "just right" for users who are heavily oriented
towards performance. However, there are a number of picky implementation
details to work out beforehand.
The number one concern is how to make the HTML Purifier files "work
out of the box", while still being able to easily get them into a form
that works with this setup. As the codebase stands right now, it would
be necessary to strip out all of the require_once calls. The only way
we could get rid of the require_once calls is to use __autoload or
use the stub for all cases (which might not be a bad idea).
Aside
-----
An important thing to remember, however, is that these require_once's
are valuable data about what classes a file needs. Unfortunately, there's
no distinction between whether or not the file is needed all the time,
or whether or not it is one of our "optional" files. Thus, it is
effectively useless.
Deprecated
----------
One of the things I'd like to do is have the code search for any classes
that are explicitly mentioned in the code. If a class isn't mentioned, I
get to assume that it is "optional," i.e. included via introspection.
The choice is either to use PHP's tokenizer or use regexps; regexps would
be faster but a tokenizer would be more correct. If this ends up being
unfeasible, adding dependency comments isn't a bad idea. (This could
even be done automatically by search/replacing require_once, although
we'd have to manually inspect the results for the optional requires.)
NOTE: This ends up not being necessary, as we're going to make the user
figure out all the extra classes they need, and only include the core
which is predetermined.
Using the autoload framework with include stubs works nicely with
introspective classes: instead of having to have require_once inside
the function, we can let autoload do the work; we simply need to
new $class or accept the object straight from the caller. Handling filters
becomes a simple matter of ticking off configuration directives, and
if ConfigSchema spits out errors, adding the necessary includes. We could
also use the autoload framework as a fallback, in case the user forgets
to make the include, but doesn't really care about performance.
Insight
-------
All of this talk is merely a natural extension of what our current
standalone functionality does. However, instead of having our code
perform the includes, or attempting to inline everything that possibly
could be used, we boot the issue to the user, making them include
everything or setup the fallback autoload handler.
Configuration Schema
--------------------
A common deficiency for all of the conditional include setups (including
the dynamically built include PHP stub) is that if one of this
conditionally included files includes a configuration directive, it
is not accessible to configdoc. A stopgap solution for this problem is
to have it piggy-back off of the data in the merge-library.php script
to figure out what extra files it needs to include, but if the file also
inherits classes that don't exist, we're in big trouble.
I think it's high time we centralized the configuration documentation.
However, the type checking has been a great boon for the library, and
I'd like to keep that. The compromise is to use some other source, and
then parse it into the ConfigSchema internal format (sans all of those
nasty documentation strings which we really don't need at runtime) and
serialize that for future use.
The next question is that of format. XML is very verbose, and the prospect
of setting defaults in it gives me willies. However, this may be necessary.
Splitting up the file into manageable chunks may alleviate this trouble,
and we may be even want to create our own format optimized for specifying
configuration. It might look like (based off the PHPT format, which is
nicely compact yet unambiguous and human-readable):
Core.HiddenElements
TYPE: lookup
DEFAULT: array('script', 'style') // auto-converted during processing
--ALIASES--
Core.InvisibleElements, Core.StupidElements
--DESCRIPTION--
<p>
Blah blah
</p>
The first line is the directive name, the lines after that prior to the
first --HEADER-- block are single-line values, and then after that
the multiline values are there. No value is restricted to a particular
format: DEFAULT could very well be multiline if that would be easier.
This would make it insanely easy, also, to add arbitrary extra parameters,
like:
VERSION: 3.0.0
ALLOWED: 'none', 'light', 'medium', 'heavy' // this is wrapped in array()
EXTERNAL: CSSTidy // this would be documented somewhere else with a URL
The final loss would be that you wouldn't know what file the directive
was used in; with some clever regexps it should be possible to
figure out where $config->get($ns, $d); occurs. Reflective calls to
the configuration object is mitigated by the fact that getBatch is
used, so we can simply talk about that in the namespace definition page.
This might be slow, but it would only happen when we are creating
the documentation for consumption, and is sugar.
We can put this in a schema/ directory, outside of HTML Purifier. The serialized
data gets treated like entities.ser.
The final thing that needs to be handled is user defined configurations.
They can be added at runtime using ConfigSchema::registerDirectory()
which globs the directory and grabs all of the directives to be incorporated
in. Then, the result is saved. We may want to take advantage of the
DefinitionCache framework, although it is not altogether certain what
configuration directives would be used to generate our key (meta-directives!)
Further thoughts
----------------
Our master configuration schema will only need to be updated once
every new version, so it's easily versionable. User specified
schema files are far more volatile, but it's far too expensive
to check the filemtimes of all the files, so a DefinitionRev style
mechanism works better. However, we can uniquely identify the
schema based on the directories they loaded, so there's no need
for a DefinitionId until we give them full programmatic control.
These variables should be directly incorporated into ConfigSchema,
and ConfigSchema should handle serialization. Some refactoring will be
necessary for the DefinitionCache classes, as they are built with
Config in mind. If the user changes something, the cache file gets
rebuilt. If the version changes, the cache file gets rebuilt. Since
our unit tests flush the caches before we start, and the operation is
pretty fast, this will not negatively impact unit testing.
One last thing: certain configuration directives require that files
get added. They may even be specified dynamically. It is not a good idea
for the HTMLPurifier_Config object to be used directly for such matters.
Instead, the userland code should explicitly perform the includes. We may
put in something like:
REQUIRES: HTMLPurifier_Filter_ExtractStyleBlocks
To indicate that if that class doesn't exist, and the user is attempting
to use the directive, we should fatally error out. The stub includes the core files,
and the user includes everything else. Any reflective things like new
$class would be required to tie in with the configuration.
It would work very well with rarely used configuration options, but it
wouldn't be so good for "core" parts that can be disabled. In such cases
the core include file would need to be modified, and the only way
to properly do this is use the configuration object. Once again, our
ability to create cache keys saves the day again: we can create arbitrary
stub files for arbitrary configurations and include those. They could
even be the single file affairs. The only thing we'd need to include,
then, would be HTMLPurifier_Config! Then, the configuration object would
load the library.
An aside...
-----------
One questions, however, the wisdom of letting PHP files write other PHP
files. It seems like a recipe for disaster, or at least lots of headaches
in highly secured setups, where PHP does not have the ability to write
to its root. In such cases, we could use sticky bits or tell the user
to manually generate the file.
The other troublesome bit is actually doing the calculations necessary.
For certain cases, it's simple (such as URIScheme), but for AttrDef
and HTMLModule the dependency trees are very complex in relation to
%HTML.Allowed and friends. I think that this idea should be shelved
and looked at a later, less insane date.
An interesting dilemma presents itself when a configuration form is offered
to the user. Normally, the configuration object is not accessible without
editing PHP code; this facility changes thing. The sensible thing to do
is stipulate that all classes required by the directives you allow must
be included.
Unit testing
------------
Setting up the parsing and translation into our existing format would not
be difficult to do. It might represent a good time for us to rethink our
tests for these facilities; as creative as they are, they are often hacky
and require public visibility for things that ought to be protected.
This is especially applicable for our DefinitionCache tests.
Migration
---------
Because we are not *adding* anything essentially new, it should be trivial
to write a script to take our existing data and dump it into the new format.
Well, not trivial, but fairly easy to accomplish. Primary implementation
difficulties would probably involve formatting the file nicely.
Backwards-compatibility
-----------------------
I expect that the ConfigSchema methods should stick around for a little bit,
but display E_USER_NOTICE warnings that they are deprecated. This will
require documentation!
New stuff
---------
VERSION: Version number directive was introduced
DEPRECATED-VERSION: If the directive was deprecated, when was it deprecated?
DEPRECATED-USE: If the directive was deprecated, what should the user use now?
REQUIRES: What classes does this configuration directive require, but are
not part of the HTML Purifier core?
vim: et sw=4 sts=4

View File

@ -0,0 +1,83 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="description" content="Defines class naming conventions in HTML Purifier." />
<link rel="stylesheet" type="text/css" href="./style.css" />
<title>Naming Conventions - HTML Purifier</title>
</head><body>
<h1>Naming Conventions</h1>
<div id="filing">Filed under Development</div>
<div id="index">Return to the <a href="index.html">index</a>.</div>
<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
<p>The classes in this library follow a few naming conventions, which may
help you find the correct functionality more quickly. Here they are:</p>
<dl>
<dt>All classes occupy the HTMLPurifier pseudo-namespace.</dt>
<dd>This means that all classes are prefixed with HTMLPurifier_. As such, all
names under HTMLPurifier_ are reserved. I recommend that you use the name
HTMLPurifierX_YourName_ClassName, especially if you want to take advantage
of HTMLPurifier_ConfigDef.</dd>
<dt>All classes correspond to their path if library/ was in the include path</dt>
<dd>HTMLPurifier_AttrDef is located at HTMLPurifier/AttrDef.php; replace
underscores with slashes and append .php and you'll have the location of
the class.</dd>
<dt>Harness and Test are reserved class names for unit tests</dt>
<dd>The suffix <code>Test</code> indicates that the class is a subclass of UnitTestCase
(of the Simpletest library) and is testable. "Harness" indicates a subclass
of UnitTestCase that is not meant to be run but to be extended into
concrete test cases and contains custom test methods (i.e. assert*())</dd>
<dt>Class names do not necessarily represent inheritance hierarchies</dt>
<dd>While we try to reflect inheritance in naming to some extent, it is not
guaranteed (for instance, none of the classes inherit from HTMLPurifier,
the base class). However, all class files have the require_once
declarations to whichever classes they are tightly coupled to.</dd>
<dt>Strategy has a meaning different from the Gang of Four pattern</dt>
<dd>In Design Patterns, the Gang of Four describes a Strategy object as
encapsulating an algorithm so that they can be switched at run-time. While
our strategies are indeed algorithms, they are not meant to be substituted:
all must be present in order for proper functioning.</dd>
<dt>Abbreviations are avoided</dt>
<dd>We try to avoid abbreviations as much as possible, but in some cases,
abbreviated version is more readable than the full version. Here, we
list common abbreviations:
<ul>
<li>Attr to Attributes (note that it is plural, i.e. <code>$attr = array()</code>)</li>
<li>Def to Definition</li>
<li><code>$ret</code> is the value to be returned in a function</li>
</ul>
</dd>
<dt>Ambiguity concerning the definition of Def/Definition</dt>
<dd>While a definition normally defines the structure/acceptable values of
an entity, most of the definitions in this application also attempt
to validate and fix the value. I am unsure of a better name, as
"Validator" would exclude fixing the value, "Fixer" doesn't invoke
the proper image of "fixing" something, and "ValidatorFixer" is too long!
Some other suggestions were "Handler", "Reference", "Check", "Fix",
"Repair" and "Heal".</dd>
<dt>Transform not Transformer</dt>
<dd>Transform is both a noun and a verb, and thus we define a "Transform" as
something that "transforms," leaving "Transformer" (which sounds like an
electrical device/robot toy).</dd>
</dl>
</body></html>
<!-- vim: et sw=4 sts=4
-->

View File

@ -0,0 +1,33 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="description" content="Discusses possible methods of optimizing HTML Purifier." />
<link rel="stylesheet" type="text/css" href="./style.css" />
<title>Optimization - HTML Purifier</title>
</head><body>
<h1>Optimization</h1>
<div id="filing">Filed under Development</div>
<div id="index">Return to the <a href="index.html">index</a>.</div>
<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
<p>Here are some possible optimization techniques we can apply to code sections if
they turn out to be slow. Be sure not to prematurely optimize: if you get
that itch, put it here!</p>
<ul>
<li>Make Tokens Flyweights (may prove problematic, probably not worth it)</li>
<li>Rewrite regexps into PHP code</li>
<li>Batch regexp validation (do as many per function call as possible)</li>
<li>Parallelize strategies</li>
</ul>
</body></html>
<!-- vim: et sw=4 sts=4
-->

View File

@ -0,0 +1,309 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="description" content="Tables detailing HTML element and CSS property implementation coverage in HTML Purifier." />
<link rel="stylesheet" type="text/css" href="./style.css" />
<title>Implementation Progress - HTML Purifier</title>
<style type="text/css">
td {padding-right:1em;border-bottom:1px solid #000;padding-left:0.5em;}
th {text-align:left;padding-top:1.4em;font-size:13pt;
border-bottom:2px solid #000;background:#FFF;}
thead th {text-align:left;padding:0.1em;background-color:#EEE;}
.impl-yes {background:#9D9;}
.impl-partial {background:#FFA;}
.impl-no {background:#CCC;}
.danger {color:#600;}
.css1 {color:#060;}
.required {font-weight:bold;}
.feature {color:#999;}
</style>
</head><body>
<h1>Implementation Progress</h1>
<div id="filing">Filed under Development</div>
<div id="index">Return to the <a href="index.html">index</a>.</div>
<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
<p>
<strong>Warning:</strong> This table is kept for historical purposes and
is not being actively updated.
</p>
<h2>Key</h2>
<table cellspacing="0"><tbody>
<tr><td class="impl-yes">Implemented</td></tr>
<tr><td class="impl-partial">Partially implemented</td></tr>
<tr><td class="impl-no">Not priority to implement</td></tr>
<tr><td class="danger">Dangerous attribute/property</td></tr>
<tr><td class="css1">Present in CSS1</td></tr>
<tr><td class="feature">Feature, requires extra work</td></tr>
</tbody></table>
<h2>CSS</h2>
<table cellspacing="0">
<thead>
<tr><th>Name</th><th>Notes</th></tr>
</thead>
<!--
<tr><td>-</td><td>-</td></tr>
-->
<tbody>
<tr><th colspan="2">Standard</th></tr>
<tr class="css1 impl-yes"><td>background-color</td><td>COMPOSITE(&lt;color&gt;, transparent)</td></tr>
<tr class="css1 impl-yes"><td>background</td><td>SHORTHAND, currently alias for background-color</td></tr>
<tr class="css1 impl-yes"><td>border</td><td>SHORTHAND, MULTIPLE</td></tr>
<tr class="css1 impl-yes"><td>border-color</td><td>MULTIPLE</td></tr>
<tr class="css1 impl-yes"><td>border-style</td><td>MULTIPLE</td></tr>
<tr class="css1 impl-yes"><td>border-width</td><td>MULTIPLE</td></tr>
<tr class="css1 impl-yes"><td>border-*</td><td>SHORTHAND</td></tr>
<tr class="impl-yes"><td>border-*-color</td><td>COMPOSITE(&lt;color&gt;, transparent)</td></tr>
<tr class="impl-yes"><td>border-*-style</td><td>ENUM(none, hidden, dotted, dashed,
solid, double, groove, ridge, inset, outset)</td></tr>
<tr class="css1 impl-yes"><td>border-*-width</td><td>COMPOSITE(&lt;length&gt;, thin, medium, thick)</td></tr>
<tr class="css1 impl-yes"><td>clear</td><td>ENUM(none, left, right, both)</td></tr>
<tr class="css1 impl-yes"><td>color</td><td>&lt;color&gt;</td></tr>
<tr class="css1 impl-yes"><td>float</td><td>ENUM(left, right, none), May require layout
precautions with clear</td></tr>
<tr class="css1 impl-yes"><td>font</td><td>SHORTHAND</td></tr>
<tr class="css1 impl-yes"><td>font-family</td><td>CSS validator may complain if fallback font
family not specified</td></tr>
<tr class="css1 impl-yes"><td>font-size</td><td>COMPOSITE(&lt;absolute-size&gt;,
&lt;relative-size&gt;, &lt;length&gt;, &lt;percentage&gt;)</td></tr>
<tr class="css1 impl-yes"><td>font-style</td><td>ENUM(normal, italic, oblique)</td></tr>
<tr class="css1 impl-yes"><td>font-variant</td><td>ENUM(normal, small-caps)</td></tr>
<tr class="css1 impl-yes"><td>font-weight</td><td>ENUM(normal, bold, bolder, lighter,
100, 200, 300, 400, 500, 600, 700, 800, 900), maybe special code for
in-between integers</td></tr>
<tr class="css1 impl-yes"><td>letter-spacing</td><td>COMPOSITE(&lt;length&gt;, normal)</td></tr>
<tr class="css1 impl-yes"><td>line-height</td><td>COMPOSITE(&lt;number&gt;,
&lt;length&gt;, &lt;percentage&gt;, normal)</td></tr>
<tr class="css1 impl-yes"><td>list-style-position</td><td>ENUM(inside, outside),
Strange behavior in browsers</td></tr>
<tr class="css1 impl-yes"><td>list-style-type</td><td>ENUM(...),
Well-supported values are: disc, circle, square,
decimal, lower-roman, upper-roman, lower-alpha and upper-alpha. See also
CSS 3. Mostly IE lack of support.</td></tr>
<tr class="css1 impl-yes"><td>list-style</td><td>SHORTHAND</td></tr>
<tr class="css1 impl-yes"><td>margin</td><td>MULTIPLE</td></tr>
<tr class="css1 impl-yes"><td>margin-*</td><td>COMPOSITE(&lt;length&gt;,
&lt;percentage&gt;, auto)</td></tr>
<tr class="css1 impl-yes"><td>padding</td><td>MULTIPLE</td></tr>
<tr class="css1 impl-yes"><td>padding-*</td><td>COMPOSITE(&lt;length&gt;(positive),
&lt;percentage&gt;(positive))</td></tr>
<tr class="css1 impl-yes"><td>text-align</td><td>ENUM(left, right,
center, justify)</td></tr>
<tr class="css1 impl-yes"><td>text-decoration</td><td>No blink (argh my eyes), not
enum, can be combined (composite sorta): underline, overline,
line-through</td></tr>
<tr class="css1 impl-yes"><td>text-indent</td><td>COMPOSITE(&lt;length&gt;,
&lt;percentage&gt;)</td></tr>
<tr class="css1 impl-yes"><td>text-transform</td><td>ENUM(capitalize, uppercase,
lowercase, none)</td></tr>
<tr class="css1 impl-yes"><td>width</td><td>COMPOSITE(&lt;length&gt;,
&lt;percentage&gt;, auto), Interesting</td></tr>
<tr class="css1 impl-yes"><td>word-spacing</td><td>COMPOSITE(&lt;length&gt;, auto),
IE 5 no support</td></tr>
</tbody>
<tbody>
<tr><th colspan="2">Table</th></tr>
<tr class="impl-yes"><td>border-collapse</td><td>ENUM(collapse, seperate)</td></tr>
<tr class="impl-yes"><td>border-space</td><td>MULTIPLE</td></tr>
<tr class="impl-yes"><td>caption-side</td><td>ENUM(top, bottom)</td></tr>
<tr class="feature"><td>empty-cells</td><td>ENUM(show, hide), No IE support makes this useless,
possible fix with &amp;nbsp;? Unknown release milestone.</td></tr>
<tr class="impl-yes"><td>table-layout</td><td>ENUM(auto, fixed)</td></tr>
<tr class="impl-yes css1"><td>vertical-align</td><td>COMPOSITE(ENUM(baseline, sub,
super, top, text-top, middle, bottom, text-bottom), &lt;percentage&gt;,
&lt;length&gt;) Also applies to others with explicit height</td></tr>
</tbody>
<tbody>
<tr><th colspan="2">Absolute positioning, unknown release milestone</th></tr>
<tr class="danger impl-no"><td>bottom</td><td rowspan="4">Dangerous, must be non-negative to even be considered,
but it's still possible to arbitrarily position by running over.</td></tr>
<tr class="danger impl-no"><td>left</td></tr>
<tr class="danger impl-no"><td>right</td></tr>
<tr class="danger impl-no"><td>top</td></tr>
<tr class="impl-no"><td>clip</td><td>-</td></tr>
<tr class="danger impl-no"><td>position</td><td>ENUM(static, relative, absolute, fixed)
relative not absolute?</td></tr>
<tr class="danger impl-no"><td>z-index</td><td>Dangerous</td></tr>
</tbody>
<tbody>
<tr><th colspan="2">Unknown</th></tr>
<tr class="danger css1 impl-yes"><td>background-image</td><td>Dangerous</td></tr>
<tr class="css1 impl-yes"><td>background-attachment</td><td>ENUM(scroll, fixed),
Depends on background-image</td></tr>
<tr class="css1 impl-yes"><td>background-position</td><td>Depends on background-image</td></tr>
<tr class="danger impl-no"><td>cursor</td><td>Dangerous but fluffy</td></tr>
<tr class="danger impl-yes"><td>display</td><td>ENUM(...), Dangerous but interesting;
will not implement list-item, run-in (Opera only) or table (no IE);
inline-block has incomplete IE6 support and requires -moz-inline-box
for Mozilla. Unknown target milestone.</td></tr>
<tr class="css1 impl-yes"><td>height</td><td>Interesting, why use it? Unknown target milestone.</td></tr>
<tr class="danger css1 impl-yes"><td>list-style-image</td><td>Dangerous?</td></tr>
<tr class="impl-no"><td>max-height</td><td rowspan="4">No IE 5/6</td></tr>
<tr class="impl-no"><td>min-height</td></tr>
<tr class="impl-no"><td>max-width</td></tr>
<tr class="impl-no"><td>min-width</td></tr>
<tr class="impl-no"><td>orphans</td><td>No IE support</td></tr>
<tr class="impl-no"><td>widows</td><td>No IE support</td></tr>
<tr><td>overflow</td><td>ENUM, IE 5/6 almost (remove visible if set). Unknown target milestone.</td></tr>
<tr><td>page-break-after</td><td>ENUM(auto, always, avoid, left, right),
IE 5.5/6 and Opera. Unknown target milestone.</td></tr>
<tr><td>page-break-before</td><td>ENUM(auto, always, avoid, left, right),
Mostly supported. Unknown target milestone.</td></tr>
<tr><td>page-break-inside</td><td>ENUM(avoid, auto), Opera only. Unknown target milestone.</td></tr>
<tr class="impl-no"><td>quotes</td><td>May be dropped from CSS2, fairly useless for inline context</td></tr>
<tr class="danger impl-yes"><td>visibility</td><td>ENUM(visible, hidden, collapse),
Dangerous</td></tr>
<tr class="css1 feature impl-partial"><td>white-space</td><td>ENUM(normal, pre, nowrap, pre-wrap,
pre-line), Spotty implementation:
pre (no IE 5/6), <em>nowrap</em> (no IE 5, supported),
pre-wrap (only Opera), pre-line (no support). Fixable? Unknown target milestone.</td></tr>
</tbody>
<tbody class="impl-no">
<tr><th colspan="2">Aural</th></tr>
<tr><td>azimuth</td><td>-</td></tr>
<tr><td>cue</td><td>-</td></tr>
<tr><td>cue-after</td><td>-</td></tr>
<tr><td>cue-before</td><td>-</td></tr>
<tr><td>elevation</td><td>-</td></tr>
<tr><td>pause-after</td><td>-</td></tr>
<tr><td>pause-before</td><td>-</td></tr>
<tr><td>pause</td><td>-</td></tr>
<tr><td>pitch-range</td><td>-</td></tr>
<tr><td>pitch</td><td>-</td></tr>
<tr><td>play-during</td><td>-</td></tr>
<tr><td>richness</td><td>-</td></tr>
<tr><td>speak-header</td><td>Table related</td></tr>
<tr><td>speak-numeral</td><td>-</td></tr>
<tr><td>speak-punctuation</td><td>-</td></tr>
<tr><td>speak</td><td>-</td></tr>
<tr><td>speech-rate</td><td>-</td></tr>
<tr><td>stress</td><td>-</td></tr>
<tr><td>voice-family</td><td>-</td></tr>
<tr><td>volume</td><td>-</td></tr>
</tbody>
<tbody class="impl-no">
<tr><th colspan="2">Will not implement</th></tr>
<tr><td>content</td><td>Not applicable for inline styles</td></tr>
<tr><td>counter-increment</td><td>Needs content, Opera only</td></tr>
<tr><td>counter-reset</td><td>Needs content, Opera only</td></tr>
<tr><td>direction</td><td>No support</td></tr>
<tr><td>outline-color</td><td rowspan="4">IE Mac and Opera on outside,
Mozilla on inside and needs -moz-outline, no IE support.</td></tr>
<tr><td>outline-style</td></tr>
<tr><td>outline-width</td></tr>
<tr><td>outline</td></tr>
<tr><td>unicode-bidi</td><td>No support</td></tr>
</tbody>
</table>
<h2>Interesting Attributes</h2>
<table cellspacing="0">
<thead>
<tr><th>Attribute</th><th>Tags</th><th>Notes</th></tr>
</thead>
<!--
<tr><th></th></tr>
<tbody>
<tr><td>-</td><td>-</td><td>-</td></tr>
</tbody>
-->
<tbody>
<tr><th colspan="3">CSS</th></tr>
<tr class="impl-yes"><td>style</td><td>All</td><td>Parser is reasonably functional. Status here doesn't count individual properties.</td></tr>
</tbody>
<tbody>
<tr><th colspan="3">Questionable</th></tr>
<tr class="impl-no"><td>accesskey</td><td>A</td><td>May interfere with main interface</td></tr>
<tr class="impl-no"><td>tabindex</td><td>A</td><td>May interfere with main interface</td></tr>
<tr class="impl-yes"><td>target</td><td>A</td><td>Config enabled, only useful for frame layouts, disallowed in strict</td></tr>
</tbody>
<tbody>
<tr><th colspan="3">Miscellaneous</th></tr>
<tr><td>datetime</td><td>DEL, INS</td><td>No visible effect, ISO format</td></tr>
<tr class="impl-yes"><td>rel</td><td>A</td><td>Largely user-defined: nofollow, tag (see microformats)</td></tr>
<tr class="impl-yes"><td>rev</td><td>A</td><td>Largely user-defined: vote-*</td></tr>
<tr class="feature"><td>axis</td><td>TD, TH</td><td>W3C only: No browser implementation</td></tr>
<tr class="feature"><td>char</td><td>COL, COLGROUP, TBODY, TD, TFOOT, TH, THEAD, TR</td><td>W3C only: No browser implementation</td></tr>
<tr class="feature"><td>headers</td><td>TD, TH</td><td>W3C only: No browser implementation</td></tr>
<tr class="impl-yes"><td>scope</td><td>TD, TH</td><td>W3C only: No browser implementation</td></tr>
</tbody>
<tbody class="impl-yes">
<tr><th colspan="3">URI</th></tr>
<tr><td rowspan="2">cite</td><td>BLOCKQUOTE, Q</td><td>For attribution</td></tr>
<tr><td>DEL, INS</td><td>Link to explanation why it changed</td></tr>
<tr><td>href</td><td>A</td><td>-</td></tr>
<tr><td>longdesc</td><td>IMG</td><td>-</td></tr>
<tr class="required"><td>src</td><td>IMG</td><td>Required</td></tr>
</tbody>
<tbody>
<tr><th colspan="3">Transform</th></tr>
<tr class="impl-yes"><td rowspan="5">align</td><td>CAPTION</td><td>'caption-side' for top/bottom, 'text-align' for left/right</td></tr>
<tr class="impl-yes"><td>IMG</td><td rowspan="3">See specimens/html-align-to-css.html</td></tr>
<tr class="impl-yes"><td>TABLE</td></tr>
<tr class="impl-yes"><td>HR</td></tr>
<tr class="impl-yes"><td>H1, H2, H3, H4, H5, H6, P</td><td>Equivalent style 'text-align'</td></tr>
<tr class="required impl-yes"><td>alt</td><td>IMG</td><td>Required, insert image filename if src is present or default invalid image text</td></tr>
<tr class="impl-yes"><td rowspan="3">bgcolor</td><td>TABLE</td><td>Superset style 'background-color'</td></tr>
<tr class="impl-yes"><td>TR</td><td>Superset style 'background-color'</td></tr>
<tr class="impl-yes"><td>TD, TH</td><td>Superset style 'background-color'</td></tr>
<tr class="impl-yes"><td>border</td><td>IMG</td><td>Equivalent style <code>border:[number]px solid</code></td></tr>
<tr class="impl-yes"><td>clear</td><td>BR</td><td>Near-equiv style 'clear', transform 'all' into 'both'</td></tr>
<tr class="impl-no"><td>compact</td><td>DL, OL, UL</td><td>Boolean, needs custom CSS class; rarely used anyway</td></tr>
<tr class="required impl-yes"><td>dir</td><td>BDO</td><td>Required, insert ltr (or configuration value) if none</td></tr>
<tr class="impl-yes"><td>height</td><td>TD, TH</td><td>Near-equiv style 'height', needs px suffix if original was in pixels</td></tr>
<tr class="impl-yes"><td>hspace</td><td>IMG</td><td>Near-equiv styles 'margin-top' and 'margin-bottom', needs px suffix</td></tr>
<tr class="impl-yes"><td>lang</td><td>*</td><td>Copy value to xml:lang</td></tr>
<tr class="impl-yes"><td rowspan="2">name</td><td>IMG</td><td>Turn into ID</td></tr>
<tr class="impl-yes"><td>A</td><td>Turn into ID</td></tr>
<tr class="impl-yes"><td>noshade</td><td>HR</td><td>Boolean, style 'border-style:solid;'</td></tr>
<tr class="impl-yes"><td>nowrap</td><td>TD, TH</td><td>Boolean, style 'white-space:nowrap;' (not compat with IE5)</td></tr>
<tr class="impl-yes"><td>size</td><td>HR</td><td>Near-equiv 'height', needs px suffix if original was pixels</td></tr>
<tr class="required impl-yes"><td>src</td><td>IMG</td><td>Required, insert blank or default img if not set</td></tr>
<tr class="impl-yes"><td>start</td><td>OL</td><td>Poorly supported 'counter-reset', allowed in loose, dropped in strict</td></tr>
<tr class="impl-yes"><td rowspan="3">type</td><td>LI</td><td rowspan="3">Equivalent style 'list-style-type', different allowed values though. (needs testing)</td></tr>
<tr class="impl-yes"><td>OL</td></tr>
<tr class="impl-yes"><td>UL</td></tr>
<tr class="impl-yes"><td>value</td><td>LI</td><td>Poorly supported 'counter-reset', allowed in loose, dropped in strict</td></tr>
<tr class="impl-yes"><td>vspace</td><td>IMG</td><td>Near-equiv styles 'margin-left' and 'margin-right', needs px suffix, see hspace</td></tr>
<tr class="impl-yes"><td rowspan="2">width</td><td>HR</td><td rowspan="2">Near-equiv style 'width', needs px suffix if original was pixels</td></tr>
<tr class="impl-yes"><td>TD, TH</td></tr>
</tbody>
</table>
</body></html>
<!-- vim: et sw=4 sts=4
-->

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,850 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="description" content="Tutorial for customizing HTML Purifier's tag and attribute sets." />
<link rel="stylesheet" type="text/css" href="style.css" />
<title>Customize - HTML Purifier</title>
</head><body>
<h1 class="subtitled">Customize!</h1>
<div class="subtitle">HTML Purifier is a Swiss-Army Knife</div>
<div id="filing">Filed under End-User</div>
<div id="index">Return to the <a href="index.html">index</a>.</div>
<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
<p>
HTML Purifier has this quirk where if you try to allow certain elements or
attributes, HTML Purifier will tell you that it's not supported, and that
you should go to the forums to find out how to implement it. Well, this
document is how to implement elements and attributes which HTML Purifier
doesn't support out of the box.
</p>
<h2>Is it necessary?</h2>
<p>
Before we even write any code, it is paramount to consider whether or
not the code we're writing is necessary or not. HTML Purifier, by default,
contains a large set of elements and attributes: large enough so that
<em>any</em> element or attribute in XHTML 1.0 or 1.1 (and its HTML variants)
that can be safely used by the general public is implemented.
</p>
<p>
So what needs to be implemented? (Feel free to skip this section if
you know what you want).
</p>
<h3>XHTML 1.0</h3>
<p>
All of the modules listed below are based off of the
<a href="http://www.w3.org/TR/2001/REC-xhtml-modularization-20010410/abstract_modules.html#sec_5.2.">modularization of
XHTML</a>, which, while technically for XHTML 1.1, is quite a useful
resource.
</p>
<ul>
<li>Structure</li>
<li>Frames</li>
<li>Applets (deprecated)</li>
<li>Forms</li>
<li>Image maps</li>
<li>Objects</li>
<li>Frames</li>
<li>Events</li>
<li>Meta-information</li>
<li>Style sheets</li>
<li>Link (not hypertext)</li>
<li>Base</li>
<li>Name</li>
</ul>
<p>
If you don't recognize it, you probably don't need it. But the curious
can look all of these modules up in the above-mentioned document. Note
that inline scripting comes packaged with HTML Purifier (more on this
later).
</p>
<h3>XHTML 1.1</h3>
<p>
As of HTMLPurifier 2.1.0, we have implemented the
<a href="http://www.w3.org/TR/2001/REC-ruby-20010531/">Ruby module</a>,
which defines a set of tags
for publishing short annotations for text, used mostly in Japanese
and Chinese school texts, but applicable for positioning any text (not
limited to translations) above or below other corresponding text.
</p>
<h3>HTML 5</h3>
<p>
<a href="http://www.whatwg.org/specs/web-apps/current-work/">HTML 5</a>
is a fork of HTML 4.01 by WHATWG, who believed that XHTML 2.0 was headed
in the wrong direction. It too is a working draft, and may change
drastically before publication, but it should be noted that the
<code>canvas</code> tag has been implemented by many browser vendors.
</p>
<h3>Proprietary</h3>
<p>
There are a number of proprietary tags still in the wild. Many of them
have been documented in <a href="ref-proprietary-tags.txt">ref-proprietary-tags.txt</a>,
but there is currently no implementation for any of them.
</p>
<h3>Extensions</h3>
<p>
There are also a number of other XML languages out there that can
be embedded in HTML documents: two of the most popular are MathML and
SVG, and I frequently get requests to implement these. But they are
expansive, comprehensive specifications, and it would take far too long
to implement them <em>correctly</em> (most systems I've seen go as far
as whitelisting tags and no further; come on, what about nesting!)
</p>
<p>
Word of warning: HTML Purifier is currently <em>not</em> namespace
aware.
</p>
<h2>Giving back</h2>
<p>
As you may imagine from the details above (don't be abashed if you didn't
read it all: a glance over would have done), there's quite a bit that
HTML Purifier doesn't implement. Recent architectural changes have
allowed HTML Purifier to implement elements and attributes that are not
safe! Don't worry, they won't be activated unless you set %HTML.Trusted
to true, but they certainly help out users who need to put, say, forms
on their page and don't want to go through the trouble of reading this
and implementing it themself.
</p>
<p>
So any of the above that you implement for your own application could
help out some other poor sap on the other side of the globe. Help us
out, and send back code so that it can be hammered into a module and
released with the core. Any code would be greatly appreciated!
</p>
<h2>And now...</h2>
<p>
Enough philosophical talk, time for some code:
</p>
<pre>$config = HTMLPurifier_Config::createDefault();
$config-&gt;set('HTML.DefinitionID', 'enduser-customize.html tutorial');
$config-&gt;set('HTML.DefinitionRev', 1);
if ($def = $config-&gt;maybeGetRawHTMLDefinition()) {
// our code will go here
}</pre>
<p>
Assuming that HTML Purifier has already been properly loaded (hint:
include <code>HTMLPurifier.auto.php</code>), this code will set up
the environment that you need to start customizing the HTML definition.
What's going on?
</p>
<ul>
<li>
The first three lines are regular configuration code:
<ul>
<li>
%HTML.DefinitionID is set to a unique identifier for your
custom HTML definition. This prevents it from clobbering
other custom definitions on the same installation.
</li>
<li>
%HTML.DefinitionRev is a revision integer of your HTML
definition. Because HTML definitions are cached, you'll need
to increment this whenever you make a change in order to flush
the cache.
</li>
</ul>
</li>
<li>
The fourth line retrieves a raw <code>HTMLPurifier_HTMLDefinition</code>
object that we will be tweaking. Interestingly enough, we have
placed it in an if block: this is because
<code>maybeGetRawHTMLDefinition</code>, as its name suggests, may
return a NULL, in which case we should skip doing any
initialization. This, in fact, will correspond to when our fully
customized object is already in the cache.
</li>
</ul>
<h2>Turn off caching</h2>
<p>
To make development easier, we're going to temporarily turn off
definition caching:
</p>
<pre>$config = HTMLPurifier_Config::createDefault();
$config-&gt;set('HTML.DefinitionID', 'enduser-customize.html tutorial');
$config-&gt;set('HTML.DefinitionRev', 1);
<strong>$config-&gt;set('Cache.DefinitionImpl', null); // TODO: remove this later!</strong>
$def = $config-&gt;getHTMLDefinition(true);</pre>
<p>
A few things should be mentioned about the caching mechanism before
we move on. For performance reasons, HTML Purifier caches generated
<code>HTMLPurifier_Definition</code> objects in serialized files
stored (by default) in <code>library/HTMLPurifier/DefinitionCache/Serializer</code>.
A lot of processing is done in order to create these objects, so it
makes little sense to repeat the same processing over and over again
whenever HTML Purifier is called.
</p>
<p>
In order to identify a cache entry, HTML Purifier uses three variables:
the library's version number, the value of %HTML.DefinitionRev and
a serial of relevant configuration. Whenever any of these changes,
a new HTML definition is generated. Notice that there is no way
for the definition object to track changes to customizations: here, it
is up to you to supply appropriate information to DefinitionID and
DefinitionRev.
</p>
<h2 id="addAttribute">Add an attribute</h2>
<p>
For this example, we're going to implement the <code>target</code> attribute found
on <code>a</code> elements. To implement an attribute, we have to
ask a few questions:
</p>
<ol>
<li>What element is it found on?</li>
<li>What is its name?</li>
<li>Is it required or optional?</li>
<li>What are valid values for it?</li>
</ol>
<p>
The first three are easy: the element is <code>a</code>, the attribute
is <code>target</code>, and it is not a required attribute. (If it
was required, we'd need to append an asterisk to the attribute name,
you'll see an example of this in the addElement() example).
</p>
<p>
The last question is a little trickier.
Lets allow the special values: _blank, _self, _target and _top.
The form of this is called an <strong>enumeration</strong>, a list of
valid values, although only one can be used at a time. To translate
this into code form, we write:
</p>
<pre>$config = HTMLPurifier_Config::createDefault();
$config-&gt;set('HTML.DefinitionID', 'enduser-customize.html tutorial');
$config-&gt;set('HTML.DefinitionRev', 1);
$config-&gt;set('Cache.DefinitionImpl', null); // remove this later!
$def = $config-&gt;getHTMLDefinition(true);
<strong>$def->addAttribute('a', 'target', 'Enum#_blank,_self,_target,_top');</strong></pre>
<p>
The <code>Enum#_blank,_self,_target,_top</code> does all the magic.
The string is split into two parts, separated by a hash mark (#):
</p>
<ol>
<li>The first part is the name of what we call an <code>AttrDef</code></li>
<li>The second part is the parameter of the above-mentioned <code>AttrDef</code></li>
</ol>
<p>
If that sounds vague and generic, it's because it is! HTML Purifier defines
an assortment of different attribute types one can use, and each of these
has their own specialized parameter format. Here are some of the more useful
ones:
</p>
<table class="table">
<thead>
<tr>
<th>Type</th>
<th>Format</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<th>Enum</th>
<td><em>[s:]</em>value1,value2,...</td>
<td>
Attribute with a number of valid values, one of which may be used. When
s: is present, the enumeration is case sensitive.
</td>
</tr>
<tr>
<th>Bool</th>
<td>attribute_name</td>
<td>
Boolean attribute, with only one valid value: the name
of the attribute.
</td>
</tr>
<tr>
<th>CDATA</th>
<td></td>
<td>
Attribute of arbitrary text. Can also be referred to as <strong>Text</strong>
(the specification makes a semantic distinction between the two).
</td>
</tr>
<tr>
<th>ID</th>
<td></td>
<td>
Attribute that specifies a unique ID
</td>
</tr>
<tr>
<th>Pixels</th>
<td></td>
<td>
Attribute that specifies an integer pixel length
</td>
</tr>
<tr>
<th>Length</th>
<td></td>
<td>
Attribute that specifies a pixel or percentage length
</td>
</tr>
<tr>
<th>NMTOKENS</th>
<td></td>
<td>
Attribute that specifies a number of name tokens, example: the
<code>class</code> attribute
</td>
</tr>
<tr>
<th>URI</th>
<td></td>
<td>
Attribute that specifies a URI, example: the <code>href</code>
attribute
</td>
</tr>
<tr>
<th>Number</th>
<td></td>
<td>
Attribute that specifies an positive integer number
</td>
</tr>
</tbody>
</table>
<p>
For a complete list, consult
<a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/AttrTypes.php"><code>library/HTMLPurifier/AttrTypes.php</code></a>;
more information on attributes that accept parameters can be found on their
respective includes in
<a href="http://repo.or.cz/w/htmlpurifier.git?a=tree;hb=HEAD;f=library/HTMLPurifier/AttrDef"><code>library/HTMLPurifier/AttrDef</code></a>.
</p>
<p>
Sometimes, the restrictive list in AttrTypes just doesn't cut it. Don't
sweat: you can also use a fully instantiated object as the value. The
equivalent, verbose form of the above example is:
</p>
<pre>$config = HTMLPurifier_Config::createDefault();
$config-&gt;set('HTML.DefinitionID', 'enduser-customize.html tutorial');
$config-&gt;set('HTML.DefinitionRev', 1);
$config-&gt;set('Cache.DefinitionImpl', null); // remove this later!
$def = $config-&gt;getHTMLDefinition(true);
<strong>$def-&gt;addAttribute('a', 'target', new HTMLPurifier_AttrDef_Enum(
array('_blank','_self','_target','_top')
));</strong></pre>
<p>
Trust me, you'll learn to love the shorthand.
</p>
<h2>Add an element</h2>
<p>
Adding attributes is really small-fry stuff, though, and it was possible
to add them (albeit a bit more wordy) prior to 2.0. The real gem of
the Advanced API is adding elements. There are five questions to
ask when adding a new element:
</p>
<ol>
<li>What is the element's name?</li>
<li>What content set does this element belong to?</li>
<li>What are the allowed children of this element?</li>
<li>What attributes does the element allow that are general?</li>
<li>What attributes does the element allow that are specific to this element?</li>
</ol>
<p>
It's a mouthful, and you'll be slightly lost if your not familiar with
the HTML specification, so let's explain them step by step.
</p>
<h3>Content set</h3>
<p>
The HTML specification defines two major content sets: Inline
and Block. Each of these
content sets contain a list of elements: Inline contains things like
<code>span</code> and <code>b</code> while Block contains things like
<code>div</code> and <code>blockquote</code>.
</p>
<p>
These content sets amount to a macro mechanism for HTML definition. Most
elements in HTML are organized into one of these two sets, and most
elements in HTML allow elements from one of these sets. If we had
to write each element verbatim into each other element's allowed
children, we would have ridiculously large lists; instead we use
content sets to compactify the declaration.
</p>
<p>
Practically speaking, there are several useful values you can use here:
</p>
<table class="table">
<thead>
<tr>
<th>Content set</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<th>Inline</th>
<td>Character level elements, text</td>
</tr>
<tr>
<th>Block</th>
<td>Block-like elements, like paragraphs and lists</td>
</tr>
<tr>
<th><em>false</em></th>
<td>
Any element that doesn't fit into the mold, for example <code>li</code>
or <code>tr</code>
</td>
</tr>
</tbody>
</table>
<p>
By specifying a valid value here, all other elements that use that
content set will also allow your element, without you having to do
anything. If you specify <em>false</em>, you'll have to register
your element manually.
</p>
<h3>Allowed children</h3>
<p>
Allowed children defines the elements that this element can contain.
The allowed values may range from none to a complex regexp depending on
your element.
</p>
<p>
If you've ever taken a look at the HTML DTD's before, you may have
noticed declarations like this:
</p>
<pre>&lt;!ELEMENT LI - O (%flow;)* -- list item --&gt;</pre>
<p>
The <code>(%flow;)*</code> indicates the allowed children of the
<code>li</code> tag: <code>li</code> allows any number of flow
elements as its children. (The <code>- O</code> allows the closing tag to be
omitted, though in XML this is not allowed.) In HTML Purifier,
we'd write it like <code>Flow</code> (here's where the content sets
we were discussing earlier come into play). There are three shorthand
content models you can specify:
</p>
<table class="table">
<thead>
<tr>
<th>Content model</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<th>Empty</th>
<td>No children allowed, like <code>br</code> or <code>hr</code></td>
</tr>
<tr>
<th>Inline</th>
<td>Any number of inline elements and text, like <code>span</code></td>
</tr>
<tr>
<th>Flow</th>
<td>Any number of inline elements, block elements and text, like <code>div</code></td>
</tr>
</tbody>
</table>
<p>
This covers 90% of all the cases out there, but what about elements that
break the mold like <code>ul</code>? This guy requires at least one
child, and the only valid children for it are <code>li</code>. The
content model is: <code>Required: li</code>. There are two parts: the
first type determines what <code>ChildDef</code> will be used to validate
content models. The most common values are:
</p>
<table class="table">
<thead>
<tr>
<th>Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<th>Required</th>
<td>Children must be one or more of the valid elements</td>
</tr>
<tr>
<th>Optional</th>
<td>Children can be any number of the valid elements</td>
</tr>
<tr>
<th>Custom</th>
<td>Children must follow the DTD-style regex</td>
</tr>
</tbody>
</table>
<p>
You can also implement your own <code>ChildDef</code>: this was done
for a few special cases in HTML Purifier such as <code>Chameleon</code>
(for <code>ins</code> and <code>del</code>), <code>StrictBlockquote</code>
and <code>Table</code>.
</p>
<p>
The second part specifies either valid elements or a regular expression.
Valid elements are separated with horizontal bars (|), i.e.
"<code>a | b | c</code>". Use #PCDATA to represent plain text.
Regular expressions are based off of DTD's style:
</p>
<ul>
<li>Parentheses () are used for grouping</li>
<li>Commas (,) separate elements that should come one after another</li>
<li>Horizontal bars (|) indicate one or the other elements should be used</li>
<li>Plus signs (+) are used for a one or more match</li>
<li>Asterisks (*) are used for a zero or more match</li>
<li>Question marks (?) are used for a zero or one match</li>
</ul>
<p>
For example, "<code>a, b?, (c | d), e+, f*</code>" means "In this order,
one <code>a</code> element, at most one <code>b</code> element,
one <code>c</code> or <code>d</code> element (but not both), one or more
<code>e</code> elements, and any number of <code>f</code> elements."
Regex veterans should be able to jump right in, and those not so savvy
can always copy-paste W3C's content model definitions into HTML Purifier
and hope for the best.
</p>
<p>
A word of warning: while the regex format is extremely flexible on
the developer's side, it is
quite unforgiving on the user's side. If the user input does not <em>exactly</em>
match the specification, the entire contents of the element will
be nuked. This is why there is are specific content model types like
Optional and Required: while they could be implemented as <code>Custom:
(valid | elements)*</code>, the custom classes contain special recovery
measures that make sure as much of the user's original content gets
through. HTML Purifier's core, as a rule, does not use Custom.
</p>
<p>
One final note: you can also use Content Sets inside your valid elements
lists or regular expressions. In fact, the three shorthand content models
mentioned above are just that: abbreviations:
</p>
<table class="table">
<thead>
<tr>
<th>Content model</th>
<th>Implementation</th>
</tr>
</thead>
<tbody>
<tr>
<th>Inline</th>
<td>Optional: Inline | #PCDATA</td>
</tr>
<tr>
<th>Flow</th>
<td>Optional: Flow | #PCDATA</td>
</tr>
</tbody>
</table>
<p>
When the definition is compiled, Inline will be replaced with a
horizontal-bar separated list of inline elements. Also, notice that
it does not contain text: you have to specify that yourself.
</p>
<h3>Common attributes</h3>
<p>
Congratulations: you have just gotten over the proverbial hump (Allowed
children). Common attributes is much simpler, and boils down to
one question: does your element have the <code>id</code>, <code>style</code>,
<code>class</code>, <code>title</code> and <code>lang</code> attributes?
If so, you'll want to specify the <code>Common</code> attribute collection,
which contains these five attributes that are found on almost every
HTML element in the specification.
</p>
<p>
There are a few more collections, but they're really edge cases:
</p>
<table class="table">
<thead>
<tr>
<th>Collection</th>
<th>Attributes</th>
</tr>
</thead>
<tbody>
<tr>
<th>I18N</th>
<td><code>lang</code>, possibly <code>xml:lang</code></td>
</tr>
<tr>
<th>Core</th>
<td><code>style</code>, <code>class</code>, <code>id</code> and <code>title</code></td>
</tr>
</tbody>
</table>
<p>
Common is a combination of the above-mentioned collections.
</p>
<p class="aside">
Readers familiar with the modularization may have noticed that the Core
attribute collection differs from that specified by the <a
href="http://www.w3.org/TR/xhtml-modularization/abstract_modules.html#s_commonatts">abstract
modules of the XHTML Modularization 1.1</a>. We believe this section
to be in error, as <code>br</code> permits the use of the <code>style</code>
attribute even though it uses the <code>Core</code> collection, and
the DTD and XML Schemas supplied by W3C support our interpretation.
</p>
<h3>Attributes</h3>
<p>
If you didn't read the <a href="#addAttribute">earlier section on
adding attributes</a>, read it now. The last parameter is simply
an array of attribute names to attribute implementations, in the exact
same format as <code>addAttribute()</code>.
</p>
<h3>Putting it all together</h3>
<p>
We're going to implement <code>form</code>. Before we embark, lets
grab a reference implementation from over at the
<a href="http://www.w3.org/TR/html4/sgml/loosedtd.html">transitional DTD</a>:
</p>
<pre>&lt;!ELEMENT FORM - - (%flow;)* -(FORM) -- interactive form --&gt;
&lt;!ATTLIST FORM
%attrs; -- %coreattrs, %i18n, %events --
action %URI; #REQUIRED -- server-side form handler --
method (GET|POST) GET -- HTTP method used to submit the form--
enctype %ContentType; &quot;application/x-www-form-urlencoded&quot;
accept %ContentTypes; #IMPLIED -- list of MIME types for file upload --
name CDATA #IMPLIED -- name of form for scripting --
onsubmit %Script; #IMPLIED -- the form was submitted --
onreset %Script; #IMPLIED -- the form was reset --
target %FrameTarget; #IMPLIED -- render in this frame --
accept-charset %Charsets; #IMPLIED -- list of supported charsets --
&gt;</pre>
<p>
Juicy! With just this, we can answer four of our five questions:
</p>
<ol>
<li>What is the element's name? <strong>form</strong></li>
<li>What content set does this element belong to? <strong>Block</strong>
(this needs a little sleuthing, I find the easiest way is to search
the DTD for <code>FORM</code> and determine which set it is in.)</li>
<li>What are the allowed children of this element? <strong>One
or more flow elements, but no nested <code>form</code>s</strong></li>
<li>What attributes does the element allow that are general? <strong>Common</strong></li>
<li>What attributes does the element allow that are specific to this element? <strong>A whole bunch, see ATTLIST;
we're going to do the vital ones: <code>action</code>, <code>method</code> and <code>name</code></strong></li>
</ol>
<p>
Time for some code:
</p>
<pre>$config = HTMLPurifier_Config::createDefault();
$config-&gt;set('HTML.DefinitionID', 'enduser-customize.html tutorial');
$config-&gt;set('HTML.DefinitionRev', 1);
$config-&gt;set('Cache.DefinitionImpl', null); // remove this later!
$def = $config-&gt;getHTMLDefinition(true);
$def-&gt;addAttribute('a', 'target', new HTMLPurifier_AttrDef_Enum(
array('_blank','_self','_target','_top')
));
<strong>$form = $def-&gt;addElement(
'form', // name
'Block', // content set
'Flow', // allowed children
'Common', // attribute collection
array( // attributes
'action*' => 'URI',
'method' => 'Enum#get|post',
'name' => 'ID'
)
);
$form-&gt;excludes = array('form' => true);</strong></pre>
<p>
Each of the parameters corresponds to one of the questions we asked.
Notice that we added an asterisk to the end of the <code>action</code>
attribute to indicate that it is required. If someone specifies a
<code>form</code> without that attribute, the tag will be axed.
Also, the extra line at the end is a special extra declaration that
prevents forms from being nested within each other.
</p>
<p>
And that's all there is to it! Implementing the rest of the form
module is left as an exercise to the user; to see more examples
check the <a href="http://repo.or.cz/w/htmlpurifier.git?a=tree;hb=HEAD;f=library/HTMLPurifier/HTMLModule"><code>library/HTMLPurifier/HTMLModule/</code></a> directory
in your local HTML Purifier installation.
</p>
<h2>And beyond...</h2>
<p>
Perceptive users may have realized that, to a certain extent, we
have simply re-implemented the facilities of XML Schema or the
Document Type Definition. What you are seeing here, however, is
not just an XML Schema or Document Type Definition: it is a fully
expressive method of specifying the definition of HTML that is
a portable superset of the capabilities of the two above-mentioned schema
languages. What makes HTMLDefinition so powerful is the fact that
if we don't have an implementation for a content model or an attribute
definition, you can supply it yourself by writing a PHP class.
</p>
<p>
There are many facets of HTMLDefinition beyond the Advanced API I have
walked you through today. To find out more about these, you can
check out these source files:
</p>
<ul>
<li><a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/HTMLModule.php"><code>library/HTMLPurifier/HTMLModule.php</code></a></li>
<li><a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/ElementDef.php"><code>library/HTMLPurifier/ElementDef.php</code></a></li>
</ul>
<h2 id="optimized">Notes for HTML Purifier 4.2.0 and earlier</h3>
<p>
Previously, this tutorial gave some incorrect template code for
editing raw definitions, and that template code will now produce the
error <q>Due to a documentation error in previous version of HTML
Purifier...</q> Here is how to mechanically transform old-style
code into new-style code.
</p>
<p>
First, identify all code that edits the raw definition object, and
put it together. Ensure none of this code must be run on every
request; if some sub-part needs to always be run, move it outside
this block. Here is an example below, with the raw definition
object code bolded.
</p>
<pre>$config = HTMLPurifier_Config::createDefault();
$config-&gt;set('HTML.DefinitionID', 'enduser-customize.html tutorial');
$config-&gt;set('HTML.DefinitionRev', 1);
$def = $config-&gt;getHTMLDefinition(true);
<strong>$def->addAttribute('a', 'target', 'Enum#_blank,_self,_target,_top');</strong>
$purifier = new HTMLPurifier($config);</pre>
<p>
Next, replace the raw definition retrieval with a
maybeGetRawHTMLDefinition method call inside an if conditional, and
place the editing code inside that if block.
</p>
<pre>$config = HTMLPurifier_Config::createDefault();
$config-&gt;set('HTML.DefinitionID', 'enduser-customize.html tutorial');
$config-&gt;set('HTML.DefinitionRev', 1);
<strong>if ($def = $config-&gt;maybeGetRawHTMLDefinition()) {
$def->addAttribute('a', 'target', 'Enum#_blank,_self,_target,_top');
}</strong>
$purifier = new HTMLPurifier($config);</pre>
<p>
And you're done! Alternatively, if you're OK with not ever caching
your code, the following will still work and not emit warnings.
</p>
<pre>$config = HTMLPurifier_Config::createDefault();
$def = $config-&gt;getHTMLDefinition(true);
$def->addAttribute('a', 'target', 'Enum#_blank,_self,_target,_top');
$purifier = new HTMLPurifier($config);</pre>
<p>
A slightly less efficient version of this was what was going on with
old versions of HTML Purifier.
</p>
<p>
<em>Technical notes:</em> ajh pointed out on <a
href="http://htmlpurifier.org/phorum/read.php?5,5164,5169#msg-5169">in a forum topic</a> that
HTML Purifier appeared to be repeatedly writing to the cache even
when a cache entry already existed. Investigation lead to the
discovery of the following infelicity: caching of customized
definitions didn't actually work! The problem was that even though
a cache file would be written out at the end of the process, there
was no way for HTML Purifier to say, <q>Actually, I've already got a
copy of your work, no need to reconfigure your
customizations</q>. This required the API to change: placing
all of the customizations to the raw definition object in a
conditional which could be skipped.
</p>
</body></html>
<!-- vim: et sw=4 sts=4
-->

View File

@ -0,0 +1,148 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="description" content="Explains various methods for allowing IDs in documents safely in HTML Purifier." />
<link rel="stylesheet" type="text/css" href="./style.css" />
<title>IDs - HTML Purifier</title>
</head><body>
<h1 class="subtitled">IDs</h1>
<div class="subtitle">What they are, why you should(n't) wear them, and how to deal with it</div>
<div id="filing">Filed under End-User</div>
<div id="index">Return to the <a href="index.html">index</a>.</div>
<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
<p>Prior to HTML Purifier 1.2.0, this library blithely accepted user input that
looked like this:</p>
<pre>&lt;a id=&quot;fragment&quot;&gt;Anchor&lt;/a&gt;</pre>
<p>...presenting an attractive vector for those that would destroy standards
compliance: simply set the ID to one that is already used elsewhere in the
document and voila: validation breaks. There was a half-hearted attempt to
prevent this by allowing users to blacklist IDs, but I suspect that no one
really bothered, and thus, with the release of 1.2.0, IDs are now <em>removed</em>
by default.</p>
<p>IDs, however, are quite useful functionality to have, so if users start
complaining about broken anchors you'll probably want to turn them back on
with %Attr.EnableID. But before you go mucking around with the config
object, it's probably worth to take some precautions to keep your page
validating. Why?</p>
<ol>
<li>Standards-compliant pages are good</li>
<li>Duplicated IDs interfere with anchors. If there are two id="foobar"s in a
document, which spot does a browser presented with the fragment #foobar go
to? Most browsers opt for the first appearing ID, making it impossible
to references the second section. Similarly, duplicated IDs can hijack
client-side scripting that relies on the IDs of elements.</li>
</ol>
<p>You have (currently) four ways of dealing with the problem.</p>
<h2 class="subtitled">Blacklisting IDs</h2>
<div class="subsubtitle">Good for pages with single content source and stable templates</div>
<p>Keeping in terms with the
<acronym title="Keep It Simple, Stupid">KISS</acronym> principle, let us
deal with the most obvious solution: preventing users from using any IDs that
appear elsewhere on the document. The method is simple:</p>
<pre>$config-&gt;set('Attr.EnableID', true);
$config-&gt;set('Attr.IDBlacklist' array(
'list', 'of', 'attribute', 'values', 'that', 'are', 'forbidden'
));</pre>
<p>That being said, there are some notable drawbacks. First of all, you have to
know precisely which IDs are being used by the HTML surrounding the user code.
This is easier said than done: quite often the page designer and the system
coder work separately, so the designer has to constantly be talking with the
coder whenever he decides to add a new anchor. Miss one and you open yourself
to possible standards-compliance issues.</p>
<p>Furthermore, this position becomes untenable when a single web page must hold
multiple portions of user-submitted content. Since there's obviously no way
to find out before-hand what IDs users will use, the blacklist is helpless.
And since HTML Purifier validates each segment separately, perhaps doing
so at different times, it would be extremely difficult to dynamically update
the blacklist in between runs.</p>
<p>Finally, simply destroying the ID is extremely un-userfriendly behavior: after
all, they might have simply specified a duplicate ID by accident.</p>
<p>Thus, we get to our second method.</p>
<h2 class="subtitled">Namespacing IDs</h2>
<div class="subsubtitle">Lazy developer's way, but needs user education</div>
<p>This method, too, is quite simple: add a prefix to all user IDs. With this
code:</p>
<pre>$config-&gt;set('Attr.EnableID', true);
$config-&gt;set('Attr.IDPrefix', 'user_');</pre>
<p>...this:</p>
<pre>&lt;a id=&quot;foobar&quot;&gt;Anchor!&lt;/a&gt;</pre>
<p>...turns into:</p>
<pre>&lt;a id=&quot;user_foobar&quot;&gt;Anchor!&lt;/a&gt;</pre>
<p>As long as you don't have any IDs that start with user_, collisions are
guaranteed not to happen. The drawback is obvious: if a user submits
id=&quot;foobar&quot;, they probably expect to be able to reference their page with
#foobar. You'll have to tell them, &quot;No, that doesn't work, you have to add
user_ to the beginning.&quot;</p>
<p>And yes, things get hairier. Even with a nice prefix, we still have done
nothing about multiple HTML Purifier outputs on one page. Thus, we have
a second configuration value to piggy-back off of: %Attr.IDPrefixLocal:</p>
<pre>$config-&gt;set('Attr.IDPrefixLocal', 'comment' . $id . '_');</pre>
<p>This new attributes does nothing but append on to regular IDPrefix, but is
special in that it is volatile: it's value is determined at run-time and
cannot possibly be cordoned into, say, a .ini config file. As for what to
put into the directive, is up to you, but I would recommend the ID number
the text has been assigned in the database. Whatever you pick, however, it
has to be unique and stable for the text you are validating. Note, however,
that we require that %Attr.IDPrefix be set before you use this directive.</p>
<p>And also remember: the user has to know what this prefix is too!</p>
<h2>Abstinence</h2>
<p>You may not want to bother. That's okay too, just don't enable IDs.</p>
<p>Personally, I would take this road whenever user-submitted content would be
possibly be shown together on one page. Why a blog comment would need to use
anchors is beyond me.</p>
<h2>Denial</h2>
<p>To revert back to pre-1.2.0 behavior, simply:</p>
<pre>$config-&gt;set('Attr.EnableID', true);</pre>
<p>Don't come crying to me when your page mysteriously stops validating, though.</p>
</body>
</html>
<!-- vim: et sw=4 sts=4
-->

View File

@ -0,0 +1,59 @@
HTML Purifier
by Edward Z. Yang
There are a number of ad hoc HTML filtering solutions out there on the web
(some examples including HTML_Safe, kses and SafeHtmlChecker.class.php) that
claim to filter HTML properly, preventing malicious JavaScript and layout
breaking HTML from getting through the parser. None of them, however,
demonstrates a thorough knowledge of neither the DTD that defines the HTML
nor the caveats of HTML that cannot be expressed by a DTD. Configurable
filters (such as kses or PHP's built-in striptags() function) have trouble
validating the contents of attributes and can be subject to security attacks
due to poor configuration. Other filters take the naive approach of
blacklisting known threats and tags, failing to account for the introduction
of new technologies, new tags, new attributes or quirky browser behavior.
However, HTML Purifier takes a different approach, one that doesn't use
specification-ignorant regexes or narrow blacklists. HTML Purifier will
decompose the whole document into tokens, and rigorously process the tokens by:
removing non-whitelisted elements, transforming bad practice tags like <font>
into <span>, properly checking the nesting of tags and their children and
validating all attributes according to their RFCs.
To my knowledge, there is nothing like this on the web yet. Not even MediaWiki,
which allows an amazingly diverse mix of HTML and wikitext in its documents,
gets all the nesting quirks right. Existing solutions hope that no JavaScript
will slip through, but either do not attempt to ensure that the resulting
output is valid XHTML or send the HTML through a draconic XML parser (and yet
still get the nesting wrong: SafeHtmlChecker.class.php does not prevent <a>
tags from being nested within each other).
This document no longer is a detailed description of how HTMLPurifier works,
as those descriptions have been moved to the appropriate code. The first
draft was drawn up after two rough code sketches and the implementation of a
forgiving lexer. You may also be interested in the unit tests located in the
tests/ folder, which provide a living document on how exactly the filter deals
with malformed input.
In summary (see corresponding classes for more details):
1. Parse document into an array of tag and text tokens (Lexer)
2. Remove all elements not on whitelist and transform certain other elements
into acceptable forms (i.e. <font>)
3. Make document well formed while helpfully taking into account certain quirks,
such as the fact that <p> tags traditionally are closed by other block-level
elements.
4. Run through all nodes and check children for proper order (especially
important for tables).
5. Validate attributes according to more restrictive definitions based on the
RFCs.
6. Translate back into a string. (Generator)
HTML Purifier is best suited for documents that require a rich array of
HTML tags. Things like blog comments are, in all likelihood, most appropriately
written in an extremely restrictive set of markup that doesn't require
all this functionality (or not written in HTML at all), although this may
be changing in the future with the addition of levels of filtering.
vim: et sw=4 sts=4

View File

@ -0,0 +1,18 @@
Security
Like anything that claims to afford security, HTML_Purifier can be circumvented
through negligence of people. This class will do its job: no more, no less,
and it's up to you to provide it the proper information and proper context
to be effective. Things to remember:
1. Character Encoding: see enduser-utf8.html for more info.
2. IDs: see enduser-id.html for more info
3. URIs: see enduser-uri-filter.html
4. CSS: document pending
Explain which CSS styles we blocked and why.
vim: et sw=4 sts=4

View File

@ -0,0 +1,120 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="description" content="Explains how to speed up HTML Purifier through caching or inbound filtering." />
<link rel="stylesheet" type="text/css" href="./style.css" />
<title>Speeding up HTML Purifier - HTML Purifier</title>
</head><body>
<h1 class="subtitled">Speeding up HTML Purifier</h1>
<div class="subtitle">...also known as the HELP ME LIBRARY IS TOO SLOW MY PAGE TAKE TOO LONG page</div>
<div id="filing">Filed under End-User</div>
<div id="index">Return to the <a href="index.html">index</a>.</div>
<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
<p>HTML Purifier is a very powerful library. But with power comes great
responsibility, in the form of longer execution times. Remember, this
library isn't lightly grazing over submitted HTML: it's deconstructing
the whole thing, rigorously checking the parts, and then putting it back
together. </p>
<p>So, if it so turns out that HTML Purifier is kinda too slow for outbound
filtering, you've got a few options: </p>
<h2>Inbound filtering</h2>
<p>Perform filtering of HTML when it's submitted by the user. Since the
user is already submitting something, an extra half a second tacked on
to the load time probably isn't going to be that huge of a problem.
Then, displaying the content is a simple a manner of outputting it
directly from your database/filesystem. The trouble with this method is
that your user loses the original text, and when doing edits, will be
handling the filtered text. While this may be a good thing, especially
if you're using a WYSIWYG editor, it can also result in data-loss if a
user makes a typo. </p>
<p>Example (non-functional):</p>
<pre>&lt;?php
/**
* FORM SUBMISSION PAGE
* display_error($message) : displays nice error page with message
* display_success() : displays a nice success page
* display_form() : displays the HTML submission form
* database_insert($html) : inserts data into database as new row
*/
if (!empty($_POST)) {
require_once '/path/to/library/HTMLPurifier.auto.php';
require_once 'HTMLPurifier.func.php';
$dirty_html = isset($_POST['html']) ? $_POST['html'] : false;
if (!$dirty_html) {
display_error('You must write some HTML!');
}
$html = HTMLPurifier($dirty_html);
database_insert($html);
display_success();
// notice that $dirty_html is *not* saved
} else {
display_form();
}
?&gt;</pre>
<h2>Caching the filtered output</h2>
<p>Accept the submitted text and put it unaltered into the database, but
then also generate a filtered version and stash that in the database.
Serve the filtered version to readers, and the unaltered version to
editors. If need be, you can invalidate the cache and have the cached
filtered version be regenerated on the first page view. Pros? Full data
retention. Cons? It's more complicated, and opens other editors up to
XSS if they are using a WYSIWYG editor (to fix that, they'd have to be
able to get their hands on the *really* original text served in
plaintext mode). </p>
<p>Example (non-functional):</p>
<pre>&lt;?php
/**
* VIEW PAGE
* display_error($message) : displays nice error page with message
* cache_get($id) : retrieves HTML from fast cache (db or file)
* cache_insert($id, $html) : inserts good HTML into cache system
* database_get($id) : retrieves raw HTML from database
*/
$id = isset($_GET['id']) ? (int) $_GET['id'] : false;
if (!$id) {
display_error('Must specify ID.');
exit;
}
$html = cache_get($id); // filesystem or database
if ($html === false) {
// cache didn't have the HTML, generate it
$raw_html = database_get($id);
require_once '/path/to/library/HTMLPurifier.auto.php';
require_once 'HTMLPurifier.func.php';
$html = HTMLPurifier($raw_html);
cache_insert($id, $html);
}
echo $html;
?&gt;</pre>
<h2>Summary</h2>
<p>In short, inbound filtering is the simple option and caching is the
robust option (albeit with bigger storage requirements). </p>
<p>There is a third option, independent of the two we've discussed: profile
and optimize HTMLPurifier yourself. Be sure to report back your results
if you decide to do that! Especially if you port HTML Purifier to C++.
<tt>;-)</tt></p>
</body>
</html>
<!-- vim: et sw=4 sts=4
-->

View File

@ -0,0 +1,231 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="description" content="Tutorial for tweaking HTML Purifier's Tidy-like behavior." />
<link rel="stylesheet" type="text/css" href="style.css" />
<title>Tidy - HTML Purifier</title>
</head><body>
<h1>Tidy</h1>
<div id="filing">Filed under Development</div>
<div id="index">Return to the <a href="index.html">index</a>.</div>
<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
<p>You've probably heard of HTML Tidy, Dave Raggett's little piece
of software that cleans up poorly written HTML. Let me say it straight
out:</p>
<p class="emphasis">This ain't HTML Tidy!</p>
<p>Rather, Tidy stands for a cool set of Tidy-inspired features in HTML Purifier
that allows users to submit deprecated elements and attributes and get
valid strict markup back. For example:</p>
<pre>&lt;center&gt;Centered&lt;/center&gt;</pre>
<p>...becomes:</p>
<pre>&lt;div style=&quot;text-align:center;&quot;&gt;Centered&lt;/div&gt;</pre>
<p>...when this particular fix is run on the HTML. This tutorial will give
you the lowdown of what exactly HTML Purifier will do when Tidy
is on, and how to fine-tune this behavior. Once again, <strong>you do
not need Tidy installed on your PHP to use these features!</strong></p>
<h2>What does it do?</h2>
<p>Tidy will do several things to your HTML:</p>
<ul>
<li>Convert deprecated elements and attributes to standards-compliant
alternatives</li>
<li>Enforce XHTML compatibility guidelines and other best practices</li>
<li>Preserve data that would normally be removed as per W3C</li>
</ul>
<h2>What are levels?</h2>
<p>Levels describe how aggressive the Tidy module should be when
cleaning up HTML. There are four levels to pick: none, light, medium
and heavy. Each of these levels has a well-defined set of behavior
associated with it, although it may change depending on your doctype.</p>
<dl>
<dt>light</dt>
<dd>This is the <strong>lenient</strong> level. If a tag or attribute
is about to be removed because it isn't supported by the
doctype, Tidy will step in and change into an alternative that
is supported.</dd>
<dt>medium</dt>
<dd>This is the <strong>correctional</strong> level. At this level,
all the functions of light are performed, as well as some extra,
non-essential best practices enforcement. Changes made on this
level are very benign and are unlikely to cause problems.</dd>
<dt>heavy</dt>
<dd>This is the <strong>aggressive</strong> level. If a tag or
attribute is deprecated, it will be converted into a non-deprecated
version, no ifs ands or buts.</dd>
</dl>
<p>By default, Tidy operates on the <strong>medium</strong> level. You can
change the level of cleaning by setting the %HTML.TidyLevel configuration
directive:</p>
<pre>$config-&gt;set('HTML.TidyLevel', 'heavy'); // burn baby burn!</pre>
<h2>Is the light level really light?</h2>
<p>It depends on what doctype you're using. If your documents are HTML
4.01 <em>Transitional</em>, HTML Purifier will be lazy
and won't clean up your <code>center</code>
or <code>font</code> tags. But if you're using HTML 4.01 <em>Strict</em>,
HTML Purifier has no choice: it has to convert them, or they will
be nuked out of existence. So while light on Transitional will result
in little to no changes, light on Strict will still result in quite
a lot of fixes.</p>
<p>This is different behavior from 1.6 or before, where deprecated
tags in transitional documents would
always be cleaned up regardless. This is also better behavior.</p>
<h2>My pages look different!</h2>
<p>HTML Purifier is tasked with converting deprecated tags and
attributes to standards-compliant alternatives, which usually
need copious amounts of CSS. It's also not foolproof: sometimes
things do get lost in the translation. This is why when HTML Purifier
can get away with not doing cleaning, it won't; this is why
the default value is <strong>medium</strong> and not heavy.</p>
<p>Fortunately, only a few attributes have problems with the switch
over. They are described below:</p>
<table class="table">
<thead><tr>
<th>Element@Attr</th>
<th>Changes</th>
</tr></thead>
<tbody>
<tr>
<td>caption@align</td>
<td>Firefox supports stuffing the caption on the
left and right side of the table, a feature that
Internet Explorer, understandably, does not have.
When align equals right or left, the text will simply
be aligned on the left or right side.</td>
</tr>
<tr>
<td>img@align</td>
<td>The implementation for align bottom is good, but not
perfect. There are a few pixel differences.</td>
</tr>
<tr>
<td>br@clear</td>
<td>Clear both gets a little wonky in Internet Explorer. Haven't
really been able to figure out why.</td>
</tr>
<tr>
<td>hr@noshade</td>
<td>All browsers implement this slightly differently: we've
chosen to make noshade horizontal rules gray.</td>
</tr>
</tbody>
</table>
<p>There are a few more minor, although irritating, bugs.
Some older browsers support deprecated attributes,
but not CSS. Transformed elements and attributes will look unstyled
to said browsers. Also, CSS precedence is slightly different for
inline styles versus presentational markup. In increasing precedence:</p>
<ol>
<li>Presentational attributes</li>
<li>External style sheets</li>
<li>Inline styling</li>
</ol>
<p>This means that styling that may have been masked by external CSS
declarations will start showing up (a good thing, perhaps). Finally,
if you've turned off the style attribute, almost all of
these transformations will not work. Sorry mates.</p>
<p>You can review the rendering before and after of these transformations
by consulting the <a
href="http://htmlpurifier.org/live/smoketests/attrTransform.php">attrTransform.php
smoketest</a>.</p>
<h2>I like the general idea, but the specifics bug me!</h2>
<p>So you want HTML Purifier to clean up your HTML, but you're not
so happy about the br@clear implementation. That's perfectly fine!
HTML Purifier will make accomodations:</p>
<pre>$config-&gt;set('HTML.Doctype', 'XHTML 1.0 Transitional');
$config-&gt;set('HTML.TidyLevel', 'heavy'); // all changes, minus...
<strong>$config-&gt;set('HTML.TidyRemove', 'br@clear');</strong></pre>
<p>That third line does the magic, removing the br@clear fix
from the module, ensuring that <code>&lt;br clear="both" /&gt;</code>
will pass through unharmed. The reverse is possible too:</p>
<pre>$config-&gt;set('HTML.Doctype', 'XHTML 1.0 Transitional');
$config-&gt;set('HTML.TidyLevel', 'none'); // no changes, plus...
<strong>$config-&gt;set('HTML.TidyAdd', 'p@align');</strong></pre>
<p>In this case, all transformations are shut off, except for the p@align
one, which you found handy.</p>
<p>To find out what the names of fixes you want to turn on or off are,
you'll have to consult the source code, specifically the files in
<code>HTMLPurifier/HTMLModule/Tidy/</code>. There is, however, a
general syntax:</p>
<table class="table">
<thead>
<tr>
<th>Name</th>
<th>Example</th>
<th>Interpretation</th>
</tr>
</thead>
<tbody>
<tr>
<td>element</td>
<td>font</td>
<td>Tag transform for <em>element</em></td>
</tr>
<tr>
<td>element@attr</td>
<td>br@clear</td>
<td>Attribute transform for <em>attr</em> on <em>element</em></td>
</tr>
<tr>
<td>@attr</td>
<td>@lang</td>
<td>Global attribute transform for <em>attr</em></td>
</tr>
<tr>
<td>e#content_model_type</td>
<td>blockquote#content_model_type</td>
<td>Change of child processing implementation for <em>e</em></td>
</tr>
</tbody>
</table>
<h2>So... what's the lowdown?</h2>
<p>The lowdown is, quite frankly, HTML Purifier's default settings are
probably good enough. The next step is to bump the level up to heavy,
and if that still doesn't satisfy your appetite, do some fine-tuning.
Other than that, don't worry about it: this all works silently and
effectively in the background.</p>
</body></html>
<!-- vim: et sw=4 sts=4
-->

View File

@ -0,0 +1,204 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="description" content="Tutorial for creating custom URI filters." />
<link rel="stylesheet" type="text/css" href="style.css" />
<title>URI Filters - HTML Purifier</title>
</head><body>
<h1>URI Filters</h1>
<div id="filing">Filed under End-User</div>
<div id="index">Return to the <a href="index.html">index</a>.</div>
<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
<p>
This is a quick and dirty document to get you on your way to writing
custom URI filters for your own URL filtering needs. Why would you
want to write a URI filter? If you need URIs your users put into
HTML to magically change into a different URI, this is
exactly what you need!
</p>
<h2>Creating the class</h2>
<p>
Any URI filter you make will be a subclass of <code>HTMLPurifier_URIFilter</code>.
The scaffolding is thus:
</p>
<pre>class HTMLPurifier_URIFilter_<strong>NameOfFilter</strong> extends HTMLPurifier_URIFilter
{
public $name = '<strong>NameOfFilter</strong>';
public function prepare($config) {}
public function filter(&$uri, $config, $context) {}
}</pre>
<p>
Fill in the variable <code>$name</code> with the name of your filter, and
take a look at the two methods. <code>prepare()</code> is an initialization
method that is called only once, before any filtering has been done of the
HTML. Use it to perform any costly setup work that only needs to be done
once. <code>filter()</code> is the guts and innards of our filter:
it takes the URI and does whatever needs to be done to it.
</p>
<p>
If you've worked with HTML Purifier, you'll recognize the <code>$config</code>
and <code>$context</code> parameters. On the other hand, <code>$uri</code>
is something unique to this section of the application: it's a
<code>HTMLPurifier_URI</code> object. The interface is thus:
</p>
<pre>class HTMLPurifier_URI
{
public $scheme, $userinfo, $host, $port, $path, $query, $fragment;
public function HTMLPurifier_URI($scheme, $userinfo, $host, $port, $path, $query, $fragment);
public function toString();
public function copy();
public function getSchemeObj($config, $context);
public function validate($config, $context);
}</pre>
<p>
The first three methods are fairly self-explanatory: you have a constructor,
a serializer, and a cloner. Generally, you won't be using them when
you are manipulating the URI objects themselves.
<code>getSchemeObj()</code> is a special purpose method that returns
a <code>HTMLPurifier_URIScheme</code> object corresponding to the specific
URI at hand. <code>validate()</code> performs general-purpose validation
on the internal components of a URI. Once again, you don't need to
worry about these: they've already been handled for you.
</p>
<h2>URI format</h2>
<p>
As a URIFilter, we're interested in the member variables of the URI object.
</p>
<table class="quick"><tbody>
<tr><th>Scheme</th> <td>The protocol for identifying (and possibly locating) a resource (http, ftp, https)</td></tr>
<tr><th>Userinfo</th> <td>User information such as a username (bob)</td></tr>
<tr><th>Host</th> <td>Domain name or IP address of the server (example.com, 127.0.0.1)</td></tr>
<tr><th>Port</th> <td>Network port number for the server (80, 12345)</td></tr>
<tr><th>Path</th> <td>Data that identifies the resource, possibly hierarchical (/path/to, ed@example.com)</td></tr>
<tr><th>Query</th> <td>String of information to be interpreted by the resource (?q=search-term)</td></tr>
<tr><th>Fragment</th> <td>Additional information for the resource after retrieval (#bookmark)</td></tr>
</tbody></table>
<p>
Because the URI is presented to us in this form, and not
<code>http://bob@example.com:8080/foo.php?q=string#hash</code>, it saves us
a lot of trouble in having to parse the URI every time we want to filter
it. For the record, the above URI has the following components:
</p>
<table class="quick"><tbody>
<tr><th>Scheme</th> <td>http</td></tr>
<tr><th>Userinfo</th> <td>bob</td></tr>
<tr><th>Host</th> <td>example.com</td></tr>
<tr><th>Port</th> <td>8080</td></tr>
<tr><th>Path</th> <td>/foo.php</td></tr>
<tr><th>Query</th> <td>q=string</td></tr>
<tr><th>Fragment</th> <td>hash</td></tr>
</tbody></table>
<p>
Note that there is no question mark or octothorpe in the query or
fragment: these get removed during parsing.
</p>
<p>
With this information, you can get straight to implementing your
<code>filter()</code> method. But one more thing...
</p>
<h2>Return value: Boolean, not URI</h2>
<p>
You may have noticed that the URI is being passed in by reference.
This means that whatever changes you make to it, those changes will
be reflected in the URI object the callee had. <strong>Do not
return the URI object: it is unnecessary and will cause bugs.</strong>
Instead, return a boolean value, true if the filtering was successful,
or false if the URI is beyond repair and needs to be axed.
</p>
<p>
Let's suppose I wanted to write a filter that converted links with a
custom <code>image</code> scheme to its corresponding real path on
our website:
</p>
<pre>class HTMLPurifier_URIFilter_TransformImageScheme extends HTMLPurifier_URIFilter
{
public $name = 'TransformImageScheme';
public function filter(&$uri, $config, $context) {
if ($uri->scheme !== 'image') return true;
$img_name = $uri->path;
// Overwrite the previous URI object
$uri = new HTMLPurifier_URI('http', null, null, null, '/img/' . $img_name . '.png', null, null);
return true;
}
}</pre>
<p>
Notice I did not <code>return $uri;</code>. This filter would turn
<code>image:Foo</code> into <code>/img/Foo.png</code>.
</p>
<h2>Activating your filter</h2>
<p>
Having a filter is all well and good, but you need to tell HTML Purifier
to use it. Fortunately, this part's simple:
</p>
<pre>$uri = $config->getDefinition('URI');
$uri->addFilter(new HTMLPurifier_URIFilter_<strong>NameOfFilter</strong>(), $config);</pre>
<p>
After adding a filter, you won't be able to set configuration directives.
Structure your code accordingly.
</p>
<!-- XXX: link to new documentation system -->
<h2>Post-filter</h2>
<p>
Remember our TransformImageScheme filter? That filter acted before we had
performed scheme validation; otherwise, the URI would have been filtered
out when it was discovered that there was no image scheme. Well, a post-filter
is run after scheme specific validation, so it's ideal for bulk
post-processing of URIs, including munging. To specify a URI as a post-filter,
set the <code>$post</code> member variable to TRUE.
</p>
<pre>class HTMLPurifier_URIFilter_MyPostFilter extends HTMLPurifier_URIFilter
{
public $name = 'MyPostFilter';
public $post = true;
// ... extra code here
}
</pre>
<h2>Examples</h2>
<p>
Check the
<a href="http://repo.or.cz/w/htmlpurifier.git?a=tree;hb=HEAD;f=library/HTMLPurifier/URIFilter">URIFilter</a>
directory for more implementation examples, and see <a href="proposal-new-directives.txt">the
new directives proposal document</a> for ideas on what could be implemented
as a filter.
</p>
</body></html>
<!-- vim: et sw=4 sts=4
-->

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,153 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="description" content="Explains how to safely allow the embedding of flash from trusted sites in HTML Purifier." />
<link rel="stylesheet" type="text/css" href="./style.css" />
<title>Embedding YouTube Videos - HTML Purifier</title>
</head><body>
<h1 class="subtitled">Embedding YouTube Videos</h1>
<div class="subtitle">...as well as other dangerous active content</div>
<div id="filing">Filed under End-User</div>
<div id="index">Return to the <a href="index.html">index</a>.</div>
<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
<p>Clients like their YouTube videos. It gives them a warm fuzzy feeling when
they see a neat little embedded video player on their websites that can play
the latest clips from their documentary &quot;Fido and the Bones of Spring&quot;.
All joking aside, the ability to embed YouTube videos or other active
content in their pages is something that a lot of people like.</p>
<p>This is a <em>bad</em> idea. The moment you embed anything untrusted,
you will definitely be slammed by a manner of nasties that can be
embedded in things from your run of the mill Flash movie to
<a href="http://blog.spywareguide.com/2006/12/myspace_phish_attack_leads_use.html">Quicktime movies</a>.
Even <code>img</code> tags, which HTML Purifier allows by default, can be
dangerous. Be distrustful of anything that tells a browser to load content
from another website automatically.</p>
<p>Luckily for us, however, whitelisting saves the day. Sure, letting users
include any old random flash file could be dangerous, but if it's
from a specific website, it probably is okay. If no amount of pleading will
convince the people upstairs that they should just settle with just linking
to their movies, you may find this technique very useful.</p>
<h2>Looking in</h2>
<p>Below is custom code that allows users to embed
YouTube videos. This is not favoritism: this trick can easily be adapted for
other forms of embeddable content.</p>
<p>Usually, websites like YouTube give us boilerplate code that you can insert
into your documents. YouTube's code goes like this:</p>
<pre>
&lt;object width=&quot;425&quot; height=&quot;350&quot;&gt;
&lt;param name=&quot;movie&quot; value=&quot;http://www.youtube.com/v/AyPzM5WK8ys&quot; /&gt;
&lt;param name=&quot;wmode&quot; value=&quot;transparent&quot; /&gt;
&lt;embed src=&quot;http://www.youtube.com/v/AyPzM5WK8ys&quot;
type=&quot;application/x-shockwave-flash&quot;
wmode=&quot;transparent&quot; width=&quot;425&quot; height=&quot;350&quot; /&gt;
&lt;/object&gt;
</pre>
<p>There are two things to note about this code:</p>
<ol>
<li><code>&lt;embed&gt;</code> is not recognized by W3C, so if you want
standards-compliant code, you'll have to get rid of it.</li>
<li>The code is exactly the same for all instances, except for the
identifier <tt>AyPzM5WK8ys</tt> which tells us which movie file
to retrieve.</li>
</ol>
<p>What point 2 means is that if we have code like <code>&lt;span
class=&quot;youtube-embed&quot;&gt;AyPzM5WK8ys&lt;/span&gt;</code> your
application can reconstruct the full object from this small snippet that
passes through HTML Purifier <em>unharmed</em>.
<a href="http://repo.or.cz/w/htmlpurifier.git?a=blob;hb=HEAD;f=library/HTMLPurifier/Filter/YouTube.php">Show me the code!</a></p>
<p>And the corresponding usage:</p>
<pre>&lt;?php
$config-&gt;set('Filter.YouTube', true);
?&gt;</pre>
<p>There is a bit going in the two code snippets, so let's explain.</p>
<ol>
<li>This is a Filter object, which intercepts the HTML that is
coming into and out of the purifier. You can add as many
filter objects as you like. <code>preFilter()</code>
processes the code before it gets purified, and <code>postFilter()</code>
processes the code afterwards. So, we'll use <code>preFilter()</code> to
replace the object tag with a <code>span</code>, and <code>postFilter()</code>
to restore it.</li>
<li>The first preg_replace call replaces any YouTube code users may have
embedded into the benign span tag. Span is used because it is inline,
and objects are inline too. We are very careful to be extremely
restrictive on what goes inside the span tag, as if an errant code
gets in there it could get messy.</li>
<li>The HTML is then purified as usual.</li>
<li>Then, another preg_replace replaces the span tag with a fully fledged
object. Note that the embed is removed, and, in its place, a data
attribute was added to the object. This makes the tag standards
compliant! It also breaks Internet Explorer, so we add in a bit of
conditional comments with the old embed code to make it work again.
It's all quite convoluted but works.</li>
</ol>
<h2>Warning</h2>
<p>There are a number of possible problems with the code above, depending
on how you look at it.</p>
<h3>Cannot change width and height</h3>
<p>The width and height of the final YouTube movie cannot be adjusted. This
is because I am lazy. If you really insist on letting users change the size
of the movie, what you need to do is package up the attributes inside the
span tag (along with the movie ID). It gets complicated though: a malicious
user can specify an outrageously large height and width and attempt to crash
the user's operating system/browser. You need to either cap it by limiting
the amount of digits allowed in the regex or using a callback to check the
number.</p>
<h3>Trusts media's host's security</h3>
<p>By allowing this code onto our website, we are trusting that YouTube has
tech-savvy enough people not to allow their users to inject malicious
code into the Flash files. An exploit on YouTube means an exploit on your
site. Even though YouTube is run by the reputable Google, it
<a href="http://ha.ckers.org/blog/20061213/google-xss-vuln/">doesn't</a>
mean they are
<a href="http://ha.ckers.org/blog/20061208/xss-in-googles-orkut/">invulnerable.</a>
You're putting a certain measure of the job on an external provider (just as
you have by entrusting your user input to HTML Purifier), and
it is important that you are cognizant of the risk.</p>
<h3>Poorly written adaptations compromise security</h3>
<p>This should go without saying, but if you're going to adapt this code
for Google Video or the like, make sure you do it <em>right</em>. It's
extremely easy to allow a character too many in <code>postFilter()</code> and
suddenly you're introducing XSS into HTML Purifier's XSS free output. HTML
Purifier may be well written, but it cannot guard against vulnerabilities
introduced after it has finished.</p>
<h2>Help out!</h2>
<p>If you write a filter for your favorite video destination (or anything
like that, for that matter), send it over and it might get included
with the core!</p>
</body>
</html>
<!-- vim: et sw=4 sts=4
-->

View File

@ -0,0 +1,196 @@
<!-- Portions (C) International Organization for Standardization 1986
Permission to copy in any form is granted for use with
conforming SGML systems and applications as defined in
ISO 8879, provided this notice is included in all copies.
-->
<!-- Character entity set. Typical invocation:
<!ENTITY % HTMLlat1 PUBLIC
"-//W3C//ENTITIES Latin 1 for XHTML//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent">
%HTMLlat1;
-->
<!ENTITY nbsp "&#160;"> <!-- no-break space = non-breaking space,
U+00A0 ISOnum -->
<!ENTITY iexcl "&#161;"> <!-- inverted exclamation mark, U+00A1 ISOnum -->
<!ENTITY cent "&#162;"> <!-- cent sign, U+00A2 ISOnum -->
<!ENTITY pound "&#163;"> <!-- pound sign, U+00A3 ISOnum -->
<!ENTITY curren "&#164;"> <!-- currency sign, U+00A4 ISOnum -->
<!ENTITY yen "&#165;"> <!-- yen sign = yuan sign, U+00A5 ISOnum -->
<!ENTITY brvbar "&#166;"> <!-- broken bar = broken vertical bar,
U+00A6 ISOnum -->
<!ENTITY sect "&#167;"> <!-- section sign, U+00A7 ISOnum -->
<!ENTITY uml "&#168;"> <!-- diaeresis = spacing diaeresis,
U+00A8 ISOdia -->
<!ENTITY copy "&#169;"> <!-- copyright sign, U+00A9 ISOnum -->
<!ENTITY ordf "&#170;"> <!-- feminine ordinal indicator, U+00AA ISOnum -->
<!ENTITY laquo "&#171;"> <!-- left-pointing double angle quotation mark
= left pointing guillemet, U+00AB ISOnum -->
<!ENTITY not "&#172;"> <!-- not sign = angled dash,
U+00AC ISOnum -->
<!ENTITY shy "&#173;"> <!-- soft hyphen = discretionary hyphen,
U+00AD ISOnum -->
<!ENTITY reg "&#174;"> <!-- registered sign = registered trade mark sign,
U+00AE ISOnum -->
<!ENTITY macr "&#175;"> <!-- macron = spacing macron = overline
= APL overbar, U+00AF ISOdia -->
<!ENTITY deg "&#176;"> <!-- degree sign, U+00B0 ISOnum -->
<!ENTITY plusmn "&#177;"> <!-- plus-minus sign = plus-or-minus sign,
U+00B1 ISOnum -->
<!ENTITY sup2 "&#178;"> <!-- superscript two = superscript digit two
= squared, U+00B2 ISOnum -->
<!ENTITY sup3 "&#179;"> <!-- superscript three = superscript digit three
= cubed, U+00B3 ISOnum -->
<!ENTITY acute "&#180;"> <!-- acute accent = spacing acute,
U+00B4 ISOdia -->
<!ENTITY micro "&#181;"> <!-- micro sign, U+00B5 ISOnum -->
<!ENTITY para "&#182;"> <!-- pilcrow sign = paragraph sign,
U+00B6 ISOnum -->
<!ENTITY middot "&#183;"> <!-- middle dot = Georgian comma
= Greek middle dot, U+00B7 ISOnum -->
<!ENTITY cedil "&#184;"> <!-- cedilla = spacing cedilla, U+00B8 ISOdia -->
<!ENTITY sup1 "&#185;"> <!-- superscript one = superscript digit one,
U+00B9 ISOnum -->
<!ENTITY ordm "&#186;"> <!-- masculine ordinal indicator,
U+00BA ISOnum -->
<!ENTITY raquo "&#187;"> <!-- right-pointing double angle quotation mark
= right pointing guillemet, U+00BB ISOnum -->
<!ENTITY frac14 "&#188;"> <!-- vulgar fraction one quarter
= fraction one quarter, U+00BC ISOnum -->
<!ENTITY frac12 "&#189;"> <!-- vulgar fraction one half
= fraction one half, U+00BD ISOnum -->
<!ENTITY frac34 "&#190;"> <!-- vulgar fraction three quarters
= fraction three quarters, U+00BE ISOnum -->
<!ENTITY iquest "&#191;"> <!-- inverted question mark
= turned question mark, U+00BF ISOnum -->
<!ENTITY Agrave "&#192;"> <!-- latin capital letter A with grave
= latin capital letter A grave,
U+00C0 ISOlat1 -->
<!ENTITY Aacute "&#193;"> <!-- latin capital letter A with acute,
U+00C1 ISOlat1 -->
<!ENTITY Acirc "&#194;"> <!-- latin capital letter A with circumflex,
U+00C2 ISOlat1 -->
<!ENTITY Atilde "&#195;"> <!-- latin capital letter A with tilde,
U+00C3 ISOlat1 -->
<!ENTITY Auml "&#196;"> <!-- latin capital letter A with diaeresis,
U+00C4 ISOlat1 -->
<!ENTITY Aring "&#197;"> <!-- latin capital letter A with ring above
= latin capital letter A ring,
U+00C5 ISOlat1 -->
<!ENTITY AElig "&#198;"> <!-- latin capital letter AE
= latin capital ligature AE,
U+00C6 ISOlat1 -->
<!ENTITY Ccedil "&#199;"> <!-- latin capital letter C with cedilla,
U+00C7 ISOlat1 -->
<!ENTITY Egrave "&#200;"> <!-- latin capital letter E with grave,
U+00C8 ISOlat1 -->
<!ENTITY Eacute "&#201;"> <!-- latin capital letter E with acute,
U+00C9 ISOlat1 -->
<!ENTITY Ecirc "&#202;"> <!-- latin capital letter E with circumflex,
U+00CA ISOlat1 -->
<!ENTITY Euml "&#203;"> <!-- latin capital letter E with diaeresis,
U+00CB ISOlat1 -->
<!ENTITY Igrave "&#204;"> <!-- latin capital letter I with grave,
U+00CC ISOlat1 -->
<!ENTITY Iacute "&#205;"> <!-- latin capital letter I with acute,
U+00CD ISOlat1 -->
<!ENTITY Icirc "&#206;"> <!-- latin capital letter I with circumflex,
U+00CE ISOlat1 -->
<!ENTITY Iuml "&#207;"> <!-- latin capital letter I with diaeresis,
U+00CF ISOlat1 -->
<!ENTITY ETH "&#208;"> <!-- latin capital letter ETH, U+00D0 ISOlat1 -->
<!ENTITY Ntilde "&#209;"> <!-- latin capital letter N with tilde,
U+00D1 ISOlat1 -->
<!ENTITY Ograve "&#210;"> <!-- latin capital letter O with grave,
U+00D2 ISOlat1 -->
<!ENTITY Oacute "&#211;"> <!-- latin capital letter O with acute,
U+00D3 ISOlat1 -->
<!ENTITY Ocirc "&#212;"> <!-- latin capital letter O with circumflex,
U+00D4 ISOlat1 -->
<!ENTITY Otilde "&#213;"> <!-- latin capital letter O with tilde,
U+00D5 ISOlat1 -->
<!ENTITY Ouml "&#214;"> <!-- latin capital letter O with diaeresis,
U+00D6 ISOlat1 -->
<!ENTITY times "&#215;"> <!-- multiplication sign, U+00D7 ISOnum -->
<!ENTITY Oslash "&#216;"> <!-- latin capital letter O with stroke
= latin capital letter O slash,
U+00D8 ISOlat1 -->
<!ENTITY Ugrave "&#217;"> <!-- latin capital letter U with grave,
U+00D9 ISOlat1 -->
<!ENTITY Uacute "&#218;"> <!-- latin capital letter U with acute,
U+00DA ISOlat1 -->
<!ENTITY Ucirc "&#219;"> <!-- latin capital letter U with circumflex,
U+00DB ISOlat1 -->
<!ENTITY Uuml "&#220;"> <!-- latin capital letter U with diaeresis,
U+00DC ISOlat1 -->
<!ENTITY Yacute "&#221;"> <!-- latin capital letter Y with acute,
U+00DD ISOlat1 -->
<!ENTITY THORN "&#222;"> <!-- latin capital letter THORN,
U+00DE ISOlat1 -->
<!ENTITY szlig "&#223;"> <!-- latin small letter sharp s = ess-zed,
U+00DF ISOlat1 -->
<!ENTITY agrave "&#224;"> <!-- latin small letter a with grave
= latin small letter a grave,
U+00E0 ISOlat1 -->
<!ENTITY aacute "&#225;"> <!-- latin small letter a with acute,
U+00E1 ISOlat1 -->
<!ENTITY acirc "&#226;"> <!-- latin small letter a with circumflex,
U+00E2 ISOlat1 -->
<!ENTITY atilde "&#227;"> <!-- latin small letter a with tilde,
U+00E3 ISOlat1 -->
<!ENTITY auml "&#228;"> <!-- latin small letter a with diaeresis,
U+00E4 ISOlat1 -->
<!ENTITY aring "&#229;"> <!-- latin small letter a with ring above
= latin small letter a ring,
U+00E5 ISOlat1 -->
<!ENTITY aelig "&#230;"> <!-- latin small letter ae
= latin small ligature ae, U+00E6 ISOlat1 -->
<!ENTITY ccedil "&#231;"> <!-- latin small letter c with cedilla,
U+00E7 ISOlat1 -->
<!ENTITY egrave "&#232;"> <!-- latin small letter e with grave,
U+00E8 ISOlat1 -->
<!ENTITY eacute "&#233;"> <!-- latin small letter e with acute,
U+00E9 ISOlat1 -->
<!ENTITY ecirc "&#234;"> <!-- latin small letter e with circumflex,
U+00EA ISOlat1 -->
<!ENTITY euml "&#235;"> <!-- latin small letter e with diaeresis,
U+00EB ISOlat1 -->
<!ENTITY igrave "&#236;"> <!-- latin small letter i with grave,
U+00EC ISOlat1 -->
<!ENTITY iacute "&#237;"> <!-- latin small letter i with acute,
U+00ED ISOlat1 -->
<!ENTITY icirc "&#238;"> <!-- latin small letter i with circumflex,
U+00EE ISOlat1 -->
<!ENTITY iuml "&#239;"> <!-- latin small letter i with diaeresis,
U+00EF ISOlat1 -->
<!ENTITY eth "&#240;"> <!-- latin small letter eth, U+00F0 ISOlat1 -->
<!ENTITY ntilde "&#241;"> <!-- latin small letter n with tilde,
U+00F1 ISOlat1 -->
<!ENTITY ograve "&#242;"> <!-- latin small letter o with grave,
U+00F2 ISOlat1 -->
<!ENTITY oacute "&#243;"> <!-- latin small letter o with acute,
U+00F3 ISOlat1 -->
<!ENTITY ocirc "&#244;"> <!-- latin small letter o with circumflex,
U+00F4 ISOlat1 -->
<!ENTITY otilde "&#245;"> <!-- latin small letter o with tilde,
U+00F5 ISOlat1 -->
<!ENTITY ouml "&#246;"> <!-- latin small letter o with diaeresis,
U+00F6 ISOlat1 -->
<!ENTITY divide "&#247;"> <!-- division sign, U+00F7 ISOnum -->
<!ENTITY oslash "&#248;"> <!-- latin small letter o with stroke,
= latin small letter o slash,
U+00F8 ISOlat1 -->
<!ENTITY ugrave "&#249;"> <!-- latin small letter u with grave,
U+00F9 ISOlat1 -->
<!ENTITY uacute "&#250;"> <!-- latin small letter u with acute,
U+00FA ISOlat1 -->
<!ENTITY ucirc "&#251;"> <!-- latin small letter u with circumflex,
U+00FB ISOlat1 -->
<!ENTITY uuml "&#252;"> <!-- latin small letter u with diaeresis,
U+00FC ISOlat1 -->
<!ENTITY yacute "&#253;"> <!-- latin small letter y with acute,
U+00FD ISOlat1 -->
<!ENTITY thorn "&#254;"> <!-- latin small letter thorn,
U+00FE ISOlat1 -->
<!ENTITY yuml "&#255;"> <!-- latin small letter y with diaeresis,
U+00FF ISOlat1 -->

View File

@ -0,0 +1,80 @@
<!-- Special characters for XHTML -->
<!-- Character entity set. Typical invocation:
<!ENTITY % HTMLspecial PUBLIC
"-//W3C//ENTITIES Special for XHTML//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent">
%HTMLspecial;
-->
<!-- Portions (C) International Organization for Standardization 1986:
Permission to copy in any form is granted for use with
conforming SGML systems and applications as defined in
ISO 8879, provided this notice is included in all copies.
-->
<!-- Relevant ISO entity set is given unless names are newly introduced.
New names (i.e., not in ISO 8879 list) do not clash with any
existing ISO 8879 entity names. ISO 10646 character numbers
are given for each character, in hex. values are decimal
conversions of the ISO 10646 values and refer to the document
character set. Names are Unicode names.
-->
<!-- C0 Controls and Basic Latin -->
<!ENTITY quot "&#34;"> <!-- quotation mark, U+0022 ISOnum -->
<!ENTITY amp "&#38;#38;"> <!-- ampersand, U+0026 ISOnum -->
<!ENTITY lt "&#38;#60;"> <!-- less-than sign, U+003C ISOnum -->
<!ENTITY gt "&#62;"> <!-- greater-than sign, U+003E ISOnum -->
<!ENTITY apos "&#39;"> <!-- apostrophe = APL quote, U+0027 ISOnum -->
<!-- Latin Extended-A -->
<!ENTITY OElig "&#338;"> <!-- latin capital ligature OE,
U+0152 ISOlat2 -->
<!ENTITY oelig "&#339;"> <!-- latin small ligature oe, U+0153 ISOlat2 -->
<!-- ligature is a misnomer, this is a separate character in some languages -->
<!ENTITY Scaron "&#352;"> <!-- latin capital letter S with caron,
U+0160 ISOlat2 -->
<!ENTITY scaron "&#353;"> <!-- latin small letter s with caron,
U+0161 ISOlat2 -->
<!ENTITY Yuml "&#376;"> <!-- latin capital letter Y with diaeresis,
U+0178 ISOlat2 -->
<!-- Spacing Modifier Letters -->
<!ENTITY circ "&#710;"> <!-- modifier letter circumflex accent,
U+02C6 ISOpub -->
<!ENTITY tilde "&#732;"> <!-- small tilde, U+02DC ISOdia -->
<!-- General Punctuation -->
<!ENTITY ensp "&#8194;"> <!-- en space, U+2002 ISOpub -->
<!ENTITY emsp "&#8195;"> <!-- em space, U+2003 ISOpub -->
<!ENTITY thinsp "&#8201;"> <!-- thin space, U+2009 ISOpub -->
<!ENTITY zwnj "&#8204;"> <!-- zero width non-joiner,
U+200C NEW RFC 2070 -->
<!ENTITY zwj "&#8205;"> <!-- zero width joiner, U+200D NEW RFC 2070 -->
<!ENTITY lrm "&#8206;"> <!-- left-to-right mark, U+200E NEW RFC 2070 -->
<!ENTITY rlm "&#8207;"> <!-- right-to-left mark, U+200F NEW RFC 2070 -->
<!ENTITY ndash "&#8211;"> <!-- en dash, U+2013 ISOpub -->
<!ENTITY mdash "&#8212;"> <!-- em dash, U+2014 ISOpub -->
<!ENTITY lsquo "&#8216;"> <!-- left single quotation mark,
U+2018 ISOnum -->
<!ENTITY rsquo "&#8217;"> <!-- right single quotation mark,
U+2019 ISOnum -->
<!ENTITY sbquo "&#8218;"> <!-- single low-9 quotation mark, U+201A NEW -->
<!ENTITY ldquo "&#8220;"> <!-- left double quotation mark,
U+201C ISOnum -->
<!ENTITY rdquo "&#8221;"> <!-- right double quotation mark,
U+201D ISOnum -->
<!ENTITY bdquo "&#8222;"> <!-- double low-9 quotation mark, U+201E NEW -->
<!ENTITY dagger "&#8224;"> <!-- dagger, U+2020 ISOpub -->
<!ENTITY Dagger "&#8225;"> <!-- double dagger, U+2021 ISOpub -->
<!ENTITY permil "&#8240;"> <!-- per mille sign, U+2030 ISOtech -->
<!ENTITY lsaquo "&#8249;"> <!-- single left-pointing angle quotation mark,
U+2039 ISO proposed -->
<!-- lsaquo is proposed but not yet ISO standardized -->
<!ENTITY rsaquo "&#8250;"> <!-- single right-pointing angle quotation mark,
U+203A ISO proposed -->
<!-- rsaquo is proposed but not yet ISO standardized -->
<!-- Currency Symbols -->
<!ENTITY euro "&#8364;"> <!-- euro sign, U+20AC NEW -->

View File

@ -0,0 +1,237 @@
<!-- Mathematical, Greek and Symbolic characters for XHTML -->
<!-- Character entity set. Typical invocation:
<!ENTITY % HTMLsymbol PUBLIC
"-//W3C//ENTITIES Symbols for XHTML//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent">
%HTMLsymbol;
-->
<!-- Portions (C) International Organization for Standardization 1986:
Permission to copy in any form is granted for use with
conforming SGML systems and applications as defined in
ISO 8879, provided this notice is included in all copies.
-->
<!-- Relevant ISO entity set is given unless names are newly introduced.
New names (i.e., not in ISO 8879 list) do not clash with any
existing ISO 8879 entity names. ISO 10646 character numbers
are given for each character, in hex. values are decimal
conversions of the ISO 10646 values and refer to the document
character set. Names are Unicode names.
-->
<!-- Latin Extended-B -->
<!ENTITY fnof "&#402;"> <!-- latin small letter f with hook = function
= florin, U+0192 ISOtech -->
<!-- Greek -->
<!ENTITY Alpha "&#913;"> <!-- greek capital letter alpha, U+0391 -->
<!ENTITY Beta "&#914;"> <!-- greek capital letter beta, U+0392 -->
<!ENTITY Gamma "&#915;"> <!-- greek capital letter gamma,
U+0393 ISOgrk3 -->
<!ENTITY Delta "&#916;"> <!-- greek capital letter delta,
U+0394 ISOgrk3 -->
<!ENTITY Epsilon "&#917;"> <!-- greek capital letter epsilon, U+0395 -->
<!ENTITY Zeta "&#918;"> <!-- greek capital letter zeta, U+0396 -->
<!ENTITY Eta "&#919;"> <!-- greek capital letter eta, U+0397 -->
<!ENTITY Theta "&#920;"> <!-- greek capital letter theta,
U+0398 ISOgrk3 -->
<!ENTITY Iota "&#921;"> <!-- greek capital letter iota, U+0399 -->
<!ENTITY Kappa "&#922;"> <!-- greek capital letter kappa, U+039A -->
<!ENTITY Lambda "&#923;"> <!-- greek capital letter lamda,
U+039B ISOgrk3 -->
<!ENTITY Mu "&#924;"> <!-- greek capital letter mu, U+039C -->
<!ENTITY Nu "&#925;"> <!-- greek capital letter nu, U+039D -->
<!ENTITY Xi "&#926;"> <!-- greek capital letter xi, U+039E ISOgrk3 -->
<!ENTITY Omicron "&#927;"> <!-- greek capital letter omicron, U+039F -->
<!ENTITY Pi "&#928;"> <!-- greek capital letter pi, U+03A0 ISOgrk3 -->
<!ENTITY Rho "&#929;"> <!-- greek capital letter rho, U+03A1 -->
<!-- there is no Sigmaf, and no U+03A2 character either -->
<!ENTITY Sigma "&#931;"> <!-- greek capital letter sigma,
U+03A3 ISOgrk3 -->
<!ENTITY Tau "&#932;"> <!-- greek capital letter tau, U+03A4 -->
<!ENTITY Upsilon "&#933;"> <!-- greek capital letter upsilon,
U+03A5 ISOgrk3 -->
<!ENTITY Phi "&#934;"> <!-- greek capital letter phi,
U+03A6 ISOgrk3 -->
<!ENTITY Chi "&#935;"> <!-- greek capital letter chi, U+03A7 -->
<!ENTITY Psi "&#936;"> <!-- greek capital letter psi,
U+03A8 ISOgrk3 -->
<!ENTITY Omega "&#937;"> <!-- greek capital letter omega,
U+03A9 ISOgrk3 -->
<!ENTITY alpha "&#945;"> <!-- greek small letter alpha,
U+03B1 ISOgrk3 -->
<!ENTITY beta "&#946;"> <!-- greek small letter beta, U+03B2 ISOgrk3 -->
<!ENTITY gamma "&#947;"> <!-- greek small letter gamma,
U+03B3 ISOgrk3 -->
<!ENTITY delta "&#948;"> <!-- greek small letter delta,
U+03B4 ISOgrk3 -->
<!ENTITY epsilon "&#949;"> <!-- greek small letter epsilon,
U+03B5 ISOgrk3 -->
<!ENTITY zeta "&#950;"> <!-- greek small letter zeta, U+03B6 ISOgrk3 -->
<!ENTITY eta "&#951;"> <!-- greek small letter eta, U+03B7 ISOgrk3 -->
<!ENTITY theta "&#952;"> <!-- greek small letter theta,
U+03B8 ISOgrk3 -->
<!ENTITY iota "&#953;"> <!-- greek small letter iota, U+03B9 ISOgrk3 -->
<!ENTITY kappa "&#954;"> <!-- greek small letter kappa,
U+03BA ISOgrk3 -->
<!ENTITY lambda "&#955;"> <!-- greek small letter lamda,
U+03BB ISOgrk3 -->
<!ENTITY mu "&#956;"> <!-- greek small letter mu, U+03BC ISOgrk3 -->
<!ENTITY nu "&#957;"> <!-- greek small letter nu, U+03BD ISOgrk3 -->
<!ENTITY xi "&#958;"> <!-- greek small letter xi, U+03BE ISOgrk3 -->
<!ENTITY omicron "&#959;"> <!-- greek small letter omicron, U+03BF NEW -->
<!ENTITY pi "&#960;"> <!-- greek small letter pi, U+03C0 ISOgrk3 -->
<!ENTITY rho "&#961;"> <!-- greek small letter rho, U+03C1 ISOgrk3 -->
<!ENTITY sigmaf "&#962;"> <!-- greek small letter final sigma,
U+03C2 ISOgrk3 -->
<!ENTITY sigma "&#963;"> <!-- greek small letter sigma,
U+03C3 ISOgrk3 -->
<!ENTITY tau "&#964;"> <!-- greek small letter tau, U+03C4 ISOgrk3 -->
<!ENTITY upsilon "&#965;"> <!-- greek small letter upsilon,
U+03C5 ISOgrk3 -->
<!ENTITY phi "&#966;"> <!-- greek small letter phi, U+03C6 ISOgrk3 -->
<!ENTITY chi "&#967;"> <!-- greek small letter chi, U+03C7 ISOgrk3 -->
<!ENTITY psi "&#968;"> <!-- greek small letter psi, U+03C8 ISOgrk3 -->
<!ENTITY omega "&#969;"> <!-- greek small letter omega,
U+03C9 ISOgrk3 -->
<!ENTITY thetasym "&#977;"> <!-- greek theta symbol,
U+03D1 NEW -->
<!ENTITY upsih "&#978;"> <!-- greek upsilon with hook symbol,
U+03D2 NEW -->
<!ENTITY piv "&#982;"> <!-- greek pi symbol, U+03D6 ISOgrk3 -->
<!-- General Punctuation -->
<!ENTITY bull "&#8226;"> <!-- bullet = black small circle,
U+2022 ISOpub -->
<!-- bullet is NOT the same as bullet operator, U+2219 -->
<!ENTITY hellip "&#8230;"> <!-- horizontal ellipsis = three dot leader,
U+2026 ISOpub -->
<!ENTITY prime "&#8242;"> <!-- prime = minutes = feet, U+2032 ISOtech -->
<!ENTITY Prime "&#8243;"> <!-- double prime = seconds = inches,
U+2033 ISOtech -->
<!ENTITY oline "&#8254;"> <!-- overline = spacing overscore,
U+203E NEW -->
<!ENTITY frasl "&#8260;"> <!-- fraction slash, U+2044 NEW -->
<!-- Letterlike Symbols -->
<!ENTITY weierp "&#8472;"> <!-- script capital P = power set
= Weierstrass p, U+2118 ISOamso -->
<!ENTITY image "&#8465;"> <!-- black-letter capital I = imaginary part,
U+2111 ISOamso -->
<!ENTITY real "&#8476;"> <!-- black-letter capital R = real part symbol,
U+211C ISOamso -->
<!ENTITY trade "&#8482;"> <!-- trade mark sign, U+2122 ISOnum -->
<!ENTITY alefsym "&#8501;"> <!-- alef symbol = first transfinite cardinal,
U+2135 NEW -->
<!-- alef symbol is NOT the same as hebrew letter alef,
U+05D0 although the same glyph could be used to depict both characters -->
<!-- Arrows -->
<!ENTITY larr "&#8592;"> <!-- leftwards arrow, U+2190 ISOnum -->
<!ENTITY uarr "&#8593;"> <!-- upwards arrow, U+2191 ISOnum-->
<!ENTITY rarr "&#8594;"> <!-- rightwards arrow, U+2192 ISOnum -->
<!ENTITY darr "&#8595;"> <!-- downwards arrow, U+2193 ISOnum -->
<!ENTITY harr "&#8596;"> <!-- left right arrow, U+2194 ISOamsa -->
<!ENTITY crarr "&#8629;"> <!-- downwards arrow with corner leftwards
= carriage return, U+21B5 NEW -->
<!ENTITY lArr "&#8656;"> <!-- leftwards double arrow, U+21D0 ISOtech -->
<!-- Unicode does not say that lArr is the same as the 'is implied by' arrow
but also does not have any other character for that function. So lArr can
be used for 'is implied by' as ISOtech suggests -->
<!ENTITY uArr "&#8657;"> <!-- upwards double arrow, U+21D1 ISOamsa -->
<!ENTITY rArr "&#8658;"> <!-- rightwards double arrow,
U+21D2 ISOtech -->
<!-- Unicode does not say this is the 'implies' character but does not have
another character with this function so rArr can be used for 'implies'
as ISOtech suggests -->
<!ENTITY dArr "&#8659;"> <!-- downwards double arrow, U+21D3 ISOamsa -->
<!ENTITY hArr "&#8660;"> <!-- left right double arrow,
U+21D4 ISOamsa -->
<!-- Mathematical Operators -->
<!ENTITY forall "&#8704;"> <!-- for all, U+2200 ISOtech -->
<!ENTITY part "&#8706;"> <!-- partial differential, U+2202 ISOtech -->
<!ENTITY exist "&#8707;"> <!-- there exists, U+2203 ISOtech -->
<!ENTITY empty "&#8709;"> <!-- empty set = null set, U+2205 ISOamso -->
<!ENTITY nabla "&#8711;"> <!-- nabla = backward difference,
U+2207 ISOtech -->
<!ENTITY isin "&#8712;"> <!-- element of, U+2208 ISOtech -->
<!ENTITY notin "&#8713;"> <!-- not an element of, U+2209 ISOtech -->
<!ENTITY ni "&#8715;"> <!-- contains as member, U+220B ISOtech -->
<!ENTITY prod "&#8719;"> <!-- n-ary product = product sign,
U+220F ISOamsb -->
<!-- prod is NOT the same character as U+03A0 'greek capital letter pi' though
the same glyph might be used for both -->
<!ENTITY sum "&#8721;"> <!-- n-ary summation, U+2211 ISOamsb -->
<!-- sum is NOT the same character as U+03A3 'greek capital letter sigma'
though the same glyph might be used for both -->
<!ENTITY minus "&#8722;"> <!-- minus sign, U+2212 ISOtech -->
<!ENTITY lowast "&#8727;"> <!-- asterisk operator, U+2217 ISOtech -->
<!ENTITY radic "&#8730;"> <!-- square root = radical sign,
U+221A ISOtech -->
<!ENTITY prop "&#8733;"> <!-- proportional to, U+221D ISOtech -->
<!ENTITY infin "&#8734;"> <!-- infinity, U+221E ISOtech -->
<!ENTITY ang "&#8736;"> <!-- angle, U+2220 ISOamso -->
<!ENTITY and "&#8743;"> <!-- logical and = wedge, U+2227 ISOtech -->
<!ENTITY or "&#8744;"> <!-- logical or = vee, U+2228 ISOtech -->
<!ENTITY cap "&#8745;"> <!-- intersection = cap, U+2229 ISOtech -->
<!ENTITY cup "&#8746;"> <!-- union = cup, U+222A ISOtech -->
<!ENTITY int "&#8747;"> <!-- integral, U+222B ISOtech -->
<!ENTITY there4 "&#8756;"> <!-- therefore, U+2234 ISOtech -->
<!ENTITY sim "&#8764;"> <!-- tilde operator = varies with = similar to,
U+223C ISOtech -->
<!-- tilde operator is NOT the same character as the tilde, U+007E,
although the same glyph might be used to represent both -->
<!ENTITY cong "&#8773;"> <!-- approximately equal to, U+2245 ISOtech -->
<!ENTITY asymp "&#8776;"> <!-- almost equal to = asymptotic to,
U+2248 ISOamsr -->
<!ENTITY ne "&#8800;"> <!-- not equal to, U+2260 ISOtech -->
<!ENTITY equiv "&#8801;"> <!-- identical to, U+2261 ISOtech -->
<!ENTITY le "&#8804;"> <!-- less-than or equal to, U+2264 ISOtech -->
<!ENTITY ge "&#8805;"> <!-- greater-than or equal to,
U+2265 ISOtech -->
<!ENTITY sub "&#8834;"> <!-- subset of, U+2282 ISOtech -->
<!ENTITY sup "&#8835;"> <!-- superset of, U+2283 ISOtech -->
<!ENTITY nsub "&#8836;"> <!-- not a subset of, U+2284 ISOamsn -->
<!ENTITY sube "&#8838;"> <!-- subset of or equal to, U+2286 ISOtech -->
<!ENTITY supe "&#8839;"> <!-- superset of or equal to,
U+2287 ISOtech -->
<!ENTITY oplus "&#8853;"> <!-- circled plus = direct sum,
U+2295 ISOamsb -->
<!ENTITY otimes "&#8855;"> <!-- circled times = vector product,
U+2297 ISOamsb -->
<!ENTITY perp "&#8869;"> <!-- up tack = orthogonal to = perpendicular,
U+22A5 ISOtech -->
<!ENTITY sdot "&#8901;"> <!-- dot operator, U+22C5 ISOamsb -->
<!-- dot operator is NOT the same character as U+00B7 middle dot -->
<!-- Miscellaneous Technical -->
<!ENTITY lceil "&#8968;"> <!-- left ceiling = APL upstile,
U+2308 ISOamsc -->
<!ENTITY rceil "&#8969;"> <!-- right ceiling, U+2309 ISOamsc -->
<!ENTITY lfloor "&#8970;"> <!-- left floor = APL downstile,
U+230A ISOamsc -->
<!ENTITY rfloor "&#8971;"> <!-- right floor, U+230B ISOamsc -->
<!ENTITY lang "&#9001;"> <!-- left-pointing angle bracket = bra,
U+2329 ISOtech -->
<!-- lang is NOT the same character as U+003C 'less than sign'
or U+2039 'single left-pointing angle quotation mark' -->
<!ENTITY rang "&#9002;"> <!-- right-pointing angle bracket = ket,
U+232A ISOtech -->
<!-- rang is NOT the same character as U+003E 'greater than sign'
or U+203A 'single right-pointing angle quotation mark' -->
<!-- Geometric Shapes -->
<!ENTITY loz "&#9674;"> <!-- lozenge, U+25CA ISOpub -->
<!-- Miscellaneous Symbols -->
<!ENTITY spades "&#9824;"> <!-- black spade suit, U+2660 ISOpub -->
<!-- black here seems to mean filled as opposed to hollow -->
<!ENTITY clubs "&#9827;"> <!-- black club suit = shamrock,
U+2663 ISOpub -->
<!ENTITY hearts "&#9829;"> <!-- black heart suit = valentine,
U+2665 ISOpub -->
<!ENTITY diams "&#9830;"> <!-- black diamond suit, U+2666 ISOpub -->

View File

@ -0,0 +1,23 @@
<?php
// This file demonstrates basic usage of HTMLPurifier.
// replace this with the path to the HTML Purifier library
require_once '../../library/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
// configuration goes here:
$config->set('Core.Encoding', 'UTF-8'); // replace with your encoding
$config->set('HTML.Doctype', 'XHTML 1.0 Transitional'); // replace with your doctype
$purifier = new HTMLPurifier($config);
// untrusted input HTML
$html = '<b>Simple and short';
$pure_html = $purifier->purify($html);
echo '<pre>' . htmlspecialchars($pure_html) . '</pre>';
// vim: et sw=4 sts=4

View File

@ -0,0 +1,9 @@
<public:attach event="oncontentready" onevent="init();" />
<script>
function init() {
element.innerHTML = '&#8220;'+element.innerHTML+'&#8221;';
}
</script>
<!-- vim: et sw=4 sts=4
-->

View File

@ -0,0 +1,188 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="description" content="Index to all HTML Purifier documentation." />
<link rel="stylesheet" type="text/css" href="./style.css" />
<title>Documentation - HTML Purifier</title>
</head>
<body>
<h1>Documentation</h1>
<p><strong><a href="http://htmlpurifier.org/">HTML Purifier</a></strong> has documentation for all types of people.
Here is an index of all of them.</p>
<h2>End-user</h2>
<p>End-user documentation that contains articles, tutorials and useful
information for casual developers using HTML Purifier.</p>
<dl>
<dt><a href="enduser-id.html">IDs</a></dt>
<dd>Explains various methods for allowing IDs in documents safely.</dd>
<dt><a href="enduser-youtube.html">Embedding YouTube videos</a></dt>
<dd>Explains how to safely allow the embedding of flash from trusted sites.</dd>
<dt><a href="enduser-slow.html">Speeding up HTML Purifier</a></dt>
<dd>Explains how to speed up HTML Purifier through caching or inbound filtering.</dd>
<dt><a href="enduser-utf8.html">UTF-8: The Secret of Character Encoding</a></dt>
<dd>Describes the rationale for using UTF-8, the ramifications otherwise, and how to make the switch.</dd>
<dt><a href="enduser-tidy.html">Tidy</a></dt>
<dd>Tutorial for tweaking HTML Purifier's Tidy-like behavior.</dd>
<dt><a href="enduser-customize.html">Customize</a></dt>
<dd>Tutorial for customizing HTML Purifier's tag and attribute sets.</dd>
<dt><a href="enduser-uri-filter.html">URI Filters</a></dt>
<dd>Tutorial for creating custom URI filters.</dd>
</dl>
<h2>Development</h2>
<p>Developer documentation detailing code issues, roadmaps and project
conventions.</p>
<dl>
<dt><a href="dev-progress.html">Implementation Progress</a></dt>
<dd>Tables detailing HTML element and CSS property implementation coverage.</dd>
<dt><a href="dev-naming.html">Naming Conventions</a></dt>
<dd>Defines class naming conventions.</dd>
<dt><a href="dev-optimization.html">Optimization</a></dt>
<dd>Discusses possible methods of optimizing HTML Purifier.</dd>
<dt><a href="dev-flush.html">Flushing the Purifier</a></dt>
<dd>Discusses when to flush HTML Purifier's various caches.</dd>
<dt><a href="dev-advanced-api.html">Advanced API</a></dt>
<dd>Specification for HTML Purifier's advanced API for defining
custom filtering behavior.</dd>
<dt><a href="dev-config-schema.html">Config Schema</a></dt>
<dd>Describes config schema framework in HTML Purifier.</dd>
</dl>
<h2>Proposals</h2>
<p>Proposed features, as well as the associated rambling to get a clear
objective in place before attempted implementation.</p>
<dl>
<dt><a href="proposal-colors.html">Colors</a></dt>
<dd>Proposal to allow for color constraints.</dd>
</dl>
<h2>Reference</h2>
<p>Miscellaneous essays, research pieces and other reference type material
that may not directly discuss HTML Purifier.</p>
<dl>
<dt><a href="ref-devnetwork.html">DevNetwork Credits</a></dt>
<dd>Credits and links to DevNetwork forum topics.</dd>
</dl>
<h2>Internal memos</h2>
<p>Plaintext documents that are more for use by active developers of
the code. They may be upgraded to HTML files or stay as TXT scratchpads.</p>
<table class="table">
<thead><tr>
<th style="width:10%">Type</th>
<th style="width:20%">Name</th>
<th>Description</th>
</tr></thead>
<tbody>
<tr>
<td>End-user</td>
<td><a href="enduser-overview.txt">Overview</a></td>
<td>High level overview of the general control flow (mostly obsolete).</td>
</tr>
<tr>
<td>End-user</td>
<td><a href="enduser-security.txt">Security</a></td>
<td>Common security issues that may still arise (half-baked).</td>
</tr>
<tr>
<td>Development</td>
<td><a href="dev-config-bcbreaks.txt">Config BC Breaks</a></td>
<td>Backwards-incompatible changes in HTML Purifier 4.0.0</td>
</tr>
<tr>
<td>Development</td>
<td><a href="dev-code-quality.txt">Code Quality Issues</a></td>
<td>Enumerates code quality issues and places that need to be refactored.</td>
</tr>
<tr>
<td>Proposal</td>
<td><a href="proposal-filter-levels.txt">Filter levels</a></td>
<td>Outlines details of projected configurable level of filtering.</td>
</tr>
<tr>
<td>Proposal</td>
<td><a href="proposal-language.txt">Language</a></td>
<td>Specification of I18N for error messages derived from MediaWiki (half-baked).</td>
</tr>
<tr>
<td>Proposal</td>
<td><a href="proposal-new-directives.txt">New directives</a></td>
<td>Assorted configuration options that could be implemented.</td>
</tr>
<tr>
<td>Proposal</td>
<td><a href="proposal-css-extraction.txt">CSS extraction</a></td>
<td>Taking the inline CSS out of documents and into <code>style</code>.</td>
</tr>
<tr>
<td>Reference</td>
<td><a href="ref-content-models.txt">Handling Content Model Changes</a></td>
<td>Discusses how to tidy up content model changes using custom ChildDef classes.</td>
</tr>
<tr>
<td>Reference</td>
<td><a href="ref-proprietary-tags.txt">Proprietary tags</a></td>
<td>List of vendor-specific tags we may want to transform to W3C compliant markup.</td>
</tr>
<tr>
<td>Reference</td>
<td><a href="ref-html-modularization.txt">Modularization of HTMLDefinition</a></td>
<td>Provides a high-level overview of the concepts behind HTMLModules.</td>
</tr>
<tr>
<td>Reference</td>
<td><a href="ref-whatwg.txt">WHATWG</a></td>
<td>How WHATWG plays into what we need to do.</td>
</tr>
</tbody>
</table>
</body>
</html>
<!-- vim: et sw=4 sts=4
-->

View File

@ -0,0 +1,49 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="description" content="Proposal to allow for color constraints in HTML Purifier." />
<link rel="stylesheet" type="text/css" href="./style.css" />
<title>Proposal: Colors - HTML Purifier</title>
</head><body>
<h1 class="subtitled">Colors</h1>
<div class="subtitle">Hammering some sense into those color-blind newbies</div>
<div id="filing">Filed under Proposals</div>
<div id="index">Return to the <a href="index.html">index</a>.</div>
<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
<p>Your website probably has a color-scheme.
<span style="color:#090; background:#FFF;">Green on white</span>,
<span style="color:#A0F; background:#FF0;">purple on yellow</span>,
whatever. When you give users the ability to style their content, you may
want them to keep in line with your styling. If you're website is all
about light colors, you don't want a user to come in and vandalize your
page with a deep maroon.</p>
<p>This is an extremely silly feature proposal, but I'm writing it down anyway.</p>
<p>What if the user could constrain the colors specified in inline styles? You
are only allowed to use these shades of dark green for text and these shades
of light yellow for the background. At the very least, you could ensure
that we did not have pale yellow on white text.</p>
<h2>Implementation issues</h2>
<ol>
<li>Requires the color attribute definition to know, currently, what the text
and background colors are. This becomes difficult when classes are thrown
into the mix.</li>
<li>The user still has to define the permissible colors, how does one do
something like that?</li>
</ol>
</body>
</html>
<!-- vim: et sw=4 sts=4
-->

View File

@ -0,0 +1,23 @@
Configuration
Configuration is documented on a per-use case: if a class uses a certain
value from the configuration object, it has to define its name and what the
value is used for. This means decentralized configuration declarations that
are nevertheless error checking and a centralized configuration object.
Directives are divided into namespaces, indicating the major portion of
functionality they cover (although there may be overlaps). Please consult
the documentation in ConfigDef for more information on these namespaces.
Since configuration is dependant on context, internal classes require a
configuration object to be passed as a parameter. (They also require a
Context object). A majority of classes do not need the config object,
but for those who do, it is a lifesaver.
Definition objects are complex datatypes influenced by their respective
directive namespaces (HTMLDefinition with HTML and CSSDefinition with CSS).
If any of these directives is updated, HTML Purifier forces the definition
to be regenerated.
vim: et sw=4 sts=4

View File

@ -0,0 +1,34 @@
Extracting inline CSS from HTML Purifier
voodoofied: Assigning semantics to elements
Sander Tekelenburg brought to my attention the poor programming style of
inline CSS in HTML documents. In an ideal world, we wouldn't be using inline
CSS at all: everything would be assigned using semantic class attributes
from an external stylesheet.
With ExtractStyleBlocks and CSSTidy, this is now possible (when allowed, users
can specify a style element which gets extracted from the user-submitted HTML, which
the application can place in the head of the HTML document). But there still
is the issue of inline CSS that refuses to go away.
The basic idea behind this feature is assign every element a unique identifier,
and then move all of the CSS data to a style-sheet. This HTML:
<div style="text-align:center">Big <span style="color:red;">things</span>!</div>
into
<div id="hp-12345">Big <span id="hp-12346">things</span>!</div>
and a stylesheet that is:
#hp-12345 {text-align:center;}
#hp-12346 {color:red;}
Beyond that, HTML Purifier can magically merge common CSS values together,
and a whole manner of other heuristic things. HTML Purifier should also
make it easy for an admin to re-style the HTML semantically. Speed is not
an issue. Also, better WYSIWYG editors are needed.
vim: et sw=4 sts=4

View File

@ -0,0 +1,211 @@
Considerations for ErrorCollection
Presently, HTML Purifier takes a code-execution centric approach to handling
errors. Errors are organized and grouped according to which segment of the
code triggers them, not necessarily the portion of the input document that
triggered the error. This means that errors are pseudo-sorted by category,
rather than location in the document.
One easy way to "fix" this problem would be to re-sort according to line number.
However, the "category" style information we derive from naively following
program execution is still useful. After all, each of the strategies which
can report errors still process the document mostly linearly. Furthermore,
not only do they process linearly, but the way they pass off operations to
sub-systems mirrors that of the document. For example, AttrValidator will
linearly proceed through elements, and on each element will use AttrDef to
validate those contents. From there, the attribute might have more
sub-components, which have execution passed off accordingly.
In fact, each strategy handles a very specific class of "error."
RemoveForeignElements - element tokens
MakeWellFormed - element token ordering
FixNesting - element token ordering
ValidateAttributes - attributes of elements
The crucial point is that while we care about the hierarchy governing these
different errors, we *don't* care about any other information about what actually
happens to the elements. This brings up another point: if HTML Purifier fixes
something, this is not really a notice/warning/error; it's really a suggestion
of a way to fix the aforementioned defects.
In short, the refactoring to take this into account kinda sucks.
Errors should not be recorded in order that they are reported. Instead, they
should be bound to the line (and preferably element) in which they were found.
This means we need some way to uniquely identify every element in the document,
which doesn't presently exist. An easy way of adding this would be to track
line columns. An important ramification of this is that we *must* use the
DirectLex implementation.
1. Implement column numbers for DirectLex [DONE!]
2. Disable error collection when not using DirectLex [DONE!]
Next, we need to re-orient all of the error declarations to place CurrentToken
at utmost important. Since this is passed via Context, it's not always clear
if that's available. ErrorCollector should complain HARD if it isn't available.
There are some locations when we don't have a token available. These include:
* Lexing - this can actually have a row and column, but NOT correspond to
a token
* End of document errors - bump this to the end
Actually, we *don't* have to complain if CurrentToken isn't available; we just
set it as a document-wide error. And actually, nothing needs to be done here.
Something interesting to consider is whether or not we care about the locations
of attributes and CSS properties, i.e. the sub-objects that compose these things.
In terms of consistency, at the very least attributes should have column/line
numbers attached to them. However, this may be overkill, as attributes are
uniquely identifiable. You could go even further, with CSS, but they are also
uniquely identifiable.
Bottom-line is, however, this information must be available, in form of the
CurrentAttribute and CurrentCssProperty (theoretical) context variables, and
it must be used to organize the errors that the sub-processes may throw.
There is also a hierarchy of sorts that may make merging this into one context
variable more sense, if it hadn't been for HTML's reasonably rigid structure.
A CSS property will never contain an HTML attribute. So we won't ever get
recursive relations, and having multiple depths won't ever make sense. Leave
this be.
We already have this information, and consequently, using start and end is
*unnecessary*, so long as the context variables are set appropriately. We don't
care if an error was thrown by an attribute transform or an attribute definition;
to the end user these are the same (for a developer, they are different, but
they're better off with a stack trace (which we should add support for) in such
cases).
3. Remove start()/end() code. Don't get rid of recursion, though [DONE]
4. Setup ErrorCollector to use context information to setup hierarchies.
This may require a different internal format. Use objects if it gets
complex. [DONE]
ASIDE
More on this topic: since we are now binding errors to lines
and columns, a particular error can have three relationships to that
specific location:
1. The token at that location directly
RemoveForeignElements
AttrValidator (transforms)
MakeWellFormed
2. A "component" of that token (i.e. attribute)
AttrValidator (removals)
3. A modification to that node (i.e. contents from start to end
token) as a whole
FixNesting
This needs to be marked accordingly. In the presentation, it might
make sense keep (3) separate, have (2) a sublist of (1). (1) can
be a closing tag, in which case (3) makes no sense at all, OR it
should be related with its opening tag (this may not necessarily
be possible before MakeWellFormed is run).
So, the line and column counts as our identifier, so:
$errors[$line][$col] = ...
Then, we need to identify case 1, 2 or 3. They are identified as
such:
1. Need some sort of semaphore in RemoveForeignElements, etc.
2. If CurrentAttr/CurrentCssProperty is non-null
3. Default (FixNesting, MakeWellFormed)
One consideration about (1) is that it usually is actually a
(3) modification, but we have no way of knowing about that because
of various optimizations. However, they can probably be treated
the same. The other difficulty is that (3) is never a line and
column; rather, it is a range (i.e. a duple) and telling the user
the very start of the range may confuse them. For example,
<b>Foo<div>bar</div></b>
^ ^
The node being operated on is <b>, so the error would be assigned
to the first caret, with a "node reorganized" error. Then, the
ChildDef would have submitted its own suggestions and errors with
regard to what's going in the internals. So I suppose this is
ok. :-)
Now, the structure of the earlier mentioned ... would be something
like this:
object {
type = (token|attr|property),
value, // appropriate for type
errors => array(),
sub-errors = [recursive],
}
This helps us keep things agnostic. It is also sufficiently complex
enough to warrant an object.
So, more wanking about the object format is in order. The way HTML Purifier is
currently setup, the only possible hierarchy is:
token -> attr -> css property
These relations do not exist all of the time; a comment or end token would not
ever have any attributes, and non-style attributes would never have CSS properties
associated with them.
I believe that it is worth supporting multiple paths. At some point, we might
have a hierarchy like:
* -> syntax
-> token -> attr -> css property
-> url
-> css stylesheet <style>
et cetera. Now, one of the practical implications of this is that every "node"
on our tree is well-defined, so in theory it should be possible to either 1.
create a separate class for each error struct, or 2. embed this information
directly into HTML Purifier's token stream. Embedding the information in the
token stream is not a terribly good idea, since tokens can be removed, etc.
So that leaves us with 1... and if we use a generic interface we can cut down
on a lot of code we might need. So let's leave it like this.
~~~~
Then we setup suggestions.
5. Setup a separate error class which tells the user any modifications
HTML Purifier made.
Some information about this:
Our current paradigm is to tell the user what HTML Purifier did to the HTML.
This is the most natural mode of operation, since that's what HTML Purifier
is all about; it was not meant to be a validator.
However, most other people have experience dealing with a validator. In cases
where HTML Purifier unambiguously does the right thing, simply giving the user
the correct version isn't a bad idea, but problems arise when:
- The user has such bad HTML we do something odd, when we should have just
flagged the HTML as an error. Such examples are when we do things like
remove text from directly inside a <table> tag. It was probably meant to
be in a <td> tag or be outside the table, but we're not smart enough to
realize this so we just remove it. In such a case, we should tell the user
that there was foreign data in the table, but then we shouldn't "demand"
the user remove the data; it's more of a "here's a possible way of
rectifying the problem"
- Giving line context for input is hard enough, but feasible; giving output
line context will be extremely difficult due to shifting lines; we'd probably
have to track what the tokens are and then find the appropriate out context
and it's not guaranteed to work etc etc etc.
````````````
Don't forget to spruce up output.
6. Output needs to automatically give line and column numbers, basically
"at line" on steroids. Look at W3C's output; it's ok. [PARTIALLY DONE]
- We need a standard CSS to apply (check demo.css for some starting
styling; some buttons would also be hip)
vim: et sw=4 sts=4

View File

@ -0,0 +1,137 @@
Filter Levels
When one size *does not* fit all
It makes little sense to constrain users to one set of HTML elements and
attributes and tell them that they are not allowed to mold this in
any fashion. Many users demand to be able to custom-select which elements
and attributes they want. This is fine: because HTML Purifier keeps close
track of what elements are safe to use, there is no way for them to
accidently allow an XSS-able tag.
However, combing through the HTML spec to make your own whitelist can
be a daunting task. HTML Purifier ought to offer pre-canned filter levels
that amateur users can select based on what they think is their use-case.
Here are some fuzzy levels you could set:
1. Comments - Wordpress recommends a, abbr, acronym, b, blockquote, cite,
code, em, i, strike, strong; however, you could get away with only a, em and
p; also having blockquote and pre tags would be helpful.
2. BBCode - Emulate the usual tagset for forums: b, i, img, a, blockquote,
pre, div, span and h[2-6] (the last three are for specially formatted
posts, div and span require associated classes or inline styling enabled
to be useful)
3. Pages - As permissive as possible without allowing XSS. No protection
against bad design sense, unfortunantely. Suitable for wiki and page
environments. (probably what we have now)
4. Lint - Accept everything in the spec, a Tidy wannabe. (This probably won't
get implemented as it would require routines for things like <object>
and friends to be implemented, which is a lot of work for not a lot of
benefit)
One final note: when you start axing tags that are more commonly used, you
run the risk of accidentally destroying user data, especially if the data
is incoming from a WYSIWYG editor that hasn't been synced accordingly. This may
make forbidden element to text transformations desirable (for example, images).
== Element Risk Analysis ==
Although none of the currently supported elements presents a security
threat per-say, some can cause problems for page layouts or be
extremely complicated.
Legend:
[danger level] - regular tags / uncommon tags ~ deprecated tags
[danger level]* - rare tags
1 - blockquote, code, em, i, p, tt / strong, sub, sup
1* - abbr, acronym, bdo, cite, dfn, kbd, q, samp
2 - b, br, del, div, pre, span / ins, s, strike ~ u
3 - h2, h3, h4, h5, h6 ~ center
4 - h1, big ~ font
5 - a
7 - area, map
These are special use tags, they should be enabled on a blanket basis.
Lists - dd, dl, dt, li, ol, ul ~ menu, dir
Tables - caption, table, td, th, tr / col, colgroup, tbody, tfoot, thead
Forms - fieldset, form, input, lable, legend, optgroup, option, select, textarea
XSS - noscript, object, script ~ applet
Meta - base, basefont, body, head, html, link, meta, style, title
Frames - frame, frameset, iframe
And tag specific notes:
a - general problems involving linkspam
b - too much bold is bad, typographically speaking bold is discouraged
br - often misused
center - CSS, usually no legit use
del - only useful in editing context
div - little meaning in certain contexts i.e. blog comment
h1 - usually no legit use, as header is already set by application
h* - not needed in blog comments
hr - usually not necessary in blog comments
img - could be extremely undesirable if linking to external pics (CSRF, goatse)
pre - could use formatting, only useful in code contexts
q - very little support
s - transform into span with styling or del?
small - technically presentational
span - depends on attribute allowances
sub, sup - specialized
u - little legit use, prefer class with text-decoration
Based on the riskiness of the items, we may want to offer %HTML.DisableImages
attribute and put URI filtering higher up on the priority list.
== Attribute Risk Analysis ==
We actually have a suprisingly small assortment of allowed attributes (the
rest are deprecated in strict, and thus we opted not to allow them, even
though our output is XHTML Transitional by default.)
Required URI - img.alt, img.src, a.href
Medium risk - *.class, *.dir
High risk - img.height, img.width, *.id, *.style
Table - colgroup/col.span, td/th.rowspan, td/th.colspan
Uncommon - *.title, *.lang, *.xml:lang
Rare - td/th.abbr, table.summary, {table}.charoff
Rare URI - del.cite, ins.cite, blockquote.cite, q.cite, img.longdesc
Presentational - {table}.align, {table}.valign, table.frame, table.rules,
table.border
Partially presentational - table.cellpadding, table.cellspacing,
table.width, col.width, colgroup.width
== CSS Risk Analysis ==
Currently, there is no support for fine-grained "allowed CSS" specification,
mainly because I'm lazy, partially because no one has asked for it. However,
this will be added eventually.
There are certain CSS elements that are extremely useful inline, but then
as you get to more presentation oriented styling it may not always be
appropriate to inline them.
Useful - clear, float, border-collapse, caption-side
These CSS properties can break layouts if used improperly. We have excluded
any CSS properties that are not currently implemented (such as position).
Dangerous, can go outside container - float
Easy to abuse - font-size, font-family (font), width
Colored - background-color (background), border-color (border), color
(see proposal-colors.html)
Dramatic - border, list-style-position (list-style), margin, padding,
text-align, text-indent, text-transform, vertical-align, line-height
Dramatic elements substantially change the look of text in ways that should
probably have been reserved to other areas.
vim: et sw=4 sts=4

View File

@ -0,0 +1,64 @@
We are going to model our I18N/L10N off of MediaWiki's system. Their's is
obviously quite complicated, so we're going to simplify it a bit for our needs.
== Caching ==
MediaWiki has lots of caching mechanisms built in, which make the code somewhat
more difficult to understand. Before doing any loading, MediaWiki will check
the following places to see if we can be lazy:
1. $mLocalisationCache[$code] - just a variable where it may have been stashed
2. serialized/$code.ser - compiled serialized language file
3. Memcached version of file (with expiration checking)
Expiration checking consists of by ensuring all dependencies have filemtime
that match the ones bundled with the cached copy. Similar checking could be
implemented for serialized versions, as it seems that they are not updated
until manually recompiled.
== Behavior ==
Things that are localizable:
- Weekdays (and abbrev)
- Months (and abbrev)
- Bookstores
- Skin names
- Date preferences / Custom date format
- Default date format
- Default user option overrides
-+ Language names
- Timezones
-+ Character encoding conversion via iconv
- UpperLowerCase first (needs casemaps for some)
- UpperLowerCase
- Uppercase words
- Uppercase word breaks
- Case folding
- Strip punctuation for MySQL search
- Get first character
-+ Alternate encoding
-+ Recoding for edit (and then recode input)
-+ RTL
-+ Direction mark character depending on RTL
-? Arrow depending on RTL
- Languages where italics cannot be used
-+ Number formatting (commafy, transform digits, transform separators)
- Truncate (multibyte)
- Grammar conversions for inflected languages
- Plural transformations
- Formatting expiry times
- Segmenting for diffs (Chinese)
- Convert to variants of language
- Language specific user preference options
- Link trails [[foo]]bar
-+ Language code (RFC 3066)
Neat functionality:
- I18N sprintfDate
- Roman numeral formatting
Items marked with a + likely need to be addressed by HTML Purifier
vim: et sw=4 sts=4

View File

@ -0,0 +1,44 @@
Configuration Ideas
Here are some theoretical configuration ideas that we could implement some
time. Note the naming convention: %Namespace.Directive. If you want one
implemented, give us a ring, and we'll move it up the priority chain.
%Attr.RewriteFragments - if there's %Attr.IDPrefix we may want to transparently
rewrite the URLs we parse too. However, we can only do it when it's a pure
anchor link, so it's not foolproof
%Attr.ClassBlacklist,
%Attr.ClassWhitelist,
%Attr.ClassPolicy - determines what classes are allowed. When
%Attr.ClassPolicy is set to Blacklist, only allow those not in
%Attr.ClassBlacklist. When it's Whitelist, only allow those in
%Attr.ClassWhitelist.
%Attr.MaxWidth,
%Attr.MaxHeight - caps for width and height related checks.
(the hack in Pixels for an image crashing attack could be replaced by this)
%URI.AddRelNofollow - will add rel="nofollow" to all links, preventing the
spread of ill-gotten pagerank
%URI.HostBlacklistRegex - regexes that if matching the host are disallowed
%URI.HostWhitelist - domain names that are excluded from the host blacklist
%URI.HostPolicy - determines whether or not its reject all and then whitelist
or allow all in then do specific blacklists with whitelist intervening.
'DenyAll' or 'AllowAll' (default)
%URI.DisableIPHosts - URIs that have IP addresses for hosts are disallowed.
Be sure to also grab unusual encodings (dword, hex and octal), which may
be currently be caught by regular DNS
%URI.DisableIDN - Disallow raw internationalized domain names. Punycode
will still be permitted.
%URI.ConvertUnusualIPHosts - transform dword/hex/octal IP addresses to the
regular form
%URI.ConvertAbsoluteDNS - Remove extra dots after host names that trigger
absolute DNS. While this is actually the preferred method according to
the RFC, most people opt to use a relative domain name relative to . (root).
vim: et sw=4 sts=4

View File

@ -0,0 +1,218 @@
THE UNIVERSAL DESIGN PATTERN: PROPERTIES
Steve Yegge
Implementation:
get(name)
put(name, value)
has(name)
remove(name)
iteration, with filtering [this will be our namespaces]
parent
Representations:
- Keys are strings
- It's nice to not need to quote keys (if we formulate our own language,
consider this)
- Property not present representation (key missing)
- Frequent removal/re-add may have null help. If null is valid, use
another value. (PHP semantics are weird here)
Data structures:
- LinkedHashMap is wonderful (O(1) access and maintains order)
- Using a special property that points to the parent is usual
- Multiple inheritance possible, need rules for which to lookup first
- Iterative inheritance is best
- Consider performance!
Deletion
- Tricky problem with inheritance
- Distinguish between "not found" and "look in my parent for the property"
[Maybe HTML Purifier won't allow deletion]
Read/write asymmetry (it's correct!)
Read-only plists
- Allow ability to freeze [this is what we have already]
- Don't overuse it
Performance:
- Intern strings (PHP does this already)
- Don't be case-insensitive
- If all properties in a plist are known a-priori, you can use a "perfect"
hash function. Often overkill.
- Copy-on-read caching "plundering" reduces lookup, but uses memory and can
grow stale. Use as last resort.
- Refactoring to fields. Watch for API compatibility, system complexity,
and lack of flexibility.
- Refrigerator: external data-structure to hold plists
Transient properties:
[Don't need to worry about this]
- Use a separate plist for transient properties
- Non-numeric override; numeric should ADD
- Deletion: removeTransientProperty() and transientlyRemoveProperty()
Persistence:
- XML/JSON are good
- Text-based is good for readability, maintainability and bootstrapping
- Compressed binary format for network transport [not necessary]
- RDBMS or XML database
Querying: [not relevant]
- XML database is nice for XPath/XQuery
- jQuery for JSON
- Just load it all into a program
Backfills/Data integrity:
- Use usual methods
- Lazy backfill is a nice hack
Type systems:
- Flags: ReadOnly, Permanent, DontEnum
- Typed properties isn't that useful [It's also Not-PHP]
- Seperate meta-list of directive properties IS useful
- Duck typing is useful for systems designed fully around properties pattern
Trade-off:
+ Flexibility
+ Extensibility
+ Unit-testing/prototype-speed
- Performance
- Data integrity
- Navagability/Query-ability
- Reversability (hard to go back)
HTML Purifier
We are not happy with our current system of defining configuration directives,
because it has become clear that things will get a lot nicer if we allow
multiple namespaces, and there are some features that naturally lend themselves
to inheritance, which we do not really support well.
One of the considered implementation changes would be to go from a structure
like:
array(
'Namespace' => array(
'Directive' => 'val1',
'Directive2' => 'val2',
)
)
to:
array(
'Namespace.Directive' => 'val1',
'Namespace.Directive2' => 'val2',
)
The below implementation takes more memory, however, and it makes it a bit
complicated to grab all values from a namespace.
The alternate implementation choice is to allow nested plists. This keeps
iteration easy, but is problematic for inheritance (it would be difficult
to distinguish a plist from an array) and retrieval (when specifying multiple
namespaces we would need some multiple de-referencing).
----
We can bite the performance hit, and just do iteration with filter
(the strncmp call should be relatively cheap). Then, users should be able
to optimize doing something like:
$config = HTMLPurifier_Config::createDefault();
if (!file_exists('config.php')) {
// set up $config
$config->save('config.php');
} else {
$config->load('config.php');
}
Or maybe memcache, or something. This means that "// set up $config" must
not have any dynamic parts, or the user has to invalidate the cache when
they do update it. We have to think about this a little more carefully; the
file call might be more expensive.
----
This might get expensive, however, when we actually care about iterating
over the configuration and want the actual values. So what about nesting the
lists?
"ns.sub.directive" => values['ns']['sub']['directive']
We can distinguish between plists and arrays by using ArrayObjects for the
plists, and regular arrays for the arrays? Alternatively, use ArrayObjects
for the arrays, and regular arrays for the plists.
----
Implementation demands, and what has caused them:
1. DefinitionCache, the HTML, CSS and URI namespaces have caches attached to them
Results:
- getBatchSerial()
- getBatch() : in general, the ability to traverse just a namespace
2. AutoFormat/Filter, this is a plugin architecture, directives not hard-coded
- getBatch()
3. Configuration form
- Namespaces used to organize directives
Other than that, we have a pure plist. PERHAPS we should maintain separate things
for these different demands.
Issue 2: Directives for configuring the plugins are regular plists, but
when enabling them, while it's "plist-ish", what you're really doing is adding
them to an array of "autoformatters"/"filters" to enable. We can setup
magic BC as well as in the new interface, but there should also be an
add('AutoFormat', 'AutoParagraph'); which does the right thing.
One thing to consider is whether or not inheritance rules will apply to these.
I'd say yes. That means that they're still plisty, in fact, the underlying
implementation will probably be a plist. However, they will get their OWN
plists, and will NOT support nesting.
Issue 1: Our current implementation is generally not efficient; md5(serialize($foo))
is pretty expensive. So, I don't think there will be any problems if it
gets "less" efficient, as long as we give users a properly fast alternative;
DefinitionRev gives us a way to do this, by simply telling the user they must
update it whenever they update Configuration directives as well. (There are
obvious BC concerns here).
In such a case, we simply iterate over our plist (performing full retrievals
for each value), grab the entries we care about, and then serialize and hash.
It's going to be slow either way, due to the ability of plists to inherit.
If we ksort(), we don't have to traverse the entire array, however, the
cost of a ksort() call may not be worth it.
At this point, last time, I started worrying about the performance implications
of allowing inheritance, and wondering whether or not I wanted to squash
the plist. At first blush, our code might be under the assumption that
accessing properties is cheap; but actually we prefer to copy out the value
into a member variable if it's going to be used many times. With this is mind
I don't think CPU consumption from a few nested function calls is going to
be a problem. We *are* going to enforce a function only interface.
The next issue at hand is how we're going to manage the "special" plists,
which should still be able to be inherited. Basically, it means that multiple
plists would be attached to the configuration object, which is not the
best for memory performance. The alternative is to keep them all in one
big plist, and then eat the one-time cost of traversing the entire plist
to grab the appropriate values.
I think at this point we can write the generic interface, and then set up separate
plists if that ends up being necessary for performance (it probably won't.) Now
lets code our generic plist implementation.
----
Iterating over the plist presents some problems. The way we've chosen to solve
this is to squash all of the parents.
----
But I don't need iteration.
vim: et sw=4 sts=4

View File

@ -0,0 +1,50 @@
Handling Content Model Changes
1. Context
The distinction between Transitional and Strict document types is somewhat
of an anomaly in the lineage of XHTML document types (following 1.0, no
doctypes do not have flavors: instead, modularization is used to let
document authors vary their elements). This transition is usually quite
straight-forward, as W3C usually deprecates attributes or elements, which
are quite easily handled using tag and attribute transforms.
However, for two elements, <blockquote>, <body> and <address>, W3C elected
to also change the content model. <blockquote> and <body> originally
accepted both inline and block elements, but in the strict doctype they
only allow block elements. With <address>, the situation is inverted:
<p> tags were now forbidden from appearing within this tag.
2. Current situation
Currently, HTML Purifier treats <blockquote> specially during Tidy mode
using a custom ChildDef class StrictBlockquote. StrictBlockquote
operates similarly to Required, except that when it encounters an inline
element, it will wrap it in a block tag (as specified by
%HTML.BlockWrapper, the default is <p>). The naming suggests it can
only be used for <blockquote>s, although it may be possible to
genericize it to work on other cases of this nature (this would be of
little practical application, as no other element in XHTML 1.1 or earlier
has a block-only content model).
Tidy currently contains no custom, lenient implementation for <address>.
If one were to be written, it would likely operate on the principle that,
when a <p> tag were to be encountered, it would be replaced with a
leading and trailing <br /> tag (the contents of <p>, being inline, are
not an issue). There is no prior work with this sort of operation.
3. Outside applicability
There are a number of other elements that contain restrictive content
models, such as <ul> or <span> (the latter is restrictive in that it
does not allow block elements). In the former case, an errant node
is eliminated completely, in the latter case, the text of the node
would is preserved (as the parent node does allow PCDATA). Custom
content model implementations probably are not the best way of handling
these cases, instead, node bubbling should be implemented instead.
vim: et sw=4 sts=4

View File

@ -0,0 +1,30 @@
CSS Length Reference
To bound, or not to bound, that is the question
It's quite a reasonable request, really, and it's already been implemented
for HTML. That is, length bounding. It makes little sense to let users
define text blocks that have a font-size of 63,360 inches (that's a mile,
by the way) or a width of forty-fold the parent container.
But it's a little more complicated then that. There are multiple units
one can use, and we have to a little unit conversion to get things working.
Here's what we have:
Absolute:
1 in ~= 2.54 cm
1 cm = 10 mm
1 pt = 1/72 in
1 pc = 12 pt
Relative:
1 em ~= 10.0667 px
1 ex ~= 0.5 em, though Mozilla Firefox says 1 ex = 6px
1 px ~= 1 pt
Watch out: font-sizes can also be nested to get successively larger
(although I do not relish having to keep track of context font-sizes,
this may be necessary, especially for some of the more advanced features
for preventing things like white on white).
vim: et sw=4 sts=4

View File

@ -0,0 +1,47 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="description" content="Credits and links to DevNetwork forum topics on HTML Purifier." />
<link rel="stylesheet" type="text/css" href="./style.css" />
<title>DevNetwork Credits - HTML Purifier</title>
</head>
<body>
<h1>DevNetwork Credits</h1>
<div id="filing">Filed under Reference</div>
<div id="index">Return to the <a href="index.html">index</a>.</div>
<div id="home"><a href="http://htmlpurifier.org/">HTML Purifier</a> End-User Documentation</div>
<p>Many thanks to the DevNetwork community for answering questions,
theorizing about design, and offering encouragement during
the development of this library in these forum threads:</p>
<ul>
<li><a href="http://forums.devnetwork.net/viewtopic.php?t=52905">HTMLPurifier PHP Library hompeage</a></li>
<li><a href="http://forums.devnetwork.net/viewtopic.php?t=53056">How much of CSS to implement?</a></li>
<li><a href="http://forums.devnetwork.net/viewtopic.php?t=53083">Parsing URL only according to URI : Security Risk?</a></li>
<li><a href="http://forums.devnetwork.net/viewtopic.php?t=53096">Gimme a name : URI and friends</a></li>
<li><a href="http://forums.devnetwork.net/viewtopic.php?t=53415">How to document configuration directives</a></li>
<li><a href="http://forums.devnetwork.net/viewtopic.php?t=53479">IPv6</a></li>
<li><a href="http://forums.devnetwork.net/viewtopic.php?t=53539">http and ftp versus news and mailto</a></li>
<li><a href="http://forums.devnetwork.net/viewtopic.php?t=53579">HTMLPurifier - Take your best shot</a></li>
<li><a href="http://forums.devnetwork.net/viewtopic.php?t=53664">Need help optimizing a block of code</a></li>
<li><a href="http://forums.devnetwork.net/viewtopic.php?t=53861">Non-SGML characters</a></li>
<li><a href="http://forums.devnetwork.net/viewtopic.php?t=54283">Wordpress makes me cry</a></li>
<li><a href="http://forums.devnetwork.net/viewtopic.php?t=54478">Parameter Object vs. Parameter Array vs. Parameter Functions</a></li>
<li><a href="http://forums.devnetwork.net/viewtopic.php?t=54521">Convert encoding where output cannot represent characters</a></li>
<li><a href="http://forums.devnetwork.net/viewtopic.php?t=56411">Reporting errors in a document without line numbers</a></li>
</ul>
<p>...as well as any I may have forgotten.</p>
</body>
</html>
<!-- vim: et sw=4 sts=4
-->

View File

@ -0,0 +1,166 @@
The Modularization of HTMLDefinition in HTML Purifier
WARNING: This document was drafted before the implementation of this
system, and some implementation details may have evolved over time.
HTML Purifier uses the modularization of XHTML
<http://www.w3.org/TR/xhtml-modularization/> to organize the internals
of HTMLDefinition into a more manageable and extensible fashion. Rather
than have one super-object, HTMLDefinition is split into HTMLModules,
each of which are responsible for defining elements, their attributes,
and other properties (for a more indepth coverage, see
/library/HTMLPurifier/HTMLModule.php's docblock comments). These modules
are managed by HTMLModuleManager.
Modules that we don't support but could support are:
* 5.6. Table Modules
o 5.6.1. Basic Tables Module [?]
* 5.8. Client-side Image Map Module [?]
* 5.9. Server-side Image Map Module [?]
* 5.12. Target Module [?]
* 5.21. Name Identification Module [deprecated]
These modules would be implemented as "unsafe":
* 5.2. Core Modules
o 5.2.1. Structure Module
* 5.3. Applet Module
* 5.5. Forms Modules
o 5.5.1. Basic Forms Module
o 5.5.2. Forms Module
* 5.10. Object Module
* 5.11. Frames Module
* 5.13. Iframe Module
* 5.14. Intrinsic Events Module
* 5.15. Metainformation Module
* 5.16. Scripting Module
* 5.17. Style Sheet Module
* 5.19. Link Module
* 5.20. Base Module
We will not be using W3C's XML Schemas or DTDs directly due to the lack
of robust tools for handling them (the main problem is that all the
current parsers are usually PHP 5 only and solely-validating, not
correcting).
This system may be generalized and ported over for CSS.
== General Use-Case ==
The outwards API of HTMLDefinition has been largely preserved, not
only for backwards-compatibility but also by design. Instead,
HTMLDefinition can be retrieved "raw", in which it loads a structure
that closely resembles the modules of XHTML 1.1. This structure is very
dynamic, making it easy to make cascading changes to global content
sets or remove elements in bulk.
However, once HTML Purifier needs the actual definition, it retrieves
a finalized version of HTMLDefinition. The finalized definition involves
processing the modules into a form that it is optimized for multiple
calls. This final version is immutable and, even if editable, would
be extremely hard to change.
So, some code taking advantage of the XHTML modularization may look
like this:
<?php
$config = HTMLPurifier_Config::createDefault();
$def =& $config->getHTMLDefinition(true); // reference to raw
$def->addElement('marquee', 'Block', 'Flow', 'Common');
$purifier = new HTMLPurifier($config);
$purifier->purify($html); // now the definition is finalized
?>
== Inclusions ==
One of the nice features of HTMLDefinition is that piggy-backing off
of global attribute and content sets is extremely easy to do.
=== Attributes ===
HTMLModule->elements[$element]->attr stores attribute information for the
specific attributes of $element. This is quite close to the final
API that HTML Purifier interfaces with, but there's an important
extra feature: attr may also contain a array with a member index zero.
<?php
HTMLModule->elements[$element]->attr[0] = array('AttrSet');
?>
Rather than map the attribute key 0 to an array (which should be
an AttrDef), it defines a number of attribute collections that should
be merged into this elements attribute array.
Furthermore, the value of an attribute key, attribute value pair need
not be a fully fledged AttrDef object. They can also be a string, which
signifies a AttrDef that is looked up from a centralized registry
AttrTypes. This allows more concise attribute definitions that look
more like W3C's declarations, as well as offering a centralized point
for modifying the behavior of one attribute type. And, of course, the
old method of manually instantiating an AttrDef still works.
=== Attribute Collections ===
Attribute collections are stored and processed in the AttrCollections
object, which is responsible for performing the inclusions signified
by the 0 index. These attribute collections, too, are mutable, by
using HTMLModule->attr_collections. You may add new attributes
to a collection or define an entirely new collection for your module's
use. Inclusions can also be cumulative.
Attribute collections allow us to get rid of so called "global attributes"
(which actually aren't so global).
=== Content Models and ChildDef ===
An implementation of the above-mentioned attributes and attribute
collections was applied to the ChildDef system. HTML Purifier uses
a proprietary system called ChildDef for performance and flexibility
reasons, but this does not line up very well with W3C's notion of
regexps for defining the allowed children of an element.
HTMLPurifier->elements[$element]->content_model and
HTMLPurifier->elements[$element]->content_model_type store information
about the final ChildDef that will be stored in
HTMLPurifier->elements[$element]->child (we use a different variable
because the two forms are sufficiently different).
$content_model is an abstract, string representation of the internal
state of ChildDef, while $content_model_type is a string identifier
of which ChildDef subclass to instantiate. $content_model is processed
by substituting all content set identifiers (capitalized element names)
with their contents. It is then parsed and passed into the appropriate
ChildDef class, as defined by the ContentSets->getChildDef() or the
custom fallback HTMLModule->getChildDef() for custom child definitions
not in the core.
You'll need to use these facilities if you plan on referencing a content
set like "Inline" or "Block", and using them is recommended even if you're
not due to their conciseness.
A few notes on $content_model: it's structure can be as complicated
as you want, but the pipe symbol (|) is reserved for defining possible
choices, due to the content sets implementation. For example, a content
model that looks like:
"Inline -> Block -> a"
...when the Inline content set is defined as "span | b" and the Block
content set is defined as "div | blockquote", will expand into:
"span | b -> div | blockquote -> a"
The custom HTMLModule->getChildDef() function will need to be able to
then feed this information to ChildDef in a usable manner.
=== Content Sets ===
Content sets can be altered using HTMLModule->content_sets, an associative
array of content set names to content set contents. If the content set
already exists, your values are appended on to it (great for, say,
registering the font tag as an inline element), otherwise it is
created. They are substituted into content_model.
vim: et sw=4 sts=4

View File

@ -0,0 +1,26 @@
Proprietary Tags
<nobr> and friends
Here are some proprietary tags that W3C does not define but occasionally show
up in the wild. We have only included tags that would make sense in an
HTML Purifier context.
<align>, block element that aligns (extremely rare)
<blackface>, inline that double-bolds text (extremely rare)
<comment>, hidden comment for IE and WebTV
<multicol cols=number gutter=pixels width=pixels>, multiple columns
<nobr>, no linebreaks
<spacer align=* type="vertical|horizontal|block">, whitespace in doc,
use width/height for block and size for vertical/horizontal (attributes)
(extremely rare)
<wbr>, potential word break point: allows linebreaks. Only works in <nobr>
<listing>, monospace pre-variant (extremely rare)
<plaintext>, escapes all tags to the end of document
<xmp>, monospace, replace with pre
These should be put into their own Tidy module, not loaded by default(?). These
all qualify as "lenient" transforms.
vim: et sw=4 sts=4

View File

@ -0,0 +1,26 @@
Web Hypertext Application Technology Working Group
WHATWG
== HTML 5 ==
URL: http://www.whatwg.org/specs/web-apps/current-work/
HTML 5 defines a kaboodle of new elements and attributes, as well as
some well-defined, "quirks mode" HTML parsing. Although WHATWG professes
to be targeted towards web applications, many of their semantic additions
would be quite useful in regular documents. Eventually, HTML
Purifier will need to audit their lists and figure out what changes need
to be made. This process is complicated by the fact that the WHATWG
doesn't buy into W3C's modularization of XHTML 1.1: we may need
to remodularize HTML 5 (probably done by section name). No sense in
committing ourselves till the spec stabilizes, though.
More immediately speaking though, however, is the well-defined parsing
behavior that HTML 5 adds. While I have little interest in writing
another DirectLex parser, other parsers like ph5p
<http://jero.net/lab/ph5p/> can be adapted to DOMLex to support much more
flexible HTML parsing (a cool feature I've seen is how they resolve
<b>bold<i>both</b>italic</i>).
vim: et sw=4 sts=4

View File

@ -0,0 +1,10 @@
Licensing of Specimens
Some files in this directory have different licenses:
windows-live-mail-desktop-beta.html - donated by laacz, public domain
img.png - LGPL, from <http://commons.wikimedia.org/wiki/Image:Pastille_chrome.png>
All other files are by me, and are licensed under LGPL.
vim: et sw=4 sts=4

View File

@ -0,0 +1,165 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>HTML align attribute to CSS - HTML Purifier Specimen</title>
<style type="text/css">
div.container {position:relative;height:110px;}
div.container.legend .test {text-align:center;line-height:100px;}
div.test {width:100px;height:100px;border:1px solid black;
position:absolute;top:10px;}
div.test.html {left:10px;}
div.test.css {left:140px;}
table {background:#F00;}
img {border:1px solid #000;}
hr {width:50px;}
div.segment {width:250px; float:left; margin-top:1em;}
</style>
</head>
<body>
<h1>HTML align attribute to CSS</h1>
<p>Inspect source for methodology.</p>
<div class="container legend">
<div class="test html">
HTML
</div>
<div class="test css">
CSS
</div>
</div>
<div class="segment">
<h2>table.align</h2>
<h3>left</h3>
<div class="container">
<div class="test html">
a<table align="left"><tr><td>O</td></tr></table>a
</div>
<div class="test css">
a<table style="float:left;"><tr><td>O</td></tr></table>a
</div>
</div>
<h3>center</h3>
<div class="container">
<div class="test html">
a<table align="center"><tr><td>O</td></tr></table>a
</div>
<div class="test css">
a<table style="margin-left:auto; margin-right:auto;"><tr><td>O</td></tr></table>a
</div>
</div>
<h3>right</h3>
<div class="container">
<div class="test html">
a<table align="right"><tr><td>O</td></tr></table>a
</div>
<div class="test css">
a<table style="float:right;"><tr><td>O</td></tr></table>a
</div>
</div>
</div>
<!-- ################################################################## -->
<div class="segment">
<h2>img.align</h2>
<h3>left</h3>
<div class="container">
<div class="test html">
a<img src="img.png" align="left">a
</div>
<div class="test css">
a<img src="img.png" style="float:left;">a
</div>
</div>
<h3>right</h3>
<div class="container">
<div class="test html">
a<img src="img.png" align="right">a
</div>
<div class="test css">
a<img src="img.png" style="float:right;">a
</div>
</div>
<h3>bottom</h3>
<div class="container">
<div class="test html">
a<img src="img.png" align="bottom">a
</div>
<div class="test css">
a<img src="img.png" style="vertical-align:baseline;">a
</div>
</div>
<h3>middle</h3>
<div class="container">
<div class="test html">
a<img src="img.png" align="middle">a
</div>
<div class="test css">
a<img src="img.png" style="vertical-align:middle;">a
</div>
</div>
<h3>top</h3>
<div class="container">
<div class="test html">
a<img src="img.png" align="top">a
</div>
<div class="test css">
a<img src="img.png" style="vertical-align:top;">a
</div>
</div>
</div>
<!-- ################################################################## -->
<div class="segment">
<h2>hr.align</h2>
<h3>left</h3>
<div class="container">
<div class="test html">
<hr align="left" />
</div>
<div class="test css">
<hr style="margin-right:auto; margin-left:0; text-align:left;" />
</div>
</div>
<h3>center</h3>
<div class="container">
<div class="test html">
<hr align="center" />
</div>
<div class="test css">
<hr style="margin-right:auto; margin-left:auto; text-align:center;" />
</div>
</div>
<h3>right</h3>
<div class="container">
<div class="test html">
<hr align="right" />
</div>
<div class="test css">
<hr style="margin-right:0; margin-left:auto; text-align:right;" />
</div>
</div>
</div>
</body>
</html>

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.1 KiB

View File

@ -0,0 +1,129 @@
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii">
<meta name=Generator content="Microsoft Word 12 (filtered medium)">
<!--[if !mso]>
<style>
v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
..shape {behavior:url(#default#VML);}
</style>
<![endif]-->
<style>
<!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
{font-family:Verdana;
panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:10.0pt;
font-family:"Verdana","sans-serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p.MsoAcetate, li.MsoAcetate, div.MsoAcetate
{mso-style-priority:99;
mso-style-link:"Balloon Text Char";
margin:0cm;
margin-bottom:.0001pt;
font-size:8.0pt;
font-family:"Tahoma","sans-serif";}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Verdana","sans-serif";
color:windowtext;}
span.BalloonTextChar
{mso-style-name:"Balloon Text Char";
mso-style-priority:99;
mso-style-link:"Balloon Text";
font-family:"Tahoma","sans-serif";}
..MsoChpDefault
{mso-style-type:export-only;}
@page Section1
{size:612.0pt 792.0pt;
margin:70.85pt 70.85pt 70.85pt 70.85pt;}
div.Section1
{page:Section1;}
-->
</style>
<!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="2050" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang=NL link=blue vlink=purple>
<div class=Section1>
<p class=MsoNormal><img width=1277 height=994 id="Picture_x0020_1"
src="cid:image001.png@01C8CBDF.5D1BAEE0"><o:p></o:p></p>
<p class=MsoNormal><o:p>&nbsp;</o:p></p>
<p class=MsoNormal><b>Name<o:p></o:p></b></p>
<p class=MsoNormal>E-mail : <a href="mailto:mail@example.com"><span
style='color:windowtext'>mail@example.com</span></a><o:p></o:p></p>
<p class=MsoNormal><o:p>&nbsp;</o:p></p>
<p class=MsoNormal><b>Company<o:p></o:p></b></p>
<p class=MsoNormal>Address 1<o:p></o:p></p>
<p class=MsoNormal>Address 2<o:p></o:p></p>
<p class=MsoNormal><o:p>&nbsp;</o:p></p>
<p class=MsoNormal>Telefoon&nbsp; : +xx xx xxx xxx xx <span style='color:black'><o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US style='color:black'>Fax&nbsp; : +xx xx xxx xx xx<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US style='color:black'>Internet : </span><span
style='color:black'><a href="http://www.example.com/"><span lang=EN-US
style='color:black'>http://www.example.com</span></a></span><span
lang=EN-US style='color:black'><o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US style='color:black'>Kamer van koophandel
xxxxxxxxx<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US style='color:black'><o:p>&nbsp;</o:p></span></p>
<p class=MsoNormal><span lang=EN-US style='font-size:7.5pt;color:black'>Op deze
e-mail is een disclaimer van toepassing, ga naar </span><span lang=EN-US
style='font-size:7.5pt'><a
href="http://www.example.com/disclaimer"><span
style='color:black'>www.example.com/disclaimer</span></a><br>
<span style='color:black'>A disclaimer is applicable to this email, please
refer to </span><a href="http://www.example.com/disclaimer"><span
style='color:black'>www.example.com/disclaimer</span></a><o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US><o:p>&nbsp;</o:p></span></p>
</div>
</body>
</html>

View File

@ -0,0 +1,74 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML ChildAreas="4" xmlns:canvas><HEAD>
<META http-equiv=Content-Type content=text/html;charset=windows-1257>
<STYLE></STYLE>
<META content="MSHTML 6.00.6000.16414" name=GENERATOR></HEAD>
<BODY id=MailContainerBody
style="PADDING-RIGHT: 10px; PADDING-LEFT: 10px; FONT-SIZE: 10pt; COLOR: #000000; PADDING-TOP: 15px; FONT-FAMILY: Arial"
bgColor=#ff6600 leftMargin=0 background="" topMargin=0
name="Compose message area" acc_role="text" CanvasTabStop="false">
<DIV
style="BORDER-TOP: #dddddd 1px solid; FONT-SIZE: 10pt; WIDTH: 100%; MARGIN-RIGHT: 10px; PADDING-TOP: 5px; BORDER-BOTTOM: #dddddd 1px solid; FONT-FAMILY: Verdana; HEIGHT: 25px; BACKGROUND-COLOR: #ffffff"><NOBR><SPAN
title="View a slideshow of the pictures in this e-mail message."
style="PADDING-RIGHT: 20px"><A style="COLOR: #0088e4"
href="http://g.msn.com/5meen_us/171?path=/photomail/{6fc0065f-ffdd-4ca6-9a4c-cc5a93dc122f}&amp;image=47D7B182CFEFB10!127&amp;imagehi=47D7B182CFEFB10!125&amp;CID=323550092004883216">Play
slideshow </A></SPAN><SPAN style="COLOR: #909090"><SPAN>|</SPAN><SPAN
style="PADDING-LEFT: 20px"> Download the highest quality version of a picture by
clicking the + above it </SPAN></SPAN></NOBR></DIV>
<DIV
style="PADDING-RIGHT: 5px; PADDING-LEFT: 7px; PADDING-BOTTOM: 2px; WIDTH: 100%; PADDING-TOP: 2px">
<OL>
<LI><IMG title="Angry smile emoticon"
style="FLOAT: none; MARGIN: 0px; POSITION: static" tabIndex=-1
alt="Angry smile emoticon" src="cid:49F0C856199E4D688D2D740680733D74@wc"
MSNNonUserImageOrEmoticon="true">Un ka <FONT style="BACKGROUND-COLOR: #800000"
color=#cc99ff><STRONG>Tev</STRONG></FONT> iet, un ko tu dari?
<LI>Aha!</LI></OL>
<UL>
<LI>Buletets
<LI>
<DIV align=justify><A title=http://laacz.lv/blog/
href="http://laacz.lv/blog/">http://laacz.lv/blog/</A> un <A
title=http://google.com/ href="http://google.com/">gugle</A></DIV>
<LI>Sarakstucitis</LI></UL></DIV><SPAN><SPAN xmlns:canvas="canvas-namespace-id"
layoutEmptyTextWellFont="Tahoma"><SPAN
style="MARGIN-BOTTOM: 15px; OVERFLOW: visible; HEIGHT: 16px"></SPAN><SPAN
style="MARGIN-BOTTOM: 25px; VERTICAL-ALIGN: top; OVERFLOW: visible; MARGIN-RIGHT: 25px; HEIGHT: 234px">
<TABLE style="DISPLAY: inline">
<TBODY>
<TR>
<TD>
<DIV
style="FONT-WEIGHT: bold; FONT-SIZE: 12pt; FONT-FAMILY: arial; TEXT-ALIGN: center"><A
id=HiresARef
title="Click here to view or download a high resolution version of this picture"
style="COLOR: #0088e4; TEXT-DECORATION: none"
href="http://byfiles.storage.msn.com/x1pMvt0I80jTgT6DuaCpEMbprX3nk3jNv_vjigxV_EYVSMyM_PKgEvDEUtuNhQC-F-23mTTcKyqx6eGaeK2e_wMJ0ikwpDdFntk4SY7pfJUv2g2Ck6R2S2vAA?download">+</A></DIV>
<DIV
title="Click here to view the full image using the online photo viewer."
style="DISPLAY: inline; OVERFLOW: hidden; WIDTH: 140px; HEIGHT: 140px"><A
href="http://g.msn.com/5meen_us/171?path=/photomail/{6fc0065f-ffdd-4ca6-9a4c-cc5a93dc122f}&amp;image=47D7B182CFEFB10!127&amp;imagehi=47D7B182CFEFB10!125&amp;CID=323550092004883216"
border="0"><IMG
style="MARGIN-TOP: 15px; DISPLAY: inline-block; MARGIN-LEFT: 0px"
height=109 src="cid:006A71303B80404E9FB6184E55D6A446@wc" width=140
border=0></A></DIV></TD></TR>
<TR>
<TD>
<DIV
style="FONT-SIZE: 10pt; WIDTH: 140px; FONT-FAMILY: verdana; TEXT-ALIGN: center"><EM><STRONG>This
<U>is </U></STRONG><U>tit</U>le</EM> fo<STRONG>r <FONT
face="Arial Black">t<FONT color=#800000 size=7>h<U>i</U></FONT>s
</FONT>picture</STRONG></DIV></TD></TR></TBODY></TABLE></SPAN></SPAN></SPAN>
<DIV
style="PADDING-RIGHT: 5px; PADDING-LEFT: 7px; PADDING-BOTTOM: 2px; WIDTH: 100%; PADDING-TOP: 2px; HEIGHT: 50px">
<DIV>&nbsp;</DIV></DIV>
<DIV
style="BORDER-TOP: #dddddd 1px solid; FONT-SIZE: 10pt; MARGIN-BOTTOM: 10px; WIDTH: 100%; COLOR: #909090; MARGIN-RIGHT: 10px; PADDING-TOP: 9px; FONT-FAMILY: Verdana; HEIGHT: 42px; BACKGROUND-COLOR: #ffffff"><NOBR><SPAN
title="Join Windows Live to share photos using Windows Live Photo E-mail.">Online
pictures are available for 30 days. <A style="COLOR: #0088e4"
href="http://g.msn.com/5meen_us/175">Get Windows Live Mail desktop to create
your own photo e-mails. </A></SPAN></NOBR></DIV></BODY></HTML>

View File

@ -0,0 +1,76 @@
html {font-size:1em; font-family:serif; }
body {margin-left:4em; margin-right:4em; }
dt {font-weight:bold; }
pre {margin-left:2em; }
pre, code, tt {font-family:monospace; font-size:1em; }
h1 {text-align:center; font-family:Garamond, serif;
font-variant:small-caps;}
h2 {border-bottom:1px solid #CCC; font-family:sans-serif; font-weight:normal;
font-size:1.3em;}
h3 {font-family:sans-serif; font-size:1.1em; font-weight:bold; }
h4 {font-family:sans-serif; font-size:0.9em; font-weight:bold; }
/* For witty quips */
.subtitled {margin-bottom:0em;}
.subtitle , .subsubtitle {font-size:.8em; margin-bottom:1em;
font-style:italic; margin-top:-.2em;text-align:center;}
.subsubtitle {text-align:left;margin-left:2em;}
/* Used for special "See also" links. */
.reference {font-style:italic;margin-left:2em;}
/* Marks off asides, discussions on why something is the way it is */
.aside {margin-left:2em; font-family:sans-serif; font-size:0.9em; }
blockquote .label {font-weight:bold; font-size:1em; margin:0 0 .1em;
border-bottom:1px solid #CCC;}
.emphasis {font-weight:bold; text-align:center; font-size:1.3em;}
/* A regular table */
.table {border-collapse:collapse; border-bottom:2px solid #888; margin-left:2em; }
.table thead th {margin:0; background:#888; color:#FFF; }
.table thead th:first-child {-moz-border-radius-topleft:1em;}
.table tbody td {border-bottom:1px solid #CCC; padding-right:0.6em;padding-left:0.6em;}
/* A quick table*/
table.quick tbody th {text-align:right; padding-right:1em;}
/* Category of the file */
#filing {font-weight:bold; font-size:smaller; }
/* Contains, without exception, Return to index. */
#index {font-size:smaller; }
#home {font-size:smaller;}
/* Contains, without exception, $Id$, for SVN version info. */
#version {text-align:right; font-style:italic; margin:2em 0;}
#toc ol ol {list-style-type:lower-roman;}
#toc ol {list-style-type:decimal;}
#toc {list-style-type:upper-alpha;}
q {
behavior: url(fixquotes.htc); /* IE fix */
quotes: '\201C' '\201D' '\2018' '\2019';
}
q:before {
content: open-quote;
}
q:after {
content: close-quote;
}
/* Marks off implementation details interesting only to the person writing
the class described in the spec. */
.technical {margin-left:2em; }
.technical:before {content:"Technical note: "; font-weight:bold; color:#061; }
/* Marks off sections that are lacking. */
.fixme {margin-left:2em; }
.fixme:before {content:"Fix me: "; font-weight:bold; color:#C00; }
#applicability {margin: 1em 5%; font-style:italic;}
/* vim: et sw=4 sts=4 */

View File

@ -0,0 +1,86 @@
<?php
/**
* Decorator/extender XSLT processor specifically for HTML documents.
*/
class ConfigDoc_HTMLXSLTProcessor
{
/**
* Instance of XSLTProcessor
*/
protected $xsltProcessor;
public function __construct($proc = false) {
if ($proc === false) $proc = new XSLTProcessor();
$this->xsltProcessor = $proc;
}
/**
* @note Allows a string $xsl filename to be passed
*/
public function importStylesheet($xsl) {
if (is_string($xsl)) {
$xsl_file = $xsl;
$xsl = new DOMDocument();
$xsl->load($xsl_file);
}
return $this->xsltProcessor->importStylesheet($xsl);
}
/**
* Transforms an XML file into compatible XHTML based on the stylesheet
* @param $xml XML DOM tree, or string filename
* @return string HTML output
* @todo Rename to transformToXHTML, as transformToHTML is misleading
*/
public function transformToHTML($xml) {
if (is_string($xml)) {
$dom = new DOMDocument();
$dom->load($xml);
} else {
$dom = $xml;
}
$out = $this->xsltProcessor->transformToXML($dom);
// fudges for HTML backwards compatibility
// assumes that document is XHTML
$out = str_replace('/>', ' />', $out); // <br /> not <br/>
$out = str_replace(' xmlns=""', '', $out); // rm unnecessary xmlns
if (class_exists('Tidy')) {
// cleanup output
$config = array(
'indent' => true,
'output-xhtml' => true,
'wrap' => 80
);
$tidy = new Tidy;
$tidy->parseString($out, $config, 'utf8');
$tidy->cleanRepair();
$out = (string) $tidy;
}
return $out;
}
/**
* Bulk sets parameters for the XSL stylesheet
* @param array $options Associative array of options to set
*/
public function setParameters($options) {
foreach ($options as $name => $value) {
$this->xsltProcessor->setParameter('', $name, $value);
}
}
/**
* Forward any other calls to the XSLT processor
*/
public function __call($name, $arguments) {
call_user_func_array(array($this->xsltProcessor, $name), $arguments);
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,157 @@
<?php
/**
* Filesystem tools not provided by default; can recursively create, copy
* and delete folders. Some template methods are provided for extensibility.
*
* @note This class must be instantiated to be used, although it does
* not maintain state.
*/
class FSTools
{
private static $singleton;
/**
* Returns a global instance of FSTools
*/
static public function singleton() {
if (empty(FSTools::$singleton)) FSTools::$singleton = new FSTools();
return FSTools::$singleton;
}
/**
* Sets our global singleton to something else; useful for overloading
* functions.
*/
static public function setSingleton($singleton) {
FSTools::$singleton = $singleton;
}
/**
* Recursively creates a directory
* @param string $folder Name of folder to create
* @note Adapted from the PHP manual comment 76612
*/
public function mkdirr($folder) {
$folders = preg_split("#[\\\\/]#", $folder);
$base = '';
for($i = 0, $c = count($folders); $i < $c; $i++) {
if(empty($folders[$i])) {
if (!$i) {
// special case for root level
$base .= DIRECTORY_SEPARATOR;
}
continue;
}
$base .= $folders[$i];
if(!is_dir($base)){
$this->mkdir($base);
}
$base .= DIRECTORY_SEPARATOR;
}
}
/**
* Copy a file, or recursively copy a folder and its contents; modified
* so that copied files, if PHP, have includes removed
* @note Adapted from http://aidanlister.com/repos/v/function.copyr.php
*/
public function copyr($source, $dest) {
// Simple copy for a file
if (is_file($source)) {
return $this->copy($source, $dest);
}
// Make destination directory
if (!is_dir($dest)) {
$this->mkdir($dest);
}
// Loop through the folder
$dir = $this->dir($source);
while ( false !== ($entry = $dir->read()) ) {
// Skip pointers
if ($entry == '.' || $entry == '..') {
continue;
}
if (!$this->copyable($entry)) {
continue;
}
// Deep copy directories
if ($dest !== "$source/$entry") {
$this->copyr("$source/$entry", "$dest/$entry");
}
}
// Clean up
$dir->close();
return true;
}
/**
* Overloadable function that tests a filename for copyability. By
* default, everything should be copied; you can restrict things to
* ignore hidden files, unreadable files, etc. This function
* applies to copyr().
*/
public function copyable($file) {
return true;
}
/**
* Delete a file, or a folder and its contents
* @note Adapted from http://aidanlister.com/repos/v/function.rmdirr.php
*/
public function rmdirr($dirname)
{
// Sanity check
if (!$this->file_exists($dirname)) {
return false;
}
// Simple delete for a file
if ($this->is_file($dirname) || $this->is_link($dirname)) {
return $this->unlink($dirname);
}
// Loop through the folder
$dir = $this->dir($dirname);
while (false !== $entry = $dir->read()) {
// Skip pointers
if ($entry == '.' || $entry == '..') {
continue;
}
// Recurse
$this->rmdirr($dirname . DIRECTORY_SEPARATOR . $entry);
}
// Clean up
$dir->close();
return $this->rmdir($dirname);
}
/**
* Recursively globs a directory.
*/
public function globr($dir, $pattern, $flags = NULL) {
$files = $this->glob("$dir/$pattern", $flags);
if ($files === false) $files = array();
$sub_dirs = $this->glob("$dir/*", GLOB_ONLYDIR);
if ($sub_dirs === false) $sub_dirs = array();
foreach ($sub_dirs as $sub_dir) {
$sub_files = $this->globr($sub_dir, $pattern, $flags);
$files = array_merge($files, $sub_files);
}
return $files;
}
/**
* Allows for PHP functions to be called and be stubbed.
* @warning This function will not work for functions that need
* to pass references; manually define a stub function for those.
*/
public function __call($name, $args) {
return call_user_func_array($name, $args);
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,126 @@
<?php
/**
* Represents a file in the filesystem
*
* @warning Be sure to distinguish between get() and write() versus
* read() and put(), the former operates on the entire file, while
* the latter operates on a handle.
*/
class FSTools_File
{
/** Filename of file this object represents */
protected $name;
/** Handle for the file */
protected $handle = false;
/** Instance of FSTools for interfacing with filesystem */
protected $fs;
/**
* Filename of file you wish to instantiate.
* @note This file need not exist
*/
public function __construct($name, $fs = false) {
$this->name = $name;
$this->fs = $fs ? $fs : FSTools::singleton();
}
/** Returns the filename of the file. */
public function getName() {return $this->name;}
/** Returns directory of the file without trailing slash */
public function getDirectory() {return $this->fs->dirname($this->name);}
/**
* Retrieves the contents of a file
* @todo Throw an exception if file doesn't exist
*/
public function get() {
return $this->fs->file_get_contents($this->name);
}
/** Writes contents to a file, creates new file if necessary */
public function write($contents) {
return $this->fs->file_put_contents($this->name, $contents);
}
/** Deletes the file */
public function delete() {
return $this->fs->unlink($this->name);
}
/** Returns true if file exists and is a file. */
public function exists() {
return $this->fs->is_file($this->name);
}
/** Returns last file modification time */
public function getMTime() {
return $this->fs->filemtime($this->name);
}
/**
* Chmod a file
* @note We ignore errors because of some weird owner trickery due
* to SVN duality
*/
public function chmod($octal_code) {
return @$this->fs->chmod($this->name, $octal_code);
}
/** Opens file's handle */
public function open($mode) {
if ($this->handle) $this->close();
$this->handle = $this->fs->fopen($this->name, $mode);
return true;
}
/** Closes file's handle */
public function close() {
if (!$this->handle) return false;
$status = $this->fs->fclose($this->handle);
$this->handle = false;
return $status;
}
/** Retrieves a line from an open file, with optional max length $length */
public function getLine($length = null) {
if (!$this->handle) $this->open('r');
if ($length === null) return $this->fs->fgets($this->handle);
else return $this->fs->fgets($this->handle, $length);
}
/** Retrieves a character from an open file */
public function getChar() {
if (!$this->handle) $this->open('r');
return $this->fs->fgetc($this->handle);
}
/** Retrieves an $length bytes of data from an open data */
public function read($length) {
if (!$this->handle) $this->open('r');
return $this->fs->fread($this->handle, $length);
}
/** Writes to an open file */
public function put($string) {
if (!$this->handle) $this->open('a');
return $this->fs->fwrite($this->handle, $string);
}
/** Returns TRUE if the end of the file has been reached */
public function eof() {
if (!$this->handle) return true;
return $this->fs->feof($this->handle);
}
public function __destruct() {
if ($this->handle) $this->close();
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,11 @@
<?php
/**
* This is a stub include that automatically configures the include path.
*/
set_include_path(dirname(__FILE__) . PATH_SEPARATOR . get_include_path() );
require_once 'HTMLPurifierExtras.php';
require_once 'HTMLPurifierExtras.autoload.php';
// vim: et sw=4 sts=4

View File

@ -0,0 +1,25 @@
<?php
/**
* @file
* Convenience file that registers autoload handler for HTML Purifier.
*
* @warning
* This autoloader does not contain the compatibility code seen in
* HTMLPurifier_Bootstrap; the user is expected to make any necessary
* changes to use this library.
*/
if (function_exists('spl_autoload_register')) {
spl_autoload_register(array('HTMLPurifierExtras', 'autoload'));
if (function_exists('__autoload')) {
// Be polite and ensure that userland autoload gets retained
spl_autoload_register('__autoload');
}
} elseif (!function_exists('__autoload')) {
function __autoload($class) {
return HTMLPurifierExtras::autoload($class);
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,29 @@
<?php
/**
* Meta-class for HTML Purifier's extra class hierarchies, similar to
* HTMLPurifier_Bootstrap.
*/
class HTMLPurifierExtras
{
public static function autoload($class) {
$path = HTMLPurifierExtras::getPath($class);
if (!$path) return false;
require $path;
return true;
}
public static function getPath($class) {
if (
strncmp('FSTools', $class, 7) !== 0 &&
strncmp('ConfigDoc', $class, 9) !== 0
) return false;
// Custom implementations can go here
// Standard implementation:
return str_replace('_', '/', $class) . '.php';
}
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,32 @@
HTML Purifier Extras
The Method Behind The Madness!
The extras/ folder in HTML Purifier contains--you guessed it--extra things
for HTML Purifier. Specifically, these are two extra libraries called
FSTools and ConfigSchema. They're extra for a reason: you don't need them
if you're using HTML Purifier for normal usage: filtering HTML. However,
if you're a developer, and would like to test HTML Purifier, or need to
use one of HTML Purifier's maintenance scripts, chances are they'll need
these libraries. Who knows: maybe you'll find them useful too!
Here are the libraries:
FSTools
-------
Short for File System Tools, this is a poor-man's object-oriented wrapper for
the filesystem. It currently consists of two classes:
- FSTools: This is a singleton that contains a manner of useful functions
such as recursive glob, directory removal, etc, as well as the ability
to call arbitrary native PHP functions through it like $FS->fopen(...).
This makes it a lot simpler to mock these filesystem calls for unit testing.
- FSTools_File: This object represents a single file, and has almost any
method imaginable one would need.
Check the files themselves for more information.
vim: et sw=4 sts=4

View File

@ -0,0 +1,11 @@
<?php
/**
* This is a stub include that automatically configures the include path.
*/
set_include_path(dirname(__FILE__) . PATH_SEPARATOR . get_include_path() );
require_once 'HTMLPurifier/Bootstrap.php';
require_once 'HTMLPurifier.autoload.php';
// vim: et sw=4 sts=4

View File

@ -0,0 +1,26 @@
<?php
/**
* @file
* Convenience file that registers autoload handler for HTML Purifier.
* It also does some sanity checks.
*/
if (function_exists('spl_autoload_register') && function_exists('spl_autoload_unregister')) {
// We need unregister for our pre-registering functionality
HTMLPurifier_Bootstrap::registerAutoload();
if (function_exists('__autoload')) {
// Be polite and ensure that userland autoload gets retained
spl_autoload_register('__autoload');
}
} elseif (!function_exists('__autoload')) {
function __autoload($class) {
return HTMLPurifier_Bootstrap::autoload($class);
}
}
if (ini_get('zend.ze1_compatibility_mode')) {
trigger_error("HTML Purifier is not compatible with zend.ze1_compatibility_mode; please turn it off", E_USER_ERROR);
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,4 @@
<?php
if (!defined('HTMLPURIFIER_PREFIX')) {
define('HTMLPURIFIER_PREFIX', __DIR__);
}

View File

@ -0,0 +1,23 @@
<?php
/**
* @file
* Defines a function wrapper for HTML Purifier for quick use.
* @note ''HTMLPurifier()'' is NOT the same as ''new HTMLPurifier()''
*/
/**
* Purify HTML.
* @param $html String HTML to purify
* @param $config Configuration to use, can be any value accepted by
* HTMLPurifier_Config::create()
*/
function HTMLPurifier($html, $config = null) {
static $purifier = false;
if (!$purifier) {
$purifier = new HTMLPurifier();
}
return $purifier->purify($html, $config);
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,222 @@
<?php
/**
* @file
* This file was auto-generated by generate-includes.php and includes all of
* the core files required by HTML Purifier. Use this if performance is a
* primary concern and you are using an opcode cache. PLEASE DO NOT EDIT THIS
* FILE, changes will be overwritten the next time the script is run.
*
* @version 4.5.0
*
* @warning
* You must *not* include any other HTML Purifier files before this file,
* because 'require' not 'require_once' is used.
*
* @warning
* This file requires that the include path contains the HTML Purifier
* library directory; this is not auto-set.
*/
require 'HTMLPurifier.php';
require 'HTMLPurifier/AttrCollections.php';
require 'HTMLPurifier/AttrDef.php';
require 'HTMLPurifier/AttrTransform.php';
require 'HTMLPurifier/AttrTypes.php';
require 'HTMLPurifier/AttrValidator.php';
require 'HTMLPurifier/Bootstrap.php';
require 'HTMLPurifier/Definition.php';
require 'HTMLPurifier/CSSDefinition.php';
require 'HTMLPurifier/ChildDef.php';
require 'HTMLPurifier/Config.php';
require 'HTMLPurifier/ConfigSchema.php';
require 'HTMLPurifier/ContentSets.php';
require 'HTMLPurifier/Context.php';
require 'HTMLPurifier/DefinitionCache.php';
require 'HTMLPurifier/DefinitionCacheFactory.php';
require 'HTMLPurifier/Doctype.php';
require 'HTMLPurifier/DoctypeRegistry.php';
require 'HTMLPurifier/ElementDef.php';
require 'HTMLPurifier/Encoder.php';
require 'HTMLPurifier/EntityLookup.php';
require 'HTMLPurifier/EntityParser.php';
require 'HTMLPurifier/ErrorCollector.php';
require 'HTMLPurifier/ErrorStruct.php';
require 'HTMLPurifier/Exception.php';
require 'HTMLPurifier/Filter.php';
require 'HTMLPurifier/Generator.php';
require 'HTMLPurifier/HTMLDefinition.php';
require 'HTMLPurifier/HTMLModule.php';
require 'HTMLPurifier/HTMLModuleManager.php';
require 'HTMLPurifier/IDAccumulator.php';
require 'HTMLPurifier/Injector.php';
require 'HTMLPurifier/Language.php';
require 'HTMLPurifier/LanguageFactory.php';
require 'HTMLPurifier/Length.php';
require 'HTMLPurifier/Lexer.php';
require 'HTMLPurifier/PercentEncoder.php';
require 'HTMLPurifier/PropertyList.php';
require 'HTMLPurifier/PropertyListIterator.php';
require 'HTMLPurifier/Strategy.php';
require 'HTMLPurifier/StringHash.php';
require 'HTMLPurifier/StringHashParser.php';
require 'HTMLPurifier/TagTransform.php';
require 'HTMLPurifier/Token.php';
require 'HTMLPurifier/TokenFactory.php';
require 'HTMLPurifier/URI.php';
require 'HTMLPurifier/URIDefinition.php';
require 'HTMLPurifier/URIFilter.php';
require 'HTMLPurifier/URIParser.php';
require 'HTMLPurifier/URIScheme.php';
require 'HTMLPurifier/URISchemeRegistry.php';
require 'HTMLPurifier/UnitConverter.php';
require 'HTMLPurifier/VarParser.php';
require 'HTMLPurifier/VarParserException.php';
require 'HTMLPurifier/AttrDef/CSS.php';
require 'HTMLPurifier/AttrDef/Clone.php';
require 'HTMLPurifier/AttrDef/Enum.php';
require 'HTMLPurifier/AttrDef/Integer.php';
require 'HTMLPurifier/AttrDef/Lang.php';
require 'HTMLPurifier/AttrDef/Switch.php';
require 'HTMLPurifier/AttrDef/Text.php';
require 'HTMLPurifier/AttrDef/URI.php';
require 'HTMLPurifier/AttrDef/CSS/Number.php';
require 'HTMLPurifier/AttrDef/CSS/AlphaValue.php';
require 'HTMLPurifier/AttrDef/CSS/Background.php';
require 'HTMLPurifier/AttrDef/CSS/BackgroundPosition.php';
require 'HTMLPurifier/AttrDef/CSS/Border.php';
require 'HTMLPurifier/AttrDef/CSS/Color.php';
require 'HTMLPurifier/AttrDef/CSS/Composite.php';
require 'HTMLPurifier/AttrDef/CSS/DenyElementDecorator.php';
require 'HTMLPurifier/AttrDef/CSS/Filter.php';
require 'HTMLPurifier/AttrDef/CSS/Font.php';
require 'HTMLPurifier/AttrDef/CSS/FontFamily.php';
require 'HTMLPurifier/AttrDef/CSS/Ident.php';
require 'HTMLPurifier/AttrDef/CSS/ImportantDecorator.php';
require 'HTMLPurifier/AttrDef/CSS/Length.php';
require 'HTMLPurifier/AttrDef/CSS/ListStyle.php';
require 'HTMLPurifier/AttrDef/CSS/Multiple.php';
require 'HTMLPurifier/AttrDef/CSS/Percentage.php';
require 'HTMLPurifier/AttrDef/CSS/TextDecoration.php';
require 'HTMLPurifier/AttrDef/CSS/URI.php';
require 'HTMLPurifier/AttrDef/HTML/Bool.php';
require 'HTMLPurifier/AttrDef/HTML/Nmtokens.php';
require 'HTMLPurifier/AttrDef/HTML/Class.php';
require 'HTMLPurifier/AttrDef/HTML/Color.php';
require 'HTMLPurifier/AttrDef/HTML/FrameTarget.php';
require 'HTMLPurifier/AttrDef/HTML/ID.php';
require 'HTMLPurifier/AttrDef/HTML/Pixels.php';
require 'HTMLPurifier/AttrDef/HTML/Length.php';
require 'HTMLPurifier/AttrDef/HTML/LinkTypes.php';
require 'HTMLPurifier/AttrDef/HTML/MultiLength.php';
require 'HTMLPurifier/AttrDef/URI/Email.php';
require 'HTMLPurifier/AttrDef/URI/Host.php';
require 'HTMLPurifier/AttrDef/URI/IPv4.php';
require 'HTMLPurifier/AttrDef/URI/IPv6.php';
require 'HTMLPurifier/AttrDef/URI/Email/SimpleCheck.php';
require 'HTMLPurifier/AttrTransform/Background.php';
require 'HTMLPurifier/AttrTransform/BdoDir.php';
require 'HTMLPurifier/AttrTransform/BgColor.php';
require 'HTMLPurifier/AttrTransform/BoolToCSS.php';
require 'HTMLPurifier/AttrTransform/Border.php';
require 'HTMLPurifier/AttrTransform/EnumToCSS.php';
require 'HTMLPurifier/AttrTransform/ImgRequired.php';
require 'HTMLPurifier/AttrTransform/ImgSpace.php';
require 'HTMLPurifier/AttrTransform/Input.php';
require 'HTMLPurifier/AttrTransform/Lang.php';
require 'HTMLPurifier/AttrTransform/Length.php';
require 'HTMLPurifier/AttrTransform/Name.php';
require 'HTMLPurifier/AttrTransform/NameSync.php';
require 'HTMLPurifier/AttrTransform/Nofollow.php';
require 'HTMLPurifier/AttrTransform/SafeEmbed.php';
require 'HTMLPurifier/AttrTransform/SafeObject.php';
require 'HTMLPurifier/AttrTransform/SafeParam.php';
require 'HTMLPurifier/AttrTransform/ScriptRequired.php';
require 'HTMLPurifier/AttrTransform/TargetBlank.php';
require 'HTMLPurifier/AttrTransform/Textarea.php';
require 'HTMLPurifier/ChildDef/Chameleon.php';
require 'HTMLPurifier/ChildDef/Custom.php';
require 'HTMLPurifier/ChildDef/Empty.php';
require 'HTMLPurifier/ChildDef/List.php';
require 'HTMLPurifier/ChildDef/Required.php';
require 'HTMLPurifier/ChildDef/Optional.php';
require 'HTMLPurifier/ChildDef/StrictBlockquote.php';
require 'HTMLPurifier/ChildDef/Table.php';
require 'HTMLPurifier/DefinitionCache/Decorator.php';
require 'HTMLPurifier/DefinitionCache/Null.php';
require 'HTMLPurifier/DefinitionCache/Serializer.php';
require 'HTMLPurifier/DefinitionCache/Decorator/Cleanup.php';
require 'HTMLPurifier/DefinitionCache/Decorator/Memory.php';
require 'HTMLPurifier/HTMLModule/Bdo.php';
require 'HTMLPurifier/HTMLModule/CommonAttributes.php';
require 'HTMLPurifier/HTMLModule/Edit.php';
require 'HTMLPurifier/HTMLModule/Forms.php';
require 'HTMLPurifier/HTMLModule/Hypertext.php';
require 'HTMLPurifier/HTMLModule/Iframe.php';
require 'HTMLPurifier/HTMLModule/Image.php';
require 'HTMLPurifier/HTMLModule/Legacy.php';
require 'HTMLPurifier/HTMLModule/List.php';
require 'HTMLPurifier/HTMLModule/Name.php';
require 'HTMLPurifier/HTMLModule/Nofollow.php';
require 'HTMLPurifier/HTMLModule/NonXMLCommonAttributes.php';
require 'HTMLPurifier/HTMLModule/Object.php';
require 'HTMLPurifier/HTMLModule/Presentation.php';
require 'HTMLPurifier/HTMLModule/Proprietary.php';
require 'HTMLPurifier/HTMLModule/Ruby.php';
require 'HTMLPurifier/HTMLModule/SafeEmbed.php';
require 'HTMLPurifier/HTMLModule/SafeObject.php';
require 'HTMLPurifier/HTMLModule/SafeScripting.php';
require 'HTMLPurifier/HTMLModule/Scripting.php';
require 'HTMLPurifier/HTMLModule/StyleAttribute.php';
require 'HTMLPurifier/HTMLModule/Tables.php';
require 'HTMLPurifier/HTMLModule/Target.php';
require 'HTMLPurifier/HTMLModule/TargetBlank.php';
require 'HTMLPurifier/HTMLModule/Text.php';
require 'HTMLPurifier/HTMLModule/Tidy.php';
require 'HTMLPurifier/HTMLModule/XMLCommonAttributes.php';
require 'HTMLPurifier/HTMLModule/Tidy/Name.php';
require 'HTMLPurifier/HTMLModule/Tidy/Proprietary.php';
require 'HTMLPurifier/HTMLModule/Tidy/XHTMLAndHTML4.php';
require 'HTMLPurifier/HTMLModule/Tidy/Strict.php';
require 'HTMLPurifier/HTMLModule/Tidy/Transitional.php';
require 'HTMLPurifier/HTMLModule/Tidy/XHTML.php';
require 'HTMLPurifier/Injector/AutoParagraph.php';
require 'HTMLPurifier/Injector/DisplayLinkURI.php';
require 'HTMLPurifier/Injector/Linkify.php';
require 'HTMLPurifier/Injector/PurifierLinkify.php';
require 'HTMLPurifier/Injector/RemoveEmpty.php';
require 'HTMLPurifier/Injector/RemoveSpansWithoutAttributes.php';
require 'HTMLPurifier/Injector/SafeObject.php';
require 'HTMLPurifier/Lexer/DOMLex.php';
require 'HTMLPurifier/Lexer/DirectLex.php';
require 'HTMLPurifier/Strategy/Composite.php';
require 'HTMLPurifier/Strategy/Core.php';
require 'HTMLPurifier/Strategy/FixNesting.php';
require 'HTMLPurifier/Strategy/MakeWellFormed.php';
require 'HTMLPurifier/Strategy/RemoveForeignElements.php';
require 'HTMLPurifier/Strategy/ValidateAttributes.php';
require 'HTMLPurifier/TagTransform/Font.php';
require 'HTMLPurifier/TagTransform/Simple.php';
require 'HTMLPurifier/Token/Comment.php';
require 'HTMLPurifier/Token/Tag.php';
require 'HTMLPurifier/Token/Empty.php';
require 'HTMLPurifier/Token/End.php';
require 'HTMLPurifier/Token/Start.php';
require 'HTMLPurifier/Token/Text.php';
require 'HTMLPurifier/URIFilter/DisableExternal.php';
require 'HTMLPurifier/URIFilter/DisableExternalResources.php';
require 'HTMLPurifier/URIFilter/DisableResources.php';
require 'HTMLPurifier/URIFilter/HostBlacklist.php';
require 'HTMLPurifier/URIFilter/MakeAbsolute.php';
require 'HTMLPurifier/URIFilter/Munge.php';
require 'HTMLPurifier/URIFilter/SafeIframe.php';
require 'HTMLPurifier/URIScheme/data.php';
require 'HTMLPurifier/URIScheme/file.php';
require 'HTMLPurifier/URIScheme/ftp.php';
require 'HTMLPurifier/URIScheme/http.php';
require 'HTMLPurifier/URIScheme/https.php';
require 'HTMLPurifier/URIScheme/mailto.php';
require 'HTMLPurifier/URIScheme/news.php';
require 'HTMLPurifier/URIScheme/nntp.php';
require 'HTMLPurifier/VarParser/Flexible.php';
require 'HTMLPurifier/VarParser/Native.php';

View File

@ -0,0 +1,30 @@
<?php
/**
* @file
* Emulation layer for code that used kses(), substituting in HTML Purifier.
*/
require_once dirname(__FILE__) . '/HTMLPurifier.auto.php';
function kses($string, $allowed_html, $allowed_protocols = null) {
$config = HTMLPurifier_Config::createDefault();
$allowed_elements = array();
$allowed_attributes = array();
foreach ($allowed_html as $element => $attributes) {
$allowed_elements[$element] = true;
foreach ($attributes as $attribute => $x) {
$allowed_attributes["$element.$attribute"] = true;
}
}
$config->set('HTML.AllowedElements', $allowed_elements);
$config->set('HTML.AllowedAttributes', $allowed_attributes);
$allowed_schemes = array();
if ($allowed_protocols !== null) {
$config->set('URI.AllowedSchemes', $allowed_protocols);
}
$purifier = new HTMLPurifier($config);
return $purifier->purify($string);
}
// vim: et sw=4 sts=4

View File

@ -0,0 +1,11 @@
<?php
/**
* @file
* Convenience stub file that adds HTML Purifier's library file to the path
* without any other side-effects.
*/
set_include_path(dirname(__FILE__) . PATH_SEPARATOR . get_include_path() );
// vim: et sw=4 sts=4

View File

@ -0,0 +1,237 @@
<?php
/*! @mainpage
*
* HTML Purifier is an HTML filter that will take an arbitrary snippet of
* HTML and rigorously test, validate and filter it into a version that
* is safe for output onto webpages. It achieves this by:
*
* -# Lexing (parsing into tokens) the document,
* -# Executing various strategies on the tokens:
* -# Removing all elements not in the whitelist,
* -# Making the tokens well-formed,
* -# Fixing the nesting of the nodes, and
* -# Validating attributes of the nodes; and
* -# Generating HTML from the purified tokens.
*
* However, most users will only need to interface with the HTMLPurifier
* and HTMLPurifier_Config.
*/
/*
HTML Purifier 4.5.0 - Standards Compliant HTML Filtering
Copyright (C) 2006-2008 Edward Z. Yang
This library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public
License along with this library; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
*/
/**
* Facade that coordinates HTML Purifier's subsystems in order to purify HTML.
*
* @note There are several points in which configuration can be specified
* for HTML Purifier. The precedence of these (from lowest to
* highest) is as follows:
* -# Instance: new HTMLPurifier($config)
* -# Invocation: purify($html, $config)
* These configurations are entirely independent of each other and
* are *not* merged (this behavior may change in the future).
*
* @todo We need an easier way to inject strategies using the configuration
* object.
*/
class HTMLPurifier
{
/** Version of HTML Purifier */
public $version = '4.5.0';
/** Constant with version of HTML Purifier */
const VERSION = '4.5.0';
/** Global configuration object */
public $config;
/** Array of extra HTMLPurifier_Filter objects to run on HTML, for backwards compatibility */
private $filters = array();
/** Single instance of HTML Purifier */
private static $instance;
protected $strategy, $generator;
/**
* Resultant HTMLPurifier_Context of last run purification. Is an array
* of contexts if the last called method was purifyArray().
*/
public $context;
/**
* Initializes the purifier.
* @param $config Optional HTMLPurifier_Config object for all instances of
* the purifier, if omitted, a default configuration is
* supplied (which can be overridden on a per-use basis).
* The parameter can also be any type that
* HTMLPurifier_Config::create() supports.
*/
public function __construct($config = null) {
$this->config = HTMLPurifier_Config::create($config);
$this->strategy = new HTMLPurifier_Strategy_Core();
}
/**
* Adds a filter to process the output. First come first serve
* @param $filter HTMLPurifier_Filter object
*/
public function addFilter($filter) {
trigger_error('HTMLPurifier->addFilter() is deprecated, use configuration directives in the Filter namespace or Filter.Custom', E_USER_WARNING);
$this->filters[] = $filter;
}
/**
* Filters an HTML snippet/document to be XSS-free and standards-compliant.
*
* @param $html String of HTML to purify
* @param $config HTMLPurifier_Config object for this operation, if omitted,
* defaults to the config object specified during this
* object's construction. The parameter can also be any type
* that HTMLPurifier_Config::create() supports.
* @return Purified HTML
*/
public function purify($html, $config = null) {
// :TODO: make the config merge in, instead of replace
$config = $config ? HTMLPurifier_Config::create($config) : $this->config;
// implementation is partially environment dependant, partially
// configuration dependant
$lexer = HTMLPurifier_Lexer::create($config);
$context = new HTMLPurifier_Context();
// setup HTML generator
$this->generator = new HTMLPurifier_Generator($config, $context);
$context->register('Generator', $this->generator);
// set up global context variables
if ($config->get('Core.CollectErrors')) {
// may get moved out if other facilities use it
$language_factory = HTMLPurifier_LanguageFactory::instance();
$language = $language_factory->create($config, $context);
$context->register('Locale', $language);
$error_collector = new HTMLPurifier_ErrorCollector($context);
$context->register('ErrorCollector', $error_collector);
}
// setup id_accumulator context, necessary due to the fact that
// AttrValidator can be called from many places
$id_accumulator = HTMLPurifier_IDAccumulator::build($config, $context);
$context->register('IDAccumulator', $id_accumulator);
$html = HTMLPurifier_Encoder::convertToUTF8($html, $config, $context);
// setup filters
$filter_flags = $config->getBatch('Filter');
$custom_filters = $filter_flags['Custom'];
unset($filter_flags['Custom']);
$filters = array();
foreach ($filter_flags as $filter => $flag) {
if (!$flag) continue;
if (strpos($filter, '.') !== false) continue;
$class = "HTMLPurifier_Filter_$filter";
$filters[] = new $class;
}
foreach ($custom_filters as $filter) {
// maybe "HTMLPurifier_Filter_$filter", but be consistent with AutoFormat
$filters[] = $filter;
}
$filters = array_merge($filters, $this->filters);
// maybe prepare(), but later
for ($i = 0, $filter_size = count($filters); $i < $filter_size; $i++) {
$html = $filters[$i]->preFilter($html, $config, $context);
}
// purified HTML
$html =
$this->generator->generateFromTokens(
// list of tokens
$this->strategy->execute(
// list of un-purified tokens
$lexer->tokenizeHTML(
// un-purified HTML
$html, $config, $context
),
$config, $context
)
);
for ($i = $filter_size - 1; $i >= 0; $i--) {
$html = $filters[$i]->postFilter($html, $config, $context);
}
$html = HTMLPurifier_Encoder::convertFromUTF8($html, $config, $context);
$this->context =& $context;
return $html;
}
/**
* Filters an array of HTML snippets
* @param $config Optional HTMLPurifier_Config object for this operation.
* See HTMLPurifier::purify() for more details.
* @return Array of purified HTML
*/
public function purifyArray($array_of_html, $config = null) {
$context_array = array();
foreach ($array_of_html as $key => $html) {
$array_of_html[$key] = $this->purify($html, $config);
$context_array[$key] = $this->context;
}
$this->context = $context_array;
return $array_of_html;
}
/**
* Singleton for enforcing just one HTML Purifier in your system
* @param $prototype Optional prototype HTMLPurifier instance to
* overload singleton with, or HTMLPurifier_Config
* instance to configure the generated version with.
*/
public static function instance($prototype = null) {
if (!self::$instance || $prototype) {
if ($prototype instanceof HTMLPurifier) {
self::$instance = $prototype;
} elseif ($prototype) {
self::$instance = new HTMLPurifier($prototype);
} else {
self::$instance = new HTMLPurifier();
}
}
return self::$instance;
}
/**
* @note Backwards compatibility, see instance()
*/
public static function getInstance($prototype = null) {
return HTMLPurifier::instance($prototype);
}
}
// vim: et sw=4 sts=4

Some files were not shown because too many files have changed in this diff Show More