| Commit message (Collapse) | Author | Age |
|
|
|
|
|
| |
This patch adds a new configuration option to the URL_Title module so that
the bot configuration may declare a list of regular expressions to match on a
URL in order to determine if it is blacklisted.
|
| |
|
|
|
|
|
| |
This should hopefully address some intermittent blank/simple/default looking
pages returned by YouTube intermittently.
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
This patch prevents accidental leakage of content-type header and charset
between calls to get_title. Without clearing these, it is possible for a URL
title to be decoded from the wrong charset if a URL was previously titled
with a differing charset to the current one. This patch clears these stale
values to guarantee accurate charset decoding per URL.
|
|
|
|
|
|
|
| |
This patch adds the ability for URL_Title to fall back on the Content-Type
meta http-equiv tag, or failing that, the Content-Type HTTP header itself.
This should improve correctnes when dealing with HTML documents other than
HTML5.
|
|
|
|
|
|
|
|
|
| |
This patch moves the HTML entity decoding until after the raw bytes from the
HTML document are translated through charsets. Previously, entities were used
as decoded by the HTML parser into UTF-8, which meant that non-UTF-8-encoded
strings from documents could become mixed with UTF-8 characters, making
the subsequent character encoding transformation impossible to perform
correctly.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
| |
This fixes duplicate URL titles from a `title of` command, and will likely
find use in future.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* makes plugin config more private:
The config file now uses sections denoted with [Plugin::Foo] where plugin-
private config can be stored. Plugins are now passed the usual, as well
as a hashref for their own config section. They are also passed the config
section of the core, i.e. those config options not appearing in an explicit
section. Generally, these are used for bot-global options, so should be
accessible to all plugins, but plugin-specific config shall be hidden
* tries to improve parsing of hash-like strings and arrays
The previous mechanism of using regex to pull out possible tokens was only
ever meant to be temporary, and caused problems with escaping or
encapsulation inside strings. I have made steps on hash parsing to allow
tokens inside strings. Both array and hash parsing still to provide an
escape character to escape the item separator (,)
|
| |
|
|
|
|
| |
Also remove debug logging statements from Jinx.pm
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
Weird bugs with HeadParser, cannot debug and patch for upstream as yet
|
|
|
|
|
|
|
|
|
| |
From the HTML::HeadParser docs:
> Note that the HTML::HeadParser might get confused if raw undecoded UTF-8 is
> passed to the parse() method. Make sure the strings are properly decoded
> before passing them on.
This explains some hard-to-trace bugs with character mangling
|
|
|