Using Supybot’s utils module¶
Supybot provides a wealth of utilities for plugin writers in the supybot.utils module, this tutorial describes these utilities and shows you how to use them.
str.py¶
The Format Function¶
The supybot.utils.str module provides a bunch of utility functions for handling string values. This section contains a quick rundown of all of the functions available, along with descriptions of the arguments they take. First and foremost is the format function, which provides a lot of capability in just one function that uses string-formatting style to accomplish a lot. So much so that it gets its own section in this tutorial. All other functions will be in other sections. format takes several arguments - first, the format string (using the format characters described below), and then after that, each individual item to be formatted. Do not attempt to use the % operator to do the formatting because that will fall back on the normal string formatting operator. The format function uses the following string formatting characters.
% - literal
%
i - integer
s - string
f - float
r - repr
b - form of the verb
to be
(takes an int)h - form of the verb
to have
(takes an int)L - commaAndify (takes a list of strings or a tuple of ([strings], and))
p - pluralize (takes a string)
q - quoted (takes a string)
n - n items (takes a 2-tuple of (n, item) or a 3-tuple of (n, between, item))
S - a human-readable size (takes an int)
t - time, formatted (takes an int)
T - time delta, formatted (takes an int)
u - url, wrapped in braces
v - void, takes one or many arguments, but doesn’t display it (useful for translation)
Here are a few examples to help elaborate on the above descriptions:
>>> format("Error %q has been reported %n. For more information, see %u.",
"AttributeError", (5, "time"), "https://limnoria.net")
'Error "AttributeError" has been reported 5 times. For more information,
see <https://limnoria.net>.'
>>> i = 4
>>> format("There %b %n at this time. You are only allowed %n at any given
time", i, (i, "active", "thread"), (5, "active", "thread"))
'There are 4 active threads at this time. You are only allowed 5 active
threads at any given time'
>>> i = 1
>>> format("There %b %n at this time. You are only allowed %n at any given
time", i, (i, "active", "thread"), (5, "active", "thread"))
'There is 1 active thread at this time. You are only allowed 5 active
threads at any given time'
>>> ops = ["foo", "bar", "baz"]
>>> format("The following %n %h the %s capability: %L", (len(ops), "user"),
len(ops), "op", ops)
'The following 3 users have the op capability: foo, bar, and baz'
As you can see, you can combine all sorts of combinations of formatting strings into one. In fact, that was the major motivation behind format. We have specific functions that you can use individually for each of those formatting types, but it became much easier just to use special formatting chars and the format function than concatenating a bunch of strings that were the result of other utils.str functions.
The Other Functions¶
These are the functions that can’t be handled by format. They are sorted in what I perceive to be the general order of usefulness (and I’m leaving the ones covered by format for the next section).
ellipsisify(s, n) - Returns a shortened version of a string. Produces up to the first n chars at the nearest word boundary.
s: the string to be shortened
n: the number of characters to shorten it to
perlReToPythonRe(s) - Converts a Perl-style regexp (e.g., “/abcd/i” or “m/abcd/i”) to an actual Python regexp (an re object)
s: the regexp string
perlReToReplacer(s) - converts a perl-style replacement regexp (eg, “s/foo/bar/g”) to a Python function that performs such a replacement
s: the regexp string
dqrepr(s) - Returns a repr() of s guaranteed to be in double quotes. (Double Quote Repr)
s: the string to be double-quote repr()’ed
toBool(s) - Determines whether or not a string means True or False and returns the appropriate boolean value. True is any of “true”, “on”, “enable”, “enabled”, or “1”. False is any of “false”, “off”, “disable”, “disabled”, or “0”.
s: the string to determine the boolean value for
rsplit(s, sep=None, maxsplit=-1) - functionally the same as str.split in the Python standard library except splitting from the right instead of the left. Python 2.4 has str.rsplit (which this function defers to for those versions >= 2.4), but Python 2.3 did not.
s: the string to be split
sep: the separator to split on, defaults to whitespace
maxsplit: the maximum number of splits to perform, -1 splits all possible splits.
normalizeWhitespace(s) - reduces all multi-spaces in a string to a single space
s: the string to normalize
depluralize(s) - the opposite of pluralize
s: the string to depluralize
unCommaThe(s) - Takes a string of the form “foo, the” and turns it into “the foo”
s: string, the
distance(s, t) - computes the levenshtein distance (or “edit distance”) between two strings
s: the first string
t: the second string
soundex(s, length=4) - computes the soundex for a given string
s: the string to compute the soundex for
length: the length of the soundex to generate
matchCase(s1, s2) - Matches the case of the first string in the second string.
s1: the first string
s2: the string which will be made to match the case of the first
The Commands Format Already Covers¶
These commands aren’t necessary because you can achieve them more easily by using the format command, but they exist if you decide you want to use them anyway though it is greatly discouraged for general use.
commaAndify(seq, comma=”,”, And=”and”) - transforms a list of items into a comma separated list with an “and” preceding the last element. For example, [“foo”, “bar”, “baz”] becomes “foo, bar, and baz”. Is smart enough to convert two-element lists to just “item1 and item2” as well.
seq: the sequence of items (don’t have to be strings, but need to be ‘str()’-able)
comma: the character to use to separate the list
And: the word to use before the last element
pluralize(s) - Returns the plural of a string. Put any exceptions to the general English rules of pluralization in the plurals dictionary in supybot.utils.str.
s: the string to pluralize
nItems(n, item, between=None) - returns a string that describes a given number of an item (with any string between the actual number and the item itself), handles pluralization with the pluralize function above. Note that the arguments here are in a different order since between is optional.
n: the number of items
item: the type of item
between: the optional string that goes between the number and the type of item
quoted(s) - Returns the string surrounded by double-quotes.
s: the string to quote
be(i) - Returns the proper form of the verb “to be” based on the number provided (be(1) is “is”, be(anything else) is “are”)
i: the number of things that “be”
has(i) - Returns the proper form of the verb “to have” based on the number provided (has(1) is “has”, has(anything else) is “have”)
i: the number of things that “has”
structures.py¶
Intro¶
This module provides a number of useful data structures that aren’t found in the standard Python library. For the most part they were created as needed for the bot and plugins themselves, but they were created in such a way as to be of general use for anyone who needs a data structure that performs a like duty. As usual in this document, I’ll try and order these in order of usefulness, starting with the most useful.
The queue classes¶
The structures module provides two general-purpose queue classes for you to use. The “queue” class is a robust full-featured queue that scales up to larger sized queues. The “smallqueue” class is for queues that will contain fewer (less than 1000 or so) items. Both offer the same common interface, which consists of:
a constructor which will optionally accept a sequence to start the queue off with
enqueue(item) - adds an item to the back of the queue
dequeue() - removes (and returns) the item from the front of the queue
peek() - returns the item from the front of the queue without removing it
reset() - empties the queue entirely
In addition to these general-use queue classes, there are two other more specialized queue classes as well. The first is the “TimeoutQueue” which holds a queue of items until they reach a certain age and then they are removed from the queue. It features the following:
TimeoutQueue(timeout, queue=None) - you must specify the timeout (in seconds) in the constructor. Note that you can also optionally pass it a queue which uses any implementation you wish to use whether it be one of the above (queue or smallqueue) or if it’s some custom queue you create that implements the same interface. If you don’t pass it a queue instance to use, it will build its own using smallqueue.
reset(), enqueue(item), dequeue() - all same as above queue classes
setTimeout(secs) - allows you to change the timeout value
And for the final queue class, there’s the “MaxLengthQueue” class. As you may have guessed, it’s a queue that is capped at a certain specified length. It features the following:
MaxLengthQueue(length, seq=()) - the constructor naturally requires that you set the max length and it allows you to optionally pass in a sequence to be used as the starting queue. The underlying implementation is actually the queue from before.
enqueue(item) - adds an item onto the back of the queue and if it would push it over the max length, it dequeues the item on the front (it does not return this item to you)
all the standard methods from the queue class are inherited for this class
The Other Structures¶
The most useful of the other structures is actually very similar to the “MaxLengthQueue”. It’s the “RingBuffer”, which is essentially a MaxLengthQueue which fills up to its maximum size and then circularly replaces the old contents as new entries are added instead of dequeuing. It features the following:
RingBuffer(size, seq=()) - as with the MaxLengthQueue you specify the size of the RingBuffer and optionally give it a sequence.
append(item) - adds item to the end of the buffer, pushing out an item from the front if necessary
reset() - empties out the buffer entirely
resize(i) - shrinks/expands the RingBuffer to the size provided
extend(seq) - append the items from the provided sequence onto the end of the RingBuffer
The next data structure is the TwoWayDictionary, which as the name implies is a dictionary in which key-value pairs have mappings going both directions. It features the following:
TwoWayDictionary(seq=(), **kwargs) - Takes an optional sequence of (key, value) pairs as well as any key=value pairs specified in the constructor as initial values for the two-way dict.
other than that, no extra features that a normal Python dict doesn’t already offer with the exception that any (key, val) pair added to the dict is also added as (val, key) as well, so the mapping goes both ways. Elements are still accessed the same way you always do with Python ‘dict’s.
There is also a MultiSet class available, but it’s very unlikely that it will serve your purpose, so I won’t go into it here. The curious coder can go check the source and see what it’s all about if they wish (it’s only used once in our code, in the Relay plugin).
web.py¶
The web portion of Supybot’s utils module is mainly used for retrieving data from websites but it also has some utility functions pertaining to HTML and email text as well. The functions in web are listed below, once again in order of usefulness.
getUrl(url, size=None, headers=None) - gets the data at the URL provided and returns it as one large string
url: the location of the data to be retrieved or a urllib2.Request object to be used in the retrieval
size: the maximum number of bytes to retrieve, defaults to None, meaning that it is to try to retrieve all data
headers: a dictionary mapping header types to header data
getUrlFd(url, headers=None) - returns a file-like object for a url
url: the location of the data to be retrieved or a urllib2.Request object to be used in the retrieval
headers: a dictionary mapping header types to header data
htmlToText(s, tagReplace=” “) - strips out all tags in a string of HTML, replacing them with the specified character
s: the HTML text to strip the tags out of
tagReplace: the string to replace tags with
strError(e) - pretty-printer for web exceptions, returns a descriptive string given a web-related exception
e: the exception to pretty-print
mungeEmail(s) - a naive e-mail obfuscation function, replaces “@” with “AT” and “.” with “DOT”
s: the e-mail address to obfuscate
getDomain(url) - returns the domain of a URL - url: the URL in question
The Best of the Rest¶
Intro¶
Rather than document each of the remaining portions of the supybot.utils module, I’ve elected to just pick out the choice bits from specific parts and document those instead. Here they are, broken out by module name.
supybot.utils.file - file utilities¶
touch(filename) - updates the access time of a file by opening it for writing and immediately closing it
mktemp(suffix=””) - creates a decent random string, suitable for a temporary filename with the given suffix, if provided
the AtomicFile class - used for files that need to be atomically written, i.e., if there’s a failure the original file remains unmodified. For more info consult file.py in src/utils
supybot.utils.gen - general utilities¶
timeElapsed(elapsed, [lots of optional args]) - given the number of seconds elapsed, returns a string with the English description of the amount of time passed, consult gen.py in src/utils for the exact argument list and documentation if you feel you could use this function.
exnToString(e) - improved exception-to-string function. Provides nicer output than a simple str(e).
InsensitivePreservingDict class - a dict class that is case-insensitive when accessing keys
supybot.utils.iter - iterable utilities¶
len(iterable) - returns the length of a given iterable
groupby(key, iterable) - equivalent to the itertools.groupby function available as of Python 2.4. Provided for backwards compatibility.
any(p, iterable) - Returns true if any element in the iterable satisfies the predicate p
all(p, iterable) - Returns true if all elements in the iterable satisfy the predicate p
choice(iterable) - Returns a random element from the iterable
supybot.dynamicScope / dynamic - accessing variables in the stack¶
This feature is not in supybot.utils but still deserves to be documented as a utility.
Althrough you should avoid using this feature as long as you can, it is sometimes necessary to access variables the Supybot API does not provide you.
For instance, the Aka plugin provides per-channel aliases by overriding getCommandMethod. However, the channel where the command is called is not passed to this functions, so when writing Aka I could either add this parameter (and thus break all plugins all plugins already overriding this method) or use this hack. I choosed this hack.
How does it work? This is quite simple: dynamic.channel
is a shortcut
for supybot.dynamicScope.DynamicScope.__getattr__('channel')
, which
browse the call stack backwards, looking for a variable named channel
,
and then returns is as far as it finds it (and returns None
if there
is no such variale).
Note that you don’t have to import dynamicScope
, the dynamic
object
is automatically set as a global variable when Supybot starts.