class String

Extensions to the String class

TODO make riphtml() just call ircify_html() with stronger purify options.

We start by extending the String class with some IRC-specific methods

Extension for String class.

String#% method which accept “named argument”. The translator can know the meaning of the msgids using “named argument” instead of %s/%d style.

Public Instance Methods

%(arg) click to toggle source
%(hash)

Format - Uses str as a format specification, and returns the result of applying it to arg. If the format specification contains more than one substitution, then arg must be an Array containing the values to be substituted. See Kernel::sprintf for details of the format string. This is the default behavior of the String class.

  • arg: an Array or other class except Hash.

  • Returns: formatted String

(e.g.) "%s, %s" % ["Masao", "Mutoh"]

Also you can use a Hash as the “named argument”. This is recommanded way for Ruby-GetText because the translators can understand the meanings of the msgids easily.

  • hash: {:key1 => value1, :key2 => value2, … }

  • Returns: formatted String

(e.g.) "%{firstname}, %{familyname}" % {:firstname => "Masao", :familyname => "Mutoh"}
# File lib/rbot/load-gettext.rb, line 149
def %(args)
  if args.kind_of?(Hash)
    ret = dup
    args.each {|key, value|
      ret.gsub!(/\%\{#{key}\}/, value.to_s)
    }
    ret
  else
    ret = gsub(/%\{/, '%%{')
    begin
ret._old_format_m(args)
    rescue ArgumentError
$stderr.puts "  The string:#{ret}"
$stderr.puts "  args:#{args.inspect}"
    end
  end
end
Also aliased as: _old_format_m
_old_format_m(args)
Alias for: %
get_html_title() click to toggle source

This method tries to find an HTML title in the string, and returns it if found

# File lib/rbot/core/utils/extends.rb, line 338
def get_html_title
  if defined? ::Hpricot
    Hpricot(self).at("title").inner_html
  else
    return unless Irc::Utils::TITLE_REGEX.match(self)
    $1
  end
end
has_irc_glob?() click to toggle source

This method checks if the receiver contains IRC glob characters

IRC has a very primitive concept of globs: a * stands for “any number of arbitrary characters”, a ? stands for “one and exactly one arbitrary character”. These characters can be escaped by prefixing them with a slash (\).

A known limitation of this glob syntax is that there is no way to escape the escape character itself, so it’s not possible to build a glob pattern where the escape character precedes a glob.

# File lib/rbot/irc.rb, line 332
def has_irc_glob?
  self =~ /^[*?]|[^\\][*?]/
end
irc_downcase(casemap='rfc1459') click to toggle source

This method returns a string which is the downcased version of the receiver, according to the given casemap

# File lib/rbot/irc.rb, line 289
def irc_downcase(casemap='rfc1459')
  cmap = casemap.to_irc_casemap
  self.tr(cmap.upper, cmap.lower)
end
irc_downcase!(casemap='rfc1459') click to toggle source

This is the same as the above, except that the string is altered in place

See also the discussion about irc_downcase

# File lib/rbot/irc.rb, line 298
def irc_downcase!(casemap='rfc1459')
  cmap = casemap.to_irc_casemap
  self.tr!(cmap.upper, cmap.lower)
end
irc_send_penalty() click to toggle source

Calculate the penalty which will be assigned to this message by the IRCd

# File lib/rbot/ircsocket.rb, line 14
def irc_send_penalty
  # According to eggdrop, the initial penalty is
  penalty = 1 + self.size/100
  # on everything but UnderNET where it's
  # penalty = 2 + self.size/120

  cmd, pars = self.split($;,2)
  debug "cmd: #{cmd}, pars: #{pars.inspect}"
  case cmd.to_sym
  when :KICK
    chan, nick, msg = pars.split
    chan = chan.split(',')
    nick = nick.split(',')
    penalty += nick.size
    penalty *= chan.size
  when :MODE
    chan, modes, argument = pars.split
    extra = 0
    if modes
      extra = 1
      if argument
        extra += modes.split(/\+|-/).size
      else
        extra += 3 * modes.split(/\+|-/).size
      end
    end
    if argument
      extra += 2 * argument.split.size
    end
    penalty += extra * chan.split.size
  when :TOPIC
    penalty += 1
    penalty += 2 unless pars.split.size < 2
  when :PRIVMSG, :NOTICE
    dests = pars.split($;,2).first
    penalty += dests.split(',').size
  when :WHO
    args = pars.split
    if args.length > 0
      penalty += args.inject(0){ |sum,x| sum += ((x.length > 4) ? 3 : 5) }
    else
      penalty += 10
    end
  when :PART
    penalty += 4
  when :AWAY, :JOIN, :VERSION, :TIME, :TRACE, :WHOIS, :DNS
    penalty += 2
  when :INVITE, :NICK
    penalty += 3
  when :ISON
    penalty += 1
  else # Unknown messages
    penalty += 1
  end
  if penalty > 99
    debug "Wow, more than 99 secs of penalty!"
    penalty = 99
  end
  if penalty < 2
    debug "Wow, less than 2 secs of penalty!"
    penalty = 2
  end
  debug "penalty: #{penalty}"
  return penalty
end
irc_upcase(casemap='rfc1459') click to toggle source

Upcasing functions are provided too

See also the discussion about irc_downcase

# File lib/rbot/irc.rb, line 307
def irc_upcase(casemap='rfc1459')
  cmap = casemap.to_irc_casemap
  self.tr(cmap.lower, cmap.upper)
end
irc_upcase!(casemap='rfc1459') click to toggle source

In-place upcasing

See also the discussion about irc_downcase

# File lib/rbot/irc.rb, line 316
def irc_upcase!(casemap='rfc1459')
  cmap = casemap.to_irc_casemap
  self.tr!(cmap.lower, cmap.upper)
end
ircify_html(opts={}) click to toggle source

This method will return a purified version of the receiver, with all HTML stripped off and some of it converted to IRC formatting

# File lib/rbot/core/utils/extends.rb, line 214
def ircify_html(opts={})
  txt = self.dup

  # remove scripts
  txt.gsub!(/<script(?:\s+[^>]*)?>.*?<\/script>/im, "")

  # remove styles
  txt.gsub!(/<style(?:\s+[^>]*)?>.*?<\/style>/im, "")

  # bold and strong -> bold
  txt.gsub!(/<\/?(?:b|strong)(?:\s+[^>]*)?>/im, "#{Bold}")

  # italic, emphasis and underline -> underline
  txt.gsub!(/<\/?(?:i|em|u)(?:\s+[^>]*)?>/im, "#{Underline}")

  ## This would be a nice addition, but the results are horrible
  ## Maybe make it configurable?
  # txt.gsub!(/<\/?a( [^>]*)?>/, "#{Reverse}")
  case val = opts[:a_href]
  when Reverse, Bold, Underline
    txt.gsub!(/<(?:\/a\s*|a (?:[^>]*\s+)?href\s*=\s*(?:[^>]*\s*)?)>/, val)
  when :link_out
    # Not good for nested links, but the best we can do without something like hpricot
    txt.gsub!(/<a (?:[^>]*\s+)?href\s*=\s*(?:([^"'>][^\s>]*)\s+|"((?:[^"]|\\")*)"|'((?:[^']|\\')*)')(?:[^>]*\s+)?>(.*?)<\/a>/) { |match|
      debug match
      debug [$1, $2, $3, $4].inspect
      link = $1 || $2 || $3
      str = $4
      str + ": " + link
    }
  else
    warning "unknown :a_href option #{val} passed to ircify_html" if val
  end

  # If opts[:img] is defined, it should be a String. Each image
  # will be replaced by the string itself, replacing occurrences of
  # %{alt} %{dimensions} and %{src} with the alt text, image dimensions
  # and URL
  if val = opts[:img]
    if val.kind_of? String
      txt.gsub!(/<img\s+(.*?)\s*\/?>/) do |imgtag|
        attrs = Hash.new
        imgtag.scan(/([[:alpha:]]+)\s*=\s*(['"])?(.*?)\2/) do |key, quote, value|
          k = key.downcase.intern rescue 'junk'
          attrs[k] = value
        end
        attrs[:alt] ||= attrs[:title]
        attrs[:width] ||= '...'
        attrs[:height] ||= '...'
        attrs[:dimensions] ||= "#{attrs[:width]}x#{attrs[:height]}"
        val % attrs
      end
    else
      warning ":img option is not a string"
    end
  end

  # Paragraph and br tags are converted to whitespace
  txt.gsub!(/<\/?(p|br)(?:\s+[^>]*)?\s*\/?\s*>/i, ' ')
  txt.gsub!("\n", ' ')
  txt.gsub!("\r", ' ')

  # Superscripts and subscripts are turned into ^{...} and _{...}
  # where the {} are omitted for single characters
  txt.gsub!(/<sup>(.*?)<\/sup>/, '^{\1}')
  txt.gsub!(/<sub>(.*?)<\/sub>/, '_{\1}')
  txt.gsub!(/(^|_)\{(.)\}/, '\1\2')

  # List items are converted to *). We don't have special support for
  # nested or ordered lists.
  txt.gsub!(/<li>/, ' *) ')

  # All other tags are just removed
  txt.gsub!(/<[^>]+>/, '')

  # Convert HTML entities. We do it now to be able to handle stuff
  # such as &nbsp;
  txt = Utils.decode_html_entities(txt)

  # Keep unbreakable spaces or conver them to plain spaces?
  case val = opts[:nbsp]
  when :space, ' '
    txt.gsub!([160].pack('U'), ' ')
  else
    warning "unknown :nbsp option #{val} passed to ircify_html" if val
  end

  # Remove double formatting options, since they only waste bytes
  txt.gsub!(/#{Bold}(\s*)#{Bold}/, '\1')
  txt.gsub!(/#{Underline}(\s*)#{Underline}/, '\1')

  # Simplify whitespace that appears on both sides of a formatting option
  txt.gsub!(/\s+(#{Bold}|#{Underline})\s+/, ' \1')
  txt.sub!(/\s+(#{Bold}|#{Underline})\z/, '\1')
  txt.sub!(/\A(#{Bold}|#{Underline})\s+/, '\1')

  # And finally whitespace is squeezed
  txt.gsub!(/\s+/, ' ')
  txt.strip!

  if opts[:limit] && txt.size > opts[:limit]
    txt = txt.slice(0, opts[:limit]) + "#{Reverse}...#{Reverse}"
  end

  # Decode entities and strip whitespace
  return txt
end
ircify_html!(opts={}) click to toggle source

As above, but modify the receiver

# File lib/rbot/core/utils/extends.rb, line 324
def ircify_html!(opts={})
  old_hash = self.hash
  replace self.ircify_html(opts)
  return self unless self.hash == old_hash
end
ircify_html_title() click to toggle source

This method returns the IRC-formatted version of an HTML title found in the string

# File lib/rbot/core/utils/extends.rb, line 349
def ircify_html_title
  self.get_html_title.ircify_html rescue nil
end
riphtml() click to toggle source

This method will strip all HTML crud from the receiver

# File lib/rbot/core/utils/extends.rb, line 332
def riphtml
  self.gsub(/<[^>]+>/, '').gsub(/&amp;/,'&').gsub(/&quot;/,'"').gsub(/&lt;/,'<').gsub(/&gt;/,'>').gsub(/&ellip;/,'...').gsub(/&apos;/, "'").gsub("\n",'')
end
to_irc_auth_command() click to toggle source

Returns an Irc::Bot::Auth::Comand from the receiver

# File lib/rbot/botuser.rb, line 119
def to_irc_auth_command
  Irc::Bot::Auth::Command.new(self)
end
to_irc_casemap() click to toggle source

This method returns the Irc::Casemap whose name is the receiver

# File lib/rbot/irc.rb, line 275
def to_irc_casemap
  begin
    Irc::Casemap.get(self)
  rescue
    # raise TypeError, "Unkown Irc::Casemap #{self.inspect}"
    error "Unkown Irc::Casemap #{self.inspect} requested, defaulting to rfc1459"
    Irc::Casemap.get('rfc1459')
  end
end
to_irc_channel(opts={}) click to toggle source

We keep extending String, this time adding a method that converts a String into an Irc::Channel object

# File lib/rbot/irc.rb, line 1513
def to_irc_channel(opts={})
  Irc::Channel.new(self, opts)
end
to_irc_channel_topic() click to toggle source

Returns an Irc::Channel::Topic with self as text

# File lib/rbot/irc.rb, line 1318
def to_irc_channel_topic
  Irc::Channel::Topic.new(self)
end
to_irc_netmask(opts={}) click to toggle source

We keep extending String, this time adding a method that converts a String into an Irc::Netmask object

# File lib/rbot/irc.rb, line 915
def to_irc_netmask(opts={})
  Irc::Netmask.new(self, opts)
end
to_irc_regexp() click to toggle source

This method is used to convert the receiver into a Regular Expression that matches according to the IRC glob syntax

# File lib/rbot/irc.rb, line 339
def to_irc_regexp
  regmask = Regexp.escape(self)
  regmask.gsub!(/(\\\\)?\\[*?]/) { |m|
    case m
    when /\\(\\[*?])/
      $1
    when /\\\*/
      '.*'
    when /\\\?/
      '.'
    else
      raise "Unexpected match #{m} when converting #{self}"
    end
  }
  Regexp.new("^#{regmask}$")
end
to_irc_user(opts={}) click to toggle source

We keep extending String, this time adding a method that converts a String into an Irc::User object

# File lib/rbot/irc.rb, line 1108
def to_irc_user(opts={})
  Irc::User.new(self, opts)
end
wrap_nonempty(pre, post, opts={}) click to toggle source

This method is used to wrap a nonempty String by adding the prefix and postfix

# File lib/rbot/core/utils/extends.rb, line 355
def wrap_nonempty(pre, post, opts={})
  if self.empty?
    String.new
  else
    "#{pre}#{self}#{post}"
  end
end