HTTP ↔ XMPP - PubSub gateway

XEP-0060 defines a Publish-Subscribe extension to the XMPP protocol. Ralph Meijer has written a gateway to interface to the XMPP world via HTTP RPC: Idavoll implements a Jabber publish-subscribe service component, as defined in XEP-0060, aiming to be fully compliant and mostly complete.

I use idavoll for the DokuWiki pubsub plugin and started this wiki page to collect resources during development.

Background Information

The basic concept works as follows:

  1. An entity publishes information to a node at a publish-subscribe service.
  2. The pubsub service pushes a notification to all entities that are authorized to learn about the published information.

Here: the entities are web-servers running a CMS (wiki, repositories etc); the nodes are XML data-containers.

The idavoll gateway accepts Atom-Entries (and Atom-Feeds) as payload1). The content inside the Atom can be any [XML-encoded] data.

Each node has a unique identifier - the NodeID or xmpp-uri - which is generated when publishing an item for the first time.

A document made up of multiple entries can be represented as a feed aka. collection. XEP-0060 defines a collection node as «A type of node that contains nodes and/or other collections but no published items. Collections make it possible to represent hierarchial node structures».

Luckily idavoll hides the gory details, and offers a neat way to “tunnel and distribute” HTTP requests. It runs along with a underlying jabber daemon next to the webserver(s).

This document is a howto use the HTTP-RPC interface of idavoll, follow me walking-through using the interface..

Detailed documentation is available at

getting started

see idavoll's dependencies and installation instructions.

sudo apt-get install python-pkg-resources python-twisted-web2 python-twisted-conch python-twisted-words python-simplejson
sudo apt-get install jabberd2
#sudo easy install wokkel
#sudo easy install idavoll
sudo easy_install https://svn.ik.nu/wokkel/trunk/wokkel
sudo easy_install https://svn.ik.nu/idavoll/trunk
cd /tmp
sudo twistd idavoll-http --jid=localhost
tail -f twistd.log 
sudo true && sudo cat twistd.pid  | sudo xargs kill

Quick test:

$curl http://localhost:8086/list
[""]
$curl http://localhost:8086/publish
<html><head><title>405 Method Not Allowed</title></head><body><h1>Method Not Allowed</h1>The requested method GET is not supported by /publish.</body></html>

the idavoll HTTP interface

idavoll listens for HTTP requests per default on port 8086. Unless noted otherwise requests must be HTTP POSTs. Parameters need to be url-encoded and added as query-string to the request URL. Data (if present) must sent directly after the HTTP Header without waiting for a 100 continue.

work in progress - pending review.

publish

Name /publish
Parameters uri (string, optional) xmpp-uri
Data Atom entry to publish. The Content-type needs to be application/atom+xml;type=entry;charset=utf-8.
If there is no content, a node-deletion is published.
Returns JSON encoded xmpp-uri of the published item.
Description publish new content; if no xmpp-uri given create a new one.

subscribe

Name /subscribe
Parameters none
Data JSON encoded array of content-type application/json
$data['callback']='<callback-url>';
$data['uri']='<xmpp-uri>';
Returns no-content - HTTP status-code 204 on success.
Description subscribe to updates of an xmpp-node.
The gateway will send HTTP-POST requests to the specified callback-url. see notification handling below.

unsubscribe

Name /unsubscribe
Parameters none
Data JSON encoded array of content-type application/json
$data['callback']='<callback-url>';
$data['uri']='<xmpp-uri>';
Returns no-content - HTTP status-code 204 on success.
Description unsubscribe from receiving notifications.
use the same parameters as for the subscribe request.

delete

Name /delete
Parameters uri (string) xmpp-uri
Data none
content-type must be set toapplication/atom+xml;type=entry;charset=utf-8
Returns JSON encoded array of known publications.
Description same as publishing empty content.

items

Name /items
Parameters uri (string) xmpp-uri of feed
Data none
Returns JSON encoded array of entries in the feed.
Description query entries in a collection.

list

Name /list - HTTP-GET
Parameters none
Data none
Returns JSON encoded array of known publications.
Description handy to test, debug or just see what's going on..

notification handling

After successful subscription and for each published update to an xmpp node, idavoll notifies the subscriber via HTTP POST.

Here's an example in PHP to catch those and dump them. Idavoll ignores any replied data, but automatically unsubscribes if it receives a non 2xx HTTP status code2).

The registered callback uri from the subscription request is used as given. Idavoll adds the originating XMPP URI to the HTTP_REFERER header (as to not interfere with GET-parameters) and POSTs the Atom-Entry as received via XMPP.

<?php
$PSlogfile= '/tmp/PubSub.debug';
 
error_log('CALLBACK '.date("c ").'request: '.print_r($_REQUEST, true)."\n", 3, $PSlogfile);
notify_handler();
 
function notify_handler() {
    global $ip_whitelist;
    global $PSlogfile;
 
    $originating_xmpp_url = $_SERVER['HTTP_REFERER'];
    error_log('CALLBACK '.date("c ").'xmppurl: '.print_r($originating_xmpp_url,true)."\n", 3, $PSlogfile);
 
    $xmpp_payload = '';
    $fh = fopen('php://input', 'r');
    if ($fh) {
        while (!feof($fh)) {
            $s = fread($fh, 1024);
            if (is_string($s))
                $xmpp_payload .= $s;
        }
        fclose($fh);
    }
    error_log('CALLBACK '.date("c ").'data-len:'.strlen($xmpp_payload)."\n", 3, $PSlogfile);
    if (empty($xmpp_payload)) {
        # remove subscription and delete page
        return;
    }
    # do sth with the $xmpp_payload - parse it as ATOM
}

further notes

Idavoll includes support to discover and subscribe to feeds. It automatically creates a collection node for a given RSS or Atom and starts polling the feed, publishes changes. read more about this feature: http://idavoll.ik.nu/wiki/FeedAggregation

What's missing?
idavoll, wokkel, twisted and jabberd provide a method plus infrastructure for communication and put some constraints on the language. You're to supply the user-interface ;)

Publishing is easy. Managing subscriptions is not.. - the latter includes both user/admin wishes to subscribe to content as well as handle data-updates, format-translations etc. YMMV.

A first CMS using idavoll for sharing is anyMeta. Being a well administrated commercial CMS, anyMeta makes use of authoritative source and read-only mirrors which are subscribed to the authoritative source. In the spirit of data-portability these authoritative sources may move and hop to a different server; reversing the role of publisher and subscriber or even be removed. subscribers may also choose if to keep or delete content if the source gets unavailable…

Updates to a node are atomic transaction in the XMPP world. A later publication to a node invalidates the previous one. The jabber server responsible for the node is the authoritative source for the content. This mechanism does allow to build a distributed lock or coherent cache model: fi. one could publish drafts-documents to mark the beginning of a edit-cycle that ends in a re-publication. If an entity receives a Draft it can perform a context appropriate action like locking the page, merging content or reporting a conflict to the user either local or remote (send back a newer Draft).

Furthermore subscribers can implement automatic-actions such as retire subscriptions if the authoritative source gets unavailable or aggregate additional information that is not part of the publication itself. - Subscribers can also choose to act differently depending on content-signature or authentication credentials (eg oAuth over XMPP)..

1) Atom is a common denominator for the XMPP/XMLstream and HTTP/XHTML
2) need to verify this
 
wiki/idavoll.txt · Last modified: 01.01.2009 04:48 by rgareus