ITX

Introduction

ITX allows access to the rich information available over Teletext from any computer accessing the World Wide Web. The project has two main parts, a Java applet running in a web browser, and a number of distributed servers, providing the back-end support. Javaless clients are supported through server generated pages.

Server

Overview

Diagram of server architecture

From the outset, the design of the server-side of the Teletext system was intended to be as modular as possible. The system will be scalable to whatever extent required in many different situations and server loads. Modules have been be created under the one umbrella of the 'server', although there are in fact multiple servers running, potentially on multiple machines, performing different tasks.

ITX Server

The ITX Server is at the heart of the server-side of the project. It is responsible for co-ordinating requests coming in from the Javaless Generation scripts and the HTTP Proxy, returning UDP packets containing the relevant data, as described in Interface to Database. It is the job of the ITX Server to formulate requests to the Database Server to handle all page requests and user preference changes.

Page Server

Design Decisions

From the outset there were two hardware devices that could be used for the decoding of the raw Teletext feed - the external teletext decoder or the internal PCI Hauppauge WinTV card. There were advantages and disadvantages to using each of the cards, but the decision was finally taken to use the WinTV card for the following reasons:

It allows direct access to the data feed
The external decoder allows requests to be sent to it to fetch a specific page from the teletext feed. However, it does not allow access to the raw VBI (vertical blanking interval) feed that the WinTV card does. The enables the page server to update a page in the database immediately a new version of it is received rather than having to request that page, where the device would block awaiting the transmission of the page.
Economy and value
The software will work with any bt848 chipset TV card, so is more flexible. From a financial aspect, the WinTV card is considerably cheaper than the external card.

The decoding software runs under the Linux operating system using the Video4Linux device driver suite. This means that, although the software has been specifically written for the Hauppauge WinTV card, it can theoretically be used with any device supported by Video4Linux that allows access to the Vertical Blanking Interval feed assuming that the drivers are all written in the intended way to allow transparency between devices.

The decision to write the decoding software in C was taken because it allows the blend of high- and low-level instructions that enables the software to access the raw VBI feed. For the same reason, Linux was chosen as the operating system to run the server on. If, for example, Windows, had been chosen, the indications were that the only method available for data extraction was Dynamic Data Exchange (DDE), which has severe limitations on the functionality of the software, as well as reliance on the reliability of closed third-party software over which the group has no control.

Implementation

The server opens the stream device file /dev/vbi which is provided by the bt848 driver as mentioned previously. The body of the program consists of reading in blocks of the VBI data and passing it on to the decoder, whose function is to strip out relevant information and send an update request to the database server.

In order to decode the stream of data, the program must first synchronise to the start of a Teletext data packet. Details of run-in data can be found in the Teletext Specification, but in essence the start of a packet is denoted by a clock run-in of two bytes consisting of alternating 1's and 0's, followed by a byte of framing code, which is 0xe4 hex. Unfortunately the WinTV card does not synchronise the bytes received from the television signal to the byte-representation that is passed to the stream input. Thus, the bit-stream must be re-aligned on receiving the clock run-in and the framing code so that the beginning of a byte in the array representation is the same as that as that according to the synchronisation bits.

Once the bit-stream is synchronised the page information is de-hammed (again, please refer to the Teletext Specification for information on the specifics of this). The body of the teletext page is then read in, and this constructed into a packet to send to the database server with the instruction to update the page with the one transmitted in the message.

The server also switches channels according to a specified heuristic in order to update all channels it is required to. As per the distributed nature of the system, it may not be all the available channels but a subset of them or even just one, leaving the task of updating other channels to another decoder.

Database Structure

These are the schema for the databases to store user data and also the teletext data.

TableColumnComments
Userpageslogin**string
page*hex-int

This table stores the user's list of bookmarks, or favourites. When a user decides to add a bookmark, a new row is added to the table with the user's login and that page number.

TableColumnComments
Textdatachannel*dec-int
pagenumber*hex-int
subpagenumber*dec-int
FTlink1hex-int
FTlink2hex-int
FTlink3hex-int
FTlink4hex-int
FTindexlinkhex-int
databinary 40 bytes × number of rows
flagsbinary
updatetimetime :- the time it was last updated
updatedbeforetime :- the time before last

The Textdata table is the main part of the database. It holds the cache of teletext pages. The pages are held as individual subpages in each row. The primary key is the combination of channel, page number and subpage number. Fastext links are stored as separate columns. The data is stored in binary format.

TableColumnComments
Userslogin*string
namestring
passwordencrypted string
(Home-page)hex-int
(Floating Toolbar)boolean

This table holds the users' login information. It will also hold a user's preferences, for various settings in the applet. For example, whether the user wants a floating toolbar, and what the user's "home page" or start page is would be stored here.

** = foreign key * = primary key

Interface to the Database

The database interfaces with both the ITX Server and the Page Server. Every time it receives a valid page, the Page Server sends a UDP packet containing the following data encoded suitably:

The Database Server receives the UDP packet, and extracts the data from the structure. It uses the C MySQL API to update the database using the API's mysql_real_query() function. This takes as an argument, a string which is the SQL query formed from the received data.

When the client requests a page, it does so by an HTTP GET request to a proxy. The proxy sends UDP packets to the ITX Server. These contain either a request for a page, or user related data, such as "add bookmark", "remove bookmark", or a validation request for a user's password. The ITX server, in turn, forwards the requests to the Database Server.

The page request will cause a SELECT query to be formed and the database returns the data to the ITX Server, packaged up in another UDP packet.

Proxy over HTTP

There is inherent security built-in to Java which enforces certain restrictions on the communications that a Java applet can initiate. It can only open a connection to the Web server that the applet itself was downloaded from. This could potentially limit the distributability of the system, limiting the Teletext server to reside on the same machine as the web server. In order to overcome this obstacle it was decided to use a proxy server on the same machine as the web server to forward requests from the client on to the Teletext Server.

One implementation of this would involve running a service on a port of the server that the Java client connects to. This would enable the client to only communicate with the server, as the Java security model dictates, but the web server to pass requests on to the Teletext Server. However, by running a service on the server, it must require the trust of the web server administrators and possibly super-user access to the web server. Ideally this is not desired as it could be difficult to convince a security-conscious system administrator to run a somewhat unknown service on their web server.

A more pleasing solution would be to incorporate the proxy server within the services already running on the server. The obvious candidate for this is using HTTP itself. This has the added advantage that by using a well-know port it is more accessible the world at large, even those behind corporate and other firewalls, as opposed to an obscure port that may well be blocked by a firewall policy.

Therefore, a proxy server will be written in a easily-ported and supported scripting language such as Perl so as not to restrict the web server to a particular operating system. The proxy script is placed in a pre-defined location on the web server, so the Java client can open an HTTP connection to, say, http://teign.doc.ic.ac.uk/cgi-bin/proxy.pl, and be given a teletext page in return. A system administrator will generally be much more willing to allow a user to run a program such as the proxy as opposed to a daemon running constantly in the background.

User Preferences

A user can be identified from one session to the next by storing a cookie on their browser. Where people refuse cookies, or where an individual wants their preferences carried between different browsers or machines, they can log in, using a username and password. Once the user is identified, a request can be made of the database server for that user's preferences. This may include various options within the applet, such as whether the toolbar is docked on the left, the right or left floating. It will also give access to persistent bookmarks for that particular user.

Javaless Generation

Using a set of scripts, running on the web server, reasonable functionality can be provided on clients where Java cannot run (due to the absence of a suitable virtual machine, or security policies in a particular organisation). The Javaless functionality is provided by creating a server-side image of the appropriate teletext page, together with a client-side image map which indicates hotspots on the image. So, clickable page numbers, URLs and email addresses are all available in the Javaless version. All options are passed in as parameters to the script, which generates the appropriate page, after communicating with the ITX server via UDP. The Javaless version will suffer a performance penalty in that no pre-processing will be possible, while the user is reading one page, it will not be possible to be actively fetching speculative future pages. The system will benefit, however, from any web caching transparently, and the pages returned can be set with the appropriate expiry header, so that well-behaved caches will avoid unnecessary requests.

Client

Overview

Diagram of client architecture

The client is an Java applet running in the browser. It communicates with the ITX server over HTTP via a proxy running on the web server the applet was downloaded from. This conforms to the security model for a generic applet, and by communicating over HTTP, will generally work across corporate firewalls (see Proxy over HTTP for more details).

Java was chosen over alternative methods because it provides the possibility of rich functionality on a large proportion of available browsers. ITX also offers an HTML-based Javaless version which is less fully-featured but that will work everywhere. Given this fall-back option, it was decided that it was reasonable to use a recent Java version (1.2), providing the most functional possible User Interface in the applet version.

The advantage to using extra threads for the Grabber and Parser is that the client could be servicing requests in the background. If a connection has been set up and appropriate data has been fetched, it is not necessarily appropriate to drop everything because the user has decided to go to another page. They may well return to the page first requested, or one of its related pages. The architecture we have could even be extended to automatically, when a user logs in, start downloading their most visited pages, or their bookmarks. It is the job of the request handler to kill these threads if they are significantly harming performance of the most recent request.

These processes can clearly run concurrently, for instance if a page has only been partially received so far then the part that is available could be displayed immediately instead of having to wait for the entire batch of pages to transfer across the network.

User Interface

This is possibly the most important part of this project because no matter how clever the server is, if the user can't see the results and get access to the features they require, the rest is useless.

The user interface will be written in Java so that an applet can be used to display Teletext in a capable browser. A non-Java version will be provided separately so that Teletext will be available from as many web browsers as possible - see Javaless Generation.

The general philosophy behind the UI is that it should be a web enhanced TV experience. Users who are comfortable with Teletext on a TV should be comfortable with Teletext on the web, but there will be additional features such as the clickable URLs and page numbers that will behave as an Internet user might expect. A remote control will be provided which has at least access to numbers and common features such as next and previous page and subpage. More advanced features such as the saving of a page and user preferences will be provided on a popup menu. This is to prevent a novice user from feeling overwhelmed by lots of tiny icons as happens with many TV remote controls. It will be possible to dock the remote on any edge of the applet or to have it in a separate frame. This allows the advanced user greater flexibility.

An initial user interface has been constructed which demonstrates the functionality of the floating and docking toolbars.

ITX Applet
ITX Floating Toolbar

As part of ensuring that the applet behaves similarly to a television the user will be able to type any three digit number and it will be translated as a page number with out a page number text field specifically being selected. In the same way a television behaves, if the user enters 4 digits the first three will be discarded and the fourth digit will be considered the first digit of a new page number.

To provide feedback when the user moves the pointer over a link, the applet will follow the standard behaviour in most web browsers and turn the pointer to a hand. The applet will also display where the link points to in the status bar - displaying the page number, URL or e-mail address. An alternative would be to highlight the link by inverting background and foreground colours. It was felt that this was not following the behaviour users would expect in a browser, it also doesn't keep teletext looking like teletext.

All interactive parts of the user interface will have tooltips to enable novice users to explore the applet rather than having to read many help pages. Hotkeys will also be provided to enable quick access to features.

Rather than using a number to represent the current channel number it is proposed to use the logos of the television stations which will be more intuitive than another number on the remote. An added advantage is that satellite and cable channels do not have a standard channel number associated with them which is common to all users, the logos are a standard identification.

The user interface will send a request to the request handler for a specific page and sub page. The renderer will return the specified page in the form of an array of up to 4 graphics. These graphics can then display the normal, flashing, reveal and flashing & reveal versions of the page which will be cycled or switched as required.

Request Handler

Sequence of events:

The request handler is responsible for the cache and also to ensure that if performance is poor that extra Grabber and Parser threads are killed.

Grabber

A Grabber thread is created to make a particular HTTP request to the ITX server. A request asks for a particular page on a particular channel. In response it receives that page, the associated fastext pages, index page and other potentially useful pages (for example the pages n+1 and n-1). For each of those pages it will receive any subpages that they consist of.

The incoming information stream is spilt into subpages and structured as a GrabbedSubPage. This includes converting the expiry time provided by the server, which is the number of seconds still to live, into a local time.

public class GrabbedSubPage{
public int magazine, page, subpage, subpageCount, rowcount;
        //row count is the actual number of rows on this page
        
        public Time expires;
        public int redlink, greenlink, yellowlink, bluelink, indexlink;
        
        public Row rows[];
        //rows should be between 0 and 25, 0 is the header
        //A row is a row number and 40 bytes, one for each of the character in the line.
}

Once grabbed the subpage is placed in the shared data structure for the parser.

Parser

The parser gets a page structure from the shared data structure.

The page is parsed into eight lists of Teletext strings, one for each combination of flashing/static, doubleheight/normalheight, hidden/visible. This is so that the basic page can be drawn and then flashing or reveal parts may be rendered on top. Each Teletext string is text in one colour, font, size and visibility.

private class TeletextString {
        int x,y;
        Color fgColour,bgColour;
        String text;
        Font textFont; //normally teletext, graphics or separated graphics
        
        void TeletextString(int xcoord, int ycoord, Color stringcol, Color backcol, int ptextFont) {
                x=xcoord;
                y=ycoord;
                fgColour=stringcol;
                bgColour=backcol;
                textFont=ptextFont;
        }
        
        void setText(String ptext) {
                text=ptext;
        }
                                 
}

The parsing is done character by character. Each non-control character is added to the current string. All control codes cause the current string to finish and call the setText method of TeletextString. A new Teletext string is then created and the current string set to the empty string. If a character which might indicate a link is detected, either a number, a full stop or an @, then when the next space is reached the previous word is analysed. If it is a valid link it is added to the list of hot spots.

The ParsedSubPage is then passed to the Request Handler and the parser starts on the next subpage. When the last page is parsed the thread dies.

public class ParsedSubPage{
public int magazine, page, subpage,rowcount;
        //row count is the actual number of rows on this page
        Time expires;   
        Boolean containsFlashing, containsHidden;
        public int redlink, greenlink, yellowlink, bluelink,indexlink;
        
        private TeletextStringList textStrings[][][]=new TeletextStringList()[1][1][1];
 
	//reveal,flashing,doubleheight
        public Hotlist hotlist=new Hotlist();
}

Renderer

The renderer is concerned with providing a tool to draw an accurate representation of Teletext to the screen. There are two logical parts to the renderer, one draws the individual strings and the other pieces many strings together. Rendering individual strings will be discussed after this.

The ParsedSubPage format will be passed to the renderer, which will first render the basic page. The renderer will run down the list, asking the string renderer for string images and then drawing then at the required co-ordinates. The basic page will be then passed to additional methods if required which draw on the flashing, hidden or hidden and flashing parts.

Rendering Individual Strings with Graphic Font

After initial investigations it was decided to develop an initial renderer which would use the Graphic Font class, originally written by Kevin Hughes. A font representing all the text and punctuation, together with the contiguous graphic characters was drawn up to facilitate early testing.

Initial trials with Graphic Font showed a number of problems

1 2
4 8
16 64

Rendering Individual Strings with New Text Renderer

To address many of the problems outlined above a new text renderer is planned. This will construct images which can be scaled as with Graphic Font but will encode the font in a simpler structure and create the graphics characters from the codes directly. A font will consist of three integers, the width of a character, the height of a character and the number of characters as well as a collection of 2 dimensional arrays to represent each character. Depending on how each letter representation is stored it may also be necessary to store the character value. Storing the group of letters into an array requires the array to be sized to the highest character code used and array slots will be wasted where there is no character. The advantage of an array is the easy access. The other alternative is to use a balanced tree and encode the character value as part of each letter. At the moment the relative size and speed of these implementations is unknown and will be investigated during implementation.

To render a string, the array representation of each character can be copied into one large array representing the string and then converted to an image using Java's image handling classes and then scaled to the required size in pixels.