Sunday, 27 May 2018

An analogue joystick and implementing an SPI controller

It's been a while since my last post. The main reason being a house move. On the plus side this is good for my projects as I now have lots more room to work on them. On the negative side, at least as far as my hobbies are concerned I've, somewhat unexpectedly, found myself doing some work for my old company.

I've found time to tacke a mini-project I've put off doing: building an analogue joystick and getting the MAXI09 to read its position.

Although I built the MAXI09 board more then two years ago, some of the hardware has not yet been tested. This includes the MCP3002 (PDF) SPI ADC IC, and the 9 pin joystick connector it is attached to - the connector dual purposed for Atari style digital joysticks, which of course have been well tested.

First up was building the joystick. The starting point was an XBox analog thumb stick out of a game controller. The thumb stick portion of the controller consists of two potentiometers arranged perpendicular to each other, with the stick itself having a button in its base such that the entire stick is attached to the button. It is possible to buy such a stick, presumably salvaged from game controller, either as a bare component or handily attached to a little PCB with headers for:

  1. Ground - one side of the potentiometers
  2. 5V - the other side of the pots
  3. X - the wiper on the first pot
  4. Y - the wiper on the second pot
  5. Fire - this is one side of a button integrated into the thumb stick, the other side of this button being attached to the ground pin
To make the joystick more comfortable to use and more sturdy, it has been enclosed in a small plastic project box. As it turned out, I used a separate fire button from a job lot of arcade-style buttons because the button within the stick is not, to me anyway, easy to operate.

A somewhat fuzzy picture of the completed mini-project:

And here's a picture of the insides of the joystick box:

After looking around for some multicore cable for the joystick, I settled on butchering a network cable for its CAT6 cable. This worked out quite well; the solid cable is easier to solder then stranded, but I still find "in air" soldering awkward and it took about half an hour to attach the 9 pin plug!

I did not bother to electrically test the joystick; instead I move right along to attaching it to the MAXI09 board. It was at that point (and at no time before) when I spent more then 30 seconds looking at the MCP3002 datasheet. Pretty quickly I spotted a problem: unlike the DS1305 (PDF); the Real Time Clock and CAT25256 (PDF); the 32KByte EEPROM, this SPI ADC uses a bit-orientated protocol. The RTC and the EEPROM send and receive commands and data in whole bytes, and never at the same time, whereas the ADC starts sending a response after just a few bits of command have been sent from the master. From the datasheet:

From this you can see that the IC will start presenting the captured value after receiving the 5 bit command sequence, which looks like this, in the order they appear on the wire:
  1. A "don't care" bit; the diagram is not very clear but the first bit recieived after the master asserts the /CS signal is a don't care.
  2. Start (always a 1)
  3. SGL/DIFF; sets single ended or differential mode - this application using single ended mode, which is enabled with a 1
  4. ODD/SIGN; in single ended mode this sets which input is used (X or Y in this application)
  5. MSBF; a 1 sets Most Significant Bit First
After sending one dummy bit, the IC will immediately start sending bit 9 (the MSB) of the ADC result. Since this behaviour will no doubt catch some people out operating this IC from a byte-orientated SPI controller, as typically found in an MCU, is covered in a special section of the datasheet where another diagram is provided:

This diagram illustrates how working with this bit-orientated behaviour is perfectly possible in a byte-centric SPI controller: as the command byte is sent out, a byte needs to be read back in. To obtain the value, bits can be squashed back together to produce the 10 bit result.

The problem for MAXI09OS is that the syswrite command does not work like this. It is (or was) not possible to read in a byte at the same time as sending a byte to a device. Normally there is no reason to need to do this, but SPI has this interesting behaviour that an IC can send data simultaneously to receiving it. Extending the SPI write sub to set the a register to the value obtained when shifting out the byte should have been straightforward enough.

Upon looking at the write code in the SPI driver in MAXI09 I hit a bit of a snag, or at least an annoyance: it wouldn't be possible to read in the byte in the write subroutine without completely rewriting it and making it extremely slow. This is because the write code efficiently uses both 8 bit accumulators. The only way to structure the write action to make it read in a byte at the same time would be to move the sending and receiving of a single bit into its own subroutine, making the code much slower then it is currently.

Because of wanting to see the ADCs working, I hit upon the idea of a quick hack: make the write call only send the top 5 bits. It would then be possible for the subsequent sysread calls to read the data generated without missing out most of the 10 bits of value data. Of course with this hack in place it would not be possible to use the SPI driver with either the RTC or the EEPROM ICs. This worked fine, which was a big relief: both the ADC was generating valid data, and the joystick itself was wired properly. However, I wont elaborate much on this because I pretty soon decided that, now rather then later, it was time to implement a proper SPI host controller in DISCo. This would make the sysread and syswrite subs inside the SPI driver completely trivial. Both would be the same, except that sysread always shifts out a zero byte.

I looked at two different, but related, SPI host controllers for inspiration.

The first was 65SPI by Daryl Rictor and the second was 65SPI/B by AndrĂ© Fachat. Both are similar and each has its pros and cons. I prefer the 65SPI/B overall as it has interrupt routing, a handy feature for propagating, say, Real Time Clock ticks through without consuming interrupt lines further up the chain. Conversely the 65SPI/B only supports four device instead of the 65SPI's eight.

Both devices share a bunch of other features:

  • Support for all four SPI modes (CPOL and CPHA)
  • Support for dividing a system clock signal down to produce the SPICLK signal
  • Status register containing wether a byte is currently being clocked out, and wether the received byte has been read by the host
  • Interrupt or polled modes
  • Fast send and receive; receive without sending and vice-versa
I have implemented the simplest possible SPI host controller. It lacks all the above features and contains only two registers, SPIIN and SPIOUT, as well as the already implemented SPISELECT used for selecting which IC should be the target. The implementation is so trivial I can include it here:

library IEEE;
use work.P_DISCO.ALL;

entity spishifter is
port ( SPIOUTREG : in T_BYTEREG; -- What we are shifting out
SPISELECTS : in T_BYTEREG; -- What device is selected
SPIINDATA : out STD_LOGIC_VECTOR (7 downto 0); -- What byte was received
E : in STD_LOGIC; -- Ticker
MISO : in STD_LOGIC_VECTOR (3 downto 0);  -- The inbound lines from each IC
MOSI : out STD_LOGIC; -- The outbound line
SCLK : out STD_LOGIC); -- The SPI clock line
end entity;

architecture behavioral of spishifter is
signal OUTSTATE : STD_LOGIC_VECTOR (7 downto 0); -- Currently shifting byte out
signal INSTATE : STD_LOGIC_VECTOR (7 downto 0); -- Ditto for in
signal SINGLEMISO : STD_LOGIC; -- The selected input
signal COUNTER : STD_LOGIC_VECTOR (2 downto 0); -- Byte shifting counter (0..7)
signal RUNNING : STD_LOGIC; -- Are we shifting?
signal TICK : STD_LOGIC; -- In or Out halves

-- Obtain the selected ICs input line
SINGLEMISO <= MISO (0) when SPISELECTS.DATA = x"00" else
MISO (1) when SPISELECTS.DATA = x"01" else
MISO (2) when SPISELECTS.DATA = x"02" else
MISO (3) when SPISELECTS.DATA = x"03" else

process (E)
if (E'Event and E = '0') then
if (SPIOUTREG.nIE = '0') then
-- Reset for byte transfer
OUTSTATE <= SPIOUTREG.DATA; -- Grab byte we are sending
INSTATE <= x"00"; -- Clear input
COUNTER <= "000"; -- MSB will be first
RUNNING <= '1'; -- Go!
TICK <= '1'; -- Start in input half
if (RUNNING = '1') then
if (TICK = '0') then
-- Output; shift byte up one
OUTSTATE (7 downto 0) <= OUTSTATE (6 downto 0) & '0';
-- Next byte
if (COUNTER = "111") then
-- Done. Mark not running and grab input byte
RUNNING <= '0';
end if;
-- Input; shift byte up one grabbing selected MISO
INSTATE (7 downto 0) <= INSTATE (6 downto 0) & SINGLEMISO;
end if;
-- Invert
TICK <= not TICK;
end if;
end if;
end if;
end process;
-- Send out MSB and set clock
MOSI <= OUTSTATE(7) when RUNNING = '1' else '0';
SCLK <= TICK when RUNNING = '1' else '0';
end architecture;

For what it's worth this is entirely my own code.

Fortunately all three types of SPI IC on the MAXI09 board support CPOL=0 CPAH=0 mode, as this is all that this code supports.

The SPIOUTREG and SPISELECTREG entities encapsulate a byte-wide addressable register. A key output of this entity is the nIE signal, which is low on (and only on) the system clock where the register is written to. This is the signal which starts the SPI sending (and receiving) action. The 8 bits are shifted out from this register, while the input is shifted in to a plain vector which, when addressed by the MPU at a particular address in DISCo, the received byte is presented.

I am only a VHDL newbie, despite having played around with it, on and off, for a few years now and I welcome any tips on improving this code.

Criticially, the RUNNING signal is not exposed to a register. Since it isn't exposed, the code in the SPI driver in MAXI09OS most wait (via a sequence of nop instructions) for the SPI host to shift out the byte. Calculating the minimum time that the MPU must wait was an interesting little excersise.

Since there are 8 bits to send, and each bit takes 2 machine cycles (TICK must go from 1 to 0 and back to 1), and the 6809 nop instruction also takes 2 machine cycles to execute, 8 nops are required. This means that the SIPCLK line runs at 1MHz, which is nice and fast for my little micro. It means that SPI data could be sent, back to back, at a rate of about 100KB/sec, but of course the driver overhead in MAXI09 means bytes cannot be sent back to back.

Below is a screenshot from my Saleae Logc 16. I did my testing with the DS1305, setting and getting the time. The screenshot shows the time being sent retrieved from the IC:

In the case of the RTC code (user/gettime.asm in the git repo), getchars was used to read 7 bytes. This has additional overhead, on top of the overhead from the driver switch in sysread. None the less, the time taken to transfer a byte is about 40uS, giving a transfer rate of 25KB/s, which isn't completely terrible for this micro.

Unfortunately I did not graph the previous bit-banged implementation of the SPI host, but it is possible to calculate the time taken to shift out one bit from the code:

               ldb #SCLK               ; clock high ...
               stb SPIOUT              ; ... set it
               ldb SPIIN               ; get state of miso
               rorb                    ; rotate into carry
               rola                    ; rotate back into a
               clr SPIOUT              ; clock low

I make this 2+5+5+2+2+7=23 machine cycles, more then an order of magnitude greater then the two taken inside DISCo to shift one bit. Of course this ignores setup for sending a byte, in both cases. And it also ignores, in the above bit-banging code, the requirement to shift in the received byte, as the byte to be sent is shifted out. All in all I think this is a fantastic improvement.

After I'd verified that the SPI host was working by interfacing it with the DS1305 RTC, I set about interfacing with the analogue joystick. In the end I wrote a C program to do this (user/c/examples/joystick.c). This required adjusting the libmaxi09os glue for syswrite to get it to set a return value, which is the value read in by the (for example, and currently only) SPI driver spiwrite implementation. The only interesting line from this whole program is this one:

uint8_t pos = ((spidata[0] << 6) | (spidata[1] >> 2));

This joins up the value from the two bytes received from the IC, massages them together, and then discards the lowest two bits of the 10 bit value, leaving an 8 bit quantity for the joystick position.

This works very well; my test program is able to sample the joystick position in the X and Y directions and show it on the console. It's obviously many times slower then reading the digital joystick position, but should be fast enough for a simple game, something I intend to tackle next...

Wednesday, 7 March 2018

Exploring C compilers for the 6809

Ever since I fired up my first 6809-based computer, I have wondered about the possibility of running programs written in something other then assembly language on it. Not that I don't love writing code in assembly. But for interests sake, and because it could be a different kind of fun, I've been curious about running code written in higher level (relative to assembly) languages, especially programs written in C.

To make trying out external code - that is code not stored in the EEPROM - quicker to try out and run then when it is held on the Compact Flash, I have implemented an improvement over the existing method for transferring runnable files (essentially machine code subroutines) between my Linux box and the MAXI09 board. The old method involved transfers with XMODEM. Whilst this method largely worked, it was fiddly because XMODEM transfers need to be initiated on the sending side; the Linux box, which is not very convenient especially if the MAXI09 console is being used.

What I have implemented in its place is a simple file transfer service whereby the name of the file to be transferred is given to the receiver, the MAXI09 board.

This screenshot shows the "getrun" command being used to download and run various binaries:
This means I can rebuild code then switch to minicom and immediately try it out, without having to go through the hassle of popping the Compact Flash out of the MAXI09 board, copying the programs across, then putting the Compact Flash back in the MAXI09 board. A big improvement!

The protocol is mind-boggling simple. In this description the client is the MAXI09 board and the server is the Linux box:
  1. Client sends a command byte 0x01, the "get" operation
    1. The rest of the sequence is command-specific though in any case only "get" has been implemented
  2. In the case of 0x01 the filename, with a trailing null byte, is sent next
  3. The server sends a response code: 0x01 means "data follows", 0x02 means "file not found", 0x03 means bad command, and 0x04 means the file is larger then 64KB
  4. Assuming the file exists 0x01 is followed by the file size as a 16bit, in big endian order
  5. Finally the file data itself is sent
    1. There are no acknowledgements sent from the receiving end as the file is sent; it is essentially sent blind
This last characteristic caused problems, initially, with data loss. It seems that with task switching going on during a transfer the circular buffer between the ISR and the sysread system call will sometimes overflow. This could be cured by either:
  • Breaking the transfers up into blocks (say 64 bytes) and acknowledging each block transferred
  • Introducing hardware flow control
  • Disabling task switching - but leaving interrupts enabled - for the duration of the file transfer
Disabling task switching is the solution I have currently employed. The first method would also likley work, but would slow down the transfers.

Figuring out the cause of the data loss took some time. Initially I suspected the 64 byte FIFO in the UART port in the SC16C654 (PDF) quad UART was overflowing due to interrupt starvation. But after checking the Line Status Register this turned out to not be the case. and it's obvious really: the baud rates used are slow compared to the speed of the CPU and the size of the UART's FIFO.

Instead the problem was an overflow of another FIFO: the circular buffer filled by the ISR and emptied by the sysread call in the UART driver. While other tasks are running, nothing is emptying this buffer so eventually it will overflow (the write pointer moves past the read pointer) causing data loss. The current solution is to insert a forbid at the start of the file transfer and a permit at the end. During the transfer interrupts will be serviced - needed so that the UART's FIFO is still emptied - but only the shell task, where the file transfer code runs, is scheduled. Thus the circular buffer is always emptied fast enough and no data loss occurs. The downside is that no other tasks are running for the duration of the transfer.

The fileserver software itself is based on the old flasher program; the one I use to update the EEPROM on the MAXI09 board. To avoid code duplication, the common routines (serial IO) have been broken out into their own module, which is linked by both the flasher program and the fileserver program. Anyone interested in this code can view it here. The code is not particular pretty, but it works.

While working with external commands, I thought about tackling a little project I've been wanting to complete for a while: running my old Snake game from within MAXI09OS. Not only would the game be started from the Shell (and read either from the Compact Flash or sent over the UART), whilst the game is running other tasks would still be being scheduled. Further, it would be terrific if it were possible to switch between the running game and the virtual consoles running other tasks. Finally, exiting the game should free up all resources and return the user to the virtual console which was used to start the game.

In summary, all this has been achieved. The game code was "ported" from its existing, bare metal, environment to use the facilities from MAXI09OS. One of the things I wanted to avoid was having the game code busy-wait - the previous iteration of the game spends most of its time in a delay loop implemented by decrementing a register until it reaches zero. This has been replaced with the use of a MAXI09OS timer. The joystick driver has also been rewritten to be event-based. Instead of returning the current state of the stick, sysread now returns only changes in position, which are themselves determined by hooking the main system tick. Changes in position generate a signal, which user code can wait on. The main loop for the Snake game thus waits on both timer and joystick events. As before, the game speeds up as the Snake eats the food. This is accomplished by shortening the interval between timer events each time the food is eaten. The interface to the V9558 itself uses the sameAPIs for interacting with the V9958 as the console driver, which happens to be roughly the same mechanism used in the Snake game. For performance reasons it's not really possible to place the V9958 behind a proper driver in the same way as, say, the UART is.

Implementing console switching while the game is running was interesting. This was accomplished by first shuffling the console video memory space around a bit. Previously the console font data and the screen data was spread across the 64KB video RAM memory space. Now the console fonts and screen data is within the first 32KB; the second 32KB is free for other purposes, for example the Snake game's own font and screen data. Actual switching was achieved by extending the console switching routine to adjust all video registers, including the video mode registers. This means that if a graphical mode is active and the user switches to a text-based virtual console, then it will be replaced with text mode. Previously only the screen memory address register were updated to switch the display text to the new virtual console.

Because of possible conflicts between tasks wanting to use non text modes on the V9958, it is only possible to have one non text view active at a time. To claim ownership of the upper half of the video memory for use with alternative video modes, a task registers a video switch callback using the setgraphicsub subroutine. This callback is run, inside the console's ISR, to switch to an alternative video mode when a special "graphical console" function key is pressed, currently F10. In the case of the Snake game, this callback switches the video mode registers to the 32 column tile mode used by the game. When the user presses, say F1, the game switches back to 80 column text mode.

This all works rather well: it is possible to leave Snake running between games (or even during a game, if you are quick) and switch back to text mode to use the Shell, etc. All the while, the system's performance is not too adversely impacted.

I'm pleased enough with this little milestone for MAXI09OS that I've made a brief video. It shows some other things covered later in this blog post:

So, to C compilers.

I've pondered running programs written in something other then Assembly Language for a while now, and I've finally had some success. Whilst there are a number of C compilers that were produced for the 6809 over the years, there seems to be only two which are, more or less, currently being worked on:
  1. GCC, in the form of some unofficial patches
  2. CMOC, which is a compiler written specifically for the 6809
I've spent a fair amount of time looking at both compilers. They each have their pros and cons:

GCC pros:
  • Although they are close to 7 years old, the patches are against GCC 4.3.4, which though somewhat old, supports full C and C++ standards including C99 and C++98 with support for some of the features available in the later C11 and C++03 standard.
  • The generated code appears to perform better then CMOC generated code.
  • Not to dish CMOC, but it is a mature, fully featured, compiler.
  • C++ is available as well as C, though I have not used it. The STL is not available however.
GCC cons:
  • It requires the creation of a linker script in order to produce output in a form which can run on a "bare" system like MAXI09OS. Not a big issue.
  • It might be my build, but the shipped libgcc contains integer division routines which crash the system. I had to use an alternative implementation and manually link the module containing these routines in, or else the code could not contain expressions with division/modulus.
  • Position Independent Code support is patchy. It mostly works, but I had problems getting the compiler to generate sensible code when globals (which were arrays) were involved. I have not spent much time looking at the generated assembly, but it does not look complete.
  • There is no C library. Apparently it is possible to use the newlib C library, but I have not had any success getting it to build.
  • The 6809 patches are not being worked on anymore, it seems.
CMOC pros:
  • Very easy to use: no linker scripts, just point it at the code. The compiler mostly targets at the CoCo, but generating "bare" machine-code files is easy with the Motorola SREC output format, which can be easily converted to a raw binary format with objcopy, a part of binuils.
  • Comes with a fairly minimal C library.
  • Being actively worked on; the last release (as of this writing) was 26th February. The author happily answered my queries about the software.
  • When dumping out the generated assembly files, the code is beautifully annotated with remarks relating to the original C code.
  • The inline assembly (required for calls to MAXI09OS routines) is a lot easier to use then GCCs.
CMOC cons:
  • Supports only a subset of C. Some missing features: bitfields (have never liked these anyway), and "const".
  • It seems to produce slower code the GCC.
On the subject of the speed of the generated code, my performance measurement was extremely crude. I tested a prime number solver and observed it ran about half the speed of the GCC-build. I have yet to look into why this is the case, but one factor against CMOC is that it passes all function parameters on the stack instead of through registers. I'm unsure if this can account for all the speed difference though.

Because of not being entirely sure which compiler I will end up using for any larger projects, the build system I have implemented for building programs written in C supports both compilers. Getting a build from "the other" compiler is a simple matter of switching a Makefile variable.

A "c" directory in the MAXI09OS repository has been created which contains the relevant files for working on MAXI09OS code written in C. This is broken down into the following files and directories:

Makefile: This is a simple top level Makefile which runs make in the subdirectories. This include file contains shared Makefile commands used by the other subdirectories. The compiler to use (GCC or CMOC) is defined here.
libc/: My very, very trivial C library is here. I have only implemented a few functions: atoi, memcpy, memset, strlen. Also an snprintf implementation has been included, which is based on mini-printf.
libmaxi09os/: This directory contains the interface between programs written in C and the MAXI09OS assembly routines. Only a few MAXI09OS routines are currently exposed: putstrdefio, sysopen, sysread, syswrite and a couple of others. This linkage uses inline assembly: the C function wrapper extracts the parameters passed in (in CMOC's case via the stack) and then uses jsr to call the MAXI09OS routine via the jump table.
libgcc/: Contains the integer division routines required when GCC is the compiler being used.
examples/: Various test programs live here, including the prime number finder.

One of my current concerns lies in the overhead of calling MAXI09OS routines from C code. In a function like sysopen, the steps involved are roughly, assuming CMOC is used:
  1. The calling code pushes the 3 parameters onto the stack. These parameters are the device name pointer (a word), and the two parameter bytes (eg the UART port number and the Baud rate).
  2. The C wrapper then loads the correct registers by reading values off the stack. This is done with inline assembly, as is the next step.
  3. The inline assembly then calls through the MAXI09OS routine via an indirect jsr.
  4. The return value, if any, is stored into a variable held on the stack.
Since the inline assembly syntax is different between GCC and CMOC, libmaxi09os is pretty much implemented twice, once for each compiler.

Here is the code to the sysopen wrapper, assuming CMOC is being used:

DEVICE m_sysopen(char *name, uint8_t param2, uint8_t param1)
        DEVICE device;

        asm {
        pshs a,b
        ldx name
        lda param1
        ldb param2
        jsr [0xc008]
        stx device
        puls a,b

        return device;

I don't believe anything can really be done about this overhead, short of using inline assembly at the calling point. But if that was done, the whole program might as well be written in assembly.

All told, I'm pretty happy about being able to write user code in C. I'm not sure where I'll use it though, yet.

So far I've written (or converted) a bunch of programs and have ran them up on the board:
  • io.bin: A simple a. print a prompt, b. get a string, c. output the string demo.
  • gettime.bin: A rewrite of the SPI RTC time-display program.
  • prime.bin: Asks for a number, and then prints all the prime numbers between 2 and that number.
  • easter.bin: Prints the dates for Easter for the years between 1980 and 2020.
As usual, I have a bunch of ideas for what to work on next, including:
  • Tidy up keyboard support: need to add key debouncing, utilities for setting the key repeat parameters, and fix some other little issues.
  • Get my Citizen 120D+ 9 pin dot-matrix working. This needs a driver for the VIA parallel port, and some utilities to print files. I really want to do this!
  • Write a driver or other routines for the OPL2 sound IC. Not really sure of the best way to approach this though!
  • See about porting a more complex C program to MAXI09OS. A text adventure game might be a nice target.
Lots of ideas, and only so much time...

Sunday, 21 January 2018

Another MAXI09 is born and some OS progress

There's not been as much progress over the last few, actually six, months as I would have liked. The good news is that I now have a whole load more time available to work on my projects, including MAXI09-related things. So expect many more blog posts in the coming months!

There has been some progress on the OS which I will talk about later, but first up some news about the creation of another MAXI09 board.

After posting about the MAXI09 project on the retrobrewcomputing forum I initially had very little interest. Which is fair enough; MAXI09 is not for everyone. But a couple of months ago I received a very interesting response from someone who, like me, is a huge fan of the 6809. After sending a few emails back and forth he expressed an interest in building his own MAXI09 and, sure enough, he is now in possession of a running board. There's still a few things to get working, especially around the screen and keyboard, but his board is very nearly complete and he is able to interact with the system via its UART ports. Here's a picture of his MAXI09:

He goes by the handle ComputerDoc and you can read more about him here. He's very active in the retro computing scene and has built more then a few retro machines over the years.

Most of ComputerDoc's troubles with getting the board running were down to him using Windows for his development machine. I haven't ever tried using asxxxx under Windows, and while there's a build which works great, working with MAXI09 also involves being able to run the "flasher" program which, whilst it builds fine under CygWin, doesn't seem to run properly. So for now 'Doc is using a Linux box with his board. It would be great to get the system useable from a Windows environment, so at some point I will tackle that extra little project.

Work on MAXI09OS is continuing slowly.

I have been working on introducing a "clean" interface between user code (that is, utilities and such like) and the main system code. I did briefly consider using Software Interrupts, which is the most elegant and recommended solution, but the overhead of stacking and unstacking all registers on every syscall does not make this a high performance solution.

Instead I am using a simple vector table approach. Unlike in my previous "mini OS" made out of my old machine code monitor, the vector table is generated using a script. This script parses the symbol map which aslink produces and generates an array of symbol references. Thus each syscall is presented at a constant offset into the vector table (referred to as the subtable in the code) and a rebuild of the EEPROM which moves the actual routine address around will not break callers of the code - user code - which are calling through the vector instead of directly. This table resides at the very start of ROM, 0xc000 using the current memory map. Currently system variables, like the current task pointer, are also exported to user code but without the indirection of the vector table. This is a reasonable approach since the table of variables has changed very little during MAXI09OS's development. I might yet change this around and either hide some of these variables behind a syscall subroutine or vector them so they can be moved around if desired. Actually a third option exists: see if its possible to live without exporting them, since most of them are fairly core to the system and do not hold things useful to "end user" code.

Because some globals need to be global across the EEPROM image (ie. the MAXI09OS) but not exported to user code, a convention has been introduced: if the subroutine starts with an underscore it will not be added to the subtable at build time. This is used, for example, by startup code to locate the address of the idler task code, which needs to be a global because it resides in a different file to the startup code. The nice thing is that the generation of these tables is completely automatic; they are produced by small perl scripts which are invoked as part of the build process by make.

To facilitate the vector table being at the start of ROM I had to make an improvement to the 256 byte boot loader I've previously written about. Before the loader always jumped to 0xc000 after either reprogramming the EEPROM, or not; the normal boot case. This was a simple approach but it meant it was not possible to chain-boot ROM images which had a regular reset vector stored in ROM at 0xfffe and 0xffff. The loader will now chain the real ROM by jumping through this vector. It achieves this by always copying itself into RAM when it starts. This is necessary because the loader program also resides at the top of the address space (it has to since the 6809 needs to boot it at start-up). Previously this copy to RAM operation - and running in - was only done when rewriting the EEPROM. It's now performed on each start up since the loader always has to be able to access all of the real ROM including the top page where the reset vector is found. This approach for chaining up the OS feels nicer and is also more flexible, at the expense of a few milliseconds start up time.

The system routine table has been proven out by implementing some test "user programs", which are executed by the shell. At present the current directory of the shell task is searched, but obviously this is a temporary hack: in the future external commands will reside in a specifically named directory. Obviously this code path is only entered if the command name entered is not one of the built-ins (cd, type, list, etc).

The rough sequence of steps to run an external command is as follows:
  1. Try to open a file in the current directory with the name obtained from the shell input stream, eg "test.bin". If this fails, bail with "file not found".
  2. If the file opened is not actually a plain file (maybe it's a directory) then bail with "not a file".
  3. Read the file into RAM (readfile sub in drivers/fs/minix.asm):
    1. Obtain the open file's length.
    2. Allocate that many bytes.
    3. Read from the file handle, that many bytes, into the new memory block, using the device agnostic getchars in lib/io.asm.
  4. Jump to the start of the allocated memory block as a subroutine.
  5. On return, free the menory allocated by readfile.
On return from the subroutine ie. the user program, the shell could do something with the register state, like treat the contents of the a register as an error value, or something similar. Currently it just loops back to the top of the shell input loop. Also, on entry to the subroutine the registers are not setup in any particular way; it would be nice if the x register contained the calling tasks IO channel, and y contained the rest (sans command name) of the shell's command buffer. That way the command could process the rest of the command-line and extract arguments. However, none of the (three) external commands currently take arguments so that is not yet needed.

The following screenshot shows the shell being used to run the three external commands:
This screenshot also shows the operation of a new driver for the SPI "controller" within DISCo. It is still bit-banged, with DISCo acting as GPIO pins to the various SPI peripheral ICs. One small improvement I've implemented in DISCo's VHDL is that the SPI select pins are now configured via an address register, instead of individual pins being controlled by register bits written by 6809 code. This means the details of how a peripheral IC is selected is abstracted away in the VHDL SPI address decode logic. The address arrangement is as follows:

SPICLOCK        .equ 2
SPIEEPROM       .equ 3

All other values mean no peripheral IC is selected, with 0xff being the nominal address to use for this state. The peripheral IC address is set in the a register when the SPI device is opened via sysopen. Unlike other drivers with units (like the Quad UART) it is only possible to open a single instance of the SPI driver at a time. This is because the SPI data effectively travels on a shared bus. While SPI peripheral ICs might remember their state when deselected (allowing flipping between devices during multi-byte reads and writes, which could happen if two tasks open different SPI peripheral ICs) I'm not sure if this is the case or not and, for now at least, this restriction simplifies things. The SPI device works well, though regular readers will know that I fully intended to implement an "intelligent" SPI host controller within DISCo at some point.

I have also been working on the debug monitor.

Previously the monitor was a task that sat idle until wanted, at which point it disabled task switching and entered it's interactive mode until it was exited. This worked well enough, but it was tied to a particular IO device (ie. a UART port or a virtual console). This made it inflexible.

Now the monitor is entered, with task switching being disabled, by sending the break signal, if using a UART port. Or by hitting a magic key combination, using using a virtual console. This is much more useful as the monitor can be entered regardless of the IO device used.

The implementation of this mechanism is kind of interesting. The OS subroutine sysread has been extended to report error values. Obviously not all device will return these error states. Errors are indicated by setting Not Zero (ie. Zero means no error). The a register will contain one from a list of possible errors:

IO_ERR_OK       .equ 0                  ; io operation completed ok
IO_ERR_WAIT     .equ 1                  ; task should wait, no data yet
IO_ERR_EOF      .equ 2                  ; end of file reached
IO_ERR_BREAK    .equ 0xff               ; get a break signal

(Prior to this change, sysread returned Not Zero from the UART driver if no data was available indicating the caller should wait. This has now been extended with these new error states.)

Inside the UART driver, the break condition is detected by examining the Line Status Register. If a break is indicated, then the sysread action returns with the appropriate error. Similarly inside the console driver, IO_ERR_BREAK is set when the keyboard translation routine detects that the break combination has been pressed. Break is represented by a virtual ASCII sequence which is on the Help key when shifted. Thus pressing this key combination results in ASC_BREAK being entered into the buffer, which is then turned into the IO_ERR_BREAK condiion within the console driver's sysread subroutine.

This system is not yet flawless. Phantom characters appear to being entered into the buffer when the break sequence is sent of the UART from the terminal program (minicom). Thus the input buffer needs to be cleared with a dummy Return key press before entering a monitor command. It's not clear if this is a coding issue, or if the break sequence itself is generating an additional real character, which is swallowed into the buffer. It's not a big problem, just slightly annoying. Here's a screenshot of the monitor being entered, two commands being run, and then exited:
The "BREAK!!!" message is printed as a reply to the break signal, inside the generic IO error handler in lib/io.asm. This routine deals with generating wait calls if a IO_ERR_WAIT is indicated, as well as dealing with break. This error handler in turn is called by the getchar routine, which is the generic wrapper for sysread. getchar is in turn used by all IO routines in the IO library, including getstr, which is the main routine for getting an input string from the IO device. It's thus possible to bypass the break detection, and implement it directly (or ignore it), by calling sysread directly and dealing with errors. Most user code will not use sysread directly however.

I have also added a debugmemory command to help find problems with the memory allocator. This spits out the list of memory blocks in the system; the addresses of each block as well as the next pointer, length and free flag for each block. This was done for a very practical reason: at one point in the implementation of external commands, my memory allocator had a corruption problem.

As is the case with most bugs, the issue is obvious in hindsight.

As described above, the external command run routine reads the entire file (of raw machine code) into a newly allocated memory block, sized exactly as big as the file. The problem was the file did not include the buffers used for input data. For example, gettime.asm requires a buffer to hold the SPI bytes received from the DS1305 (PDF) Real Time Clock. But this data is not included in the resultant gettime.bin file, since the memory buffer used to hold the SPI data is only "reserved" using the .rmb assembly instruction, which only creates labels unless there is trailing data - in which case real byes are used up inside the compiled machine-code file. The solution, albeit temporary, is to add a literal 0 byte value to the end of the gettime.bin file, via the .byte assembly pseudo instruction. This causes the file to contain space for the SPI data buffer.

The proper solution to this problem is introduce a header to the .bin external command files. This header would contain things like what additional space is needed to hold these unset buffers. This is the purpose of the .bss segment found in proper executable and object files. This is data which ha no initial value (usually zero'd out), but none the less needs to exist within a running processes address space.

It is not immediately clear what I will be working on next. In any case, real life has gotten in the way; my first priority at the moment is moving house. Once that is done I can turn my attention back to MAXI09. For a change of pace, I am thinking of working on a joystick. I want to build a nice digital joystick out of an arcade stick and some buttons I bought from eBay years ago. As a side project I want to make this joystick work with modern Macs/PCs/Linux using a USB adapter. I also want to see if I can get the 9 pin dot-matrix printer working. I bought that more then 2 years ago and it has so far sat gathering dust.

After that I want to work on a game. It will run within the OS, so should be a nice demonstration of the multitasking capabilities of the system. It is a lofty goal: after proving the ideas by porting my previously written Snake game to MAXI09OS, I am thinking about tackling a much more sophisticated arcade game...

Saturday, 22 April 2017

Code reorganisation, debug monitor, IDE, MinixFS

I have restructured the source code tree for MAXI09OS. Source files are now grouped in directories, with a directory for drivers, another one for the core of the OS, etc. The assembly source files are no longer stuck in a single directory. The makefiles are, by necessity, a little more complex as a result. After toying with using make in a recursive (PDF) fashion, I've instead settled on using include files which enumerate the build files.

One other improvement is that the source for the 256 byte boot loader, described previously, has been folded into the main MAXI09OS repo, along with the flasher tool for transferring new ROM images to the MAXI09 board.

All in all, it's a neater setup even if it is a little over engineered. I learned a bit more about make writing it as well. There's a few further improvements I could make if I wanted to, like a better header dependancy system then simply having a full rebuild whenever one of the headers is changed, but since a full rebuild takes only seconds it's probably pointless.

Back in the OS code itself, I have been improving my monitor by folding it into the OS as a kind of debug task. Commands are now words instead of a single letter. So "dump" instead of "d". Much nicer. The "command table" is handled through a generic subroutine, so other environments, like the Shell, can use the same mechanism.

While in use, the monitor task is the only one which will be scheduled. It is entered by hitting return on whatever IO device it is running on, which is usually a UART port. Interrupts are still enabled, but a flag variable is incremented that causes the scheduler, when it runs under the main ticker interrupt, to always return causing the same task to run each time. In this state the monitor will see a mostly static system. This is the same mechanism (forbid()/permit()) used when a task needs to be the only task running in the system.

I've added a command that shows memory stats, and the tasks in the system, or just the information about a specific task as this screenshot shows:
The "Largest" value from the memory command warrants some elaboration. This is the size of the largest free memory block. Since the simple linked list approach to the memory management lacks a mechanism to coalesce free blocks, it is very prone to memory fragmentation. The Largest value is the size of the biggest free memory block, which might well be considerably less then the total free memory. Actually after coalescing adjacent free blocks, free memory could still be fragmented.

I've also been working on making the monitor useful for exercising device drivers directly, without the need to write a specific test program.

With the sysopen command, it is possible to set the A register (which is usually the unit number) as well as the B register, which is used to set the baud rate in the UART driver but is otherwise not specifically assigned to a particular purpose.

The main motivation for doing this was to make it easier to write a driver for the IDE interface.

The IDE driver is for sector level access to the attached disk; the filesystem layer, described later, sits on top of it.

The same driver mechanism, and subroutines (sysopen, sysread, etc) are used for the IDE driver, except that in the case of sysread additional registers are used since the read is a sector read and not a byte read.

The following registers are used, in both sysread and syswrite:
  • X: the device handle, as usual
  • Y: the memory address to write to (sysread) or read from (syswrite)
  • U: the sector number (LBA) to start from
  • A: the count of sectors to read or write
Currently no interrupts are used so the IO operations busy-wait until the device is ready to send (or receive) the data. There are obstacles in MAXI09OS to doing this which I'll write about later. In reality this would only really matter if MAXI09 was ever attached to a very slow, old, hard disk. Whilst a CompactFlash is used the interrupt would fire, most likely, only a few iterations into the polling loop. Such is the speed of the MPU in MAXI09 retaliative to what a CF would more usually be attached too. All that said, getting interrupts working with the IDE interface would be nice.

I'm also using syscontrol for a few things (more will come later):
  • IDECTRL_IDENTIFY - 0: perform an identify command on the drive and fill out the 512 byte sector of information referenced by the Y register
  • IDECTRL_READ_MBR - 1: read sector 0 into device handle memory and copy partition information into a system table
The partition table, which is a section of 4 lots of 16 bytes within the MBR sector, contains start and length sector information about each partition, as well as other non pertinent data. The syscontrol action reads this in and uses it as sector offsets when doing IO on a partition. Currently no info about the partition table is returned with the syscontrol call. I will probably change this at some point so the user could view the table etc.

Inside the sysopen call partitions map nicely to units, with unit 0 being used to access the entire disk. The partition table information is used to calculate an offset for each partition / unit. At present, the lengths of each partition is not enforced and accesses for the first partition could overlap into the second, etc. This would be trivial to honour I'd I wanted to write some extra code.

This screenshot shows the monitor being used to open the IDE device and issue the IDENTITY command. The first 128 bytes of the resultant sector are dumped out, showing the vendor and model:
Here's a look at the monitor being used to:
  1. Open the entire disk
  2. Read the MBR
  3. Close the entire disk
  4. Open partition 1
  5. Read in the Minix superblock (2 sectors) into memory location 0x4000
  6. Dump out the first 128 bytes of the superblock
(The way I worked out that this was a MinixFS superblock was to spot the 0x8f13 sequence at offset 0x10. This is the magic number sequence for a Minix filesystem with 30 character file names, byte swapped since the format on disk is little-endian.)

After implementing the low level IDE layer (and MBR reading) the next task was to write routines for dealing with the filesystem layer.

For anyone interested in the guts of this ancient filesystem I suggest you read this overview, as well as this look at some of the data structures. Needless to say, MinixFS version 1 and 2 are about the simplest form of UNIX filesystem, all the more interesting (for me and this project) because the important structure fields are 16 bits wide.

The functionality nicely splits in two modules:


This wraps a mounted Minix filesystem handle.

It contains code to parse the superblock structure (including lots of little to big endian swaps), and a routine to read a arbitrary 1KB FS block which calls to the underlying device ie. the IDE layer to do the reading at the calculated physical offset. This routine uses a block offset calculated when the filesystem is mounted (of course, the underlying IDE layer will apply its own partition-based offset) The filesystem-layer offset calculation uses fields from the superblock, which includes fields which indicate the number of blocks used for the inode and data zone bitmaps.

There's also a routine to read in an arbitrary inode. This uses a simple cache; the last disk block of inodes read in. If the requested inode has already been read in then there won't be any disk access. 

An interesting aspect of this module is that it is possible to have multiple mounted file systems in the system at the same time. However to keep things simple the init task is responsible for mounting a system-wide root filesystem.

Also since an open "low level" device handle is the thing which is mounted, in theory the code written supports the adding of other block devices sitting under the filesystem code, say a SCSI disk attached to a controller.

Those good things said, I have not attempted to implement a VFS-type layer in MAXI09OS. Only Minix FS volumes can be mounted and the subroutine is called "mountminix". I did toy with writing more abstractions that would allow, hypothetically, the writing of a FAT16 layer without changing any user level code but in the end I concluded it would add yet more complexity and time and not really teach me anything useful.


This wraps an open "file", and presents the same entry points for an open file as the other drivers in the system, with the addition of a new sysseek call to move the file pointer around the file. It also contains other routines, for things like stat-ing a file by its filename.

The basic "thing which is opened" is a file referenced by its inode number. sysopen is called with X pointing to "minix", Y pointed at a mounted filesystem superblock handle obtained with mountminix, and U with the inode number. As usual the device/file handle is returned in X.

These handles are used for all types of "files". Opening one principally consists of locating and reading in its inode. This inode includes the "data zones" (block numbers) for the entry's data. For directories this data consists of an array of 32 byte directory entries AKA dirents. The structure of a dirent is trivial:
  1. 2 byte inode number
  2. 30 byte (max) null-padded filename
Thus to open a file by it's filename the first thing to do is to open the directory the file is in. We have to start somewhere, and what we start with is the root directory inode, which in Minix is always inode 1. (Inode 0 is reserved.)

The dirent array is iterated through, possibly by reading in additional data zone (filesystem blocks) if the directory contains more entries then would fit in a single filesystem block.

If a match of filename is found, the inode number for the file becomes known and its inode structure can be read from disk. The content - data zones - of the file can then be read in using the data zone pointers.

The following screenshot shows the monitor to be used to:
  1. Open the IDE disk.
  2. Read the partition table.
  3. Close the IDE disk.
  4. Open partition 1.
  5. Mount the MinixFS.
  6. Open inode 1, the root directory.
  7. Read in the first few dirents and dump them out.
As well as simple linear reading through a file it is also possible to seek to an arbitrary byte position, and continue reading from there.

One of the massive tasks I've not yet even started is writing to the filesystem. This is a pretty big job and opens up all sorts of interesting scenarios that need dealing with. For instance, how to deal with two tasks, one which is writing to a file, whilst another has it open for reading? Things get very challenging, very fast.

The last thing I have been working on is the command-line Shell. Currently it is very, very basic. You can perform the following operations:
  1. Change directory to a named directory (cd)
  2. List the current directory (list)
  3. List in long form the current directory (list -l)
  4. Output files by name (type)
These are internal commands, effectively subroutines in the ROM. But external commands should be perfectly possible too. Since I can now read from a filesystem, if the user enters the filename for a command that filename will be opened, read into RAM, and jumped too as a subroutine of the Shell process.

The list functionality warrants a little discussion. In a simple listing, it is only necessary to iterate the dirents in a directory. The user cannot tell even wether the entries being listed are files or directories, since those attributes (the file "mode") are stored in the inode and not with the filenames. List in long form opens each file's inode and displays the type of the "file" (regular, directory, pipe, device node, etc), along with its files size, user and group etc. Thus list long, on MAXI09 as it is on a real UNIX/Linux, is more expensive then simply listing the names of the files.

I've yet to write a routine to output decimal numbers, so all the values for things like file size are still in hex.

The following screenshot shows the Shell being used to wander around the Minix filesystem, listing directories etc:
The Shell is coming along quite nicely. Of course, what's not shown here is that it is possible to have multiple Shells running at the same time, on the virtual consoles and on the UART ports. There's a few limitations not shown in the screenshot, like the fact that you can only change directory into a directory in the current directory, not to an arbitrary path.

So MAXI09OS is coming along quite nicely, though you still can't really "do" anything useful with it yet.

I think I'll take a break from working on the software side for a while now and switch to working on the hardware:
  • There's the SPI controller VHDL code in DISCo. I can then write a driver for it, and finally some utilities for setting and getting the date and time from the Real Time Clock.
  • The Citizen 120D+ printer I bought about a year ago has yet to even be powered on. I could have a go at writing a driver so I can print to it, much like how they can be outputted to the console. This might also prove that I need to implement command redirection.
  • I could have a go at finishing the building of an analogue joystick I started a year or so ago, and experimenting with that
  • The keyboard controller can generate a reset signal on a particular key combination, but I've not even experimented with getting that working yet
I've had MAXI09 up and running for more then a year now, and it continues to keep on giving with no end in sight...