Electronics » the Z80 project » hardware sprite board
 Let there be sprites! (The blurry bits:) |
As it stood, the Z80 project's graphics system featured a 4 bitplane,
double-buffered 256x192 pixel display with hardware scroll in both
directions, scanline synchronized interrupts and a palette of 196 colours.
One feature I still wanted to add was sprite capabilities - moving
objects around the screen was taxing the CPU something chronic.
A couple of solutions came to mind: 1) Create an Amiga-like blitter unit
to shift display data a lot faster than the CPU (blitter accelerated software
sprites).. or 2) Make a hardware sprite unit. As hardware sprites are overlayed "on top of" the display
data, they dont need to be manually drawn or erased from video RAM,
and additionally they could easily have their own colour palette. Option 2 promised to be
more interesting, beneficial and less hassle to implement in hardware
and utilize in software once completed.
The next question then arose: How best to do it. I figured there were 3 choices:
- Use off-the-shelf simple logic chips for the whole thing. It seemed that
in order to have x number of sprites appear anywhere on the same scan line I'd
need x copies of the same circuit working in parallel.I wanted eight sprites
so it probably would have taken another tower of at least eight PCBs! Erm.. no :)
- Make a custom sprite chip. Unfortunately I wasn't familiar with Field
Programmable Logic Array or PLD chips at the time so.. err.. Next!
- Base it around a fast microcontroller. I'd recently aquired some Ubicom
SX28 chips and after a bit of research I concluded I could just about do
what I wanted with one of these (plus a few external logic ICs). Needless
to say, this was the way to go..
The SX28 sprite system firmware:
I decided the best (only?) approach to achieve my goal was for
the SX28 to prepare a complete scanline's worth (ie: a 256 pixel buffer)
of sprite date in advance of each raster line and clock that out in
sync with the pixels of the main display. Prior to each active scan line, the SX28 would clear an
internal 256 pixel buffer then read the x and y coordinates of each sprite,
see if they're on the current scanline and if so draw the correct slice
at the sprite's x location in the buffer.
There would need to be some simple masking when making up the buffer to handle transparent
sprite pixels. Also, the sprites would need to be clipped at the edges of the screen
in order to avoid ugly sudden appearances when a sprite comes into view.
Y-clipping would be almost automatic as the code would be selecting the correct
slice of each sprite anyway. Right side clipping would be easy - just stop
drawing if you reach the last byte of the scanline buffer. The left edge would
require an offset into the sprite data and a reduction of the number of pixels
to plot.
The number of sprites allowable per line would be limited by the amount of
time the SX28 had to generate a buffer during the horizontal border area
and also by the width of the sprites.
Originally I was hoping for eight 32x32 pixel sprites to be available per
line but the code wasnt fast enough - I settled for eight 16x16 sprites to
ensure a glitch-free safety margin between generating the line buffer and
the time when it needs start clocking out pixels. There was no real limit
on the *height* of the sprites - I just made them fit in a 256 byte block
for ease of access. I could always get more than eight sprites
on screen by writing an external (Z80) multiplexing routine.
The sprites were to have 16 colours (well, 15 plus transparent) and
that meant each pixel would take need 4 bits. Unlike the main display,
the sprite image data was organized in "chunky" mode - all bits came
from the same byte location - not merged from 4 seperate planes.
The SX28 has 136 bytes of SRAM so 256 pixels worth of data would take
128 bytes, leaving 8 bytes free for system variables. One small problem.. using a
nybble per pixel would mean having to rotate image data by 4 pixels
when positioning sprites at odd x coordinates. The "swap" instruction would
be one way to sort this out, but with the way things turned out I "cheated"
by specifying that pre-shifted data be put in the sprite image blocks
(as well as the normal byte-aligned nybbles). Therefore the sprite block
data format is:
Bytes 00-07: 16 nybbles for sprite line 0
Bytes 08-0F: 16 nybbles for sprite line 0 shifted right 4 pixels.
....
to..
....
Bytes F0-F7: 16 nybbles for sprite line 15
Bytes F8-FF: 16 nybbles for sprite line 15 shifted right 4 pixels.
(I designated the first 256 bytes of sprite RAM to be the "control block".
Rather than hold an actual sprite image, the x/y coords and definition number
of each of the eight sprite channels are read from / written to this location.)
"Issues"
There was a slight snag with using two seperate microcontrollers for the
sprite and bitmap data.Although they were both controlled from same clock
source (albeit at different speeds) I had to make sure the two SX chips were
in sync when clocking out pixels. On my first attempt I was just waiting for
the video SX's X-border signal (ie:Horizontal blanking) to change before
starting the sprite buffer dump. I soon found out this was not good enough.
A busy wait loop takes 4 clock cycles and the x-border signal could
change anywhere within this 4-clock group - this resulted in visible sub-pixel
sized jitter between the sprite and normal display data. The video SX's border
signal was arriving at perfectly regular intervals, the problem was that prior to
its busy wait loop, the "sprite SX" would have been generating the scanline buffer,
a process that takes a varying amount of time. The problem was solved by getting
the sprite SX to align its busy wait on 4-cycle periods, achieved by reading the
RTCC counter, inverting the value, ANDing 3 and executing that number of NOP
instructions prior to the busy wait loop.
Glitchy sprite images also plagued early versions of the code. I'd used
a 10ns SRAM for the sprite data so the microcontroller wouldnt need to be
hanging around when reading sprite data but the SX's slightly
unpredictable I/O timing was spooning things up. Fixing this was really
just a case of restructuring the code so that the I/O delays I added
wouldn't need to be wasteful NOPs, though these were unavoidable in some places.
The Sprite system hardware:
Here's another of my scribbled schematics.
Port A of the SX is assigned to four control signals: 2 inputs
to sense the x and y border signals from the main video SX and 2 outputs:
the 1st is an address latch line, the second is a busy signal
that prevents the conflicting access to sprite RAM.
Port B is an 8 bit input connected directly to the data bus of the
sprite data SRAM chip. The Z80 databus is also connected to the
sprite SRAM chip through a 74HC245 buffer (write only access).
Port C is configured as an 8 bit output used to set the 32KB SRAM
address lines during the horizontal border period and output
the 4 bit pixel values to the palette board during the active
display window.
When port C is producing SRAM addresses, address lines A8:A14
are latched at the beginning of each sprite read (to set
the "definition block" part of the address). The lower
8 bits are then controlled freely by the SX to access
image data within that sprite block. The SRAM's address
bus is also switched between that formed by the SX (and latch) and
the Z80 CPU by four 74HC157 data selectors - This allows the Z80
to upload sprite data to sprite RAM. I mapped sprite RAM at Z80
address 0, which is the same place as the ROM in my system.
Since there should never be any writes to the ROM, and there's
no reason to read sprite RAM, there's no conflict. The only
limitation is that you can only update the sprite RAM when
the SX is not accessing it - ie: during the buffer dump period
(active display window) or any time during the vertical border
During the active raster window, the four LSBs of port C output the 4-bit pixel
values to the palette board. Here, they form address lines A4-A7 of
the colour palette SRAM (they're sync'd with the display data
with a 74HC574 latch chip first). This way, the transparency
(sprite colour 0) is handled "automatically" by the
address bus of the palette RAM. EG: Bitmap colours 0 to 15
in A0-A3 will be selected when the sprite pixel colour (A4-A7)
is zero. The only "complication" is that 16 addresses must be set
for each sprite colour so the sprite appears to have complete
priority over the bitmaps.
Palette SRAM address:
$00-$00: bitmap colours 0-15 (15 values)
$10-$1f: sprite colour 1 (make all same)
$20-$2f: sprite colour 2 (make all same)
...
$f0-$ff: sprite colour 15 (make all same)
The sprite colours *can* be set at different values (within a set of 16)
in order to give the impression that certain bitmap colours have priority
over the sprites (or to create semi-transparent effects etc)
And that's about it - my SX28 sprite code is source code is here
if you want a look. Its probably useless for anything outside the Z80 Project, mind.
|