Stats
We have 2 guests onlineSearch
| Screen 2 output |
| Tuesday, 27 July 2010 20:48 |
|
One of the things that sets MSX apart from many other systems is the use of separate memory for the display processor that is not mapped into the address space of the Z80. While this is very efficient for the VDP, it is not so convenient for the programmer. Compare the following code and compare the CPU time it takes (all cycle counts are including the M1 wait state): ; write register A to address HL T-states
ld (hl),a ; 8
; write register A to VRAM address HL
push af ; 12
ld a,l ; 5
out ($99),a ; 12
ld a,h ; 5
or 64 ; 8
out ($99),a ; 12
pop af ; 11
out ($99),a ; 12
; Total: 77
While everything should necessarily be like this, it is obvious that writing to VRAM takes ages compared to simple RAM writes. This makes it obvious that writing to VRAM is best done is large blocks at a time. This article describes an effective way to do your screen handling while doing away with the added complexity of doing manual VDP writes.
My personal preference for screen handling is to maintain a 768 byte buffer in RAM. This makes random access writing reading and writing simple and fast. This only requires the buffer to be completely written to VRAM every one or two interrupts. For this you could do something like the following: ScreenBuffer: equ $C000
;
; OutputScreen
;
; Input:
; Output:
; Changes: AF, BC, HL
;
OutputScreen:
ld a,$00
out ($99),a
ld a,$18 + 64
out ($99),a
ld hl,ScreenBuffer
ld bc,$0098
otir
otir
otir
ret
The workings of this routine are easy. First it sets VRAM address $1800 for writing (the name table) and then it outputs 3 times 256 bytes. However, on MSX1 there is a problem. The VDP is not quick enough to handle the speed of otir unless it is its vertical blank period. So the question becomes whether this code can run in that period. When at 60 Hz, there are 192 lines visible of the 262 lines from the total frame. This means that the number of T-states that the VDP spends in vertical blank is 3579545 / 60 * (262-192) / 262 = 15939. Otir takes 22 T-states per byte, so the entire screen takes more than 16896 T-states. Oops, that won't quite work in a single vblank period. But before splitting the code into a fast and slow part, optimisation should be investigated. When checking the instruction timings on MAP, you can see that otir takes 5 T-states per byte more than outi. This extra time is used to perform the internal loop. But this means that using 768 outis would be 3825 T-states faster than three otirs. This is a little excessive and not quite necessary and for these kind of routines I like to use 32 outis so they fill exactly one line. Code could now be the following: ScreenBuffer: equ $C000
;
; OutputScreen
;
; Input:
; Output:
; Changes: AF, BC, HL
;
OutputScreen: ; T-states
ld a,$00 ; 8
out ($99),a ; 12
ld a,$18 + 64 ; 8
out ($99),a ; 12
ld hl,ScreenBuffer ; 11
ld bc,$1898 ; 11
.line:
push bc ; 24x 12
[32] outi ; 24x 32x 18
pop bc ; 23x 11
djnz .line ; 23x 14
ret ; Total: 14749
With the T-state count until the last outi totalling at 14749, the code is now well within the vblank time. But this is still not optimal. The push/pop combination is a bit of a waste. It is really only there because outi decrements B. A better way would be to use a different register and change djnz to a dec/jp nz combo, which would make the loop only 2 T-states slower. But since we're optimising, we may as well go for an even quicker way. Because both outi and djnz decrement B, it is decremented 33 times before every zero check by djnz. This makes the total count 24x33 = 792 = $318. While this is technically a 16 bit counter, it is not actually necessary to use the high byte because the zero check only occurs every 33 decrements. This means that B will pass zero somewhere within the 32 outis and go back to 255 without being checked. Only after the full 24 lines will djnz find B to be zero. Which brings the optimised solution to: ScreenBuffer: equ $C000
;
; OutputScreen
;
; Input:
; Output:
; Changes: AF, BC, HL
;
OutputScreen: ; T-states
xor a ; 5
out ($99),a ; 12
ld a,$18 + 64 ; 8
out ($99),a ; 12
ld hl,ScreenBuffer ; 11
ld bc,$1898 ; 11
.line:
[32] outi ; 24x 32x 18
djnz .line ; 23x 14
ret ; Total: 14205
This leaves a comfortable 1734 T-states to spare. Unfortunately this is not enough to also ram the sprite attribute table down the throat of the vdp, although you could consider to append a number of sprites that you want to keep on high priority planes and outside flicker code. What is important to remember is that this type of code is unforgiving to timing problems. This routine must be started within 1734 T-states after vbank or you will get screen corruption. This means that can not put large portions of code between di/ei. Also a word of caution on BIOS. Normally it would take about 200 to 250 T-states for BIOS to call the $FD9A/$FD9F hooks. However, if you attach your code to $FD9F, you will still be a slave to whatever is installed to run over $FD9A. So you best bet is to write your own handler for $FD9A so your routine will always run first. Of course you can always write your own interrupt handler directly and skip BIOS completely. I hope this has been interesting and will be helpful to people who are starting to write games in assembly. Leave a comment if you have questions, comments or improvements. |
| Last Updated on Sunday, 28 November 2010 16:03 |
Comments
Hm, Ok. Too bad. Then I have to think up on a way to manage SAT mirroring myself ;)
For sprite handling you should be a little clearer about the scope of what you want to see. I must admit that I have skipped SAT rotation in sprite mode 2 because of the need to rotate the colour table as well. Possible, but usually not worth the effort.
Something about the music format may be interesting. Especially as an example of how simple things can be.
Perhaps for next article you could address collision detection (your info on MRC was very helpful) or on Sprite handling on screen 4 and up. The colour information in combination with SAT rotation/mirroring is a real burden for (starting) coders.
NB: Change the primary text colour of your web pages so that it has better contrast to the background!
RSS feed for comments to this post