LGT8F328P
https://github.com/dbuezas/lgt8fx - Arduino support
https://github.com/dbuezas/lgt8fx/tree/master/docs - manuals
https://www.nongnu.org/avr-libc/user-manual/inline_asm.html - something about inline asm
Getting it to work
Runs as atmega328p, except these things:
• use programming speed 57600, not 115200
• use internal crystal at 32MHz :
        CLKPR = 0x80;
        CLKPR = 0x00;
  and set F_CPU=32000000L
• no EEPROM support (at least not compatible one)
uDSC
• arithmetic accelerator
• has 16x16-bit multiplier and 32-bit accumulator
• can transfer register pairs to uDSC with one instruction, for example this
  loads r22:r23 to DY register:
	out DSDY, r22
• can load uDSC registers directly from memory as 16-bit values: when you load
  to register from remapped address with offset +0x2000 (ie. you read u16 from
  0x0406 as 0x2406), it will load a uDSC register instead
  • if MM in DSCR=0: register number determines what uDSP register is loaded,
    ie.: `ld r2, Z` will load `AL` register (r0, ..., r4 load DX, DY, ...)
  • if MM in DSCR=1: the register has to contain io-address of the uDSC
    register (I couldn't get this method working)
Gallery of uDSC bugs
Value not yet in register
-------------------------
Example of loading DSDX register (address 0x10 (0x30)):
  too_early:
        ldi r22,0x98
        ldi r23,0x7a
        out 16, r22
        ret
gives you DX=0x0198, but the correct value is DX=0x7a98.
These two work:
  works_ok1:
        ldi r22,0x98
        ldi r23,0x7a
        nop
        out 16, r22
        ret
  works_ok2:
        ldi r23,0x7a
        ldi r22,0x98
        out 16, r22
        ret 
It seems that at the time the `out` instruction is executed and uDSC is trying
to getch the register pair r22:r23 (with r23 fetched first), the pipeline of
previous instruction hasn't yet finished (and r23 has some undefined value).
Register stores too late
------------------------
Value written from uDSC register to the remapped +0x2000 memory is one clock
too late, and at the time the write is being performed, the value from uDSC
had not yet latched, so the value of previous write attempt is being written
instead (or 0x0000 if it's the first access).
   store_values:
        st Z+2, r2	; writes 0x0000 to [Z]
        std Z+2, r3	; writes r2 (DSAL) to [Z+2]
        st Z+2, r4	; write r3 (DSAH) to [Z+4]
        ret
Some registers are not available immediately
--------------------------------------------
   saturation_from_mul:
        out 16, r24
        out 17, r22
        ldi r24, 0x44
        out 1, r24
        //nop
        in r24, 2
Saturation register isn't available immediately after executing the mul
operation (`out 1, r24`), but only one cycle after it.
Weird operation signedness rules
--------------------------------
It's not really clear what the "S" flag on "DA+DX*DY" does, besides affecting
flags.
The DX and DY signedness bits are swapped (they are S2 and S1, not the other
way around).
Weird saturation rules
----------------------
Some operation on DA _saturate_ in DA, for example:
DA_USUB_DY DSIR=0x13 => DA=15 DY=30 results in A=0
DA_ADD_DY DSIR=0x37 => DA=0xffff_fffe DY=0005 results in DA=0xffff_ffff
The operation `DA = DA + DX * DY` doesn't have this behavior and , so I suggest
using that with DX=+-1 instead of the dedicated `DA = DA +- DY` operation.
Also, there's generally no telling what caveats any operation has unless you
test it (and all the extremes).
Flags don't work
----------------
The flags do not work as expected. First, they don't seem to be numbered
correctly, second they do not correlate with the value in DA register.
MM=1 mode not working
---------------------
This is may not be a bug, but I didn't manage to get the other memory-mapping
mode working.