Talk:PIC instruction listings

Split completed
I've been WP:BOLD and split the PIC article into the PIC instruction listings article. This significantly shortens the PIC article and removes unnecessary detail. This is similar in purpose and design to x86 instruction listings. -- Wonderfl (reply) 07:46, 15 January 2015 (UTC)

Encoding
Are the 12-, 13- and 14-bit instructions zero-extended to 16 bit in a binary file or are they really 12/13/14-bit long? Or does this depend on the programmer of the microcontroller? – Sivizius (talk) 14:56, 15 September 2017 (UTC)
 * They're really 12/13/14 bits long in the microcontroller. When stored in byte-oriented memory like a desktop PC, they're padded to 16 bits.  The assembler adds the padding bits, and the flash programmer ignores them. 97.102.205.224 (talk) 04:34, 24 February 2024 (UTC)

Padauk doesn't belong here
The Padauk section starts by "Although clearly PIC-derived, there are some significant differences:", then proceeds with a long list of substantial differences. I don't see the "clearly PIC-derived" at all. If Padauk is "clearly PIC-derived" then nearly any 8-bit architecture with a fixed-size instruction word would be "clearly PIC-derived". IMO, the Padauk devices should have their own article (maybe create the Padauk Technology article and put it there, since these µC are their main product. SPTH (talk) 13:44, 16 December 2020 (UTC)


 * SPTH, I'm curious. What are these other "8-bit architecture with a fixed-size instruction word" processors you mentioned?
 * The Atmel AVR is the only one I can think of (most 8-bit architectures have variable-width instructions), and I agree that the Atmel AVR is not "PIC-derived". --DavidCary (talk) 08:28, 25 April 2022 (UTC)


 * The Xilinx PicoBlaze and the Lattice Mico8 are other examples of such architectures. SPTH (talk) 06:47, 17 June 2022 (UTC)


 * The differences listed are all relatively minor, and possibly deliberate to avoid patent suits. Some similarities (8-bit data path, separate code and data address spaces, fixed instruction size) are common among microcontrollers, but the following fundamental architectural similarities are not widely seen outside PIC-derived architectures:
 * 1-operand accumulator machine with a single (absolute) addressing mode
 * Oddball instruction word size, with half the bits dedicated to operand address
 * Family of processors with slightly different instruction word sizes
 * All 2-input ALU instructions come in memory-destination and accumulator-destination variants
 * The only conditional instructions are skip instructions
 * Load immediate & return instruction (RETLW) for ROM lookup tables
 * Obvious equivalents for all five of the PICmicro instruction forms:
 * Zero-operand special instructions
 * One-operand memory–accumulator arithmetic instructions
 * Four bit operations: set, clear, skip if set, skip if clear
 * Unconditional call and branch instructions
 * Accumulator–immediate instructions
 * Additional minor but suggestive similarities:
 * In addition to the common decrement-and-skip-on-zero, both have increment forms as well
 * Opcodes with an msbit of 0 almost all specify a memory operand
 * The exception being zero-operand instructions, which have a large number of zero msbits, with all-zero being NOP
 * Opcodes with an msbit of 1 do not have a memory operand, but are divided by the second msbit into:
 * unconditional jump/call instructions (distinguished by the third msbit), and
 * accumulator–immediate instructions
 * I mention, in fairness, that one-absolute-operand accumulator machines used to be more popular, e.g. the PDP-8 and Apollo guidance computers, but they're rare these days. I can't think of another microcontroller with that design.  Certainly not 8048, 8051, Atmel AVR, COP8, 68HC08, ST6/ST7, STM8, Hitachi H8, Toshiba TLCS, Epson S1C88, CR816, or Mitsubishi 740.  Nor 16-bit microcontrollers like MSP430, Zilog Z8 or RL78.
 * Since you mentioned Picoblaze and LatticeMico8 specifically, I'll describe them in more detail. They're so similar to each other that I can describe them together when contrasting them with the PIC/Padauk.
 * First, ways they are like the PIC & Padauk CPUs:
 * They both have a fixed, non-multiple-of-8, instruction size (22 and 18 bits, resp.)
 * The primary ALU status flags are C and Z. In particular, there's no N (sign) flag.
 * But in the following significant ways, they are very different from the PIC & Padauk CPUs:
 * They're 2-address architectures with 16/32 general-purpose registers.
 * All external memory and I/O registers are accessed load/store; there are no read-modify-write instructions.
 * Most instructions come in register–register and register–immediate forms. (With the former padding the source register to 8 bits so it fits in the same field as the immediate constant.)
 * This includes loads and stores, which offer the same choice of source (address) operands: register (indirect addressing) or immediate (absolute).
 * Branches are conditional on the C and Z flags. Both have conditional calls; Picoblaze also has conditional return!
 * Padauk changes are significant (especially the RAM/IO split, and the redesign of the single-operand instructions), but they're obviously in furtherance of saving opcode space so the equivalent of the PICmicro 14-bit instruction set fits into a 13-bit instruction.
 * I would have thought the "PIC equivalent" column would make things clear. If the basic architectures didn't have an obvious isomorphism, such equivalents couldn't exist.
 * 97.102.205.224 (talk) 04:30, 24 February 2024 (UTC)

Holtek opcode files
The main reference at https://wuffs.org/blog/mouse-adventures-part-3 describes the Holtek-supplied *.fmt files and a disassembler which uses them to interpret binary code. But the author gives up on the  section of the files. Here's a bit more information, including the long (2-word) instructions.

The folliwing is an excerpt from HT48R06A-1.fmt, a 14-bit device:

%operand ;	 fedc ba98 7654 3210 ;	1 0000 0000 0111 1111 ;	2 0000 0000 1111 1111  ;	3 0000 0011 1111 1111  ;	4 0000 00bb b111 1111 ;			ele, val, cod, msk 0, 3, 0,03fffh,  0, 0, 0, 3fffh	; for dc/dw 1, 7, 0, 07fh,   0, 0, 0,  07fh	; byte [value] 2, 3, 0, 0ffh,   0, 0, 0,  0ffh	; imm value 3, 1, 0, 03ffh,  0, 0, 0, 03ffh	; far jump 4, 5, 0, 03ffh,  0, 0, 7, 7,  0, 3, 0, 7fh	; bit [ram].bit %mnemonic ;	size	code, mask	 mnemonic... 1,	0000h, 03fffh,	  nop 1,	0001h, 03fffh,	  clr	   wdt [...] 	1,	0200h, 03f80h,	   sub	   a, &1 1,	0280h, 03f80h,	  subm    a, &1 1,	0300h, 03f80h,	  add	   a, &1 1,	0380h, 03f80h,	  addm    a, &1 [...] 	1,	0a00h, 03f00h,	   sub	   a, &2 1,	0b00h, 03f00h,	  add	   a, &2 [...] 	1,	2000h, 03c00h,	   call    &3 1,	2800h, 03c00h,	  jmp	   &3 1,	3000h, 03c00h,	  set	   &4 1,	3400h, 03c00h,	  clr	   &4 1,	3800h, 03c00h,	  snz	   &4 1,	3c00h, 03c00h,	  sz	   &4

Here's a similar excerpt from HT69F360.fmt, a 16-bit device with long instructions: %operand 0, 3, 0, 0ffffH,  0, 0, 0, 0ffffH 1, 7, 0, 000ffH,  0, 0, 0, 007fH,  0, 7, 0eH, 1 2, 3, 0,  0ffh,   0, 0, 0,  0ffh 3, 1, 0, 03fffH,  0, 0, 0, 07ffh,  0, 0bh, 0eh, 3 4, 5, 0, 007ffH,  0, 0, 7, 7,      0, 3,   0,   007fH,  0, 0ah, 0eH, 1 5, 7, 0, 0ffffH,  0, 0, 0, 007fH,  0, 7,   0eH, 1,      1, 8,   0H,  0ffH 6, 5, 0, 007ffffH, 0, 0, 7, 7,     0, 3,   0,   007fH,  0, 0ah, 0eH, 1,    1, 0bh, 0h, 0ffh %mnemonic 1,       0000h, 0ffffH,        nop [...] 1,        0200h, 0bf80H,        sub     a, &1 1,       0280h, 0bf80H,        subm    a, &1 1,       0300h, 0bf80H,        add     a, &1 1,       0380h, 0bf80H,        addm    a, &1 [...] 2,       08200h, 0bf80h, 0000h, 0000h,  lsub    a, &5 2,      08280h, 0bf80h, 0000h, 0000h,  lsubm   a, &5 2,      08300h, 0bf80h, 0000h, 0000h,  ladd    a, &5 2,      08380h, 0bf80h, 0000h, 0000h,  laddm   a, &5 [...] 1,        0a00h, 0ff00H,        sub     a, &2 1,       0b00h, 0ff00H,        add     a, &2 [...] 1,        2000h, 03800H,        call    &3 1,       2800h, 03800H,        jmp     &3 1,       3000h, 0bc00H,        set     &4 1,       3400h, 0bc00H,        clr     &4 1,       3800h, 0bc00H,        snz     &4 1,       3c00h, 0bc00H,        sz      &4 [...] 2,       0b000h, 0bc00h, 0000h, 0000h,  lset    &6 2,      0b400h, 0bc00h, 0000h, 0000h,  lclr    &6 2,      0b800h, 0bc00h, 0000h, 0000h,  lsnz    &6 2,      0bc00h, 0bc00h, 0000h, 0000h,  lsz     &6

I've included the entire operand section, and enough mnemonic lines to reference 6 of them.

The mnemonic lines include:
 * Opcode size, in words (1 for most instructions, 2 for long)
 * Opcode bits and mask, repeated for each word:
 * The bits which are set in the instruction
 * A mask of bits which are opcode (as opposed to operand) bits
 * The actual mnemonic, with  acting as a placeholder for the operand

The puzzle is interpreting the  lines. They're in blocks of 4 values, each composed of 3 small values followed by a larger bit-mask, and a variable number of blocks per line.

The first block is special: Bit operands (operand syntax 7) are represented as bit numbers, computed as byte&times;8 + bit. Subsequent blocks appear to follow a common pattern. Each block maps some bits from the operand value to the opcode. The values are: For each group, the operand is stuffed into the opcode by:
 * 1) The first value is the operand type.  Operand type 0 is used for dc/dw (define constant/define word) pseudo-operations.
 * 2) The second value is mysterious, but appears to be an operand syntax specification:
 * 3) * 1 = Branch destination
 * 4) * 3 = Immediate constant
 * 5) * 5 = RAM address
 * 6) * 7 = Bit number
 * 7) The third value is always 0, so remains mysterious.
 * 8) The fourth value appears to be a mask of valid operand values.  Operand type 3 always has a mask of   (1 byte), while type 1 corresponds to the RAM address range of the device in question (07fH or 0ffH).  Operand type 0's mask corresponds to the full ROM width (03fffH, 07fffH or 0ffffH).
 * 1) Opcode word. Usually 0, but may be 1 for long instructions.
 * 2) Operand lsbit.  The lowest bit of the block in the operand value.
 * 3) Opcode lsbit.  The lowest bit of the block in the opcode word.
 * 4) Block mask.  The (zero-based) mask which defines the width of this block.

Notice operand type 1 (short RAM address). It's either a 7-bit (operand mask=07fh) or 8-bit (operand mask = 000ffH) value, with the low 7 bits copied straight through to the opcode, and the 8th bit (if present) sent to bit 14 of the opcode.

Compare operand type 4 (short bit address). This is similar to the above, but the low 3 bits (the bit number) are written to bits 7..9 (it's just a coincidence that the bit number and 3-bit mask are both 7), while the other groups are just like type 1 except the operand lsbits are all 3 bits more. The overall operand mask is also 3 bits wider.

Operand types 5 and 6 are the same as types 1 and 4, but 8 bits wider. And the 8 msbits are written to bits 0..7 of opcode word 1.

Obviously, original research like this has no place in Wikipedia proper, but I'm leaving this here in case it helps anyone confirm the accuracy of an admissible source. 97.102.205.224 (talk) 05:30, 24 February 2024 (UTC)