PCM Hammer P04 Support Project

They go by many names, P01, P59, VPW, '0411 etc. Also covering E38 and newer here.
Post Reply
kur4o
Posts: 948
Joined: Sun Apr 10, 2016 9:20 pm

Re: PCM Hammer P04 Support Project

Post by kur4o »

I think there is 1300+ different OS numbers from 1996-2006 for 256kb,512kb and 1mb pcms.
That don`t include most of the earlier revisions, since these files are hard to find. With good enough archive of files, I can rip to csv file all OS numbers and checksum locations, and we can make a chart of crossreferenced OS that share same OS code but different calibration part.

Some archive, with all kind of dumps will really help.
User avatar
antus
Site Admin
Posts: 8237
Joined: Sat Feb 28, 2009 8:34 pm
cars: TX Gemini 2L Twincam
TX Gemini SR20 18psi
Datsun 1200 Ute
Subaru Blitzen '06 EZ30 4th gen, 3.0R Spec B
Contact:

Re: PCM Hammer P04 Support Project

Post by antus »

We could probably write a script to work through pcm.dat from winflash and to gather the service numbers or hardware ids (whichever is in there) for all of those OSs, and count up how many are against each ID. If there is say < 3 against a specific ID we can probably assume its a mistake and remove it and if there are many more, then we can group them by service number or hardware or ID.

Im also thinking that for now if intel and amd both read at 512 byte packets, we could set the upper limit to that instead of 8 bytes as in the current develop branch and continue testing from there. There is a good chance we'd get 512kb bins from all P04s, just the early 256kb ones would be messed up at the end, and the late 1Mb ones would be cut short.

Then we could move to getting this loader working, using it to load the existing read kernel. If it can load a loader, then load the read kernel with that loader, then we are well positioned to grow the kernel and add flash chip type detection, at which point we can handle different types and sizes of flash for read.
Have you read the FAQ? For lots of information and links to significant threads see here: http://pcmhacking.net/forums/viewtopic.php?f=7&t=1396
kur4o
Posts: 948
Joined: Sun Apr 10, 2016 9:20 pm

Re: PCM Hammer P04 Support Project

Post by kur4o »

Here is some basic instructions how to get info from various p04 files with universal patcher.

Open program and goto utilities->patcher.

OS crc column is the checksum of main code. If that matches between stock bin files with different OS number, than they can be considered identical in terms of crossflashing and tuning definitions. Only calibration differs. Much more like ls1 files, but here we don`t have p/n defined to different segments we have only p/n to calibration area. So we can define a new OS p/n using the OS crc and some custom labeling like p04-case1->list all cal p/n

Cs1 address and OS store address columns are needed for checksum calculation when changes are made to bin file. These 2 dwords are excluded from checksum and the checksum is stored at CS1 address[4 bytes long word sum]
3d tables column, list most of the 3d tables found in the bin and the size of rows[:13=13 rows 3d table].
Attachments
get_V6_info_instructions.jpg
User avatar
antus
Site Admin
Posts: 8237
Joined: Sat Feb 28, 2009 8:34 pm
cars: TX Gemini 2L Twincam
TX Gemini SR20 18psi
Datsun 1200 Ute
Subaru Blitzen '06 EZ30 4th gen, 3.0R Spec B
Contact:

Re: PCM Hammer P04 Support Project

Post by antus »

I had a go at the CRC routines last night, but couldn't get a working and matching C implementation, even in x86 C in one session. I tried a few novel ideals

I got easy68k going.. you can use easybin to load a pcm bin file to address 0, and name it the same as your easy68k source file so that it loads automatically, then use START ORG $FF8000 in easy68k to put a crc routine in 'pcm ram' where you can hard code start and offsets. End the code with a TRAP #15 to stop the emu. Then you can run the code in easysim and look at the register that should hold the result to compare.

I also tried useing hxd to load a bin file, and export it to 'c' format, then use that to put it in a C app that can be compiled for x86 or 68k and compare the results vs pcmhamemr that way. The idea was to find a tight C implementation to trade size for speed. Was not able to fine any algo that got the same result as pcmhammer. The difference is how pcmhammers 32 bits at a time, instead of 8, and its not clear what to tweak to get the same result without the lookup table.

So in the end I just modified the pcmhammer kernel to use -O3 optimisataions which doesnt create a working kernel because the optimisations break the hardware interfaces, but then I loaded the code in to ida and took a look. There is 2C3 bytes of code to cover all the crc functinality and about another 260 or bytes of memory used for the lookup table. There also seems to be some clr.l instructions duplicated that I cant see would have any effect and probably could be removed. But we would probably put that asm in any asm source code to get it going then attempt to optimise it and shrink it down more later.

I also asked ChatGPT to generate such a routine in 68k. It did the usual chatgpt thing of creating code that doesnt look right, but it didnt assemble and didnt work. I talked it through the opcodes it got wrong and it revised the code and eventually assembled in easy68k, but there isnt enough code there for it to be a plausable implementation so I'd have to call it a convincing fake.

I am wondering if we should change the crc algo back to standard crc32 and process 1 byte at a time instead of 4, so that standard algos work in the kernel and standard libraries on the .net side too. This would make implementation a lot easier, but probably increase the amount of processing by a factor of 4, thus 4 times slower, and there is the tradeoff. But CRC is arbitrary, so we do have some level of freedom to change the algo. We could possibly use a version of the block sum CRC too, which is optimised for pcm use, maybe just expand it to 32 bits from 16 for purposes of difference checking in flash blocks (error rate of 1 in 65536 isn't probably good enough for this purpose when its sized at 16 bits).
Have you read the FAQ? For lots of information and links to significant threads see here: http://pcmhacking.net/forums/viewtopic.php?f=7&t=1396
User avatar
Gampy
Posts: 2330
Joined: Sat Dec 15, 2018 7:38 am

Re: PCM Hammer P04 Support Project

Post by Gampy »

Check out test.cpp in the Kernels directory.

It is imperative that the Polynomial and Remainder values are initiated the same on both implementations ... Kernel and App.

-Enjoy
Intelligence is in the details!

It is easier not to learn bad habits, then it is to break them!

If I was here to win a popularity contest, their would be no point, so I wouldn't be here!
User avatar
antus
Site Admin
Posts: 8237
Joined: Sat Feb 28, 2009 8:34 pm
cars: TX Gemini 2L Twincam
TX Gemini SR20 18psi
Datsun 1200 Ute
Subaru Blitzen '06 EZ30 4th gen, 3.0R Spec B
Contact:

Re: PCM Hammer P04 Support Project

Post by antus »

OK, got us a starting point in straight C without a lookup table.

pcmhammer:

Code: Select all

000000-003FFF	AB5A2C04
test calc:

Code: Select all

$ ./crc32
AB5A2C04
C Code:

Code: Select all

#define WIDTH  (8 * 4)
#define TOPBIT (1 << (WIDTH - 1))
#define POLYNOMIAL 0x04C11DB7

unsigned char *message=bin;

unsigned int crcSlow(unsigned char *message, int nBytes)
{
    unsigned int remainder = 0;
    for (int byte = 0; byte < nBytes; ++byte)
    {
        remainder ^= (unsigned int) (message[byte] << (WIDTH - 8));
        for (unsigned char bit = 8; bit > 0; --bit)
        {
            if (remainder & TOPBIT) remainder = (remainder << 1) ^ POLYNOMIAL;
            else remainder = (remainder << 1);
        }
    }
    return (remainder);
}

void main()
{
   printf("%08X", crcSlow(message, 0x4000));
}
And for 68k that crcSlow function compiles down to this. It should be a nice cheat sheet to write a clean implementation.

Code: Select all

var_9           = -9
var_8           = -8
var_4           = -4
arg_0           =  8
arg_4           =  $C

                link    a6,#-$C
                clr.l   var_4(a6)
                clr.l   var_8(a6)
                bra.s   loc_8000020E
; ---------------------------------------------------------------------------

loc_800001B6:
                move.l  var_8(a6),d0
                add.l   arg_0(a6),d0
                movea.l d0,a0
                move.b  (a0),d0
                move.b  d0,d0
                andi.l  #$FF,d0
                moveq   #$18,d1
                lsl.l   d1,d0
                eor.l   d0,var_4(a6)
                move.b  #8,var_9(a6)
                bra.s   loc_80000204
; ---------------------------------------------------------------------------

loc_800001DA:
                move.l  var_4(a6),d0
                tst.l   d0
                bge.s   loc_800001F6
                move.l  var_4(a6),d0
                add.l   d0,d0
                move.l  d0,d1
                eori.l  #$4C11DB7,d1
                move.l  d1,var_4(a6)
                bra.s   loc_80000200
; ---------------------------------------------------------------------------

loc_800001F6:
                move.l  var_4(a6),d0
                add.l   d0,d0
                move.l  d0,var_4(a6)

loc_80000200:
                subq.b  #1,var_9(a6)

loc_80000204:
                tst.b   var_9(a6)
                bne.s   loc_800001DA
                addq.l  #1,var_8(a6)

loc_8000020E:
                movea.l var_8(a6),a0
                cmpa.l  arg_4(a6),a0
                blt.s   loc_800001B6
                move.l  var_4(a6),d0
                unlk    a6
                rts
; End of function crcSlow

Have you read the FAQ? For lots of information and links to significant threads see here: http://pcmhacking.net/forums/viewtopic.php?f=7&t=1396
User avatar
antus
Site Admin
Posts: 8237
Joined: Sat Feb 28, 2009 8:34 pm
cars: TX Gemini 2L Twincam
TX Gemini SR20 18psi
Datsun 1200 Ute
Subaru Blitzen '06 EZ30 4th gen, 3.0R Spec B
Contact:

Re: PCM Hammer P04 Support Project

Post by antus »

Or better, compiled with -O2 with my annotations

Code: Select all

crcSlow:

arg_0           =  8
arg_4           =  $C

                link    a6,#0
                move.l  d2,-(sp)
                movea.l arg_4(a6),a1    ; move start address to a1
                tst.l   a1              ; any work to do at all?
                ble.s   return_0        ; if not, exit immediately and return a 0
                movea.l arg_0(a6),a0    ; for (int byte = 0; byte < nBytes; ++byte)
                lea     (a0,a1.l),a1
                clr.l   d0

mainloop:
                clr.l   d1              ; unsigned int remainder=0
                move.b  (a0)+,d1        ; message[byte] to register d1
                moveq   #$18,d2         ; WIDTH-8 to d2
                lsl.l   d2,d1           ; shift left copy of message[byte] by (WIDTH-8)
                eor.l   d1,d0           ; xor in to remainder
                moveq   #8,d1           ; for (unsigned char bit = 8; bit > 0; --bit)

bitloop:
                tst.l   d0              ; if (remainder & TOPBIT)
                blt.s   polynomial      ; then jump to the polynomial part
                add.l   d0,d0           ; remainder = (remainder << 1); (optimised by tricky use of an add opcode)
                subq.b  #1,d1           ; next bit
                bne.s   bitloop         ; any bits left?

mainloop_check:
                cmpa.l  a0,a1           ; is byte < nBytes?
                bne.s   mainloop        ; if not, iterate main loop
                move.l  (sp)+,d2        ; if so we are done.. clean up stack
                unlk    a6              ; unlink stack variables
                rts                     ; return

polynomial:
                add.l   d0,d0           ; remainder = (remainder << 1) 
                eori.l  #$4C11DB7,d0 ; ^ POLYNOMIAL;
                subq.b  #1,d1           ; next byte
                bne.s   bitloop         ; is there another bit in this byte? (loop)
                bra.s   mainloop_check  ; next main loop

return_0:
                clr.l   d0              ; save the value 0 to the return register
                move.l  (sp)+,d2 ; restore stack
                unlk    a6            ; unlink stack variables
                rts                      ; return
We cold save a few bytes by removing this check for any work at the top of the code, the link a6,#0 / move.l d2,-(sp) setting up the variables on the stack, the function it uses for an early bailout, and the the unlink on the main exit path. We can assume that if we call the function we have work to do, and the error checking is going to be in the vpw handler and control flow logic elsewhere.

Remove:

Code: Select all

                link    a6,#0
                move.l  d2,-(sp)
......
                tst.l   a1              ; any work to do at all?
                ble.s   return_0      ; if not, exit immediately and return a 0
......
                move.l  (sp)+,d2    ; if so we are done.. clean up stack
                unlk    a6              ; unlink stack variables
......
return_0:
                clr.l   d0              ; save the value 0 to the return register
                move.l  (sp)+,d2 ; restore stack
                unlk    a6            ; unlink stack variables
                rts                      ; return
Have you read the FAQ? For lots of information and links to significant threads see here: http://pcmhacking.net/forums/viewtopic.php?f=7&t=1396
User avatar
antus
Site Admin
Posts: 8237
Joined: Sat Feb 28, 2009 8:34 pm
cars: TX Gemini 2L Twincam
TX Gemini SR20 18psi
Datsun 1200 Ute
Subaru Blitzen '06 EZ30 4th gen, 3.0R Spec B
Contact:

Re: PCM Hammer P04 Support Project

Post by antus »

Here is the same code, with the optimisation and setup for Easy68k. This can be loaded and assembled, then in easy sim load the test bin memory image, and hit the play button. Emulator should run the crc function, break and show the calculated CRC in D1, which should match the expected.

So if we remove the code that sets the start address, and length, and assume the function is called with that data in the registers already, and we remove the 2 opcodes at stop: which is only for easy68k the crc32 function is down to only 44 bytes of code, and no memory as its registers only. :thumbup:
easy68k tested ok.png
easy68k crc32 test and memory image.zip
(338.43 KiB) Downloaded 40 times
edit: I probably didnt need to copy the CRC to D1, but there was some stability issues in Easy68k when I was getting it running. It looks like overwriting D1 was a bug I was triggering in the emulator, as now its working fine, D0 seems to be fine every time. So by default a .S68 memory image of the same name as the executable is loaded, and by default ram is empty. So if you put the bin.s68 file named the same as the code it loads and is overwriten/truncated to about 1kb. This might be the bug, because then you need to go and load another copy of the bin, and things get strange after that. This is why I named the .s68 file differently to the one containing the code.
Have you read the FAQ? For lots of information and links to significant threads see here: http://pcmhacking.net/forums/viewtopic.php?f=7&t=1396
bubba2533
Posts: 498
Joined: Wed Apr 11, 2018 8:50 am
cars: 03 Chevy S10 Turbo V6

Re: PCM Hammer P04 Support Project

Post by bubba2533 »

Awesome work!
LS1 Boost OS V3 Here. For feature suggestions post in here Development Thread. Support future development ->Patreon.
User avatar
antus
Site Admin
Posts: 8237
Joined: Sat Feb 28, 2009 8:34 pm
cars: TX Gemini 2L Twincam
TX Gemini SR20 18psi
Datsun 1200 Ute
Subaru Blitzen '06 EZ30 4th gen, 3.0R Spec B
Contact:

Re: PCM Hammer P04 Support Project

Post by antus »

I found a couple of optimisations. Moved the init of D2 outside the main loop, as D2 isnt used anywhere else, so it doesnt need to be re-initialised every interation. I also noticed it was doing the remainder = remainder << 1 in two places. So as a size, not speed optimisation I put it before branching out to the xor polynomial stage so the opcode didnt need to be repeated in two locations, saving 2 bytes. I also took out the copy of result to D1. So its a little smaller and easy68 benchmarks it as 7358458 cycles to run over $4000 bytes before the change and 7292926 after. So a small speed and size optimisation there.

Now this easy68k thing is working it might be possible to help use it to develop this loaded we need. Would need to fake the data buffer as there is no DLC, but it'll be nice to be able to see the copy and handoff part run in the sim.
Attachments
easy68k crc32 test and memory image.zip
(338.32 KiB) Downloaded 49 times
Have you read the FAQ? For lots of information and links to significant threads see here: http://pcmhacking.net/forums/viewtopic.php?f=7&t=1396
Post Reply