Thanks, that's unexpected but neat. Wonder where the crossover is...

With larger inputs and B1's, the speedup is even more dramatic:
$ echo "2^22671"  ./ecm c 5 43e6
GMPECM 7.0.4 [configured with GMP 6.2.0, enableasmredc] [ECM]
Input number is 2^22671 (683 digits)
Using B1=43000000, B2=198654756318, polynomial Dickson(12), sigma=0:12068850290356100037
Step 1 took 546568ms
Step 2 took 163563ms
vs.
$ echo "2^22671"  ./ecm c 5 43e6
GMPECM 7.0.4 [configured with GMP 6.2.0, GWNUM 29.8, enableasmredc] [ECM]
Due to incompatible licenses, this binary file must not be distributed.
Input number is 2^22671 (683 digits)
Using B1=43000000, B2=198654756318, polynomial Dickson(12), sigma=0:16623107045151173302
Step 1 took 250555ms
Step 2 took 163852ms
Over twice as fast in B1!