Skip to content

Generate better start values for nth-root#598

Open
czurnieden wants to merge 4 commits into
libtom:developfrom
czurnieden:faster_nroot
Open

Generate better start values for nth-root#598
czurnieden wants to merge 4 commits into
libtom:developfrom
czurnieden:faster_nroot

Conversation

@czurnieden
Copy link
Copy Markdown
Contributor

This is the implementation I talked about in DCIT/perl-CryptX#123 with @danaj
The runtime is up to standard now.
Needs some work (there is a TODO list at the top of mp_root_n.c, but mostly chores).

@danaj
Copy link
Copy Markdown

danaj commented Apr 24, 2026

I'm getting a failure with a = 2147483645^23 b = 23. The start value t2 = 2147483634 is too low.
[MacOS 26, M1, 64-bit, gcc-15]

@czurnieden
Copy link
Copy Markdown
Contributor Author

I had the angst-allowance at the wrong place. Should be added to the log(a).
Worst thing: had it right before ;-)

The number 2147483645^23 is well chosen: log2(2147483645^23) ~ 712.99999995364 so we got 712.99999995364/23 ~ 30.99999999798, which gets got caught in all of the rounding errors. Adding 1 (one) to o the log gives 713.99999995364/23 ~ 31.043478258852, resulting in a roughly 10% higher start-value in theory and is also larger than the actual root. 1/2should do it, too

I also found an nasty little bug if root^base == input, see #599

Thanks for taking a look!

@danaj
Copy link
Copy Markdown

danaj commented Apr 24, 2026

9999999999900000000000000 ^ 1/53 is coming into the loop with t2 = 2. We used to have the comment "Start value must be larger than root" and this isn't true since 2 is the result we're looking for. It doesn't seem to be working right, but with t2=3 it's fine. Sorry for not triaging past that.

@czurnieden
Copy link
Copy Markdown
Contributor Author

@danaj

Sorry for not triaging past that

Nah, no reason, I wrote most of it, so it's all on me ;-)

That seems to have been fixed by #599. Will put it in this PR now.

@czurnieden
Copy link
Copy Markdown
Contributor Author

Should work now. (famous last words)
Which leads to the question: worth it?

 - Implementation of a fixed point exp2 function to
   get better start-values
 - fixed treatment of perfect powers
 - fixed 0^(1/x) == 0 with x != 0
 - added test for b == 0 (division by zero)
@czurnieden
Copy link
Copy Markdown
Contributor Author

Could it be that I was a little bit behind? ;-)

@czurnieden
Copy link
Copy Markdown
Contributor Author

This branch was a bit of a quick hack. Seemed as if the fixed point exp2 function was not up to par. Fixed that with a thoroughly tested new one.

About 8 decimal digits over the whole domain [0,1[ , even near one.

exp2m1_errors

You don't not have to believe me, you can try for yourself, put some code below for your convenience.

A problem^Wnuisance is the variable precision of s_mp_fp_log which is the reason for some oddities in the actual code. Advantage: it is a Q0.32 using uint32_t which is always available since 8-bit is not supported anymore.

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

#define MP_FRACBITSQ032 (sizeof(uint32_t) *(size_t)CHAR_BIT)

static uint32_t mulhi_u32(uint32_t a, uint32_t b)
{
   uint32_t a_high, a_low, b_high, b_low, r_high, r_mida, r_midb, r_low, carry;
   /* We have uint64_t */
#if ( MP_DIGIT_BIT > 16)
   return (((uint32_t)((uint64_t)a*b)) >> MP_FRACBITSQ032);
#else
   a_high = a >> 16;
   a_low = a & 0xFFFF;
   b_high = b >> 16;
   b_low = b & 0xFFFF;

   r_high = a_high * b_high;
   r_mida = a_high * b_low;
   r_midb = a_low * b_high;
   r_low = a_low * b_low;

   carry = ((r_mida & 0xFFFF) + (r_midb & 0xFFFF) + (r_low >> 16)) >> 16;
   r_low = r_high + (r_mida >> 16) + (r_midb >> 16) + carry;
   return r_low;
 #endif
}


static uint32_t fixed_exp2_u32(uint32_t x)
{
   /* Remez coefficients for 2^x - 1 over [0, 1[ scaled to Q32 */
   uint32_t r;
   if (x == 0u) {
      return 0u;
   }

   r = 0xe5867;
   r = mulhi_u32(r, x) + 0x512969;
   r = mulhi_u32(r, x) + 0x27ab6f0;
   r = mulhi_u32(r, x) + 0xe33f3a6;
   r = mulhi_u32(r, x) + 0x3d7fbfd4;
   r = mulhi_u32(r, x) + 0xb17213ac;
   r = mulhi_u32(r, x) + 0xb;
   return r;
}


static double f2f(uint32_t a)
{
   /* 2^31 for Q1.31 4294967296*/
   return a / 4294967296.;
}

#include <math.h>
int main(void)
{
   double a, res, ref, aerr, rerr, maxabserr = 0.0, maxrelerr = 0.0;
   uint32_t x, start, end;

   start = 0x00000000;
   end   = 0xFFFFFFFF;
   fprintf(stderr,"Testing fixed_exp2 with inputs in [%.20f, %.20f)\n",
           f2f(start), f2f(end));

   printf("# input                abs. err.              rel. err.\n");
#ifdef FORGNUPLOT
   for (x = start; x < end; x+=0xffff) {
#else
   for (x = start; x < end; x++) {
#endif
      a = f2f(x);
      ref = exp2(a);
      res = f2f(fixed_exp2_u32(x)) + 1.0;
      aerr =  res - ref;
      rerr = fabs(aerr)/ref;
#ifdef FULLOUTPUT
      printf("%.15f %.15f  %.15f %.15f %.15f\n", a, res, ref, aerr, rerr);
#elif ((defined ERRORSONLY) || (defined FORGNUPLOT))
      printf("%.15f  %.15f %.15f\n", a, aerr, rerr);
#endif
      if (fabs(aerr) > fabs(maxabserr)) {
         maxabserr = aerr;
      }
      if (rerr > maxrelerr) {
         maxrelerr = rerr;
      }
   }
   fprintf(stderr, "max. abs. err = %g, max. rel. err. %g\n", maxabserr, maxrelerr);
   return EXIT_SUCCESS;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants