View Single Post
Old 2016-06-05, 20:31   #1
debrouxl's Avatar
Sep 2009

97810 Posts
Default Improving the queue management.

TL;DR: we're currently experiencing issues, we know we need to improve, and we need both a bit more design feedback, and some coding help to speed up the process. TIA

As you may have noticed in the past few weeks, the 14e and 15e parts of the NFS@Home grid have been suffering from a series of client hunger episodes. That is a consequence of the growing pain of feeding the 15e and especially the 14e clients with relatively small tasks. It's a problem we need to solve, preferably in a durable manner - and possibly at a wider scale than NFS@Home

Part of the work load and pain comes from the fact that the process, inherited and slightly improved from RSALS, is largely manual. The tasks of creating entries, entering number information, changing the number state multiple times, expanding the range of q values, updating post-processor reservations, updating factor information, are repetitive, wasteful and error-prone - and nowadays, they seem to fall on the shoulders of seemingly too few persons.
Giving more persons access to the management infrastructure should solve the immediate scalability problem, but does nothing about the repetitiveness of the process. We need to do better, and we know we can do better - for instance, Makoto Kamada's near-repdigit management pages let users reserve numbers and enter factors by themselves, which is a pretty good thing (but as it stands, the management infrastructure gives access in an all or nothing fashion).

I started the discussion several weeks ago in the middle of the queue management topic, and jyb picked up on my post.

I have slowly set up myself for improving the management infrastructure: gathering the production code, starting preparatory tasks for unifying the 14e and 15e management databases and code, and setting up a reproducible testing environment. All of that backed by a private Github repository, of course.
That's the current state of the work. The pages cannot be rendered yet because no stub was made for the BOINC DB access code. But the real, functional work is going to begin soon, and I'd like to make sure I'm not missing some functional requirements, to begin with - especially on the roles

A summarized excerpt of the todo list is as follows:
  • factoring out the duplicated code and databases behind 14e and 15e, creating a third, unified page which shall be used for both queues for now, and possibly the larger siever, or new task types (distributed polynomial selection ?) in the future;
  • building up the Role-Based Access Control system. I identified four initial roles:
    1. "admin" role: ability to delete numbers, to queue extra-wide ranges - should not be used most of the time;
    2. limited role 1 "queue managers": the existing single role minus the special abilities allowed by the admin role. Queue managers would be tasked for starting enough tasks to feed the grid, in a fair order if possible (use multiple number sources when there's enough ECM power, let ECM queues rebuild at other times, etc.);
    3. limited role 2 "scientist": say, William, Andrey and other project managers or frequent contributors (e.g. Sean), for creating entries and filling them in, but not moving them to QUEUED for SIEVING or later;
    4. limited role 3 "post-processer": post-processers could reserve tasks and post factor size information, and unreserve their own tasks, but not unreserve others' tasks;
    Of course, some user accounts could have multiple limited roles at once. And perhaps post-processer should be expanded with the ability to enter ECM work information, if we decide to track that, e.g. the way the near-repdigit project does.
  • automatically deducing some information from the poly, such as number value, SNFS / GNFS, difficulty, number of bits, etc.;
  • adding warnings upon events such as attempting to create an entry with the same internal name as an existing one, attempting to expand the range of q values above e.g. 500M for 14e (probably indicates that an extra zero slipped in), entering a polynomial containing a number whose value already exists (or with some more work, which has clear-cut parameters mismatches, such as 32-bit LPs tasks with rlim < 100M, etc.)

The end goal is to eliminate repetitive tasks and make the project's management more scalable. And throwing out an idea: if tighter integration between the yoyo@home and NFS@Home queues is seen as beneficial to both projects, I'd say, why not ?

For coders: the main technologies involved in the current infrastructure and the planned improved infrastructure are PHP 5.x and MySQL 5.5 (used by BOINC), currently on Ubuntu 14.04 LTS. IIRC, the web server is Apache 2.x, but the queue management code doesn't care about that, as long as it somehow sets PHP_AUTH_USER, for the not-yet-written RBAC.
The todo/wish list currently mentions database tables for users and their roles, but if the set of users is narrow and static enough (which it should be), hard-coded PHP associative arrays are clearly simpler...

Thanks in advance for your input and your help

Last fiddled with by debrouxl on 2016-06-05 at 20:47
debrouxl is offline   Reply With Quote