Notebook Instance Reverse SSH and HTTP Tunnels.
 2019-10-15, 16:35 #2 chalsall It sure is nice having rich interactive shell access into these instances... It makes development work much, much easier... BTW, I've found that running mprime concurrently to mfaktc has ***no*** impact on the latter's throughput, even with Kaggle's P100s.
 2019-10-15, 20:10 #3 Dylan14 There is a typo in the sshd.pl script, line 129. Code: Log(1, "Note that when you first connect, you will have to anwer \"yes\""); I think you mean to mean to say answer instead. In PuTTY there is an option to just connect once to the tunnel, instead of saving the credentials. Not so much with ssh in Linux.
2019-10-15, 20:27 #4 chalsall
chalsall
If I May

"Chris Halsall"
Sep 2002

967410 Posts

Quote:
 Originally Posted by Dylan14 There is a typo in the sshd.pl script, line 129.
LOL... Thanks. Fixed.

And thanks for helping test this; very much appreciated!

Good to know about the Putty option; I'll include that in the further documentation I'm going to put on the site (with a smell checker enabled for that work...).

BTW, have you clicked on the link for the HTTP tunnel yet? Any additional information you think would be good there?

I plan to do some simple rrdtool work in the instance, to be able to plot CPU, GPU, RAM and FS usage during the lifespan of the instance. This will be displayed as a graph on the top of the page.

 2019-10-15, 20:54 #5 Dylan14 I have the http tunnel open. One thing I would suggest is that the page should automatically refresh every so often (much like top does). Maybe once every 30 seconds or so. One advantage of this tunnel is the fact that you can very easily edit files using emacs, vi, etc... Whereas on the Colab site you can't really edit files so well.
2019-10-15, 21:02 #6 chalsall
chalsall
If I May

"Chris Halsall"
Sep 2002

2×7×691 Posts

Quote:
 Originally Posted by Dylan14 I have the http tunnel open. One thing I would suggest is that the page should automatically refresh every so often (much like top does). Maybe once every 30 seconds or so.
Yup... I plan to use some simple AJAX to update the data.

Quote:
 Originally Posted by Dylan14 One advantage of this tunnel is the fact that you can very easily edit files using emacs, vi, etc... Whereas on the Colab site you can’t really edit files so well.
Exactly! Also, "tail -f" is your friend!!!

Don't forget you can have multiple, parallel SSH sessions into your instance(s)...

 2019-10-17, 00:58 #7 chalsall So I got a little distracted with rrdtool today... So for anyone who's using my reverse tunnels, be sure to click on the HTTP link into your instance. I've added pretty graphs... This is about three hours into a GPU72_TF instance run, with mprime launched in parallel by way of the command line. No slow down in mfaktc. Also, I'm not sure it's been mentioned explicitly before, but so everyone knows Colab gives you one (1#) core, HT enabled. Kaggle gives you two (2#) cores, again hyperthreaded.
2019-10-17, 02:36 #8 EdH
EdH

"Ed Hall"
Dec 2009

53·71 Posts

Quote:
 Originally Posted by chalsall So for anyone who's using my reverse tunnels, be sure to click on the HTTP link into your instance. I've added pretty graphs... This is about three hours into a GPU72_TF instance run, with mprime launched in parallel by way of the command line. No slow down in mfaktc. Also, I'm not sure it's been mentioned explicitly before, but so everyone knows Colab gives you one (1#) core, HT enabled. Kaggle gives you two (2#) cores, again hyperthreaded.
I was just checking all this out while I was waiting for CADO-NFS to compile in my most recent test. Still many things I need to figure out, but all appears to be working as intended from my viewpoint.

Excellent!

2019-10-18, 21:41 #9 chalsall
chalsall
If I May

"Chris Halsall"
Sep 2002

2×7×691 Posts

Quote:
 Originally Posted by chalsall I've added pretty graphs...
So, today I spent some time with IPC between the tunneled instances and the iROOT server.

The same script which collects the CPU and RAM usage for display on the web pages served by the instances' web server also sends back the same data to iROOT every five minutes.

This is currently the values from a "vmstat" run, but I plan to increase the dataset to include uptime, number of logged-in users, file system status, etc. I ***won't*** be collecting data like "ps auxw" etc. Integer values only.

I will need to add this to the Privacy Policy statement on the site. And if anyone doesn't want this kind of data sent back, please let me know. There's a mechanism to "opt-out" of the telemetry stream, I just haven't added it to the iROOT UI yet.

I'm currently in the process of exposing the collected data on each "View Tunnel" page, by way of the same type of graphs you see from within the instance.

One nice thing about doing this is the data is persistent. I have the rrdtool database configured to store a year's worth of data (four-hour spans after a month).

It was a bit annoying only having this data on the instance. I couldn't "auto-refresh" the web page (to simulate a "real-time" experience), or else immediately after the instance was shut down the page would reload with an error message.

Now we have a log of the instances coming up, working, shutting down.

Rinse and repeat...

 2019-10-19, 01:40 #10 Dylan14 The graphs are a nice touch to the server page. Now, so far I have only played with a CPU instance with the ssh tunneling. If say, a person has a GPU instance, can the usage of the GPU be tracked as well (without too much effort, of course)? As with that, we could potentially see how the GPU's are doled out in the Colaboratory, and so that we can better time our execution of our code so that we can get the good T4's.
2019-10-19, 14:23 #11 chalsall
chalsall
If I May

"Chris Halsall"
Sep 2002

2×7×691 Posts

Quote:
 Originally Posted by Dylan14 If say, a person has a GPU instance, can the usage of the GPU be tracked as well (without too much effort, of course)?
Trivial to plot the data. The problem I'm having is I can't figure out how to get the data!

"!nvidia-smi" works fine from within the Notebook, but the same command at the console returns "Failed to initialize NVML: Driver/library version mismatch".

If anyone can tell me a command that works, it would be appreciated. If someone doesn't offer a simple fix, I'll look at bringing in a compatible version in the SSH payload.

