Android in wireguardland: pushing extensions down the firefox hole using OCR
Written on 2023-05-24
Chris Dawson is a dad and writer from Portland, OR. Personal blog

I’m really sorry for the tortured attempt at being Lewis Carroll, but let’s just admit there is a tiny bit of alliteration and move on.

This post describes one of my favorite hacks ever:

  1. On my Android phone, take a screenshot of the Android wireless debugging dialog.
  2. In the background, that screenshot is synced using syncthing to a desktop machine miles away.
  3. That dekstop machine uses optical character recognition (OCR) to retrieve the IP and PORT of the wireless debug settings, then connects to my Android device over a wireguard network.
  4. That desktop machine then pushes a Firefox add-on into Firefox browser on my Android phone.

If during the reading of this post you know a way to persistently keep Firefox Android add-ons loaded, I would love to hear them, and I can keep the discoveries of this post fondly in the back of my head…

Plzat.me is a way to subscribe to Hacker News threads and people. When a thread you care about gets updated, you get an email summary. When a smart person comments on something interesting, get an email that notifies you of that something.

To make this easy, Plzat.me works as a browser extension that adds a "@" sign next to threads inside your browser when you visit Hacker News. You click on the "@" and you subscribe to that thread. You can install the add-on here

Have you ever read something on HN from your mobile phone and then emailed it to yourself to read later? I wanted an easy way to track those when I jump back and forth between my phone and desktops. Plz@Me makes it easy to subscribe to those threads, organize them, and get updates automatically (so can reply as soon as possible and keep the conversation going).

Firefox on Android works with add-on extensions! It is so liberating to have that kind of control over your mobile browsing experience. But, the experience isn’t optimal when you are developing an add-on. If the browser gets restarted, you need to re-install the extension. And, that happens often if the browser gets killed for system resource memory reasons. It’s easy to reinstall by using adb and the web-ext tool. But, that’s not possible when I’m out and about in the world with my phone.

I started to investigate whether there was a way to automatically connect to my phone when I needed it reinstalled. ADB has this really nifty feature called “wireless debugging.” You can turn on wireless debugging, and then from your desktop or laptop, pair your computer to the android device. This works if you are sitting next to the computer that is paired, but it also works over wireguard even if the machine and the phone are on different continents! That gave me the idea that perhaps I could automate installing the add-on without having access to my laptop because I have a desktop device in my office that is always connected.

But, then I ran into some roadblocks. First, the only reliable way I could connect to my android device was two step: first, using the adb pair command, and then using the adb connect command. I assumed I could just copy the pair code into an SSH connection made from the same Android device. But that did not work: if you switch away from the dialog, the pair dialog disappears and you cannot use the pair code to connect. I thought perhaps I could do something tricky like copy the code into my copy/paste buffer and then write an Android application to pull that code in the background, but the dialog does not have selectable text fields, and the thought of re-reading documentation on Android background services made my skin crawl. I was completely blocked no matter what I tried.

The idea that got me on the jazz

I’m not going to apologize for the following tortured references to an 80s TV show, so please stop reading if that bothers you.

After my initial attempts failed, I had a crazy idea: what if I took a screenshot and somehow used OCR to pull the pin from the screenshot which then triggered the command on my adb machine? It felt like a lot of bizarre ideas which would all have to come together. But, I watched so much A-Team when I was a kid and could hear the phrase “I love it when a plan comes together” and Hannibal felt like he was moving my hands like a marionette.

First thing, I needed to find a way to synchronize screenshots in the background without leaving the pair dialog. I investigated whether Google Photos could do that, but then I remembered the nightmare of getting an API key for Google services, and said “I ain’t getting on that plane!”

As if a Murdock psychosis came over me, I heard a voice telling me about a tool called syncthing. I went to the repo, ran the docker command and it booted. I assumed there would be a lot of trouble, forwarding ports and reding docs on this new service. But, amazingly, syncthing is so well designed and so smart and just works.

To use syncthing with an android device you simply give the syncthing GUI your ID (which can be shared using a QR code). Then you accept the connection and start sharing folders. The Syncthing daemon/service automatically connects to the outside using a STUN server, so you do not need to do any port forwarding. It was so simple and I figured it all out organically within the GUI.

I started taking screenshots and could see them syncing right away in the daemon GUI. I next needed to figure out how to get a notification when a new file went up. That turned out to be really simple as well. The syncthing gui has a place to generate an API key; after a few minutes of searching I found this API endpoint: rest/stats/folder.

That endpoint returns JSON which includes a lastFile object from each folder, and inside that object is the filename. My screenshot sync folder is keyed with rc10j-tuxzg, so I use that. I then wrote a simple nodejs script that checks that JSON every second. If it finds a change in the filename, then it knows a new screenshot arrived.

"rc10j-tuxzg": {
  "lastFile": {
    "at": "2023-05-22T01:04:47Z",
    "filename": "Screenshot_20230521-181547.png",
    "deleted": false
  },
  "lastScan": "2023-05-22T16:30:04Z"
}

That felt like the easy part. The hard part was, I imagined, going be getting the text from that image. But, I was wrong. I installed tesseract, an open source OCR tool from Google. Within a few seconds of usage, with only this command, I could retrieve all the text:

tesseract st/Screenshots/$FILENAME --psm 11 --oem 3 output  

I didn’t read a bunch of man pages, I just used the first example I saw. And, then I had this in the output.txt file:

Pair with device  
  
Wi—Fi pairing code  
  
080214  
  
IP address & Port  
  
10.42.0.12:46119
  
Cancel  

It was trivial to process that and convert that to adb pair 080214 10.42.0.12:46119. I added that to my nodejs script, and ran it. I went back to my Android phone and took a screenshot. And, magically, the dialog closed and Android indicated the pairing command succeeded!

From there it was short strokes to get the rest working. Initially I thought that I could take a screenshot with the pair dialog (which pairs on a different port than the regular adb connect command) and get both, but the background was too dark to process the adb connect IP and port. So, I ended up making my script have two modes, pair and connect. It takes two screenshots: the first pairs, then the script waits for a second screenshot that does the connection. And, the one bug I found with tesseract was that it often got confused by 1 and 7, so if I saw those numbers heavily represented in the pair PIN, I usually closed and re-opened the dialog.

Once the connection is established, the regular web-ext script automatically installs the extension and launches the browser. I use this ALL the time, it’s magical.

The only downside is that you need to be on a wireless network. If you are just on cellular it does not work, because wireless debugging requires wifi, even though my wireguard connection technically could push new files up. But, when I’m browsing HN, I generally want to be on wifi anyway.

My scripts

Here are the scripts I’m using:

#!/run/current-system/sw/bin/bash

echo "Pairing device"
adb pair 10.42.0.12:$2 $3
sleep 2
echo "Connecting to device"
adb connect 10.42.0.12:$1
sleep 2
echo "Rebuilding"
cd ~/Projects/plzatme-extension
npm run build || true
sleep 2

sleep 15
adb shell am start -a android.intent.action.VIEW -d https://news.ycombinator.com

And, the syncthing script:

const axios = require('axios');
const { spawn, exec } = require('child_process');
const fs = require('fs');

const { SYNCTHING_API_KEY } = process.env;

function loop(timeout = 1000 * 1, cb) {
    return new Promise( (resolve) => {
        cb();
        setTimeout( () => {
            loop(timeout, cb);
        }, timeout ); 
    });
}

let lastFilename;

async function ocr(filename) {
    return new Promise( resolve => {
        // tesseract st/Screenshots /Screenshot_20230519-171555.png --psm 11 --oem 3 output
        const tesseract = spawn('tesseract', [
            `st/Screenshots /${filename}`,
            '--psm', 11,
            '--oem', 3,
            'output'
        ]);

        tesseract.on('close', (code) => {
            const text = fs.readFileSync("./output.txt").toString();
            resolve(text);
            console.log(`child process exited with code ${code}`);
        });
    });
}

function getPort(line) {
    const port = line.slice(-5);
    // console.log('Port is', port);
    return port;
}

let port;
function processForDebugging(text) {
    console.log('Processing for debugging');    
    const lines = text.split("\n\n");
    lines.forEach( (line,i) => {
        if (line.startsWith("Wireless debugging")) {
            mode = "debugging";
        } else if (mode === "debugging" && line.startsWith("IP address & Port")) {
            port = getPort(lines[i+1]);
        }
    });
}

let pairCode;
let pairPort;
function processForPairing(text) {
    console.log('Processing for pairing');
    const lines = text.split("\n\n");
    lines.forEach( (line,i) => {
        if (line.includes("pairing code")) {
            pairCode = lines[i+1];
        }
        if (line.startsWith('IP address & Port')) {
            pairPort = getPort(lines[i+1].replace(' ',''));
        }
    });
}

let mode = undefined;
function launchFirefoxExtension(text) {
    if (mode === 'debugging') {
        processForPairing(text);
        if (pairCode && port && pairPort) {
            console.log('Ready for it!', port, pairCode, pairPort );
            // Save port so we can clear it later.
            let _port = port;
            const plzatme =
                  spawn( '/home/xrdawson/plazatme.sh',
                         [port, pairPort, pairCode ],
                       );

            plzatme.stderr.on('data', (data) => {
                // console.error('PLZATMEE', data.toString());
            });
            plzatme.stdout.on('data', (data) => {
                console.log('PLZATME', data.toString());
            });
            plzatme.on('close', (code) => {
                console.log('All finished with adb script', code);
                const push = spawn( '/home/xrdawson/push.sh', [ _port ] );
                push.stdout.on('data', (data) => {
                    console.log('PUSH', data.toString());
                });
                push.on('close', (code) => {
                    console.log('Finished with push');
                });
            });
        } else {
            console.log('Not everything is there', port, pairPort, pairCode );
        }
        mode = pairPort = pairCode = port = undefined;
    } else {
        processForDebugging(text);
    }
    // console.log('Got text', text );
}

async function retrieveAndProcess() {
    const response = await axios( 'http://10.42.0.20:8384/rest/stats/folder', {
        headers: {
            'X-API-Key': SYNCTHING_API_KEY
        }
    });
    const json = response.data;
    // xconsole.log('Json is', json);
    const { filename } = json['rc10j-tuxzg']['lastFile'];
    // console.log(`Checking ${filename} vs ${lastFilename}`);
    if (lastFilename && lastFilename !== filename) {
        console.log('Got change, processing new screenshot');
        const text = await ocr(filename);
        launchFirefoxExtension(text);
    }
    lastFilename = filename;
}

async function start() {
    loop(1000 * 2, retrieveAndProcess );
}

start();

The push.sh command is just this from the extension directory:

web-ext run --target=firefox-android --firefox-apk=org.mozilla.firefox --android-device=$DEVICE
Other interesting things:
<
>
Webiphany.com
https://webiphany.com
1