Categories
default embedded systems English articles micro Tech

SonOTA – Flashing Itead Sonoff devices via original OTA mechanism

Long story short

There’s now a script with which you can flash your sonoff device via the original internal OTA upgrade mechanism, meaning, no need to open, solder, etc. the device to get your custom firmware onto it.

This isn’t perfect (yet) — please mind the issues at the end of this post!

https://github.com/mirko/SonOTA

Credits

First things first: Credits!
The problem with credits is you usually forget somebody and that’s most likely happening here as well.
I read around quite a lot, gathered information and partially don’t even remember anymore where I read what (first).

Of course I’m impressed by the entire Tasmota project and what it enables one to do with the Itead Sonoff and similar devices.

Special thanks go to khcnz who helped me a lot in a discussion documented here.

I’d also like to mention Richard Burtons, who I didn’t interact with directly but only read his blog. That guy apparently was too bored by all the amazing tech stuff he was doing for a living, so he took a medical degree and is now working as a doctor, has a passion for horology (meaning, he’s building a turrot clock), is sailing regattas with his own rs200, decompiles and reverse-engineers proprietary bootloaders in his spare time and writes a new bootloader called rboot for the ESP8266 as a side project.

EDIT: Jan Almeroth already reversed some of the protocol in 2016 and also documented the communication between the proprietary EWeLink app and the AWS cloud. Unfortunately I only became aware of that great post after I already finished mine.

Introduction Sonoff devices

Quite recently the Itead Sonoff series — a bunch of ESP8266 based IoT homeautomation devices — was brought to my attention.

The ESP8266 is a low-power consumption SoC especially designed for IoT purposes. It’s sold by Espressif, running a 32-Bit processor featuring the Xtensa instruction set (licensed from Tensilica) and having an ASIC IP core and WiFi onboard.

Those Sonoff devices using this SoC basically expect high voltage input, therewith having an AC/DC (5V) converter, the ESP8266 SoC and a relais switching the high voltage output.
They’re sold as wall switches (“Sonoff Touch”), E27 socket adapters (“Slampher”), power sockets (“S20 smart socket”) or as just — that’s most basic cheapest model — all that in a simple case (“Sonoff Basic”).
They also have a bunch of sensoric devices, measuring temperature, power comsumption, humidty, noise levels, fine dust, etc.

Though I’m rather sceptical about the whole IoT (development) philosophy, I always was (and still am) interested into low-cost and power-saving home automation which is completely and exclusively under my control.

That implies I’m obviously not interested in some random IoT devices being necessarily connected to some Google/Amazon/Whatever cloud, even less if sensible data is transmitted without me knowing (but very well suspecting) what it’s used for.

Guess what the Itead Sonoff devices do? Exactly that! They even feature Amazon Alexa and Google Nest support! And of course you have to use their proprietary app to confgure and control your devices — via the Amazon cloud.

However, as said earlier, they’re based on the ESP8266 SoC, around which a great deal of OpenSource projects evolved. For some reason especially the Arduino community pounced on that SoC, enabling a much broader range of people to play around with and program for those devices. Whether that’s a good and/or bad thing is surely debatable.

I’ll spare you the details about all the projects I ran into, there’s plenty of cool stuff out there.

I decided to go for the Sonoff-Tasmota project which is quite actively developed and supports most of the currently available Sonoff devices.

It provides an HTTP and MQTT interface and doesn’t need any connection to the internet at all. As MQTT sever (in MQTT speech called broker) I use mosquitto which I’m running on my OpenWrt WiFi router.

Flashing custom firmware (via serial)

Flashing your custom firmware onto those devices however always requires opening them, soldering a serial cable, pulling GPIO0 down to get the SoC into programming mode (which, depending on the device type, again involes soldering) and then flash your firmware via serial.

Side note: Why do all those projects describing the flashing procedure name an “FTDI serial converter” as a requirement? Every serial TTL converter does the job.
And apart from that FTDI is not a product but a company, it’s a pretty shady one. I’d just like to remind of the “incident” where FTDI released new drivers for their chips which intentionally bricked clones of their converters.

How to manually flash via serial — even though firmware replacement via OTA (kinda) works now, you still might want unbrick or debug your device — the Tasmota wiki provides instructions for each of the supported devices.

Anyway, as I didn’t want to open and solder every device I intend to use, I took a closer look at the original firmware and its OTA update mechanism.

Protocol analysis

First thing after the device is being configured (meaning, the device got configured by the proprietary app and is therewith now having internet access via your local WiFi network) is to resolve the hostname `eu-disp.coolkit.cc` and attempt to establish a HTTPS connection.

Though the connection is SSL, it doesn’t do any server certificate verification — so splitting the SSL connection and *man-in-the-middle it is fairly easy.

As a side effect I ported the mitm project sslsplit to OpenWrt and created a seperate “interception”-network on my WiFi router. Now I only need to join that WiFi network and all SSL connections get split, its payload logged and being provided on an FTP share. Intercepting SSL connections never felt easier.

Back to the protocol: We’re assuming at this point the Sonoff device was already configured (e.g. by the official WeLink app) which means it has joined our WiFi network, acquired IP settings via DHCP and has access to the internet.

The Sonoff device sends a dispatch call as HTTPS POST request to eu-disp.coolkit.cc including some JSON encoded data about itself:


POST /dispatch/device HTTP/1.1
Host: eu-disp.coolkit.cc
Content-Type: application/json
Content-Length: 152

{
  "accept":     "ws;2",
  "version":    2,
  "ts":         119,
  "deviceid":   "100006XXXX",
  "apikey":     "6083157d-3471-4f4c-8308-XXXXXXXXXXXX",
  "model":      "ITA-GZ1-GL",
  "romVersion": "1.5.5"
}

It expects an also JSON encoded host as an answer

HTTP/1.1 200 OK
Server: openresty
Date: Mon, 15 May 2017 01:26:00 GMT
Content-Type: application/json
Content-Length: 55
Connection: keep-alive

{
  "error":  0,
  "reason": "ok",
  "IP":     "52.29.48.55",
  "port":   443
}

which is used to establish a WebSocket connection

GET /api/ws HTTP/1.1
Host: iotgo.iteadstudio.com
Connection: upgrade
Upgrade: websocket
Sec-WebSocket-Key: ITEADTmobiM0x1DaXXXXXX==
Sec-WebSocket-Version: 13


HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: q1/L5gx6qdQ7y3UWgO/TXXXXXXA=

which consecutively will be used for further interchange.
Payload via the established WebSocket channel continues to be encoded in JSON.
The messages coming from the device can be classified into action-requests initiated by the device (which expect ackknowledgements by the server) and acknowledgement messages for requests initiated by the server.

The first requests are action-requests coming from the device:

1) action: register

{
  "userAgent":  "device",
  "apikey":     "6083157d-3471-4f4c-8308-XXXXXXXXXXXX",
  "deviceid":   "100006XXXX",
  "action":     "register",
  "version":    2,
  "romVersion": "1.5.5",
  "model":      "ITA-GZ1-GL",
  "ts":         712
}

responded by the server with
{
  "error":       0,
  "deviceid":   "100006XXXX",
  "apikey":     "85036160-aa4a-41f7-85cc-XXXXXXXXXXXX",
  "config": {
    "hb":         1,
    "hbInterval": 145
  }
}

As can be seen, action-requests initiated from server side also have an apikey field which can be — as long its used consistently in that WebSocket session — any generated UUID but the one used by the device.

2) action: date

{
  "userAgent":  "device",
  "apikey":     "85036160-aa4a-41f7-85cc-XXXXXXXXXXXX",
  "deviceid":   "100006XXXX",
  "action"      :"date"
}

responded with
{
  "error":      0,
  "deviceid":   "100006XXXX",
  "apikey":     "85036160-aa4a-41f7-85cc-XXXXXXXXXXXX",
  "date":       "2017-05-15T01:26:01.498Z"
}

Pay attention to the date format: it is some kind ISO 8601 but the parser is really picky about it. While python’s datetime.isoformat() function e.g. returns a string taking microseconds into account, the parser on the device will just fail parsing that. It also always expects the actually optional timezone being specified as UTC and only as a trailing Z (though according to the spec “00:00” would be valid as well).

3) action: update — the device tells the server its switch status, the MAC address of the accesspoint it is connected to, signal quality, etc.
This message also appears everytime the device status changes, e.g. it got switched on/off via the app or locally by pressing the button.

{
  "userAgent":      "device",
  "apikey":         "85036160-aa4a-41f7-85cc-XXXXXXXXXXXX",
  "deviceid":       "100006XXXX",
  "action":         "update",
  "params": {
    "switch":         "off",
    "fwVersion":      "1.5.5",
    "rssi":           -41,
    "staMac":         "5C:CF:7F:F5:19:F8",
    "startup":        "off"
  }
}

simply acknowlegded with
{
  "error":      0,
  "deviceid":   "100006XXXX",
  "apikey":     "85036160-aa4a-41f7-85cc-XXXXXXXXXXXX"
}

4) action: query — the device queries potentially configured timers
{
  "userAgent":  "device",
  "apikey":     "85036160-aa4a-41f7-85cc-XXXXXXXXXXXX",
  "deviceid":   "100006XXXX",
  "action":     "query",
  "params": [
    "timers"
  ]
}

as there are no timers configured the answer simply contains a "params":0 KV-pair
{
  "error":      0,
  "deviceid":   "100006XXXX",
  "apikey":     "85036160-aa4a-41f7-85cc-XXXXXXXXXXXX",
  "params":     0
}

That’s it – that’s the basic handshake after the (configured) device powers up.

Now the server can tell the device to do stuff.

The sequence number is used by the device to acknowledge particular action-requests so the response can be mapped back to the actual request. It appears to be a UNIX timestamp with millisecond precision which doesn’t seem like the best source for generating a sequence number (duplicates, etc.) but seems to work well enough.

Let’s switch the relais:

{
  "action":     "update",
  "deviceid":   "100006XXXX",
  "apikey":     "85036160-aa4a-41f7-85cc-XXXXXXXXXXXX",
  "userAgent":  "app",
  "sequence":   "1494806715179",
  "ts":         0,
  "params": {
    "switch":     "on"
  },
  "from":       "app"
}

{
  "action":     "update",
  "deviceid":   "100006XXXX",
  "apikey":     "85036160-aa4a-41f7-85cc-XXXXXXXXXXXX",
  "userAgent":  "app",
  "sequence":   "1494806715193",
  "ts":         0,
  "params": {
    "switch":     "off"
  },
  "from":       "app"
}

As mentioned earlier, each action-request is responded with proper acknowledgements.

And — finally — what the server now also is capable doing is to tell the device to update itself:

{
  "action":     "upgrade",
  "deviceid":   "100006XXXX",
  "apikey":     "85036160-aa4a-41f7-85cc-XXXXXXXXXXXX",
  "userAgent":  "app",
  "sequence":   "1494802194654",
  "ts":         0,
  "params": {
    "binList":[
      {
        "downloadUrl":  "http://52.28.103.75:8088/ota/rom/xpiAOwgVUJaRMqFkRBsoI4AVtnozgwp1/user1.1024.new.2.bin",
        "digest":       "1aee969af1daf96f3f120323cd2c167ae1aceefc23052bb0cce790afc18fc634",
        "name":         "user1.bin"
      },
      {
        "downloadUrl":  "http://52.28.103.75:8088/ota/rom/xpiAOwgVUJaRMqFkRBsoI4AVtnozgwp1/user2.1024.new.2.bin",
        "digest":       "6c4e02d5d5e4f74d501de9029c8fa9a7850403eb89e3d8f2ba90386358c59d47",
        "name":         "user2.bin"
      }
    ],
    "model":    "ITA-GZ1-GL",
    "version":  "1.5.5",
  }
}

After successful download and verification of the image’s checksum the device returns:
{
  "error":      0,
  "userAgent":  "device",
  "apikey":     "85036160-aa4a-41f7-85cc-XXXXXXXXXXXX",
  "deviceid":   "100006XXXX",
  "sequence":   "1495932900713"
}

The downloadUrl field should be self-explanatory (the following HTTP GET request to those URLs contain some more data as CGI parameters which however can be ommitted).

The digest is a sha256 hash of the file and the name is the partition onto which the file should be written on.

Implementing server side

After some early approaches I decided to go for a Python implementation using the tornado webserver stack.
This decision was mainly based on it providing functionality for HTTP (obviously) as well as websockets and asynchronous handling of requests.

The final script can be found here: https://github.com/mirko/SonOTA

==> Trial & Error

1st attempt

As user1.1024.new.2.bin and user2.20124.new.2.bin almost look the same, let’s just use the same image for both, in this case a tasmota build:

MOEP! Boot fails.

Reason: The tasmota build also contains the bootloader which the Espressif OTA mechanism doesn’t expect being in the image.

2nd attempt

Chopping off the first 0x1000 bytes which contain the bootloader plus padding (filled up with 0xAA bytes).

MOEP! Boot fails.

Boot mode 1 and 2 / v1 and v2 image headers

The (now chopped) image and the original upgrade images appear to have different headers — even the very first byte (the files’ magic byte) differ.

The original image starts with 0xEA while the Tasmota build starts with 0xE9.

Apparently there are two image formats (called v1 and v2 or boot mode 1 and boot mode 2).
The former (older) one — used by Arduino/Tasmota — starts with 0xE9, while the latter (and apparently newer one) — used by the original firmware — starts with 0xEA.

The technical differences are very well documented by the ESP8266 Reverse Engineering Wiki project, regarding the flash format and the v1/v2 headers in particular the SPI Flash Format wiki oage.

The original bootloader only accepts images starting with 0xEA while the bootloader provided by Arduino/Tasmota only accepts such starting with 0xE9.

3rd attempt

Converting Arduino images to v2 images

Easier said than done, as the Arduino framework doesn’t seem to be capable of creating v2 images and none of the common tools appear to have conversion functionality.

Taking a closer look at the esptool.py project however, there seems to be (undocumented) functionality.
esptool.py has the elf2image argument which — according source — allows switching between conversion to v1 and v2 images.

When using elf2image and also passing the --version parameter — which normally prints out the version string of the tool — the --version parameter gets redefined and expects an then argument: 1 or 2.

Besides the sonoff.ino.bin file the Tasmota project also creates an sonoff.ino.elf which can now be used in conjunction with esptool.py and the elf2image-parameter to create v2 images.

Example: esptool.py elf2image --version 2 tmp/arduino_build_XXXXXX/sonoff.ino.elf

WORKS! MOEP! WORKS! MOEP!

Remember the upgrade-action passed a 2-element list of download URLs to the device, having different names (user1.bin and user2.bin)?

This procedure now only works if the user1.bin image is being fetched and flashed.

Differences between user1.bin and user2.bin

The flash on the Sonoff devices is split into 2 parts (simplified!) which basically contain the same data (user1 and user2). As OTA upgrades are proven to fail sometimes for whatever reason, the upgrade will always happen on the currently inactive part, meaning, if the device is currently running the code from the user1 part, the upgrade will happen onto the user2 part.
That mechanism is not invented by Itead, but actually provided as off-the-shelf OTA solution by Espressif (the SoC manufacturer) itself.

For 1MB flash chips the user1 image is stored at offset 0x01000 while the user2 image is stored at 0x81000.

And indeed, the two original upgrade images (user1 and user2) differ significantly.

If flashing a user2 image onto the user1 part of the flash the device refuses to boot and vice versa.

While there’s not much information about how user1.bin and user2.bin technically differ from each other, khcnz pointed me to an Espressif document stating:

user1.bin and user2.bin are [the] same software placed to different regions of [the] flash. The only difference is [the] address mapping on flash.

4th attempt

So apparently those 2 images must be created differently indeed.

Again it was khcnz who pointed me to different linker scripts used for each image within the original SDK.
Diffing
https://github.com/espressif/ESP8266_RTOS_SDK/blob/master/ld/eagle.app.v6.new.1024.app1.ld
and
https://github.com/espressif/ESP8266_RTOS_SDK/blob/master/ld/eagle.app.v6.new.1024.app2.ld
reveals that the irom0_0_seg differs (org = 0x40100000 vs. org = 0x40281010).

As Tasmota doesn’t make use of the user1-/user2-ping-pong mechanism it conly creates images supposed to go to 0x1000 (=user1-partition).

So for creating an user2.bin image — in our case for a device having a 1MB flash chip and allocating (only) 64K for SPIFFS — we have to modify the following linker script accordingly:

--- a/~/.arduino15/packages/esp8266/hardware/esp8266/2.3.0/tools/sdk/ld/eagle.flash.1m64.ld
+++ b/~/.arduino15/packages/esp8266/hardware/esp8266/2.3.0/tools/sdk/ld/eagle.flash.1m64.ld
@@ -7,7 +7,7 @@ MEMORY
   dport0_0_seg :                        org = 0x3FF00000, len = 0x10
   dram0_0_seg :                         org = 0x3FFE8000, len = 0x14000
   iram1_0_seg :                         org = 0x40100000, len = 0x8000
-  irom0_0_seg :                         org = 0x40201010, len = 0xf9ff0
+  irom0_0_seg :                         org = 0x40281010, len = 0xf9ff0
 }
 
 PROVIDE ( _SPIFFS_start = 0x402FB000 );

So we will now create an user1 (without above applied modification> and an user2 (with above modification> image and converting them to v2 images with esptool.py as described above.

–> WORKS!

Depending on whether the original firmware was loaded from the user1 or user2 partition, it will fetch and flash the other image, telling the bootloader afterwards to change the active partition.

Issues

Mission accomplished? Not just yet…

Although our custom firmware is now flashed via the original OTA mechanism and running, the final setup differs in 2 major aspects (compared to if we would have flashed the device via serial):

  • The bootloader is still the original one
  • Our custom image might have ended up in the user2 partition

Each point alone already results in the Tasmota/Adruino OTA mechniasm not working.
Additionally — since the bootloader stays the original one — it still only expects v2 images and still messes with us with its ping-pong-mechanism.

This issue is already being addressed though and discussed on how to be solved best in the issue ticket mentioned at the very beginning.

Happy hacking!

Categories
embedded systems English articles My life Tech Uncategorized

intel 540s SSD fail

My intel SSD failed. Hard. As in: its content got wiped. But before getting way too theatrical, let’s stick to the facts first.

I upgraded my Lenovo ThinkPad X1 Carbon with a bigger SSD in the late summer this year — a 1TB intel 540s (M.2).

The BIOS of ThinkPads (and probably other brands as well) offer to secure your drive with an ATA password. This feature is part of the ATA specification and was already implemented and used back in the old IDE times (remember the X-BOX 1?).

With such an ATA password set, all read/write commands to the drive will be ignored until the drive gets unlocked. There’s some discussion about whether ATA passwords should or shouldn’t be used — personally I like the idea of $person not being able to just pull out my drive, modify its unencrypted boot record and put it back into my computer without me noticing.

In regard of current SSDs the ATA password doesn’t just lock access to the drive but also plays part in the FDE (full disk encryption) featured by modern SSDs — but back to what actually happened…

As people say, it’s good practice to frequently(TM) change passwords. So I did with my ATA password.

And then it happened. My data was gone. All of it. I could still access the SSD with the newly set password but it only contained random data. Even the first couple of KB, which were supposed to contain the partition table as well as unencrypted boot code, magically seem to have been replaced with random data. Perfectly random data.

So, what happened? Back to FDE of recent SSDs: They perform encryption on data written to the drive (decryption on reads, respectively) — no matter if you want it or not.
Encrypted with a key stored on the device — with no easy way of reading it out (hence no backup). This is happening totally transparently; the computer the device is connected to doesn’t have to care about that at all.

And the ATA password is used to encrypt the key the actual data on the drive is encrypted with. Password encrypts key encrypts data.

Back to my case: No data, just garbage. Perfectly random garbage. First idea on what happened, as obvious as devastating: the data on the drive gets read and decrypted with a different key than it initially got written and encrypted with. If that’s indeed the case, my data is gone.

This behaviour is actually advertised as a feature. intel calls it “Secure Erase“. No need to override your drive dozens of times like in the old days — therewith ensuring the data is irreversible vanished in the end. No, just wipe the key your data is encrypted with and done. And exactly this seems to have happened to me. I am done.

Fortunately I made backups. Some time ago. Quite some time ago. Of a few directories. Very few. Swearing. Tears. I know, I know, I don’t deserve your sympathies (but I’d still appreciate!).

Anger! Whose fault is it?! Who to blame?!

Let’s check the docs on ATA passwords, which appear to be very clear — from the official Lenovo FAQ:

“Will changing the Master or User hard drive password change the FDE key?”
– “No. The hard drive passwords have no effect on the encryption key. The passwords can safely be changed without risking loss of data.”

Not my fault! Yes! Wait, another FAQ entry says:

“Can the encryption key be changed?”
– “The encryption key can be regenerated within the BIOS, however, doing so will make all data inaccessible, effectively wiping the drive. To generate a new key, use the option listed under Security -> Disk Encryption HDD in the system BIOS.”

Double-checking the BIOS if I unintentionally told my BIOS to change the FDE key. No, I wasn’t even able to find such a setting.

Okay — intermediate result: either buggy BIOS telling my SSD to (re)generate the encryption key (and therewith “Secure Erase” everything on it) or buggy SSD controller, deciding to alter the key at will.

Google! Nothing. Frightening reports about the disastrous “8MB”-bug on the earlier series 320 devices popped up. But nothing on series 540s.

If nothing helps and/or there’s nobody to blame: go on Twitter!

Some Ping-Pong:

Then…

https://twitter.com/IntelSSD/status/791299306892308480

Wait, what?! That’s a known issue? I didn’t find a damn thing in the whole internets! Tell me more!

And to my surprise – they did. For a minute. Shortly before having respected tweets deleted.

Let’s take a look on what my phone cached:

The deleted tweets contain a link http://intel.ly/2eRl73j which resolves to https://security-center.intel.com/advisory.aspx?intelid=INTEL-SA-00055&languageid=en-fr which is an advisory seemingly describing exactly what happened to me:

“In systems with the SATA devsleep feature enabled, setting or resetting the user or master password by the ATA security feature set may cause data corruption.”

Later on:

“Intel became aware of this issue during early customer validation.”

I guess I just became aware of being part of the “early customer validation”-program. This issue: Personally validated. Check.

Ok, short recap:

  • intel has a severe bug causing data loss on 540s SSD and – according to the advisory – other series as well
  • intel knows about it (advisory dates to 1st of August)
  • intel doesn’t seem to be eager to spread the word about it
  • affected intel SSDs are sold with the vulnerable firmware version
  • nobody knows a damn thing about it (recall the series 320 issue which was big)

Meanwhile, I could try to follow up on @lenovo’s tips:

Sounds good! Maybe, just maybe, that could bring my data back.

Let’s skip the second link, as it contains a dedicated Windows software I’d love to run, but my Windows installation just got wiped (and I’m not really keen of reinstalling and therewith overriding my precious maybe-still-not-yet-permamently-lost data).

The first link points to an ISO file. Works for me! Until it crashes. Reproducibly. This ISO reproducibly crashes my Lenovo X1 Carbon 3rd generation. Booting from USB thumb-drive (officially supported it says), as well as from CD. Hm.

For now I seem to have to conclude with the following questions:

  • Why there’s not I can’t find a damn thing about this bug in the media?
  • Why did intel delete its tweets referencing this bug?
  • Why does the firmware-updater doesn’t do much despite crashing my computer?
  • Why didn’t I do proper backups?!
  • How do I get my data back?!?1ß11

 

PS: Before I clicked the Publish button I again set up a few search queries. Found my tweets.

Categories
Uncategorized

Protected: There’s no such thing as bad publicity…

This content is password protected. To view it please enter your password below: