Fault-Tolerant RL Octocopter — Karolina Dubiel
← karolina.mgdubiel.com6,399 views
In Progress▶ First flight video
Fault-Tolerant RL Octocopter
Want to follow along? I'd love it if you tuned in for the updates I'll post on [](https://x.com/karolina_dubiel) and [](https://www.linkedin.com/in/karolinadubiel)
Loading model…
drag to rotate · scroll to zoom
x: 83 y: -165 z: 208
Building a custom octocopter from scratch -- 0 prior hardware experience, idea to flying drone in 2.5 weeks. Designed in Fusion 360, CNC-milled from G10 fiberglass and carbon fiber, and assembled by hand. The end goal: an RL-trained controller that can sustain flight through single, dual, and quad motor failures in simulation, deployed zero-shot to hardware.
Phases
FAQ
- ✓Phase I --CAD a custom octocopter design, CNC cut it, and assemble the frame, motors, and propellers.
- ✓Phase II --Wire up electronics and take flight as a regular FC-powered octocopter.
- ◉Phase III --Develop and train an RL policy capable of supporting the octocopter through regular flight and dual-motor failures.
- ○Phase IV --Complete the sim-to-real transition and achieve RL-powered flight. Sustain flight after shutting off 2 motors randomly in field tests.
- * *
Build Log
Jump to oldest ↓Collapse all
Day 30: Learning curve
Jun 28, 2026
**The drone flies through all single, dual, and SOME TRIPLE motor failures!!**But it's sim-only, and the path to get here had significant learning curves.
Surviving dual motor failures (2x speed)
Surviving worst-case 2-motor failure (2x speed)
Before training the huge policy with domain randomization and all of the bells and whistles, I decided to train a sim-only policy, which would be ready in about an hour and half on CPU. I'm really glad I did this, because it definitely didn't work out the first time. Here's a (maybe too transparent) timeline of the process:
| # | What I tried | Result | | --- | --- | --- | | 1 | Baseline PPO, high exploration, always 2 faults. Ran overnight. | Failed. Entropy climbed 11→22 the whole time, and it crashed at 20M when trying to save (it was fine, I had checkpoints, but still) |
| 2 | Lowered exploration, kept always-2-faults and no curriculum (straight-to-dual). | This one seemed to be working. Looked broken at 2M (crashing at step 7, very negative entropy — Gaussian differential entropy can go negative when variance collapses), but trained through it. I killed it deliberately because I wanted to add single-motor failures first. | | 3 | Added a hover->single->dual curriculum | Broke everything. Turned out the curriculum's 4M steps of pure hover was exposing two latent bugs (failures from step 0 gave enough training signal to bulldoze past them). | | 4 | (not a config -- an operator error) | A zombie training process from the night before was running alongside the new one, both writing the _same_ checkpoint filenames. | | 5 | Residual actions (u = hover + action). | Commands got driven to ±3, saturated and lopsided -- tip-over in 7 steps. | | 6 | Stripped/re-added the curriculum, low exploration. | 0% at every checkpoint -- and it crashed at _step 7 every single time_, even during the pure-hover phase. That's what finally told me that this bug had to be systemic. | | 7 | The two real fixes (below). | Worked. Hover learned by 0.5M steps, 100% survival on hover/single/dual by ~9.5M. |
Everything from #3 onward -- the entropy panic, the residual detour, the curriculum back-and-forth -- was me poking at symptoms of two underlying bugs.
**1. The actions were getting stuck.** The Gaussian policy outputs unbounded means, but the env hard-clips commands to [0,1] and PPO computes gradients on the _unclipped_ value -- so once a motor drifts past the clip edge there's no corrective gradient pulling it back inside, and it stays there. With 8 motors this produces a lopsided tip-over, and no hyperparameter fixes it because it's not a hyperparameter problem. Fix: squash through tanh as a residual around hover throttle, so commands can't saturate and an untrained net already hovers. This alone bumped untrained survival from 7 steps to 205.
**2. Staying alive paid nothing.** An open-loop hover test printed `r = 0.00` every step: at the ~1.9m the drone actually settles, the +0.1 survival bonus was exactly cancelled by the -0.1 altitude penalty. Since the drone is marginally stable every episode eventually crashes (-10), so "hover 200 steps then crash" and "crash immediately" had the _same_ return. Fix: bump `r_survive` 0.1 → 1.0, so hovering pays +0.9/step and PPO finally has a reason to stay up.
**Results**
Final policy is a 43.4k-parameter MLP.
!Image 3: Learning curve across 20M training steps
Learning curve across 20M steps — survival and reward by fault class
It even generalizes to 3-motor failures it was never trained on, as long as recovery is physically possible. Even when I killed 3 adjacent motors (physically unrecoverable) then, it fought for 7.2 seconds and sank instead of tumbling.
Surviving triple motor failures (out-of-distribution)
One nice surprise: the "uncompensatable-yaw" cases (two same-spin motors 90° apart, where in theory the drone should just spin freely) aren't actually uncompensatable. The policy holds heading to within ~13°/s in all of them -- a slow drift, not a free spin. My heuristic for flagging those was too pessimistic.
Next step: the real, sim-to-real-able policy!
Day 25: Being a physicist, aka swinging my drone from the kitchen table
Jun 23, 2026
I modeled and printed a GPS mount, which was the final component I needed to lock down my total system mass. Now that the drone is fully complete, I was able to collect the full system identification data on the drone. I haven't been able to dedicate as much time as I would've wanted to this drone for the past ~1.5 weeks, but I hope to lock back in now that it's a software-only task for the next little bit.
!Image 4: Yaw bifilar pendulum setup (I promise the strings are parallel and straight IRL)
Yaw bifilar pendulum setup
(I promise the strings are parallel IRL)
!Image 5: GPS and receiver mount
GPS and receiver mount
Here are the results and calculations of my system identification. They're not completely perfect or scientific, but I'm hoping domain randomization can make up for any inaccuracies. I'll be using motor/propulsion characterization information from the manufacturer and have also collected CoM info.
I = (T² · M · g · d²) / (8π² · L) → I = 0.36974 · d² · T²
M = 1.177 kg · L = 0.39552 m · d = half wire sep. (m) · T = time for 20 osc ÷ 20 (s)
**Measurements**
Each trial timed over 20 oscillations (T = total / 20). Wire separations were measured as full width and halved for d.
7 trials each; pink = worst 2 dropped before averaging.
| Axis | T1 | T2 | T3 | T4 | T5 | T6 | T7 | Avg×20 (s) | T (s) | Wire sep. (mm) | d (m) | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | Roll | 12.97 | 12.87 | 13.15 | 12.92 | 12.9 | 12.82 | 12.92 | 12.906 | 0.6453 | 332.592 | 0.166296 | | Pitch | 12.77 | 12.64 | 12.82 | 12.63 | 12.75 | 12.84 | 12.82 | 12.76 | 0.638 | 332.592 | 0.166296 | | Yaw | 13.84 | 13.52 | 13.52 | 13.47 | 13.52 | 13.4 | 13.65 | 13.536 | 0.6768 | 380.314 | 0.190157 |
**Results**
| Axis | T (s) | d (m) | I (kg·m²) | | --- | --- | --- | --- | | Roll | 0.6453 | 0.166296 | 0.004258 | | Pitch | 0.6380 | 0.166296 | 0.004162 | | Yaw | 0.6768 | 0.190157 | 0.006125 |
Day 22: Standing on business and changing of plans
Jun 20, 2026
The drone now has 8 legs to land on, which should make test flights significantly less scary. Although I never optimized this drone for weight when designing it (the body plates could have more cutouts and other optimizations could've been made), I've now decided to be more mindful about each component going forward. In total, the 8 legs weigh 7.67g.
CAD of the legs
Legs printing
I've also made the decision to no longer have a separate microcontroller for now. I'd originally planned to bolt a separate companion computer onto the drone to run the RL policy and feed motor commands to Betaflight over MSP -- first a Raspberry Pi 4, then a Teensy when the Pi looked like a bad fit. The problem: no matter which board I picked, my architecture needs to send 8 direct per-motor commands, and doing that over MSP fights Betaflight's safety model (motors don't reliably stop on disarm or link loss). So I'm scrapping the separate microcontroller -- my flight controller is already an STM32H743 (480MHz M7), so I'm just compiling the policy straight into the Betaflight firmware on the board I already have, which also kills a big chunk of my loop latency. Papers that inspired this: "Learning to Fly in Seconds" (Eschmann et al., RA-L 2024) and Neuroflight (Koch et al., 2019).
I'll try this out and re-evaluate if I run into significant blocks.
Day 17: The eagle has landed (taken off)!
Jun 15, 2026
Today, exactly 2.5 weeks after the kickoff of my initial idea, I officially completed the pipeline from concept -> flying octocopter with 0 prior hardware or CAD experience. When I started this project, I had never flown a drone before.
I haven't done anything except hover yet, and there's no microcontroller on board, so this is a completely regular octocopter with no RL abilities at all. I also have yet to make a GPS mount, mount my antenna, or strap anything down nicely, so there's a lot of work to do before this thing can properly fly safely.
YouTube link to flight video ->
!Image 8: Configuring everything in Betaflight
Configuring everything in Betaflight
We have liftoff!
To be clear: currently, if this drone lost a single motor in flight, it would probably stay up. Octocopters are famously tolerant to single motor failure for two reasons:
**1.**There's huge thrust overcapacity: 8 motors at ~125 gf load each at hover means losing one drops total capacity from ~11,000 gf to ~9,750 gf, still enough to maintain a healthy 2:1 thrust-to-weight ratio up to nearly 5 kg of drone weight (we're at 1 kg).
**2.**Betaflight's PID loop runs at several kHz and doesn't need to know _why_ the drone is tilting -- it just sees the gyro reporting a roll, and commands the remaining motors on the low side to push harder. The yaw imbalance gets partially compensated by Betaflight's mixer redistributing throttle between the surviving CW and CCW motors. The drone would stay airborne with maybe a slow yaw drift and degraded responsiveness, but it usually wouldn't fall out of the sky unless other bad things happened.
The problem is that this only really works for _one_ motor. As soon as a second motor dies (especially two same-rotation motors at 90° from each other) the static mixer breaks down. It keeps demanding thrust from two dead motors and the drone would become uncontrollable. That's the failure mode RL is supposed to fix.
P.S. -- while setting everything up in Betaflight, I set the startup chime of the drone to be Mask Off by Future :D
Also: the flat 8-arm frame has a much larger effective disc area than a typical quad, which means strong ground effect -- air pushed down by the rotors compresses against the floor and bounces thrust back up, so the drone feels weirdly buoyant near the ground (you can see it floating in the video below).
Floating on its own ground-effect cushion
**A common question: why not MPC?**
A lot of people on X have asked this -- to be honest, the primary reason is that I specifically wanted an RL project and designed this drone around that goal. I came up with the fault-tolerant octocopter concept as a vehicle for learning RL on real hardware, not the other way around.
That said, there are real engineering arguments for RL here:
- **Inference cost:** MPC solves an optimization problem at every timestep, which is a lot of computation for a RPi commanding 8 motors (more than RL, which is a single pass through a ~50k parameter network, which would probably be under or around 1ms)
- **Unknown failure state:** from my understanding, MPC normally needs to know what the system is doing, and without a dedicated fault detector, this would create extra work for me. The RL policy learns to infer failure state implicitly from the gap between commanded and observed behavior.
- **Model mismatch tolerance:** MPC is only as good as its model. My cheap motors probably aren't perfectly identical and the inertia tensor I measure will only be approximate. Heavy domain randomization during RL training explicitly teaches the policy to handle model error. An MPC controller built on the same uncertain model doesn't get that for free.
MPC is probably the more reliable choice for a project like this, but not the most fun option :D If I can't get RL working, MPC is absolutely my fallback -- and at that point I'd probably treat this whole attempt as useful data collection for a model anyway.
Day 13: What's next -- training an RL policy
Jun 11, 2026
Since posting on X, I've gotten many DMs asking exactly how I want to approach the next phase of this project: making the drone fly with RL. Here's the plan I have so far.
- * *
Most importantly, the RL policy will directly command all 8 motors at 50 Hz over a serial link to the flight controller with no traditional PID loop in the path. This is the only architecture that gives the policy full authority to reallocate thrust when motors fail.
I'm focusing on six unique failure classes (ignoring rotational equivalence): single motor, adjacent pair (45°, mixed CW/CCW), 90° same-type, 135° mixed, 180° same-type, and full ESC loss (each ESC controls its own quad). The hardest case is the 90° same-type failure, because it's the only one that hits both problems simultaneously: a yaw torque imbalance (the two dead motors were the same spin direction) and a spatial asymmetry in the remaining thrust geometry.
!Image 9: The circuit diagram that I drew for wiring everything up
The single- and dual-motor failures that I want to support, plus ESC loss
Losing two same-spin motors leaves 2 CW and 4 CCW running (or vice versa), yaw-torque imbalanced 2:1 at equal throttle. Balancing them forces the CW motors to run at 2× the per-motor thrust of the CCW motors. At 1393 gf max per motor, the yaw-balanced thrust ceiling works out to 5,572 gf -- enough to maintain a 2:1 thrust-to-weight ratio up to ~2.8 kg of drone weight (we're at 1 kg). The remaining 6 motors span a 270° arc, so roll and pitch authority still exists. The worst case is survivable -- the drone would be spinning, but it could still hover to a soft landing.
| | Full 8-motor | 90° same-type (6 motors) | | --- | --- | --- | | Max total thrust | 11,144 gf | 5,572 gf (yaw balanced) | | CW motor load at hover | ~9% | ~18% | | Max drone weight at 2:1 T/W | ~5.6 kg | ~2.8 kg | | Yaw authority | full | near zero |
**Simulation**
I'm building the sim in MuJoCo, because it runs fast on a CPU and I have a Mac, which rules out Isaac Lab and basically everything else NVIDIA-shaped. For a single rigid body with 8 thrust points, MuJoCo is more than enough, and I can run ~128 environments in parallel on my laptop.
The model itself comes from measurements, not the CAD. I'll be gathering data on:
- Total mass
- Inertia tensor via the bifilar pendulum test
- Motor thrust curves
- Motor time constant
- Hover throttle point
I'm also adding two things to my sim environment that I keep reading are what actually kill sim-to-real transfer for motor-level control:
**1.** Motor lag: real motors take 20–50 ms to reach a commanded speed. In sim, thrust changes instantly unless you model it. A policy that learns with instant motors learns to twitch.
**2.** Loop latency: on the real drone, there's ~15–30 ms between the IMU reading and thrust actually changing (serial read, inference, serial write, ESC response). If I train with zero latency, the policy will oscillate the second it touches hardware. This one scares me the most, so it's getting randomized aggressively (the policy trains against a delay that changes every episode and jitters within episodes).
!Image 10: The high-level plan moving forward
The high-level plan moving forward
Everything else physical gets randomized too: mass ±10%, per-motor thrust constants ±15% (cheap motors are not identical, I own eight data points proving this), center of mass, battery sag over a flight, sensor noise[[4]](https://karolina.mgdubiel.com/drone/#day14-ref-4).
**Training**
PPO[[1]](https://karolina.mgdubiel.com/drone/#day14-ref-1) via PufferLib. I looked at SAC since it's more sample-efficient, but sample efficiency solves a problem I don't have -- my sim steps are nearly free. PPO with a pile of parallel environments is what almost every sim-to-real flight paper I've read actually shipped, and it plays nicer with heavy randomization. (Also: an X reply told me "puffer. just puffer. trust me.")
Two more decisions I stole from sim-to-real literature:
**1.**The critic gets to cheat. During training, the value network sees ground truth the real drone will never have, like which motors are dead, the exact thrust constants, and true velocity. The actor only sees what real sensors provide. The critic gets thrown away after training, so this costs nothing at deployment. (This is called asymmetric actor-critic[[2]](https://karolina.mgdubiel.com/drone/#day14-ref-2), and I've read that it makes a huge difference when the physics are randomized this hard.)
**2.**No fault detector (for now). The policy sees its last 5 observation/action frames and has to figure out failures on its own, from the gap between what it commanded and what the drone did.
Under a same-type dual failure the drone physically cannot hold its heading -- the torques don't balance at any throttle combination. The right behavior is to give up on yaw, spin slowly about vertical, and stay level. If the reward punishes spinning, the policy sacrifices roll and pitch chasing a heading it can't have. Mueller & D'Andrea showed the same thing for quads losing a motor[[3]](https://karolina.mgdubiel.com/drone/#day14-ref-3) -- their recovering quad spins the whole time. Mine will too, on purpose.
**Deployment**
If the policy shows promising survival rates in sim, it'll get exported to ONNX and run on the RPi 4 (I think. Any opinions on this vs other microcontroller options?) the network is ~45k parameters, which is under a millisecond of inference, so the Pi is not the bottleneck. The 50 Hz loop will read attitude and gyro over serial, run the policy, and write 8 motor commands.
Then, the actual experiment: fly, kill motors from the transmitter, and find out if millions of simulated crashes taught it anything!
- * *
1. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal Policy Optimization Algorithms," _arXiv:1707.06347_, 2017. 2. L. Pinto, M. Andrychowicz, P. Welinder, W. Zaremba, and P. Abbeel, "Asymmetric Actor Critic for Image-Based Robot Learning," _RSS_, 2018. 3. M. W. Mueller and R. D'Andrea, "Stability and control of a quadrocopter despite the complete loss of one, two, or three propellers," _IEEE ICRA_, 2014. 4. J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, "Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World," _IROS_, 2017.
Day 11: Insider traitor-ing
Jun 9, 2026
Flashing the firmware didn't go as planned -- the USB-C input port on my H743 AeroSelfie FC is broken. This isn't the biggest deal in the world; everything on that board was pre-soldered and detaching it for return was pretty easy. The annoying part is that a broken FC is pretty critical-path, and nothing can progress until the new one comes in (thankfully soon!)
I attached the standoffs and top body plate to the drone to get an idea of what everything would look like all together and weigh the assembled drone.
!Image 11: The drone with the top plate and standoffs attached
The drone with the top plate
and standoffs attached
!Image 12: The traitor FC in question
The traitor FC in question
The drone weighs exactly 1kg with a mounted battery (this weight includes everything except the flight controller, which is negligible). Each motor produces approximately 950gf of thrust at 70% throttle on a fully charged 6S battery, and up to 1393gf at full throttle. Across all 8 motors, that's 7,600gf -- 7.6kg of thrust -- at 70% throttle alone, against 1kg of weight, which gives a thrust-to-weight ratio of 7.6:1. To hover, I only need 125gf per motor, which is around 15-20% throttle. That means the drone has enormous headroom above hover -- at 70% throttle it's producing nearly 8x what it needs to stay airborne. This is really, really good! An overpowered drone = way more leeway to tolerate (or ideally, fully recover from and maintain normal flight during) motor loss.
I'm kind of blocked until the new FC arrives. I'm not used to blockers like this (given my all-software background), but I'm reminding myself that it's just part of hardware to have stuff like this happen. Annoying ≠ discouraging, and I'm really excited to see this drone hover soon.
Day 9: Wired up!
Jun 7, 2026
I soldered the flight controller, two ESCs, GPS, battery wires, and receiver together! The drone is theoretically able to hover now, but I haven't tested that yet :D
!Image 13: The circuit diagram that I drew for wiring everything up
The circuit diagram that I drew for wiring everything up
!Image 14: The fully soldered drone!
The fully soldered drone!
I'm pretty inexperienced with soldering, so this part took me longer than any of the CAD/assembly so far. Given that I've never assembled electronics together this way, it was difficult for me to imagine how everything would fit together and to solder everything neatly. I ended up deciding to just have one battery wire, which I sandwiched between the two ESCs, so that it could serve them both.
The ESCs and flight controller sit on top of each other to simplify my center of gravity. Once I fly this drone as a regular octocopter, I'll also have to mount a Raspberry Pi or Jetson Nano (open to feedback here!) onboard to run the inference. I plan on sticking this board to the bottom of the top plate.
!Image 15: Soldering in progress ...
Soldering in progress ...
!Image 16: Attaching the capacitors
Attaching the capacitors
**Before I test hovering, I'll need to:**
- Flash Bluejay firmware to both ESCs
- Configure Betaflight: set the mixer to Octocopter Flat X, ESC protocol to DSHOT600, enable the accelerometer, and dial in conservative rates and arming parameters
- Configure failsafe behavior for if RC signal drops mid-flight
- Balance check: find the battery position that centers the CoM and lock it down
Day 6: Superglued, taped down, and ready to solder
Jun 4, 2026
As soon as the jig was printed, I used it to align the arms of the drone and filled any gaps in between with superglue.
!Image 17: The phase wires for the motors are now taped down and the frame is perfectly aligned
The phase wires for the motors are now taped down and the frame is perfectly aligned
!Image 18: Filling the space between the arms with superglue while the drone is in the jig
Filling the space between
the arms with superglue while
the drone is in the jig
!Image 19: The drone sitting in the 3D printed jig
The drone sitting in the 3D printed jig
The arms were fully stable once the superglue set, which means I won't have any vibrational issues due to the imperfect arm alignment that I was worried about earlier. My original plan was to start soldering all the electronics today, but I'm still waiting on a soldering iron shipment. Planning to start wiring everything as soon as materials and my full-time job allow :D
Supergluing the arms in the jig required loosening the screws to get the arms to fully pop into the jig supports. One of the screws got stuck, snapped, and had to be drilled out :( Crisis averted with very minimal damage to the frame, though!
Traitor screw
Day 4: Getting jiggy with it
Jun 2, 2026
I continued with assembly, screwing the 8 motors to the arms and the 8 propellers to the motors.
!Image 21: Assembled drone frame with motors and propellers
The assembled drone frame (arms, bottom
and middle plates, motors, and propellers)
!Image 22: A screenshot from a timelapse of me attaching the motors + propellers to the body
A screenshot from a video of me
attaching the motors + propellers
to the body
A small problem: if you tug really hard, some of the arms wiggle a bit, even when fully screwed together and tightened. I think this is due to the fact that I set the cut tolerance as 0.1mm in the CAD, not knowing how precise the CNC mill would be. For the future, a better tolerance would be 0.05mm or 0.08mm. Any wiggle room in the arms can cause vibrations when flying, which could mess up my flight dynamics and make RL-based flying impossible.
The solution for this is to 3D print a 0-tolerance assembly jig to hold the arms in perfect position while the center of the drone is superglued together. Here's the design of said jig -- it'll be printed and ready to use soon:
!Image 23: CAD of the assembly jig
CAD of the assembly jig
Day 3: Assemble!
June 1, 2026
After a weekend away at Pinnacles National Park, I screwed the body of the drone together: the 8 arms, the bottom plate, and the middle plate.
!Image 24: Work in progress ... screwing all of the arms together
Work in progress ... screwing all
of the arms together
Day 1: CAD, CNC milling, and humble beginnings
May 30, 2026
While on a recent vacation in Guatemala, I came up with the idea for this project from a hammock on the shores of Lake Atitlán. Immediately, I ordered (most of) the necessary parts on Amazon and started ideating on exactly how to go about building a fully RL-powered, intelligently fault-tolerant octocopter.
I have **never** done a substantial hardware project before. I have never CADed, I've soldered once, and I have no experience with drone flight controllers, speed controllers, or anything in that domain. **I have never flown a drone before.** I have never trained an RL policy as complex as the one required for this project.
I got started thanks to hours spent on Google, Reddit, Claude, and talking to Tomas, an AE major who helped with every CAD and machine shop question I had.
The first two steps of this project were both started and completed today:
1. CAD of the drone's body and arms in Fusion360
2. CNC milling forms out of G-10 fiberglass (arms) and 5mm carbon fiber (body)
!Image 25: Fusion 360 CAD render with all eight motors placed on the frame
Fusion360 view of the finished CAD -- full octocopter with third-party motor/propeller .step files imported into my design
!Image 26: Top-down layout of the arm geometry in CAD
Intertwined arm geometry layout
The arms intertwine in the center of the drone for stability. They're sandwiched between a flower-shaped bottom plate and a larger body plate on top. I decided on flower cutouts for the carbon fiber body.
!Image 27: Arms drawing exported for CNC milling
Arms -- prepared for the CNC mill
!Image 28: Body plate drawing exported for CNC milling
Body plates -- prepared for the CNC mill
After that, it was time to CNC cut. This was my first time in a machine shop :D
I ended up having to re-cut the arms because the drill was going too fast and pushed the G-10 plate as it was cutting. I learned that cutting at ~20% speed when using thick materials is a much better idea than having to re-cut due to going too fast.
!Image 29: CNC toolpath simulation for the arm cuts
The CNC mill cutting out the arms
!Image 30: CNC mill cutting the carbon fiber body plate
Me preparing the CNC mill
!Image 31: Freshly CNC-cut parts laid out
Freshly cut G-10 and carbon fiber parts
Made with 💖 by Karolina Dubiel. Last updated June 2026.
✕
FAQ
Why an octocopter?
When a quadcopter loses a motor, it has to give up yaw authority entirely to stay airborne — Mueller & D'Andrea (2014) showed you can recover stable flight, but only by letting the whole frame spin. An octocopter has enough actuator redundancy that in most dual-motor-loss cases (the exception being two motors 90° apart of the same rotation direction) the remaining motors can still produce the full range of forces and torques, so the drone can fly completely normally with the right policy — no yaw sacrifice required.
Why'd you pick this project?
I wanted a real RL project on real hardware and designed the drone around that goal — the fault-tolerant octocopter is a vehicle for learning RL on hardware, not the other way around.
Why not MPC?
Honestly, the primary reason is that I specifically wanted an RL project. That said, there are real engineering arguments: MPC solves an optimization at every timestep (expensive on a RPi commanding 8 motors vs. a single forward pass through a ~50k-parameter network), it normally requires knowing the failure state explicitly (the RL policy infers it implicitly from the gap between commanded and observed behavior), and it's only as good as its model — domain randomization during RL training explicitly teaches robustness to the model error my cheap motors and approximate inertia tensor will introduce. MPC is probably more reliable, but less fun. It's my fallback if RL doesn't pan out.