Author: TohumAB

Beyond OCR: How LLMs Are Transforming Structured PDF Extraction
Introduction

Extracting structured information from PDFs is a common challenge in many industries. Consider a Customs Declaration Form filled with item descriptions, quantities, and values—capturing these details accurately is crucial for compliance and downstream processing. Traditionally, organizations have relied on Optical Character Recognition (OCR) to digitize such forms. However, recent advances in artificial intelligence, particularly through Large Language Models (LLMs), offer a new approach to reading PDFs directly and preserving their structure. This article compares traditional OCR-based parsing versus direct LLM-based PDF reading and explains why LLMs are emerging as a powerful solution for structured document extraction.

How Traditional OCR Parses PDFs

OCR Workflow: Traditional OCR software converts scanned pages or PDF content into text. Essentially, the OCR engine detects characters and words in an image, outputting a plain text transcription. For example, an OCR might read a customs form and output a text block containing all the form’s words line by line. Modern OCR tools can achieve high accuracy on clean, typed documents and have long been a staple for digitizing text.

Limitations: OCR “sees” only text—it does not deeply understand a document’s layout or context. It does not inherently know that one piece of text is a customer name and another is an address; it merely recognizes them as isolated words on the page. As a result, preserving spatial relationships and the structure of data can be challenging. For instance, if a field’s value spans multiple lines (such as a 12-digit number broken over two lines), many OCR systems will treat each line separately. In borderless tables, the lack of explicit separators means that column boundaries can be lost, resulting in merged or misaligned output. To mitigate these issues, OCR-based workflows typically require post-processing steps—using positional data and template-based rules to reconstruct the original structure—which can be brittle when document layouts vary.

How LLMs Read PDFs Differently

LLM Workflow: Large Language Models approach the problem from a language understanding perspective rather than pure pattern recognition. One common approach is to feed OCR-extracted text (with formatting cues) into an LLM and ask it to interpret the content to extract structured data. More advanced methods involve multimodal LLMs that take the raw document (as an image or PDF) as input—integrating both visual and linguistic analysis. In both cases, the LLM is not merely transcribing characters; it is interpreting the document much like a human would, taking into account context, layout, and meaning.

Understanding Context and Structure: Because LLMs are trained on vast amounts of text and, in some cases, layout information, they inherently understand common formats and language patterns. An LLM can infer that a sequence of words represents an address or that a list of numbers forms a table column. This means the LLM can group and label information in one go, outputting structured data (such as JSON) directly. For instance, when processing a customs declaration, an LLM might output:
```
json
{
  "Total Value": 10000,
  "Currency": "USD",
  "ItemList": [
    { "Description": "Item A", "Quantity": 10, "Price": 50 },
    { "Description": "Item B", "Quantity": 5, "Price": 100 }
  ]
}
```
LLMs excel at using context to resolve ambiguities. If an item description spans multiple lines, a well-prompted LLM can understand that the continuation belongs with the original entry rather than representing a new item. This holistic approach allows LLMs to maintain the document’s structure with much less manual intervention.

Adapting to Layout Variations: Because LLMs rely on contextual and semantic cues, they are more adaptable to layout variations. Whether a form labels a field as “Total Value” or “Grand Total,” an LLM can recognize the intent behind the data. This flexibility means one model can handle multiple form types without extensive reprogramming—a significant advantage over rigid, rule-based OCR systems.

OCR Challenges with Complex Documents

Consider some common pain points of OCR when dealing with structured PDFs like customs forms:
- Loss of Spatial Context: OCR outputs a stream of text without clear indicators of spatial relationships. Important groupings—such as which value belongs to which field—can be lost, requiring additional logic to reassemble the data.
- Borderless or Complex Tables: Many business documents use spacing rather than drawn grid lines to define tables. OCR engines often misinterpret such layouts, breaking multi-line rows into separate entries or merging adjacent columns incorrectly.
- Multi-line Fields: Fields like addresses or product descriptions that span multiple lines are another challenge. Traditional OCR may treat each line as a distinct entry, breaking the continuity of the data.
- Extensive Post-Processing Needs: Extracting structured data from raw OCR output often requires custom rules, pattern matching, and heuristics. This not only adds complexity but also demands ongoing maintenance as document formats evolve.
LLMs to the Rescue: Why They Excel for Structured Forms

LLM-based processing addresses many of these challenges by integrating contextual understanding into the extraction process:
- Contextual Extraction: An LLM doesn’t just see words—it understands their meaning. By interpreting the content as a whole, an LLM can accurately associate values with their respective fields, reducing the need for extensive post-processing.
- Preserving Structure: With appropriate prompts, LLMs maintain the grouping of related data. For example, details for each line item in a customs declaration remain associated, ensuring that labels and values are correctly paired.
- Handling Layout Variations: LLMs are less sensitive to variations in document format. Their flexibility allows them to extract the correct information even when the layout changes from one document to another.
- Reduced Manual Rules: Instead of writing and maintaining a myriad of custom scripts for each document type, developers can rely on a well-crafted prompt to guide the LLM. This simplification reduces development overhead and speeds up deployment.
Recent advances in document understanding using transformer-based models have demonstrated that combining text and layout information significantly improves extraction performance. Models designed specifically for document processing have shown marked improvements in handling complex, multi-line, and borderless table data.

Performance Showdown: Accuracy, Speed, and Efficiency

Accuracy: LLM-based extraction tends to deliver higher accuracy for complex documents. While state-of-the-art OCR systems can achieve high accuracy on clean text, they often leave an error margin when extracting structured data. By leveraging context, LLMs can significantly reduce these errors in real-world applications.

Speed: Traditional OCR is optimized for raw text extraction and can process dozens of pages per second. LLM-based methods, while computationally heavier and slightly slower on a per-page basis, often deliver structured data directly—eliminating time-consuming post-processing steps. For many business workflows, a few extra seconds per document is a small price to pay for the gains in accuracy and automation.

Efficiency and Scalability: OCR is typically less resource-intensive, making it cost-effective for large-scale deployments. However, while LLMs demand more compute resources, they can enhance overall operational efficiency by reducing the need for manual corrections and custom parsing rules. Moreover, the adaptability of LLMs to new document formats without extensive reprogramming translates into long-term savings in time and development effort.

Real-World Impact on Business Workflows

For business professionals, data scientists, and developers, the difference between OCR and LLM-based extraction is not just technical—it’s about operational efficiency and data quality. For example, one large customs authority that adopted an LLM-driven document processing system reported dramatically faster form processing and a significant reduction in errors. By automating the extraction process, they were able to process forms more quickly, minimize compliance issues, and free up human resources for more complex tasks.

Moreover, the increased data accuracy from LLM-based extraction means fewer downstream errors, less manual intervention, and faster access to reliable data for decision-making. In an era where timely and accurate information is critical, the benefits of LLM-powered extraction can translate directly into a competitive advantage.

Conclusion & Key Takeaways

The evolution from traditional OCR to LLM-based PDF reading represents a significant leap in document processing technology. Key takeaways include:
- Different Philosophies: Traditional OCR is effective at basic text extraction but struggles with context and layout, while LLMs understand text within its broader context, preserving relationships and ensuring data integrity.
- Structured Data Integrity: In applications like Customs Declaration Forms, maintaining the structure of multi-field data is critical. LLMs excel at keeping related data elements correctly paired, thereby improving overall accuracy.
- Performance Considerations: While OCR offers speed and low computational cost, LLMs provide a richer, more accurate output that often justifies the extra processing time. Recent advances in document understanding demonstrate that transformer-based models can significantly reduce extraction errors.
- Impact on Workflows: By automating complex document extraction, LLM-based systems streamline operations, reduce manual corrections, and enable faster, more reliable access to critical data—directly enhancing business efficiency and decision-making.
In summary, while traditional OCR remains a useful tool for simple text extraction, LLMs are proving to be more effective for extracting structured data from complex documents. For organizations dealing with diverse and intricate document formats, the shift toward LLM-based processing represents a strategic advancement that can drive significant operational improvements.
June 29, 2025

From Concept to Code: My Iterative Journey to Multiplexing Module with ChatGPT

I started with a pretty straightforward wish. I had an ESP32 with just one accessible analog pin, but I needed to read 16 analog channels. I explained my predicament to ChatGPT o3-mini-high model, saying, “I only have one channel, but I need to read 16 channels.” ChatGPT responded with several options—external ADCs, multiplexing, and more. I considered the choices and decided that using an analog multiplexer, specifically the CD74HC4067, was the best route for me. That was the spark that lit the fire for a much deeper discussion.

Step 1: The Initial Wish

I began by stating that I wanted to use the ESP32 to read 16 analog channels even though I had only one analog input available. ChatGPT quickly provided several options. He mentioned using analog multiplexers, external ADCs, and even reassigning pins if possible. In the end, I chose to use the CD74HC4067 multiplexer. At that point, I hadn’t yet detailed my desired software architecture or any timing requirements. Later, I asked for possible architectural designs and a set of requirements to support my goal.

Step 2: Architectural Design and Requirements

I then asked ChatGPT to come up with an architectural design for a software module that would interface with the CD74HC4067. He presented a layered design that separated hardware abstraction, multiplexer control, ADC reading, and task integration. I appreciated the structure—it was thorough and organized—but I noticed it was missing details on real-time performance and compile‑time parameterization. So I requested that every adjustable parameter be defined as a preprocessor directive, and that the design include an RTOS Task/Thread approach with a queue to hold a timestamp for each set of 16 channel readings.

Step 3: Adding RTOS and Timestamps

With the idea now clearly taking shape, I confirmed that I was going to use the CD74HC4067. I specifically asked that a dedicated RTOS task be created to sequentially scan all 16 channels and that each full set of readings include a timestamp indicating when the reading was finished. ChatGPT updated the design accordingly, incorporating an ADC & sampling manager layer and specifying that the data package would contain both the channel data and the timestamp. At this point, I felt the design was solid but still needed more specific timing calculations.

Step 4: Timing Calculations and Sampling Modes

Next, I asked for a calculation of how many milliseconds it would take to read all 16 channels. ChatGPT provided several example scenarios based on different settling and ADC conversion delays. I liked the explanation but then pushed further: I wanted to add a parameter (set at compile time) that could take values like Fast Mode, Moderate Mode, or HighRes. This parameter would automatically adjust the multiplexer settling time and ADC conversion delay according to datasheet figures and previous measurements. The design evolved further with these added requirements.

Step 5: Complete Module Implementation

Then I requested a complete module—both .cpp and .h files—that incorporated all the requirements so far, including unit test functions. ChatGPT produced a detailed module, with all the adjustable parameters defined as preprocessor directives and real timing values chosen based on datasheet data (for instance, using a conversion time of around 11 µs for the ESP32). It even came with unit tests. While I was impressed with the thoroughness, I noticed that the code still used some blocking waits (for example, calls like ets_delay_us), which wasn’t acceptable for my real-time needs.

Step 6: The Non‑Blocking Mandate

Finally, I insisted that no blocking waits be used anywhere in the code. I demanded that every timing delay must be implemented in a non‑blocking manner using a state-machine approach, with the code yielding to the scheduler rather than busy-waiting. ChatGPT revised the design one last time, converting the sampling task into a non‑blocking state machine that checks elapsed time and yields control using vTaskDelay(0) and taskYIELD(). With that, the design was complete, meeting all my requirements.

Reflections on AI as a System Architect, Developer, and Hardware Developer

My journey—from stating a simple problem to receiving a complete, non‑blocking module—has been nothing short of fascinating. I learned that, through iterative conversation, I could convert a vague idea into a detailed set of requirements and a production-ready code solution. While some might argue that AI might miss subtle hardware nuances or integration issues, I found that the rapid iteration allowed me to focus on refining the design rather than reinventing every detail.

I now see that AI can act as both a system architect and a developer. It can quickly suggest modular designs, help define functional and non‑functional requirements, and even generate code that adheres to real-time, non-blocking constraints. Of course, human oversight remains essential for verifying and adapting AI output to real-world hardware challenges. But overall, the workflow becomes significantly faster, leaving more time for testing, optimization, and creative problem solving.

You can see the requirements and the full code below.

Functional Requirements:

Hardware Initialization:The module shall initialize all ESP32 digital pins for controlling the CD74HC4067 multiplexer and properly configure the ADC.
Multiplexer Channel Selection:The module shall enable selection of any channel (0–15) via digital outputs.
ADC Reading:The module shall perform ADC conversions from the multiplexer’s output.
RTOS Task for Sampling:A dedicated, non‑blocking FreeRTOS task shall sequentially sample all 16 channels.
Data Packaging:After a full scan, the module shall package the 16 ADC readings along with a timestamp (in milliseconds) into a data package.
Queue Integration:Each data package shall be posted to a FreeRTOS queue for asynchronous processing.
Parameterization:Every adjustable parameter (sampling interval, ADC conversion delay, multiplexer settling delay, queue length, pin assignments, etc.) must be defined as a preprocessor directive.
Sampling Mode Parameter:The module shall support a compile‑time selectable parameter (MODE_FAST, MODE_MODERATE, or MODE_HIGHRES) that adjusts the timing settings (such as settling time and ADC conversion delay) based on datasheet data and practical measurements.
Non‑Blocking Operation:All timing delays must be implemented in a non‑blocking manner (using a state-machine design and yielding to the scheduler) with no blocking waits.
Error Handling:The module shall detect and report errors (such as hardware initialization failures and queue overflows).

Non‑Functional Requirements:

Real-Time Performance:The sampling task must operate with minimal jitter and be assigned sufficient priority.
Modularity & Maintainability:The design must separate hardware control, sampling logic, and RTOS interfacing to facilitate maintenance.
Efficiency:The module must conserve CPU cycles by avoiding blocking operations.
Scalability:The design should allow future enhancements (e.g., additional filtering or averaging) with minimal changes.
Testability:The module’s functionality must be verifiable through unit tests.

Final Code

Below is the complete final module code:

MultiplexerModule.h

cpp

#ifndef MULTIPLEXER_MODULE_H
#define MULTIPLEXER_MODULE_H

#include <Arduino.h>
#include <freertos/FreeRTOS.h>
#include <freertos/queue.h>
#include <freertos/task.h>
#include <stdint.h>

/*
  Module Parameters and Sampling Modes

  All adjustable parameters are defined as preprocessor directives.
  The values below are chosen based on published datasheet data and practical measurements.
*/

// Number of multiplexer channels.
#ifndef NUM_CHANNELS
  #define NUM_CHANNELS 16
#endif

// Sampling mode selection.
typedef enum {
  MODE_FAST,
  MODE_MODERATE,
  MODE_HIGHRES
} SamplingMode_t;

#ifndef SAMPLING_MODE
  #define SAMPLING_MODE MODE_FAST
#endif

// Timing parameters (in microseconds) based on sampling mode.
#if (SAMPLING_MODE == MODE_FAST)
  #define MUX_SETTLE_TIME_US     2    // Minimal settling delay.
  #define ADC_CONVERSION_TIME_US 11   // Measured ~11 µs conversion time.
#elif (SAMPLING_MODE == MODE_MODERATE)
  #define MUX_SETTLE_TIME_US     5
  #define ADC_CONVERSION_TIME_US 22
#elif (SAMPLING_MODE == MODE_HIGHRES)
  #define MUX_SETTLE_TIME_US     20
  #define ADC_CONVERSION_TIME_US 50
#else
  #error "Invalid SAMPLING_MODE selected"
#endif

// Full-scan sampling interval (in milliseconds).
#ifndef SAMPLING_INTERVAL_MS
  #define SAMPLING_INTERVAL_MS 10   // Adjust as required.
#endif

// Queue length for data packages.
#ifndef QUEUE_LENGTH
  #define QUEUE_LENGTH 10
#endif

// Hardware pin assignments (adjust as needed for your wiring).
#ifndef MUX_S0_PIN
  #define MUX_S0_PIN  25
#endif
#ifndef MUX_S1_PIN
  #define MUX_S1_PIN  26
#endif
#ifndef MUX_S2_PIN
  #define MUX_S2_PIN  27
#endif
#ifndef MUX_S3_PIN
  #define MUX_S3_PIN  14
#endif
#ifndef MUX_EN_PIN
  #define MUX_EN_PIN  12   // Optional enable pin (active LOW). If unused, tie low.
#endif
#ifndef MUX_ANALOG_PIN
  #define MUX_ANALOG_PIN  34  // Example: an ADC1 channel on the ESP32.
#endif

// Data package structure that holds a full set of readings with a timestamp.
typedef struct {
  uint16_t channelReadings[NUM_CHANNELS];  // ADC reading for each channel.
  uint32_t timestamp_ms;                   // Timestamp (in ms) when the scan finished.
} DataPackage_t;

// Public API class declaration.
class MultiplexerModule {
public:
  // Initialize the module: configure hardware, create queue, and start the sampling task.
  static void init();

  // Stop the sampling task and free allocated resources.
  static void stop();

  // Retrieve a data package from the module’s queue (non-blocking).
  static bool getData(DataPackage_t* pkg, TickType_t waitTime = 0);

  // (Optional) Set sampling mode at runtime.
  // NOTE: Dynamic mode changes are not supported because all timing parameters are compile-time.
  static void setSamplingMode(SamplingMode_t mode);

  // Unit test function (compiled only if UNIT_TEST is defined).
  #ifdef UNIT_TEST
  static void runUnitTests();
  #endif

private:
  // Enumeration of states in the non-blocking sampling state machine.
  typedef enum {
    STATE_SELECT_CHANNEL,
    STATE_WAIT_SETTLE,
    STATE_ADC_READ,
    STATE_WAIT_ADC,
    STATE_NEXT_CHANNEL,
    STATE_FULL_SCAN_WAIT
  } SamplingState_t;

  // The non-blocking state-machine-based sampling task.
  static void samplingTask(void* pvParameters);

  // Helper function to set the multiplexer to the specified channel.
  static void selectMuxChannel(uint8_t channel);

  // FreeRTOS queue handle for data packages.
  static QueueHandle_t muxQueue;
  
  // Task handle for the sampling task.
  static TaskHandle_t samplingTaskHandle;

  // Current sampling mode (for documentation; timing parameters remain compile-time).
  static SamplingMode_t currentMode;
};

#endif // MULTIPLEXER_MODULE_H

MultiplexerModule.cpp

cpp

#include "MultiplexerModule.h"

// Static member definitions.
QueueHandle_t MultiplexerModule::muxQueue = NULL;
TaskHandle_t MultiplexerModule::samplingTaskHandle = NULL;
SamplingMode_t MultiplexerModule::currentMode = SAMPLING_MODE;

// Helper function: set multiplexer control pins based on the channel (0–15).
void MultiplexerModule::selectMuxChannel(uint8_t channel) {
  digitalWrite(MUX_S0_PIN, (channel & 0x01) ? HIGH : LOW);
  digitalWrite(MUX_S1_PIN, (channel & 0x02) ? HIGH : LOW);
  digitalWrite(MUX_S2_PIN, (channel & 0x04) ? HIGH : LOW);
  digitalWrite(MUX_S3_PIN, (channel & 0x08) ? HIGH : LOW);
}

// Non-blocking sampling task implemented as a state machine.
void MultiplexerModule::samplingTask(void* pvParameters) {
  DataPackage_t pkg;
  uint8_t currentChannel = 0;
  SamplingState_t state = STATE_SELECT_CHANNEL;
  uint64_t stateStartTime = esp_timer_get_time();  // in microseconds
  uint64_t fullScanStartTime = esp_timer_get_time();
  uint16_t adcTemp = 0;

  // Run the state machine forever without blocking waits.
  for (;;) {
    uint64_t now = esp_timer_get_time();

    switch (state) {
      case STATE_SELECT_CHANNEL:
        selectMuxChannel(currentChannel);
        digitalWrite(MUX_EN_PIN, LOW);
        stateStartTime = now;
        state = STATE_WAIT_SETTLE;
        break;

      case STATE_WAIT_SETTLE:
        if (now - stateStartTime >= MUX_SETTLE_TIME_US) {
          state = STATE_ADC_READ;
        } else {
          vTaskDelay(0);
        }
        break;

      case STATE_ADC_READ:
        adcTemp = analogRead(MUX_ANALOG_PIN);
        stateStartTime = now;
        state = STATE_WAIT_ADC;
        break;

      case STATE_WAIT_ADC:
        if (now - stateStartTime >= ADC_CONVERSION_TIME_US) {
          pkg.channelReadings[currentChannel] = adcTemp;
          state = STATE_NEXT_CHANNEL;
        } else {
          vTaskDelay(0);
        }
        break;

      case STATE_NEXT_CHANNEL:
        currentChannel++;
        if (currentChannel < NUM_CHANNELS) {
          state = STATE_SELECT_CHANNEL;
        } else {
          pkg.timestamp_ms = (uint32_t)((esp_timer_get_time() - fullScanStartTime) / 1000);
          xQueueSend(muxQueue, &pkg, 0);
          currentChannel = 0;
          fullScanStartTime = esp_timer_get_time();
          state = STATE_FULL_SCAN_WAIT;
        }
        break;

      case STATE_FULL_SCAN_WAIT: {
          uint32_t elapsed_ms = (uint32_t)((esp_timer_get_time() - fullScanStartTime) / 1000);
          if (elapsed_ms >= SAMPLING_INTERVAL_MS) {
            state = STATE_SELECT_CHANNEL;
          } else {
            vTaskDelay(0);
          }
        }
        break;

      default:
        state = STATE_SELECT_CHANNEL;
        break;
    }
    taskYIELD();
  }
}

// Public initialization: configure pins, create the queue, and start the sampling task.
void MultiplexerModule::init() {
  pinMode(MUX_S0_PIN, OUTPUT);
  pinMode(MUX_S1_PIN, OUTPUT);
  pinMode(MUX_S2_PIN, OUTPUT);
  pinMode(MUX_S3_PIN, OUTPUT);
  pinMode(MUX_EN_PIN, OUTPUT);
  digitalWrite(MUX_EN_PIN, LOW);
  pinMode(MUX_ANALOG_PIN, INPUT);

  muxQueue = xQueueCreate(QUEUE_LENGTH, sizeof(DataPackage_t));
  if (muxQueue == NULL) {
    Serial.println("Error: Could not create multiplexer queue.");
    return;
  }

  BaseType_t result = xTaskCreatePinnedToCore(
    samplingTask,
    "MuxSamplingTask",
    4096,
    NULL,
    1,
    &samplingTaskHandle,
    1
  );
  if (result != pdPASS) {
    Serial.println("Error: Could not create sampling task.");
  }
}

// Public stop function.
void MultiplexerModule::stop() {
  if (samplingTaskHandle != NULL) {
    vTaskDelete(samplingTaskHandle);
    samplingTaskHandle = NULL;
  }
  if (muxQueue != NULL) {
    vQueueDelete(muxQueue);
    muxQueue = NULL;
  }
}

// Retrieve a data package from the queue (non-blocking).
bool MultiplexerModule::getData(DataPackage_t* pkg, TickType_t waitTime) {
  if (muxQueue == NULL) return false;
  return (xQueueReceive(muxQueue, pkg, waitTime) == pdPASS);
}

// Public function to set the sampling mode (runtime changes are not supported).
void MultiplexerModule::setSamplingMode(SamplingMode_t mode) {
  Serial.println("Runtime mode change not supported; set SAMPLING_MODE at compile time.");
}

#ifdef UNIT_TEST
// Unit tests.
void MultiplexerModule::runUnitTests() {
  Serial.println("Running unit tests for MultiplexerModule...");
  for (uint8_t ch = 0; ch < NUM_CHANNELS; ch++) {
    bool b0 = (ch & 0x01) != 0;
    bool b1 = (ch & 0x02) != 0;
    bool b2 = (ch & 0x04) != 0;
    bool b3 = (ch & 0x08) != 0;
    Serial.print("Channel ");
    Serial.print(ch);
    Serial.print(" expects S0=");
    Serial.print(b0 ? "HIGH" : "LOW");
    Serial.print(", S1=");
    Serial.print(b1 ? "HIGH" : "LOW");
    Serial.print(", S2=");
    Serial.print(b2 ? "HIGH" : "LOW");
    Serial.print(", S3=");
    Serial.println(b3 ? "HIGH" : "LOW");
  }
  uint32_t expected_us = NUM_CHANNELS * (MUX_SETTLE_TIME_US + ADC_CONVERSION_TIME_US);
  Serial.print("Expected full-scan time (µs): ");
  Serial.println(expected_us);
  if (sizeof(DataPackage_t) != (NUM_CHANNELS * sizeof(uint16_t) + sizeof(uint32_t))) {
    Serial.println("Error: DataPackage_t size mismatch.");
  } else {
    Serial.println("DataPackage_t size verified.");
  }
  Serial.println("Unit tests completed.");
}
#endif  // UNIT_TEST

June 29, 2025

How AI Is Turbocharging Entrepreneurship: Cutting Costs and Saving Time

Imagine turning your big idea into a working prototype—without having to hire a full engineering team. Thanks to rapidly evolving AI tools, what once took multiple specialized roles can now be done by a single founder or a lean, scrappy crew. The result? Lower costs, faster time to market, and a direct path from concept to pilot.

The Numbers Speak Volumes

• Up to 55% Faster Coding: According to a 2022 GitHub survey, AI-assisted coding can drastically speed up development.

• 20–30% Lower Operational Costs: A McKinsey report found that integrating AI into product development slashes expenses.

• 30–40% Faster Time to Market: A Forrester analysis indicates a sizable jump in speed from concept to pilot thanks to AI-driven prototyping.

Why This Matters for You

1. Solo, But Scalable: AI lets founders wear multiple hats, from coding to marketing, with minimal outside help.

2. Instant Prototyping: Whether you’re pitching to investors or testing a new feature, AI helps you spin up just enough functionality to demonstrate real potential.

3. Stay in Control: AI isn’t a magic wand—human oversight is crucial. You still need to validate outputs, refine code, and ensure your product meets real-world needs.

Seize the Moment—It Won’t Last Forever

Here’s the kicker: the window of opportunity might be shorter than you think. As AI gets more powerful, the competitive edge offered by adopting it early will shrink. Eventually, when everyone can do everything, the advantage disappears. Right now, though, AI can still be your startup’s secret weapon—helping you out-innovate slower, bulkier competitors.

What’s Your Next Move?

• If you’ve got a burning idea: Prototype it!

• If you’re on the fence: Experiment now, before this advantage becomes the norm.

June 29, 2025

How Tohum AB Transformed Technical Requirements, Tackled Technical Debt, and Turned Learning into a Career‑Growth Engine

1. Why We Wrote This Story

Every start‑up eventually discovers that its technology is only half the product; the other half is the knowledge of the engineers who bring that technology to life. At Tohum AB—operating from Göteborg, Sweden and İzmir, Türkiye—we design field‑ready soil‑measurement nodes that pair an ESP32 brain with precision NPK sensors, Li‑ion batteries, high‑efficiency solar panels, SIM‑based back‑haul, and MQTT data pipelines. As the feature list grew, we found ourselves juggling firmware forks, hurried PCB revisions, and an ever‑lengthening “things we’ll fix later” backlog.

This article is a retrospective on how we:

Defined our technical requirements in a way that balanced agronomic accuracy, ultra‑low‑power needs, and manufacturing constraints.
Audited and quantified our technical debt, from buggy BLE drivers to duplicated power‑tree schematics.
Built a skill‑matrix learning plan that turns gaps into growth opportunities rather than stress points.
Embedded career development into our daily engineering rhythm so that the future of every employee is treated as seriously as the future of each product.

We hope the playbook is useful whether you’re architecting your first sensor node or scaling a fleet of tens of thousands of devices.

2. Framing the Mission—From “A Cool Gadget” to Measurable Outcomes

Our founding vision was straightforward: Give farmers affordable, telemetry‑rich insight into soil health so they can optimize fertilizer and irrigation with scientific precision. Turning that vision into an engineering backlog required a language all stakeholders could understand—marketing, field agronomists, firmware hackers, and hardware designers alike.

We therefore wrote each top‑level requirement as a user‑visible outcome followed by engineering metrics:

“A field technician can install a probe and see data in the dashboard within 5 min.”
- 2.4 GHz Wi‑Fi provisioning or BLE Soft‑AP; fallback to pre‑loaded APN for cellular.
- Initial soil pH, EC, temperature values must appear in Grafana within 300 s.
“Nodes must operate unattended for one full Nordic winter without battery replacement.”
- Average current draw ≤ 80 µA over 24 h; deep‑sleep quiescent ≤ 5 µA.
- 6 W panel + 3 500 mAh 18650 must meet energy budget at 55° N with 2 h insolation.
“Data gaps caused by coverage loss must be backfilled automatically.”
- 4 MiB LittleFS ring buffer; SD card optional.
- MQTT QoS 1 resend with exponential back‑off on reconnect.

With outcomes clear, trade‑offs became explicit: a bigger local buffer adds BOM cost; more frequent sampling increases agronomic resolution but drains the battery. Because every requirement could be traced to concrete numbers, we avoided the trap of “nice‑to‑have” features sneaking into revision A boards.

3. Breaking Down the Technology Stack

Next, we decomposed the solution into six focus areas, each championed by a “tech steward” who owned requirements, risk registers, and Jira epics:

Focus Area	Core Technology	Success Metric
Processing & RTOS	ESP32‑S3, FreeRTOS, ESP‑IDF	100 % CPU utilization < 20 ms per cycle
Soil Sensors	Ion‑selective electrode array	±2 % full‑scale accuracy vs lab ref.
Energy	3.7 V Li‑ion, MPPT buck‑boost solar charger	≥95 % charge efficiency at 250 mA
Connectivity	SIM7600 LTE‑CAT‑M1 + Wi‑Fi	> 98 % successful publish ratio
Cloud & Protocol	MQTT/TLS to AWS IoT Core	Latency < 3 s p95
Enclosure & Compliance	IP67 PC‑ABS + potting	Pass CE, FCC, ROHS, IP67 spray

A single‑page “Architecture Canvas” visualized dependencies: the soil sensor board feeds I²C data to the MCU; the power subsystem reports SoC via coulomb counter; the modem shares the UART bus with RS‑485 debugging; the cloud rules engine fans data into InfluxDB for Grafana.

This canvas became the table of contents for all subsequent design docs, preventing siloed thinking and ensuring every engineer knew who to ping when a spec changed.

4. Quantifying Technical Debt—Turning Anecdotes into Numbers

Like many start‑ups, we had accumulated “we’ll fix this after demo day” shortcuts:

Firmware
- Hard‑coded APNs in three different source trees.
- Copy‑pasted ADC routines with magic scaling constants.
Hardware
- Three resistor value changes patched with blue‑wire in pilot batch.
- No consistent naming convention for test points.
Process
- One‑off Bash scripts to flash 50 units on the production line.
- Tribal knowledge about which SDK commit worked with which board revision.

We adopted a three‑step audit similar to Ward Cunningham’s technical‑debt metaphor:

Inventory – an all‑hands “debt safari” produced 126 items in a Confluence page.
Severity & Interest Rate – each item scored 1‑5 for impact and 1‑5 for cost to fix later.
Principal Estimate – hours required if fixed immediately.

The result looked like:

Debt Item	Impact	Interest	Principal (h)	Weighted Score
BLE driver fork	5	5	40	1 000
Inconsistent ADC scale	4	3	8	96
Flash script	2	2	6	24

High‑scoring items were promoted to sprint backlogs. Lower ones remained in the “snow‑bank” but with explicit expiry dates. This quantitative lens depersonalized discussions: the numbers decided priority, not the loudest voice.

5. Mapping Skills to the Product—The T‑Shaped Matrix

Parallel to the debt audit, we realized some “debt” was really a skill deficit. We borrowed the five‑level framework you saw earlier (Absolute Beginner → Expert) and listed 25 competencies (C/C++, ESP‑IDF, FreeRTOS, Git, PlatformIO, Linux CLI, Unity Test, CI/CD, Jira/Confluence, Electronics, PCB design, Power budgeting, Battery management, Cellular comms, IoT protocols, TLS, Sensor calibration, Data logging, OTA, Control theory, Python/Bash, Cloud dashboards, Regulatory compliance, Documentation tooling).

Each engineer self‑assessed and a tech lead moderated for calibration. The outcomes formed a heat map:

Name   C/C++ ESP‑IDF LTE  PCB  MQTT Cloud ...
Ali      4      3      2    5    4     2
Elif     2      2      4    3    3     5
Mert     5      4      1    2    3     2

We call this the T‑shaped Matrix—everyone deep in one spike, broad in the basics. It instantly highlighted risk: if Ali (LTE level 2) went on vacation, cellular issues might stall a sprint. It also illuminated growth paths: Elif could mentor cloud while learning ESP‑IDF from Mert.

6. Crafting the Learning Plan—From Gaps to Goals

A learning plan is credible only if it ties back to product needs and individual motivation. We took three concrete steps:

Skill Buckets Aligned to Road‑map Milestones
Q3: secure boot and flash encryption → everyone to Level 3 in mbedTLS.
Q4: delta‑OTA rollout → two firmware devs to Level 4 in data logging and partition artistry.
Embedded Practice
– Pair‑programming sessions twice a week rotating mentor/mentee.
– “Friday Spike”: 4‑hour sandbox to hack on anything related to the weekly theme—Kalman filter mock‑ups, PlantUML board diagrams, etc.
Visible Progress Metrics
– Monthly “show the thing you learned” lightning talks.
– Jira Goals board with personal epics like “Elif Level‑up ESP‑IDF to 3: OTA example shipped.”

We allocated 10 % of sprint velocity explicitly to learning tasks. Finance initially blinked at the capacity hit, but retention numbers improved: voluntary turnover dropped to 0 % over 18 months.

7. The Career Development Loop—Engineering, Not HR, First

Traditional appraisal cycles felt inadequate. We replaced them with a lightweight Career Kanban:

Column	Definition of Done
Explore	Identify learning goal tied to product milestone
Plan	Pick mentor, resources, deadline
Execute	Complete course/book/pair‑task
Demonstrate	Show working code, schematic, or doc
Reflect	15‑min retro with mentor; update skill matrix

Engineers move cards at their own pace; leads step in only to unblock or celebrate. This fluidity mirrors how pull requests work—learning is just another branch.

8. Results in the Field—Fewer Downtimes, Happier Farmers

Twelve months after launching the framework:

Bug backlog shrank by 48 %.
Firmware release cadence improved from quarterly to monthly.
Battery runtime exceeded the original winter target by 18 %.
Support tickets for cellular dropouts fell by 60 %.

Most tellingly, time from spec to first validated prototype went from eight weeks to 4.5. Engineers credited cross‑training: the PCB designer who reached Level 3 in Python wrote a calibration script that saved a week of test‑bench labor.

9. Caring for the Future—Why Learning Is Part of Our ESG Charter

We position continuous learning not just as an HR perk but as an environmental and social imperative:

Environmental – Better‑designed firmware means fewer field visits and lower CO₂ footprint.
Social – Upskilled employees become local tech ambassadors, boosting regional innovation.
Governance – Documented skill matrices and transparent promotions reduce bias and strengthen compliance.

Each quarter we publish an internal “Learning Impact Report” showing correlations between training hours and key OKRs (bug density, energy efficiency). The board reviews it alongside revenue numbers—proof that people metrics earn equal board‑room airtime.

10. Lessons Learned & Tips for Other Teams

Write outcome‑driven requirements first; technology choices follow naturally.
Quantify technical debt so arguments become math, not emotion.
Visual skill matrices surface talent bottlenecks faster than org charts.
Allocate focused time (we chose 10 %) to make learning credible.
Tie learning to live code—a merged PR teaches more than any MOOC.
Celebrate small wins; lightning talks trump slide decks for sharing tacit know‑how.
Publicly connect learning to product KPIs to win executive sponsorship.

11. What’s Next for Tohum AB

Our 2026 road map includes NB‑IoT + LoRa dual connectivity, AI‑driven nutrient predictions, and a rugged v6 enclosure targeting IP69K wash‑down farming operations. Each frontier spawns new skill‑matrix columns—ultra‑low‑power ML, LoRaWAN MAC deep dives, high‑pressure ingress labs. The framework lives on: we’ll repeat the requirement‑debt‑learning cycle with every leap.

12. A Final Word on People

Silicon chips age; solar panels degrade; even the richest soil eventually needs replenishment. The only asset that compounds value indefinitely is human potential. By engineering our learning process with the same rigor we apply to schematics and code, we discovered a virtuous loop: better products fund more learning, which yields better careers, which in turn produce better products.

If you take one thing from our story, let it be this: invest in your engineers like you invest in your battery chemistry. Both store energy, but only one can invent the future.

June 29, 2025

Author: TohumAB

Beyond OCR: How LLMs Are Transforming Structured PDF Extraction

Introduction

How Traditional OCR Parses PDFs

How LLMs Read PDFs Differently

OCR Challenges with Complex Documents

LLMs to the Rescue: Why They Excel for Structured Forms

Performance Showdown: Accuracy, Speed, and Efficiency

Real-World Impact on Business Workflows

Conclusion & Key Takeaways