BLOG

Beyond Von Neumann

Toward a unified deterministic architecture

For over 50 years, our industry has been bound by the Von Neumann model — from CPUs and GPUs to specialized AI accelerators. Even as we added complexity through speculation, prediction, and out-of-order execution, performance came at the cost of efficiency and predictability.

In my new VentureBeat article, I introduce a new paradigm: Deterministic Execution — a cycle-accurate approach that eliminates speculation and unifies scalar, vector, and matrix compute under one deterministic scheduler.

By orchestrating compute and memory with precise timing, we can achieve higher throughput, lower power, and simpler hardware — a foundation for the next generation of AI and general-purpose processors.

CHECK OUT THE FULL ARTICLE

By Dr. Thang Minh Tran, CEO/CTO Simplex Micro | SEPTEMBER 4TH, 2025

Rethinking Scoreboards

A path forward for ai-era cpus

In the pursuit of higher performance and tighter power budgets, AI accelerators are pushing microarchitectures to their limits. Deep pipelines and speculative execution bring power penalties and complexity—but what if we’ve been overlooking a simpler, more elegant solution?

In my latest article on SemiWiki, I explore how a refined scoreboard architecture—rooted in classic CPU design—can offer a scalable, energy-efficient path forward. By enabling precise instruction scheduling and eliminating speculative overhead, we can unlock instruction-level parallelism without the usual cost.

Read the full article

By Dr. Thang Minh Tran, CEO/CTO Simplex Micro | July 1st, 2025

Speculative Execution

Rethinking the approach to CPU scheduling in ai data centers

In this latest SemiWiki article from Simplex Micro’s CEO/CTO Thang Tran, he explores how predictive execution can replace speculative execution in modern data centers, offering a more efficient, cost-effective solution. This shift not only reduces power consumption and silicon overhead but also eliminates the need for expensive HBM, transforming the future of AI infrastructure.

As AI workloads continue to grow, rethinking the traditional speculative execution model is essential for staying competitive. In his article, Thang dives into how predictive execution can streamline CPU scheduling, helping data centers scale more efficiently and sustainably.

Read the full article

by Jonah McLeod | may 7th, 2025

Predictive Load Handling

Solving a quiet bottleneck in modern dsps

Memory stalls: the silent killer in DSPs for embedded AI.

When we talk performance, it’s easy to focus on MACs, vector width, or clock speed. But in the trenches of edge AI—voice, radar, low-power vision—it’s latency that quietly derails everything.

My latest article explores a persistent bottleneck most DSP toolchains overlook: non-cacheable memory and the precise timing demands it imposes.

Traditional approaches rely on deterministic scratchpads and TCM. But if the data isn’t there exactly when it’s needed? The pipeline stalls, IPC crashes, power is wasted.

Enter Predictive Load Handling—a technique that doesn’t try to guess what address will be accessed, but instead predicts when it will be available.

It’s a subtle shift with major implications for real-time inference.

Read the full breakdown here

by Jonah McLeod | April 17th, 2025

Even HBM Isn’t Fast Enough All the Time

why latency-tolerant architectures matter
in the age of ai supercomputing

High Bandwidth Memory (HBM) has become the defining enabler of modern AI accelerators.

From NVIDIA’s GB200 Ultra to AMD’s MI400, every new AI chip boasts faster and larger stacks of HBM, pushing memory bandwidth into the terabytes-per-second range. But beneath the impressive specs lies a less obvious truth: even HBM isn’t fast enough all the time. And for AI hardware designers, that insight could be the key to unlocking real performance.

read the full article here

by Jonah McLeod | April 7th, 2025

september 21st, 2023

– This collection of online lectures was led by Thang Tran and provide over 6 hours of insight into vector processor design

MTIA: First Generation Silicon Targeting Meta’s Recommendation Systems

June 23rd, 2023

– The 512-bit OoO RISC-V vector processor was architected and designed by Thang Tran

Naveed Sherwani on Meta’s MTIA

may 20th, 2023

MTIA v1: Meta’s first-generation AI inference accelerator

MAY 18TH, 2023

RISC-V Areas of Innovation

SCALABILITY

RISC-V processors can scale from low-power, low-performance devices to high-performance computing systems, making it a versatile technology for a range of applications. This scalability is achieved through the modular design of the ISA, which allows for customization and optimization at all levels of the system.

MODULARITY

The RISC-V ISA is modular, which means that it can be customized to meet the needs of different applications and markets. This modularity allows developers to design RISC-V processors that are optimized for specific use cases, such as low-power IoT devices or high-performance computing systems.

EXTENSIBILITY

RISC-V’s extensible ISA enables customers to add their own unique instructions and ‘secret sauce’ to gain a competitive edge over industry rivals, while also future-proofing their technology with the ability to add new features and capabilities over time without requiring changes to the core architecture.

LOW POWER

RISC-V’s simple ISA enables low-power techniques such as multi-level clock-gating and multiple power domains, optimizing processor design for energy efficiency in mobile and IoT devices.

SECURITY

RISC-V processors can incorporate security features at the hardware level, which can help to protect against cyber threats such as hacking and malware. This hardware-level security is achieved through features such as memory protection and encryption, which are integrated into the design of the processor.

FLEXIBILITY

RISC-V’s modular and extensible ISA allows for flexible customization at all levels, enabling configuration to unique designs tailored to various applications and customers.

INTEROPERABILITY

The RISC-V ecosystem is built upon a broad range of companies that belong to the RISC-V international consortium, enabling seamless interoperability with other technologies and systems.

LOW BARIER TO ENTRY

RISC-V’s open-source and scalable design, with its low barrier to entry, makes it an ideal choice for IoT development, encouraging innovation and creating a diverse ecosystem of developers and applications, from hobbyists and startups to industry leaders.

Download Our White Paper on Time-based Microprocessors

In the pursuit of performance, modern microprocessor design has become increasingly complex, often leading to higher power consumption and design challenges. At Simplex Micro, we believe there is a more efficient path forward.

In this white paper, CEO/CTO Dr. Thang Tran details a novel, time-based architecture that utilizes status scheduling to achieve high-performance, out-of-order execution without the overheard of traditional dynamic schedulers.

Discover how this “fire-and-forget” methodology simplifies the pipeline, reduces power consumption, and provides a scalable foundation for next-generation custom and vector processors.

To receive a copy of the full white paper, please complete the form below.