This project is a basic Hardware Accelerator designed for high-performance arithmetic operations, specifically Multiply-Accumulate (MAC) functions often used in DSP or AI workloads. Architected from scratch in SystemVerilog, this hardware module operates independently of the main CPU. It utilizes an autonomous pipeline and elastic data buffering to achieve maximum computational density per clock cycle.
In digital signal processing, computing elements frequently face "data starvation" when memory fetching cannot keep up with execution speeds. Furthermore, deep accumulation loops in standard ALUs are prone to integer overflow, leading to catastrophic phase inversion in audio or corrupted weights in AI models. This project was built to solve both the memory-compute impedance mismatch and arithmetic instability simultaneously.
Designed a dual-buffer system to decouple data ingestion from the processing core.
The core computational unit, stripped down for maximum frequency scaling.
MAX_POS or MAX_NEG, ensuring mathematically stable output streams.The top-level wrapper acts as a micro-sequencer. It autonomously drives the datapath enable signals based purely on buffer occupancy. A synchronous shift-register tracks data through the pipeline, asserting the output valid flag exactly aligned with the 2-cycle algorithmic latency.
Developed a rigorous simulation environment to prove the logic. The test suite heavily stressed the saturation boundaries, intentionally flooding the accumulator with edge-case limits (e.g., 0x7FFFFFFF + 0x7FFFFFFF) to verify that the clamping mechanisms triggered perfectly without stalling the pipeline throughput.
Sustained Throughput
CPU Overhead
Overflow Immune