Skip to content

Architecture

kavyasreedhar edited this page Oct 25, 2019 · 48 revisions

Lake Hardware Architecture

Contents


Top-level Parameters

  • data_width - number of bits in a word

  • memory_width - number of bits that can be stored at an address in SRAM

  • memory_depth - depth of memory macro

  • num_ports - number of ports on the memory macro (single, two, dual

  • partial_write - indicates if partial write is supported (full memory_width does not have to be written in each address)

  • num_banks - number of memory banks

  • iterator_support - maximum number of range/stride loops supported

  • interconnect_in - number of input ports from the interconnect

  • interconnect_out - number of output ports to interconnect


Aggregator

Description: Groups input words together and outputs a group of words that are the SRAM width (memory_width / word_width) to store at an address in SRAM
Operation: SIPO
Parameters:

  • clk
  • word_width
  • memory_width

Inputs:

  • input_pixels - word bits

Outputs:

  • aggregated_output - array of (memory_width / word_width) words to store at an address in SRAM
  • full - word is full

Algorithms: N/A
Current Limitations:

  • memory_width should be divisible by word_width

Transpose Buffer

Description: Organizes rows of pixels stored in addresses of SRAM into columns that can be output to registers in the interconnect.
Operation: PISO, using a double buffer to simultaneously store words from SRAM while outputting columns to interconnect shift registers
Parameters:

  • clk
  • rst
  • word_width
  • memory_width
  • interconnect_out

Inputs:

  • sram_input - data read from an address in SRAM
  • valid_input - vector the same length as sram_input indicating which words in sram_input need to be stored currently in the transpose buffer

Outputs:

  • valid_col_pixels - column of pixels to interconnect output ports
  • read_valid - output data is valid
  • stencil_valid - entire stencil is valid

Algorithms: N/A
Current Limitations:


Input Reorder Buffer (Aggregator Allocation)

Description: Implements mapping algorithm given input address and input word data to which aggregator the word should be passed, and what order within one output of the aggregator this word should be stored in. Essentially, this reorder buffer determines at what address in the SRAM and what order within all the words stored at that address the current input word should be stored at given the input address and output access pattern.
Operation:
Effectively implements a configurable switching network with limited access-aware buffering
Parameters:

  • data_width
  • memory_width
  • num_banks
  • num_ports
  • interconnect_in
  • partial_write
  • num_aggregators

Inputs:

  • clk
  • rst
  • in_word_data
  • input address
  • output access pattern: range, stride, starting address

Outputs:

  • out_word_data determined to be next in "optimal" order for aggregator
  • agg_index - which aggregator to store word_data in
  • word_index - what order to store word in one SRAM address (what index of aggregator output should this word be at to be properly stored in the optimal ordering in SRAM)

Algorithms:

  • agg_index and word_index (order within one word) determination for in_word_data

Current Limitations:
Related Literature:


Output Reorder Buffer (Transpose Buffer Allocation)

Description: Implements mapping algorithm given output access pattern to which address from SRAM to read from and which transpose buffer data consisting of (memory_width / word_width) words to input this data from SRAM to.
Operation:
Parameters:

  • word_width
  • memory_width
  • num_banks
  • num_ports
  • partial_write

Inputs:

  • clk
  • rst
  • num_transpose_bufs
  • output access pattern: range, stride, starting addr

Outputs:

  • data_out from SRAM read at an address determined in this module
  • tb_index - which transpose buffer to store data_out in

Algorithms:

  • read address determination for data_out

Current Limitations:
Related Literature:


Memory Storage Wrapper / Mapping Table

Description: Memory storage wrapper is similar to algorithmic memory and includes mapping table. Wrapper is the RTL that supports virtual ports for underlying physical memory.
Additional memory modules (likely same size as memory depth) can be used to keep track of data written in the case of a write conflict. This mapping table records where data in an address has actually been stored and maps the intended address to the actual address for future reads. Note that there is a linear relationship between the number of memory modules and number of write ports. Other implementations using some sort of mapping table have also been discussed.
Operation: Literature review has several different possibilities for implementation. - To be decided
Parameters:

  • num_ports
  • memory_width
  • num_banks

Inputs:

  • clk
  • rst
  • Read/Write signal
  • Data to be written if Write
  • Address to read from / write to

Outputs:

  • output data from SRAM

Modified structures:

  • if conflict, mapping table includes where write data has been stored so that when there is a read to the original address, the correct data is read

Algorithms: Literature review has several different possibilities for implementation. - To be decided
Current Limitations:
Related Literature:
Memory Storage Wrapper:

Mapping Table:


Addressing Logic

Description: Generates address from access pattern.
Operation: Loop through range and stride to find offsets to add to start addresses to get SRAM addresses.
Parameters:

  • iterator_support

Inputs:

  • access pattern: range, stride, start addr

Outputs:

  • SRAM address

Algorithms: N/A

Current Limitations:

  • cannot have more than iterator_support nested loops