-
Notifications
You must be signed in to change notification settings - Fork 2
Architecture
- Top-level Parameters
- Aggregator
- Transpose Buffer
- Input Reorder Buffer
- Output Reorder Buffer
- Memory Storage Wrapper
- Addressing Logic
-
data_width - number of bits in a word
-
memory_width - number of bits that can be stored at an address in SRAM
-
memory_depth - depth of memory macro
-
num_ports - number of ports on the memory macro (single, two, dual
-
partial_write - indicates if partial write is supported (full memory_width does not have to be written in each address)
-
num_banks - number of memory banks
-
iterator_support - maximum number of range/stride loops supported
-
interconnect_in - number of input ports from the interconnect
-
interconnect_out - number of output ports to interconnect
Description: Groups input words together and outputs a group of words that are the SRAM width (memory_width / word_width) to store at an address in SRAM
Operation: SIPO
Parameters:
- clk
- word_width
- memory_width
Inputs:
- input_pixels - word bits
Outputs:
- aggregated_output - array of (memory_width / word_width) words to store at an address in SRAM
- full - word is full
Algorithms: N/A
Current Limitations:
- memory_width should be divisible by word_width
Description: Organizes rows of pixels stored in addresses of SRAM into columns that can be output to registers in the interconnect.
Operation: PISO, using a double buffer to simultaneously store words from SRAM while outputting columns to interconnect shift registers
Parameters:
- clk
- rst
- word_width
- memory_width
- interconnect_out
Inputs:
- sram_input - data read from an address in SRAM
- valid_input - vector the same length as sram_input indicating which words in sram_input need to be stored currently in the transpose buffer
Outputs:
- valid_col_pixels - column of pixels to interconnect output ports
- read_valid - output data is valid
- stencil_valid - entire stencil is valid
Algorithms: N/A
Current Limitations:
Description: Implements mapping algorithm given input address and input word data to which aggregator the word should be passed, and what order within one output of the aggregator this word should be stored in. Essentially, this reorder buffer determines at what address in the SRAM and what order within all the words stored at that address the current input word should be stored at given the input address and output access pattern.
Operation:
Effectively implements a configurable switching network with limited access-aware buffering
Parameters:
- data_width
- memory_width
- num_banks
- num_ports
- interconnect_in
- partial_write
- num_aggregators
Inputs:
- clk
- rst
- in_word_data
- input address
- output access pattern: range, stride, starting address
Outputs:
- out_word_data determined to be next in "optimal" order for aggregator
- agg_index - which aggregator to store word_data in
- word_index - what order to store word in one SRAM address (what index of aggregator output should this word be at to be properly stored in the optimal ordering in SRAM)
Algorithms:
- agg_index and word_index (order within one word) determination for in_word_data
Current Limitations:
Related Literature:
- https://ieeexplore.ieee.org/document/7827608
- https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6732285
Description: Implements mapping algorithm given output access pattern to which address from SRAM to read from and which transpose buffer data consisting of (memory_width / word_width) words to input this data from SRAM to.
Operation:
Parameters:
- word_width
- memory_width
- num_banks
- num_ports
- partial_write
Inputs:
- clk
- rst
- num_transpose_bufs
- output access pattern: range, stride, starting addr
Outputs:
- data_out from SRAM read at an address determined in this module
- tb_index - which transpose buffer to store data_out in
Algorithms:
- read address determination for data_out
Current Limitations:
Related Literature:
- https://ieeexplore.ieee.org/document/7827608
- https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6732285
Description: Memory storage wrapper is similar to algorithmic memory and includes mapping table. Wrapper is the RTL that supports virtual ports for underlying physical memory.
Additional memory modules (likely same size as memory depth) can be used to keep track of data written in the case of a write conflict. This mapping table records where data in an address has actually been stored and maps the intended address to the actual address for future reads. Note that there is a linear relationship between the number of memory modules and number of write ports. Other implementations using some sort of mapping table have also been discussed.
Operation: Literature review has several different possibilities for implementation. - To be decided
Parameters:
- num_ports
- memory_width
- num_banks
Inputs:
- clk
- rst
- Read/Write signal
- Data to be written if Write
- Address to read from / write to
Outputs:
- output data from SRAM
Modified structures:
- if conflict, mapping table includes where write data has been stored so that when there is a read to the original address, the correct data is read
Algorithms: Literature review has several different possibilities for implementation. - To be decided
Current Limitations:
Related Literature:
Memory Storage Wrapper:
- http://yuba.stanford.edu/~sundaes/Papers/DesignCon-AlgMem.pdf
- http://yuba.stanford.edu/~sundaes/Presentations/IBMSTS2013.pdf
- https://patentimages.storage.googleapis.com/d0/4d/05/079d66ba234a10/US8266408.pdf
Mapping Table:
Description: Generates address from access pattern.
Operation: Loop through range and stride to find offsets to add to start addresses to get SRAM addresses.
Parameters:
- iterator_support
Inputs:
- access pattern: range, stride, start addr
Outputs:
- SRAM address
Algorithms: N/A
Current Limitations:
- cannot have more than iterator_support nested loops