Philips TMS320C6713 Car Stereo System User Manual


 
SPRA921
6 TMS320C6713 Digital Signal Processor Optimized for High Performance Multichannel Audio Systems
Table 1. C6713 Benchmark Performance
Algorithm Description Parameter Values Cycles Time
Biquad filter
(IIR filter direct form II)
nx input/output cycles nx = 60
nx = 90
316
436
1.4 µs
1.9 µs
Real FIR filter nh coefficients
nr output samples
nh = 24
nr = 64
nh = 30,
nr = 50
802
795
3.6 µs
3.5 µs
IIR filter nr number of output samples nr = 64 443 2.0 µs
IIR lattice filter nr number of samples
nk number of reflection coefficients
nk = 10,
nr = 100
4125 18.3 µs
Dotproduct
nx number of values nx = 512 281 1.2 µs
3 Two-Level Cache
3.1 Cache Overview
The TMS320C6713 device utilizes a highly efficient two-level real-time cache for internal
program and data storage. The cache delivers high performance without the cost of large arrays
of on-chip memory. The efficiency of the cache makes low cost, high-density external memory,
such as SDRAM, as effective as on-chip memory.
The first level of the memory architecture has dedicated 4K Byte instruction and data caches,
L1I and L1D respectively. The LII is direct-mapped where as the L1D provides 2-way
associativity to handle multiple types of data. The second level (L2) consists of a total of 256K
bytes of memory. 64K bytes of this can be configured in one of five ways:
64K 4-way associative cache
48K 3-way associative cache, 16K mapped RAM
32K 2-way associative cache, 32K mapped RAM
16K direct mapped associative cache, 48K mapped RAM
64K Mapped RAM
Dedicated L1 caches eliminate conflicts for the memory resources between the program and
data busses. A unified L2 memory provides flexible memory allocation between program and
data for accesses that do not reside in L1.
3.2 Cache Hides Off-Chip Latency
The external memories that interface to the TMS320C6713 may operate at a maximum of
100 MHz, while the device operates at a 225 MHz maximum frequency. All external memory
devices have significant start-up latencies associated with them. For example, SDRAMs typically
have a read latency of 2-4 bus cycles. The reduced frequency and additional latency of
memories would normally significantly degrade processor performance. There is a significant
reduction in latency for retrieving data from on-chip L2 memory than from an external memory.
By having the intermediate L2 cache, this latency is hidden from the user. Using the fast L2
memories to cache the slower external memories reduces the latency of external accesses by a
factor of five.