11
Data security concurrent with homogeneous by AES algorithm in SSD controller Lingyan Fan 1 , Jianjun Luo 1a) , Hailuan Liu 2 , and Xuan Geng 3 1 Micro-Electronics Research Institute, Hangzhou Dianzi University, No.1, 2 nd Avenue, Xiasha District, Hangzhou City, 310018, China 2 Sage Microelectronics Corporation, 910 Campisi Way, Campbell, CA95008, USA 3 Department of Electronic Engineering, College of Information Engineering of Shanghai Maritime University, Shanghai, 201306, China a) Jianjun.Luo@hdu.edu.cn Abstract: AES has been one of the most popular encryption and decryption algorithms for data security applications. At the same time, data randomization (or homogeneous) technology was applied to reduce the bit error rate (BER) of MLC and TLC ash memory. Here, AES algorithm was found ecient to replace the orthogonal polyno- mials which normally carry out homogeneous function by scrambling data. This paper put forward a novel hardware architecture providing both homogeneous and data encryption/decryption functions concur- rently by an embedded AES hardware engine while getting rid of randomization engine with Linear Feedback Shift Register (LFSR). It made a ash controller simple and reduced the die size because the independent homogeneous hard engine is no longer necessary for a ash memory system, in which AES security algorithm embedded. Finally a SSD controller designed in this architecture was silicon proven. Keywords: NAND ash memory, homogenization, AES Classication: Storage technology References [1] R. H. Morelos-Zaragoza: The Art of Error Correcting Coding (Wiley Press, West Sussex, 2006) 2nd ed. 39. [2] Federal Information Processing Standards Publication 197, Specication for the ADVANCED ENCRYPTION STANDARD (AES) (2001). [3] J. L. Massey: IEEE Trans. Inf. Theory 15 (1969) 122. DOI:10.1109/TIT.1969. 1054260 [4] Micron: L74A NAND ash memory data sheet, Rev.E3/11EN (2009). [5] Micron: M73A NAND Flash memory data sheet, Rev.B7/10EN (2010). [6] M. Dworkin: Recommendation for Block Cipher Modes of Operation Methods and Techniques, NIST Special Publication 800-38A 2001 Edition. © IEICE 2014 DOI: 10.1587/elex.11.20140535 Received June 5, 2014 Accepted June 9, 2014 Publicized June 26, 2014 Copyedited July 10, 2014 1 LETTER IEICE Electronics Express, Vol.11, No.13, 111

Data security concurrent with homogeneous by AES algorithm

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data security concurrent with homogeneous by AES algorithm

Data security concurrentwith homogeneous by AESalgorithm in SSD controller

Lingyan Fan1, Jianjun Luo1a), Hailuan Liu2, and Xuan Geng31 Micro-Electronics Research Institute, Hangzhou Dianzi University,

No.1, 2nd Avenue, Xiasha District, Hangzhou City, 310018, China2 Sage Microelectronics Corporation, 910 Campisi Way, Campbell, CA95008, USA3 Department of Electronic Engineering, College of Information Engineering of

Shanghai Maritime University, Shanghai, 201306, China

a) [email protected]

Abstract: AES has been one of the most popular encryption anddecryption algorithms for data security applications. At the same time,data randomization (or “homogeneous”) technology was applied toreduce the bit error rate (BER) of MLC and TLC flash memory. Here,AES algorithm was found efficient to replace the orthogonal polyno-mials which normally carry out homogeneous function by scramblingdata. This paper put forward a novel hardware architecture providingboth homogeneous and data encryption/decryption functions concur-rently by an embedded AES hardware engine while getting rid ofrandomization engine with Linear Feedback Shift Register (LFSR). Itmade a flash controller simple and reduced the die size because theindependent homogeneous hard engine is no longer necessary for a flashmemory system, in which AES security algorithm embedded. Finally aSSD controller designed in this architecture was silicon proven.Keywords: NAND flash memory, homogenization, AESClassification: Storage technology

References

[1] R. H. Morelos-Zaragoza: The Art of Error Correcting Coding (Wiley Press,West Sussex, 2006) 2nd ed. 39.

[2] Federal Information Processing Standards Publication 197, Specification forthe ADVANCED ENCRYPTION STANDARD (AES) (2001).

[3] J. L. Massey: IEEE Trans. Inf. Theory 15 (1969) 122. DOI:10.1109/TIT.1969.1054260

[4] Micron: L74A NAND flash memory data sheet, Rev.E3/11EN (2009).[5] Micron: M73A NAND Flash memory data sheet, Rev.B7/10EN (2010).[6] M. Dworkin: Recommendation for Block Cipher Modes of Operation Methods

and Techniques, NIST Special Publication 800-38A 2001 Edition.

© IEICE 2014DOI: 10.1587/elex.11.20140535Received June 5, 2014Accepted June 9, 2014Publicized June 26, 2014Copyedited July 10, 2014

1

LETTER IEICE Electronics Express, Vol.11, No.13, 1–11

Page 2: Data security concurrent with homogeneous by AES algorithm

1 Introduction

NAND flash memory is dominating today’s mobile storage market. Forexample, SD/MMC card with flash memory inside came to replace films incamera all over the world ten years ago. Therefore, Solid State Disk (SSD)reveals its ambition in replacing traditional motor-driven hard disk in high-end computing systems including, but not limited to, notebook computer,network server, etc.

It is well-known that NAND flash itself is not perfect media. It contains alarge number of randomly scattered bad blocks and requires on-the-fly errorcorrections. These limitations are dramatically worse in Multi-Level Cell(MLC) and Triple-Level Cell (TLC) NAND flash compared with Single LevelCell (SLC). Therefore, MLC flash is asking for a higher error correction (ECC)ability than that of SLC flash. There are many ECC algorithms developed inthe industry. BCH, brought up by Hbose, Ray-Chaudhuri and Ocquenghemseparately [1], has been regarded as one of the most popular algorithms. Manyof the flash controllers today are declaring their ECC ability in range of 24 bit,48 bit or even 72 bit per 1KB payload with the BCH algorithm. It is somehowdisappointing that there are still fail cases in MLC and TLC flash memorycaused by BER beyond the enhanced ECC ability. And then Low-DensityParity-Check (LDPC) was applied with higher expectation of ECC ability.

When studying the MLC and TLC’s bit error models according to theirphysical working mechanism, a technology called “Homogeneous” is applied toreduce the probability of the bit errors generated and reduce the pressure onthe ECC’s algorithm ability. Although homogeneous is a concept that isapplied in other fields, Homogeneous plays a more important role in flashcontrollers.

At the meaning time, data security is becoming more important in today’sstorage system. The Advanced Encryption Standard (AES) [2] is the well-known algorithm for data security.

Here an AES engine is designed as an embedded module taking not onlythe role of encryption/decryption but also concurrently carrying out homoge-neous function to reduce flash memory bit error rate, so it is not necessary toembedded both the AES engine and the independent homogeneous engine inSD/MMC or SSD controllers.

2 Homogeneous technology

Bit errors in the basic flash memory are regarded as randomly distributed. Thisis accurate enough for SLC flash memory and a decent estimate for the MLC orTLC BER. The situation is very similar with BER analysis in communicationsystems. However, the much higher bit rate in MLC and TLC compared withthat of SLC is obviously not caused by this kind of random bit error mechan-ism, “Burst Error” and “Inter-Page Error” becomes the error case dominator.

There are two logic bits stored in a physical memory cell in MLC flashmemory. These two bits are actually recognized by the four levels of voltageson a same floating gate. This means SLC has larger noise margin. Therefore,

© IEICE 2014DOI: 10.1587/elex.11.20140535Received June 5, 2014Accepted June 9, 2014Publicized June 26, 2014Copyedited July 10, 2014

2

IEICE Electronics Express, Vol.11, No.13, 1–11

Page 3: Data security concurrent with homogeneous by AES algorithm

the much higher bit error rate is doubtlessly an inherent vice of MLC andTLC.

In the communication system, random errors and burst errors can besuppressed by scrambling the data packets. Here, the same technology, socalled Homogeneous, is introduced to deal with bit errors in flash memory.Most of the mathematic algorithms, which originally developed as white noisesources, can be applied for the Homogeneous purpose because the basic theoryis the same.

It should be noted that Homogeneous cannot correct any bit of errors.Homogeneous only helps distribute error bits more evenly and cut the “peak”BER to an average level. Assume a BCH algorithm can protect 1K bytepayload data, and a certain TLC flash memory asks K bit error ECC abilitydefined by the peak number of error bits. With homogeneous, a flash con-troller can deal with it by ECC ability as small as 2K/3 or even K/2 bits.

Here is an example of algorithm to realize homogeneous (randomization)by data scrambling method with Linear Feedback Shift Register (LFSR) [3].Fibonacci Implementation as polynomial Eq. (1) can be free-running LinearFeedback Shift Register (LFSR) in hardware. In Fig. 1, The initialized valueof an LFSR seed (D0-D15) shall be FFFF.

LFSR ¼ X15 þX5 þX4 þX3 þX1 þ 1 ð1Þ

NAND flash memory is accessed by unit of “Page” which depends on differentflash memory part numbers, for example, 2K byte or 8K byte, etc. Unfortu-nately, the bits in a same cell of MLC or TLC flash memory are not located ona same page. The 2-bit information on each cell of MLC flash is distributed ontwo associated pages: LSB page (Least Significant Bit Page) and MSB page(Most Significant Bit Page). These two associated pages are called PairedPages or Shared Pages. Flash memory chips produced by different vendors oreven different flash part numbers from a same vendor, have different pairedpage structures. For example, MT29F64G08CBAA, a MLC NAND type flashmemory by Micron [4, 5], has the Paired Page structure shown in Fig. 2.

Because The 2-bit information is in the same floating gate, the states oftwo pages will be affected by one another. This kind of error happens betweenthe paired pages. Therefore, it is called “Inter-Page Error”.

Homogeneous is enhanced here to affect the paired pages by a group oforthogonal polynomials, and finally eliminate, or minimize, the influencebetween the paired pages. Each page is scrambled by one of the polynomials.The property of orthogonal polynomials makes the paired pages “separated”(shown in Fig. 3). Here, Ki is a logic switcher selecting the corresponding datastream, and

Fig. 1. Hardware engine for polynomial Eq. (1)

© IEICE 2014DOI: 10.1587/elex.11.20140535Received June 5, 2014Accepted June 9, 2014Publicized June 26, 2014Copyedited July 10, 2014

3

IEICE Electronics Express, Vol.11, No.13, 1–11

Page 4: Data security concurrent with homogeneous by AES algorithm

0 � i < Max. Page Number in a block,or 0 � i < Max. Page Number in a paired page set.

Each polynomial is a hardware engine consisting of D flip-flops and exclusiveOR gates in a real silicon chip. Homogeneous function is normally imple-mented by a group of polynomial engines. It is possible to make it simpler byonly one polynomial if it can provide a group of white noise patterns withenough distance longer than flash page size. The polynomial (1) with itsphysical circuit in Fig. 1 is a good example to meet this requirement. Startingwith different seeds, which are the reset values of D flip flops, the polynomial(1) can generate a group of vectors, each set of vectors (pattern) can beapplied to homogenize the corresponding flash page.

Assume the target flash is TLC type with 256 pages per block and 8Kbytes per page. At least eight seeds should be prepared for eight vectors andeach vector with its corresponding seed is applied for one flash page becausethe paired pages consist of eight physical flash pages. Of course, the vectorsshould be kept 8K byte in distance. A list of eight seeds is given in Fig. 4 as anexample.

3 Efficiency of homogeneous

A lab test result is shown in Fig. 5. The test target is two randomly selectedflash memory chips from one thousand samples of Micron’s TLC type flash,which had 64G bit density with the following features:

Fig. 2. An example of paired pages

Fig. 3. A group of orthogonal polynomials for homogeneous

© IEICE 2014DOI: 10.1587/elex.11.20140535Received June 5, 2014Accepted June 9, 2014Publicized June 26, 2014Copyedited July 10, 2014

4

IEICE Electronics Express, Vol.11, No.13, 1–11

Page 5: Data security concurrent with homogeneous by AES algorithm

• Page size: 8K bytes• Block Size: 256 pages• Density: 8G bytes, 4K blocks, 1M pages

In order to make the investigation simple, no wear-leveling algorithm wasinvolved while firmware controlled all the operations by writing data into thewhole flash chip and reading back for comparing with bit error counter. BCHalgorithm with 1K byte data length was applied; therefore, one round of wholechip error scanning took eight Mega (8M) times of BCH algorithm operation.

Each flash chip was tested with 15 rounds of whole space scanning. Thismeans total K ¼ 1:2 � 108 times of BCH error detection. In Fig. 5 each pointshown is the detected times in the range of error bits. For example, point X isthe error detected number N for those BCH result indicating 5 or 6 or 7 or 8error bits. The error detected number is rescaled by “Log (N)”. N is very closeto K because the majority of BCH result indicated no error or error bits lessthan 4.

Fig. 4. An example of eight seeds for homogenization inTLC flash memory

Fig. 5. Bit errors with and without homogeneous

© IEICE 2014DOI: 10.1587/elex.11.20140535Received June 5, 2014Accepted June 9, 2014Publicized June 26, 2014Copyedited July 10, 2014

5

IEICE Electronics Express, Vol.11, No.13, 1–11

Page 6: Data security concurrent with homogeneous by AES algorithm

From the curve in Fig. 5, the ECC ability has to be beyond 60 bit per 1Kbyte payload to obtain least reliability without homogeneous applied, while 40bit or even 36 bit ECC ability is good enough if there is Homogeneous.

4 Data security by AES algorithm

The Advanced Encryption Standard (AES) is a specification for the encryp-tion of electronic data established by the U.S. National Institute of Standardsand Technology (NIST) in 2001. Having been adopted by the U.S. govern-ment, AES superseded the Data Encryption Standard (DES) and became oneof the most popular data security algorithms. Basically, it is a symmetric-keyalgorithm, in which the same key is used for both encrypting and decryptingthe data.

Federal Information Processing Standards Publication 197 (FIP-197)described the AES algorithm in details. As FIP defined, the AES encryptionalgorithm uses fairly straightforward techniques for substitution and per-mutation, except for the MixColumns routine. The MixColumns routineuses special addition and multiplication. The addition and multiplicationused by AES are based on mathematical field theory. In particular, AES isbased on a field called GF(28). For the AES algorithm, the irreduciblepolynomial is:

mðxÞ ¼ x8 þ x4 þ x3 þ xþ 1 ð2ÞFor both its Cipher and Inverse Cipher, the AES algorithm uses a roundfunction composed of four different byte-oriented transformations:1) SubBytes() transformation: it is a non-linear byte substitution that

operates independently on each byte of the state using a substitutiontable (S-box). This is shown in Fig. 6.

2) ShiftRows() transformation: the bytes in the last three rows of the stateare cyclically shifted over different numbers of bytes as equation Eq. (3),Nb is Number of columns.

S0r;c ¼ Sr;ðc-shiftðr;NbÞÞmodNb For 0 < r < 4 and 0 � c < Nb ð3Þ

Fig. 6. S-box

© IEICE 2014DOI: 10.1587/elex.11.20140535Received June 5, 2014Accepted June 9, 2014Publicized June 26, 2014Copyedited July 10, 2014

6

IEICE Electronics Express, Vol.11, No.13, 1–11

Page 7: Data security concurrent with homogeneous by AES algorithm

3) The MixColumns() transformation operates on the state column-by-column, treating each column as a four-term polynomial Eq. (4) thea(x) is a fixed polynomial.

s0ðxÞ ¼ aðxÞ � sðxÞ ð4Þ4) AddRoundKey() transformation: a round key is added to the state by a

simple bitwise XOR operation Eq. (5). Round Key is generated from theAES KEY, after AddRoundKey, the data is encrypted by the RoundKey.

½S00;c; S

01;c; S

02;c; S

03;c� ¼ ½S0;c; S1;c; S2;c; S3;c� � ½wround�Nbþc�; for 0 � c < Nb

ð5ÞThere are some operation modes which allow block ciphers; these wereinvented to provide confidentiality for messages in arbitrary length. Theyare ECB, CBC, OFB and CFB mode [6]. These modes, with the exception ofECB, require an initialization vector (IVector), a sort of ‘dummy block’, tokick off the process for the first real block and also to provide some random-ization for the process.

IVector is not necessary to be secret in most cases, but it is important tomake sure IVector is never reused with the same key. In CBC mode, theIVector must, in addition, be randomly generated during encryption.

Fig. 7 is a diagram showing normal implementation. First, the data isencrypted by AES, and then homogenized by exclusive OR operation with theselected output stream among a group of orthogonal polynomials.

5 Double functions of AES engine in flash controller

Although AES algorithm is not developed to make signals randomized orwhite noise, it does encrypt data and make the data “unpredictable” forattackers. So the AES encrypted data can be regarded as random patterns,which is the theoretical target of Homogeneous. Thus, it is worth it to have astudy on AES encryption as a homogeneous algorithm. If the answer is OK, adesigner can get rid of those polynomials for homogeneous in a system insidewhich AES security exists.

Fig. 7. AES function in MLC/TLC flash storage systemwith Homogeneous

© IEICE 2014DOI: 10.1587/elex.11.20140535Received June 5, 2014Accepted June 9, 2014Publicized June 26, 2014Copyedited July 10, 2014

7

IEICE Electronics Express, Vol.11, No.13, 1–11

Page 8: Data security concurrent with homogeneous by AES algorithm

A group of orthogonal polynomials or a set of the different seeds for apolynomial is assigned to break the interactions among the paired pages. ForAES method, a unique security key for the paired pages provides the samebenefit. Of course, such kind of keys should be page number related functionswith expression as following:

Key ¼ F ðpage numberÞ: ð6ÞFurthermore, if a system designer does not like “changing keys” consideringthe risk or difficulty of management, there is a simpler alternative method bychanging the IVector:

IVector ¼ F ðpage numberÞ: ð7ÞFor AES-256 algorithm, the key is 256 bits and IVector is 128 bits. Here, twosimple but effective assignments are applied for lab verification:

(1) Key F 16 byte original key D page numberAssume the original key is 256’h1234_0000_0000_*_0000, then the en-crypt keys for those pages in a physical block are listed as following:256’h1234_0000_0000_*_0000,256’h1234_0000_0000_*_0001,256’h1234_0000_0000_*_0002,256’h1234_0000_0000_*_0003,� � � � � �

(2) IVector F any presetting value D page number.The simplest presetting value is zero, so the IVectors can be listed asfollowing:128’h0000_0000_0000_0000,128’h0000_0000_0000_0001,128’h0000_0000_0000_0002,128’h0000_0000_0000_0003,� � � � � �

Fig. 8 is the diagram of using AES as double function engine for both datasecurity and Homogeneous. All IVectors in Fig. 8 can have preset values asdescribed above.

Fig. 8. AES engine taking double functions© IEICE 2014DOI: 10.1587/elex.11.20140535Received June 5, 2014Accepted June 9, 2014Publicized June 26, 2014Copyedited July 10, 2014

8

IEICE Electronics Express, Vol.11, No.13, 1–11

Page 9: Data security concurrent with homogeneous by AES algorithm

6 Benefit of applying AES with double functions

The test result given in Fig. 9 shows that AES can bring the effects onsuppressing the bit errors in a TLC flash application, the same as thepolynomial method does. Curve 0 is the test result without Homogeneous,and Curve 1 with homogeneous by a polynomial with a set of seeds. These twocurves are the same with those in Fig. 5. Although the different is tiny,Curve 2 and Curve 3 are both a little bit lower than Curve 1, indicating thatthe two AES implementations have slightly better effects than that ofpolynomial methods. Therefore, AES method brings the 1st benefit of gettingrid of special polynomials for homogeneous in data secured NAND flashmemory applications.

Another benefit is brought by homogeneous is that no matter imple-mented by AES with double functions or scrambling with polynomials,homogeneous lowered the peak bit error numbers. This greatly helps reducethe gate count number or silicon die size, and finally cuts the cost.

The complexities of the syndrome computation and Chien search in theBCH ECC engine are proportional to the number of correctable errors andcodeword length. BCH algorithms, which can detect and correct maximum Ebit errors, need equivalent gate count number G for a certain semiconductorprocess. G can be roughly estimated as following:

G ¼ g � fðEÞ ð8ÞHere, g is a consistent for a dedicated process.

In the above sample, without homogeneous or AES, the TLC required 72bit or even higher ECC ability. Homogeneous with AES or polynomials madeit as low as 48 bit for the equivalent reliability. The logic synthesis result isshown in Table I, the gate counts of 48 bit ECC engine is 75% compared withthat of 72 bit ECC engine.

For high speed implementation, ECC engine is very exhaustive for a realsilicon die area. It is significant for the Solid State Disk (SSD) controller designbecause it manages multi-channels of flash memories. The most common SSDcontroller specifications are declaring eight or ten channels running in parallel

Fig. 9. AES took the role of homogeneous

© IEICE 2014DOI: 10.1587/elex.11.20140535Received June 5, 2014Accepted June 9, 2014Publicized June 26, 2014Copyedited July 10, 2014

9

IEICE Electronics Express, Vol.11, No.13, 1–11

Page 10: Data security concurrent with homogeneous by AES algorithm

with high performance. That is to say, a SSD controller must be embeddedwith eight or even more ECC hardware modules. It is very helpful in designinga cost-effective SSD controller and reduces the gate count of ECC engine.

Furthermore, there must be an AES engine in a SSD controller withsecurity function, and an AES engine can also make data randomized, so anindependent homogeneous hardware engine is not necessary to be imple-mented. In this way, the SSD controller is cost effective by getting rid of agroup of orthogonal polynomials for homogeneous in Fig. 3. Although the gatecount (or die size) reduce in each channel is not much (almost 1% comparedwith that of ECC engine in Table I), the whole architecture become simpler.

Finally, a SSD controller for TLC flash memory may have area includingAES algorithm and ECC engine with 72 bit error endurance in maximum,which is 15856213 according to Table I.

A SSD controller with data security function was designed with themethodologies described in this paper. A total of five BCH ECC moduleshaving ability of correcting 48 error bits were embedded. The AES engine wasdesigned to play the roles of both data security and homogeneous concur-rently with performance up to 300MB/s. Fig. 10 is the snap-shot of the realsilicon chip under a microscope. This SSD controller, measured as 3628um �

Table I. Comparison of die size parameters

Note:(1) Evaluated by tool of Design Compiler of Synopsys & library of Shanghai Semi-

conductor Manufacturing International Corp (SMIC).(2) Maximum frequency of the circuits was set at 300MHz.

Reported Area AES PolynomialECC

(48 bit)ECC

(72 bit)

Combinational Area 317092 3685. 1090211 1337294

Non-combinational Area 36375 780.80 147947 217508

Net Interconnect Area 3494277 34906 7851892 10453666

Total Cell Area 353466 4466 1238158 1554803

Total Area 3847744 39372 9090051 12008469

Fig. 10. The snap-shot of the SSD controller

© IEICE 2014DOI: 10.1587/elex.11.20140535Received June 5, 2014Accepted June 9, 2014Publicized June 26, 2014Copyedited July 10, 2014

10

IEICE Electronics Express, Vol.11, No.13, 1–11

Page 11: Data security concurrent with homogeneous by AES algorithm

3956um in die size, has one SATA-II interface and ten flash channels in total.The maximum throughput tested 260MB/s with sequential read burst withdesired reliability.

7 Conclusion

Homogeneous is effective in suppressing the bit error rate in MLC or TLCflash memories and improves the reliability of the storage systems protectedby ECC algorithms. AES, which plays the major role of data encryption anddecryption engine for most of the storage devices, was found the concurrentfunction of homogeneous. An AES hardware module was built to provide bothdata security and homogeneous functions while an independent homogeneousmodule that scrambled data was no longer necessary. In order to verify all thedescribed analysis, a SATA-II SSD controller design was turned into a realsilicon chip and this novel structure was silicon proven.

Acknowledgments

This work was supported in part by the Zhejiang Provincial Natural ScienceFoundation of China under Grant No. LQ12F01001, and the Scientific Re-search Fund of Zhejiang Provincial Education Department (No. Y200803237and No. Y200701938).

© IEICE 2014DOI: 10.1587/elex.11.20140535Received June 5, 2014Accepted June 9, 2014Publicized June 26, 2014Copyedited July 10, 2014

11

IEICE Electronics Express, Vol.11, No.13, 1–11