CRYSTENGCOMM Atw

  • Upload
    baumesl

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

  • 8/14/2019 CRYSTENGCOMM Atw

    1/4

    A reliable methodology for high throughput identication of a mixtureof crystallographic phases from powder X-ray diffraction data

    Laurent Allan Baumes, *a Manuel Moliner, a Nicolas Nicoloyannis b and Avelino Corma *a

    Received 18th July 2008, Accepted 23rd July 2008First published as an Advance Article on the web 11th August 2008

    DOI: 10.1039/b812395k

    Because the inherent complexity of zeolites together with the use of high throughput technology make the performance of existingsolutions for the automatic identication of mixture of crystallo-graphic phases questionable, an adequate full prole search-matchapproach is presented, and its reliability is clearly demonstrated andillustrated on the very complicated case of the zeolite ITQ-33.

    The discovery of new microporous crystalline structures involvesa considerable experimental effort, which can be diminished by using

    high throughput (HT) techniques. Despite the reduction of experi-mental time and cost per solid, HT technology has added substantialdifculties to the analysis of the powder X-ray diffraction (XRD) of the resultant products. The associated data quality loss and volumeincrease make the routine procedures, both manual and software-assisted, inadequate. Time constraints and related complexity high-light the necessity of reliable and robust systems able to identify eachcomponent of mixtures of crystallographic phases in a fully auto-mated way. When dealing with inherently complex materials likezeolites, the capability of existing solutions becomes questionable.Widely used to characterize crystallographic structures, crystallite size(grain size), preferred orientation in powdered samples, powderdiffraction is intended not only to identify crystalline materials by

    comparing diffraction data against a database but also to detect theformation of an unknown phase even over broad ranges of synthesisconditions and variables. Because user involvement is expected to beminimized at least in the rst steps of screening in order not to slowdown the whole HT process, the reliability and robustness of themethodology become of outstanding signicance. To improveprincipally implies to decrease the number of identication errors,while a relatively greater weight should be assigned to the mismatchof the phases presenting higher levels of crystallinity, and to falsenegative considering the detection of unknown phases. Closelyrelated, we refer to robustness as the capability of a technique tosuccessfully perform over a variety of problems, i.e. should not onlybe restricted to datasets with special characteristics. Thus, we will

    describe rst the weaknesses of existing methodologies, and then animprovedapproach is presented. Its originality andpower areveried

    and illustrated with the very complex case of the zeolite ITQ-33. 1

    Both the mathematical proof of the robustness and the impressiveresults obtained on the real studies, make such an approach a verypromising and reliable methodology.

    Search-match approaches for the recognition of crystallographicphases from XRD can be divided into (a) peak search and indexing,and (b) full prole solutions. Based on the examination of therespective advantages and drawbacks of the two kinds of approachessummarized in Table 1, the former is discarded, and we have focusedon true full prolesystems. The principaldifculty encountered when

    dealing with full patterns is the adequate conception of the criterionused for the matching, e.g. comparison, taking into account that onespecic structurecanpresent large differences in theintensity of peaksand XRD angles, depending on its level of crystallinity, crystallitesize, and chemical composition. Subsequently, synthesized powderspresenting mixtures of phases and impurities make this even morecomplicated. Similarity measures are the key component that allowssizing how similar/dissimilar the samples are. However, a suitablecriterion should ( i ) accurately manage the inuence of the highestpeaks on the overall measure, ( ii ) detect the growing phases despitetheir weak peaks intensities, ( iii ) handle x-shift between samples butalso when the shifting is not constant along 2 q range inside a givenpattern (see Fig. S8-b in the ESI), ( iv) limit the number of userinteractions for settings and pre-treatments decisions and, ( v) indif-ferently treat the amorphous phase while keeping consistent.

    The proposed methodology called adaptable time warping 2

    (ATW) is a two-step approach that could be dedicated to, forinstance, intrinsically ordered data such as X-ray diffractograms. Thecentral criterion is rst optimized in order to tackle all previouslymentioned points, taking into account the knowledge about theproblem, i.e. selection of possible phases. Then it is applied onincoming full proles for identication of the phases includingamorphous and unknown. The preliminary combination of thefollowing two components gives to the method all its strength: a veryexible distance based on dynamic time warping (DTW) model, anda learning system that aims at automatically tuning distanceparameters according to the specicities of selected phases. DTW 3 isa variety of time series alignment algorithm developed originally forspeech recognition in the 1970s. 4 Rather than comparing the value of the input pattern at time t to a selected reference pattern at time t, analgorithm is used that searches the space of mappings from the timesequenceof theinput to that of thereference, so that thetotal distanceis minimised. This is not always a linear mapping (see Fig. 1(right));for example, wemay nd that time t1 in theinputcorresponds to timet1 + 5 inthe reference,whereas t2 in theinputstreamcorresponds to t2 3 in the reference. The computation of a warping distance requiresa warping path which denes the sequence of a pairs of points thatare matched together, see Warping path and ATW formula in the

    aInstituto de Tecnologia Quimica, UPV-CSIC, Universidad Politecnica deValencia, Avda de los Naranjos s/n, 46022 Valencia, Spain. E-mail:[email protected]; [email protected]; Fax: +34 9638 7789; Tel: +349638 77800bLaboratoire ERIC, 5 Av. Pierre Mendes France, Universite Lumiere Lyon 2, 69676 Bron, France Electronic supplementary information (ESI) available: Proof of ATWreliability; warping path and ATW formula; optimization witha genetic algorithm; synthesis of the hexamethonium study. See DOI:10.1039/b812395k

    This journal is The Royal Society of Chemistry 2008 CrystEngComm , 2008, 10 , 13211324 | 1321

    COMMUNICATION www.rsc.org/crystengcomm | CrystEngComm

    http://www.rsc.org/crystengcommhttp://www.rsc.org/crystengcomm
  • 8/14/2019 CRYSTENGCOMM Atw

    2/4

    ESI. In ATW, this special feature allows both to optimally manageordering a variable shift by computing the distance between thepoints that do not occur at the same moment, and to highlightimportant traits of each reference pattern, i.e. taking simultaneouslyinto account the entire selection of possible phases, to nd which arethe 2 q angles that make a given phase particular/distinguishable byassigningweights to each pair in thewarping sequence.Note that onepoint canbe either totally ignored duringthedistance computation ormatched with one or several points. To do this, ATW uses P , a t tmatrix, as the set of parameters ( i.e. weights) required to compute thedistance, with P [P i , j ] R + , c i , j [1,t] and t the number of intensities. P is optimized by a learning system that examines theavailable patterns, and is modied by a genetic algorithm 5 (GA) inorder to maximize the recognition rate of the phases. After themethod has been correctly trained, i.e. phases specicities arecaptured, theunseensamples areanalyzed. The algorithmlabels eachsample with all the phases present. In order to identify the differentcrystallographic phases contained in the experimental samples, thealgorithm only uses the diffraction data from available zeolites (withthe laboratory internal database and theoretical patterns). Each timea new sample is characterized, its relationship to all previously storedmaterials is examined through distances in order to assign its crys-tallographic phases. Such an approach which follows the instance-based learning (IBL) protocol can not only predict the pure ormajority phase but also the mixtures of phases ordered bya decreasing order of crystallinity. For doing that, the algorithmcomputes the distances to the neighbours, and the output phases areordered depending on these distances. The conception of ATW

    makes any warping and non-warping distances a particular case of ATW, for example the Euclidian distance is dened by P as the unitmatrix, see proof of ATW reliability in the ESI. According toa given classication problem ( i.e. dataset), the optimal parametersP imply that ATW gives results at least equal or superior to all otherdistances used.

    HT technology in combination with chemical knowledge and dataanalysis 6,7 has allowed the synthesis of a very unique zeolite structurethat includes extra-large 18MR connectedwith medium 10MR pores(ITQ-33). This study has required the generation of 192 diffracto-grams with a parallelized XRD to follow the formation of 8 differentcrystallographic phases, and numerous mixtures of phases dependingon the synthesis conditions. In a very narrow range of conditionsamong the whole experimental space, the new crystalline phase hadto be discovered. The ratio Si/Ge was broadly varied getting varia-tions in the peaks positions (see Fig. S8 in the ESI). As among othercharacteristics, our methodology is expected to handle the shifts inthe peaks position, the one that is common when dealing withsamples having different Si/Ge, Si/Al, and Si/B ratios. The resultsobtained during the synthesis of zeolites using hexamethonium asan organic structure directing agent (OSDA) is a perfect example of testing the ability of the ATW to classify such materials. InFig. 1(left) which represents the complete experimental phasediagram of the investigation of the ITQ-33, the occurrence of eachcompeting phase as a function of the composition of the starting gelcan be observed. Despite the non-linearity of the system, the range of composition in which the different phases are formed is clearlydened. The evaluation of the proposed strategy aims at recovering

    Table 1 Comparison of search-match approaches

    Approaches Pattern Methods Associated techniques Advantages Drawbacks

    Peak search andindexing

    Reduced(Stick)

    d -spacing and Intensity Hanawalt a Low storage requirement Peak determination c

    Fink Speed of search Number of peaks to considerDiffract AT b Ease of database building Weak peaks are discarded

    Full prole Full Similarity-based onEuclidian distance

    Stati stics Full use of information No commercial databasePCA, MMDS Local exptl patterns collectedClustering Decision for pre-processing

    a 8 strongest peaks. b Intermediate approach. c Overlap, shoulders.

    Fig. 1 (Left) Phase diagram of the entire research space. (Right) Adapting the distance calculation for the series.

    1322 | CrystEngComm , 2008, 10 , 13211324 This journal is The Royal Society of Chemistry 2008

  • 8/14/2019 CRYSTENGCOMM Atw

    3/4

    the complete phase diagram from HT powder diffraction in a fullyautomated manner.

    The commercial software PolySnap2 from Bruker-AXS writtenby Gilmore et al. is selected as thereferenceforcomparison dueto thefollowing reasons: is representative and highly relevant consideringthe current state of the art in HT identication of phases through fullprole examination; thoroughly detailed explanations are available; 8

    study cases have reported excellent results; a free time-limited demo

    version is accessible;andit mergesa broad kind of techniques, amongthem: ( i ) principle component analysis (PCA), and multi-dimensionalscaling (MDS), principally used as a data visualization tools forexploring the similarities or dissimilarities between patterns; ( ii )cluster analysis such as hierarchical or fuzzy clustering aiming atcreating subsets of data so that the patterns in each group ideallyshare some common trait; and ( iii ) parametric and non-parametricstatistical tests such as Pearson, Spearman, or KolmogorovSmirnov(KS). 8 It can be noticed that most of these techniques or criteria usethe Euclidian distance at the basis of their calculation. PolySnap2

    analyses the data, automatically sorts the full patterns into clusters,and identies unusual samples which may be unknown structures.As PolySnap2 offers an automatic analysis where user interface is

    minimized to a few options (allow an x-shift and check foramorphous have been selected), this mode is chosen forcomparison.

    We will illustrate the methodology for a mixture of crystallo-graphic phases occurring during the synthesis of a zeolite. It can alsobe applied to the synthesis of MOFs or any other type of crystallinematerials. The results obtained with the methodology presented herewill also be compared with those from, probably, the best methodreported so far in the literature. 8

    ATW is applied to thedataset hexamethonium, andtheresultingclassication error is below 3% (see Table 2). After verication, wecould observe that the algorithm is moreaccurate than our manualclassication; the algorithm considers the amorphous material asanother class, even when the amorphous content is the impurity.The other two errors came from two mixtures ITQ-22/ITQ-24 thatwere predicted by the algorithm, while the real phase only containedITQ-22. The reason for this is because with these two zeolites, thedistance to the ITQ-24 diffractogram is in the limit of signicance.The comparison of ATW and PolySnap2 results considering thehexamethonium study (see Table S7 in the ESI) shows a clearimprovement in the error rate when ATW is employed (the ATWerror rate is 92% lower than that of PolySnap2 ), principally whenmixtures of phases and not complete crystalline phases are present(see Fig. S9 in the ESI). This has also been veried withanother caseof lower complexity (Beta study) with only two crystalline phases,showing similar difculties when using the PolySnap2 method (seeTable S4 and Fig. S5 in the ESI). Moreover, we have empiricallyveried with 20 benchmarks and two other real datasets of zeoliteinvestigations in which ATW effectiveness is always at least equiva-lent to all other distances used, including the famous DTW (seeFig. S1 and Table S2 in the ESI).

    In conclusion, we have shown that the ATW methodology can notonly be successfully used to automatically identify mixtures of crys-tallographic phases but it is also able to extract/detect unknownstructures. This makes ATW an innovative and robust approach.The lack of adapted methodologies for series has induced a new eldof investigation called temporal data mining. 9 ATW robustness pla-ces themethodologyas a leading strategy in this domain. Consideringthe numerous applications in chemistry and especially when using T

    a b l e 2

    C o n f u s i o n m a t r i x w i t h t h e r e a l p h a s e s i n t h e e x p e r i m e n t a l d e s i g n v e r s u s p r e d i c t e d c l a s s e s o b t a i n e d w i t h A T W

    R e a l

    A m .

    I T Q - 2

    2

    I T Q - 2

    4

    E U - 1

    S S Z - 3 1

    L a m e l l a r

    L a m e l l a r + 2 4

    2 4 + 3 3

    1 7 + 2 4

    2 2 + 2 4

    2 2 + 2 4 + 3 3

    2 4 + A m .

    S S Z - 3

    1 + A m .

    P r e d i c t e d

    A m o r p h o u s ( A m . )

    5 5

    5 5

    I T Q - 2

    2

    3 6

    3 6

    I T Q - 2

    4

    1 7

    1 7

    E U - 1

    3

    3

    S S Z - 3

    1

    1 5

    1 5

    L a m e l l a r

    4 6

    4 6

    L a m e l l a r / I T Q - 2

    4

    8

    8

    I T Q - 2

    4 / I T Q - 3

    3

    2

    2

    I T Q 1 7 / I T Q - 2

    4

    2

    2

    I T Q - 2

    2 / I T Q - 2

    4

    2

    2

    4

    I T Q - 2

    2 / I T Q - 2

    4 / I T Q - 3

    3

    1

    1

    I T Q - 2

    4 / A m o r p h o u s

    2

    0

    2

    S S Z - 3

    1 / A m o r p h o u s

    1

    0

    1

    5 5

    3 8

    1 9

    3

    1 6

    4 6

    8

    2

    2

    2

    1

    0

    0

    This journal is The Royal Society of Chemistry 2008 CrystEngComm , 2008, 10 , 13211324 | 1323

  • 8/14/2019 CRYSTENGCOMM Atw

    4/4

    characterization data ( i.e. series), ATW appears as a very promisingmethodology that can help those working in the synthesis of novelcrystalline materials.

    Experimental

    Hexamethonium is used as a structure directing agent (SDA). Aninitial experimental factorial design (3 43) is selected. 1 Si/Ge, T III /(Si+ Ge),OH /(Si + Ge), andH 2O/(Si + Ge) are thesynthesis variables.This designconsiders the following four molar ratios (level):Si/Ge (4)ranging from 2 to 30; B/(Si + Ge) (4) from 0 to 0.05; OH /(Si + Ge)(3) from 0.1 to 0.5; and H 2O/(Si + Ge) (4) from 5 to 30. The totalnumber of samples synthesized and characterized is 192. The exi-bility of the hexamethonium allows different conformations thatstabilize diverse competing structures, like EU-1, ITQ-17, ITQ-22,ITQ-24, SSZ-31, a lamellar phase, and the new structure ITQ-33.

    Acknowledgements

    Laurent A. Baumes especially thanks Nicolas Nicoloyannis who wasone of the directors of his PhD thesis and a friend. EU CommissionFP6 (TOPCOMBI Project) is gratefully acknowledged. We also

    thank Santiago Jimenez for his scientic collaboration on the hIT eQplatform.

    Notes and references1 A. Corma, M. J. D az-Cabanas, J. L. Jorda , C. Mart nez and

    M. Moliner, Nature , 2006, 443 , 842845.2 R. Gaudin, N. Nicoloyannis. in 5th Int. Conf. Machine Learning and

    Applications (ICMLA06), ICMLA , 2006, 213218, ISBN 0-7695-2735-3. IEEE Computer Society. Los Alamitos, CA, USA.

    3 (a) D. J. Berndt and J. Clifford, KDD Workshop , 1994; (b) (b)E.Keogh, in Tutorial in 18th ACM SIGKDD Int. Conf. on KnowledgeDiscovery and Data Mining (KDD04). Seattle, WA, USA, 2004.

    4 (a) V. M. Velichko and N. G. Zagoruyko, Int. J. Man-Machine Studies ,1970, 2, 223; (b) H. Sakoe and S. Chiba, IEEE Trans. Acoustics SpeechSignal Process. , 1978, 4349; ( c) C. S. Myers and L. R. , Rabiner, Bell Syst. Tech. J. , Sept. 1981, 60(7), 13891409.

    5 (a) D. E. Goldberg, The Design of Innovation: Lessons from and forCompetent Genetic Algorithms , Addison-Wesley, Reading, MA. 2002;(b) L. M. Schmitt, Theor. Comput. Sci , 2001, 259 , 161; (c)

    (c)M. D. Vose, The Simple Genetic Algorithm: Foundations and Theory , MIT Press, Cambridge, MA. 1999; ( d ) D. Whitley, Stat.Comput. , 1994, 2, 6585.

    6 (a) M. Moliner, J. M. Serra, A. Corma, E. Argente, S. Valero andV. Botti, Microporous Mesoporous Mater. , 2005, 78, 7381; (b)O. B. Vistad, D. E. Akporiaye, K. Mejland, R. Wendelbo,A. Karlsson, M. Plassen and K. P. Lillerud, Stud. Surf. Sci. Catal. ,2004, 154A , 731738; ( c) A. Cantin, A. Corma, M. J. Diaz-Cabanas,J. L. Jorda and M. Moliner, J. Am. Chem. Soc. , 2006, 128 , 4216 4217; (d ) A. Corma, M. Moliner, J. M. Serra, P. Serna, M. J. D az-Cabanas and L. A. Baumes, Chem. Mater. , 2006, 18 , 32873296.

    7 (a) C. Klanner, D. Farrusseng, L. A. Baumes, M. Lengliz,C. Mirodatos and F. Schu th, Angew. Chem., Int. Ed. , 2004, 43,53475349; ( b) F. Schu th, L. A. Baumes, F Clerc, D. Demuth,D. Farrusseng, J. Llamas-Galilea, C. Klanner, J. Klein, A. Martinez-Joaristi, J. Procelewska, M. Saupe, S. Schunk, M. Schwickardi,W. Strehlau and T. Zech, Catal. Today , 2006, 117 , 284290; ( c)L. A. Baumes, J. Comb. Chem. , 2006, 8, 304314.

    8 (a) C. J. Gilmore, G. Barr and J. Paisley, J. Appl. Crystallogr. , 2004, 37,231242; ( b) G. Barr, W. Dong and C. J. Gilmore, J. Appl. Crystallogr. ,2004, 37 , 243252; ( c) G. Barr, W. Dong and C. J. Gilmore, J. Appl.Crystallogr. , 2004, 37, 658664.

    9 (a) W. Lin, M. A. Orgun and G. J. Williams, Australasian Data Mining Workshop , Macquarie Univ. and CSIRO Data Mining, 2002; ( b)C. M. Antunes, A. L. Oliveira, Workshop on Temporal Data Mining,at the 7th Int. Conf. on Knowledge Discovery and Data Mining(KDD01), San Francisco, CA, 2001.

    1324 | CrystEngComm , 2008, 10 , 13211324 This journal is The Royal Society of Chemistry 2008