Standard multivariate analysis tools such as principal component analysis (PCA) or correspondence analysis (CA) are very useful for summarizing a single set of numerical data as simple interpretable factors. These multivariate analysis methods are available on numerous commercial or freeware packages. The increase in the number of molecular databases freely available on the Internet raises the question of how to take advantage of information from different data sets. Combining such data requires sophisticated multivariate analysis tools which can analyse more than one data set simultaneously. These methods are less common than the usual PCA and CA. This paper illustrates the results that can be obtained by crossing two data sets with the ADE package (Analysis of Environmental Data, Thioulouse et al., 1995), in which these methods are implemented. We have attempted to cross information on amino-acid physico-chemical properties and protein composition. 
Multivariate analysis revealed that the between-species variability of protein composition is low (Grantham et al., 1980), at least when compared with the between-species codon usage variability. Three main interpretable factors underly the variability in the composition of E. coli proteins (Lobry and Gautier, 1994). These factors are, in decreasing order of importance, protein hydrophobicity , the expressivity level of their corresponding genes, and the aromaticity of the proteins themselves.The situation for amino-acids physico-chemical properties is more confused because the main factors are not readily identified (Sneath, 1966). The datasets analysed and the methods used also differed from author to author. From a dataset of 134 qualitative amino-acid properties, Sneath (1966) tentatively identified the first three factors as aliphaticity, hydrogenation, and aromaticity. Sj?str?m and Wold (1985) identified the first three factors from a dataset of 20 quantitative properties as being lipophilicity, side chain bulk, and electronic properties. Kidera et al. (1985) found that 10 orthogonal factors were sufficient to represent almost all the variability of 188 published indices, showing that these indices are very redundant. Nakai et al. (1988), working with 222 amino-acid indices and a hierarchical cluster analysis, found four main clusters of amino-acid features, alpha and turn propensities, beta propensities, hydrophobicity, and other physico-chemical properties.
Co-inertia (or co-structure) analysis (Chessel and Mercier, 1993; Doldec and Chessel, 1994) is a "data coupling" approach to multivariate analysis. It allows the simultaneous analysis of two data sets. In agronomy and ecology, these data sets are often an environmental table (physico-chemical variables) and a floro-fauna table (species abundance) measured at the same sampling points. Many methods have been suggested for analysing such data (see a review by Chessel and Mercier,1993), one of the simplest from the theoretical point of view is co-inertia analysis. Tucker (1958) described such an analysis under the name of inter-battery factor analysis in the case of two PCA tables. The method has also been proposed as an alternative to canonical analysis for environmental data (Gittins, 1985), and generalized to any type of table (quantitative, qualitative, or contingency) by Mercier (1991). It is also similar to the canonical correspondence analysis (CCA) of ter Braak (1986) and the partial least square regression method (PLS) used by Wold et al., 1987; Hellberg et al., 
Co-inertia analysis was used on to the two data sets described in the Data Sets. The data were arranged in two tables, one with 20 rows (amino-acids) and 402 columns (physico-chemical and biological properties), and the second with 20 rows and 999 columns (E. coli proteins).



