Workflow schema of the pipeline: The data pipeline for development of the oil crops metabolic network involves configuration, a cleaning raw DNA sequences, cleaning, clustering and assembling, Functional annotation, and data mining & visualization.

Four oil crops seeds (sesame, peanut, rape and soybean) were harvest at different developmental stages and were mixed for subsequent mRNA extraction and development of libraries experiments. After sequencing, a total of 248,522 oil crops EST sequences derived from four cDNA libraries were used to construct the database. Quality control of raw DNA sequences was performed by using Phred program, cross-match and repeatmasker to remove sub-standard reads, the vector and adapter sequences, and repeat sequence followed by EST-trimmer to eliminate 3’polyA and 100bp EST reads. Phrap program was used to cluster the overlapping ESTs into contigs.

The clustered unigenes were compared with the Nr, Swiss-prot, TrEMBL and COG database using the default setting of BLASTX program (NCBI, ftp://ftp.ncbi.nlm.nih.gov/blast) and mapped to Gene Ontology (GO) by using Blast result and GO annotation database with parameter E value: 1e-05. Additional information and GO terms were obtained by comparing the sequences to the InterPro database using the InterProScan tool to identify protein signatures. The unigenes were translated into six reading frames and mapped the annotation information to knowledge bases such as the KEGG pathways.to reconstruct and visualize the metabolic network based on the connection matrix of reactions, the software Cytoscapeand yEd (a Java Graph Editor from the company yWorks)were used as layout tools for the network construction.



