dr. sc. Mario Cindrić
mag. sci Marija Nišavić
dr. sc. Amela Hozić


Double Bracket: ü
  Protein identification method in biopharmaceutical industry
  designed as an industrial standard for protein
  expression/production quality control


Currently used LC/MS (MS) technique for protein identification is based on database matching of non-derivatized peptide signals recorded in the positive ion mode. Although widely used, such an approach does not always provide satisfactory sequence coverage to unambiguously identify a protein sample.

We have introduced a peptide derivatization method that allows for spectra recording in both - positive and negative ion mode, in order to increase peptide sequence coverage. This method for unambiguous protein identification could be utilized in wide variety of industrial and research applications.

What we are presenting is:

The method comprises of four main steps (Figure 1):

  1. Single protease digestion with trypsin
  2. N-terminal peptide derivatization with 5-formylbenzene-1,3-disulfonic acid (EP2710380 A1, US 8647880)
  3. UPLC/MSE experiments in positive and negative ion mode
  4. Data processing (PLGS)

Figure 1. Schematic representation of the methods workflow.

Post-column addition of reagent A enables negative ionization of derivatized and non-derivatized peptides. Examples of UPLC chromatograms with and without post-column introduction of reagent A (0.4 µL flow rate) are shown in Figures 2 and 3.

ES-_no A

Figure 2. UPLC chromatogram of derivatized BSA digest negative ion mode without post-column addition of reagent A.

Figure 3. UPLC chromatogram of derivatized BSA digest with post-column addition of reagent A.

a – Reagent A is 0.05 % (w/w) of CAF reagent (5-formylbenzene-1,3-disulfonic acid) in isopropanol  


In order to compare sequence coverage obtained using our method with the conventional one, 12 protein samples: bovine serum albumin (BSA), transferrin (Trn), erythropoietin (Epo), lysozyme (Lyz), aldolase (Als), ß-casein (Csn), Isoleucyl-tRNA syntetase (IleRS), nucleotidyl transferase (CCA), TEV protease (TEV), Leucyl-tRNA synthetase (LeuRS), Elongation factor Tu (EF-Tu) and Seryl-tRNA synthetase (SerRS) were analyzed. In case of Epo and Trn, glycan moieties were enzymatically removed with PNGase F prior to trypsin digestion using protocol described by Cindrić et al1.

Sample Preparation

Is given in APPENDIX under section Sample preparation

System and Method conditions

1. LC

LC system: nanoACQUITY UPLC® (Waters, Milford, MA, USA)
Trapping column: nanoACQUITY UPLC® 2G-V/M Symmetry® C18 Trap Column, 100Å, 5μm, 180 μm x 20 mm (Waters, Milford, MA, USA, p/n 186006527)
Trapping conditions: Isocratic delivery of aqueous 0.1% formic acid, at 15 μL/min for two minutes
Analytical column: ACQUITY UPLC® BEH130 C18, 130 Å, 1.7 μm, 100 μm × 100 mm Column (Waters, Milford, MA, USA, p/n: 186003546)
Column temperature: 40 °C
Elution flow rate: 1 µL/min
Mobile phase A: aqueous 0.1% formic acid
Mobile phase B: 0.1% formic acid in 95% acetonitrile
Gradient: 0.1% to 99% solvent B in 30 minutes
Run time: 30 minutes
Sample injection volume: 4 µL

2. MS

Instrument: SYNAPT G2-Si mass spectrometer (Waters, Milford, MA, USA)

Instrument parameters:
Analyser mode: Resolution
Capillary: 3.5 kV
Source temperature: 80 °C
Nanoflow gas pressure: 1 bar
Cone voltage: 40 V
MSE parameters:
Acquisition time: 30 min
Mass range: 50 to 4000 Da
Data format: Continuum
Scan time: 1 s  

MSE data were acquired in positive ion mode for non-derivatized samples and positive and negative ion modes for derivatized samples with collision cell energy alternating between low energy (4 eV) to collect peptide precursor (MS) data, and elevated energy (ramping from 20 to 40 eV) to obtain peptide fragmentation (MSE) data (standard MSE procedure). Sampling of the lock spray channel (1 ng/μL leucine enkephalin in 50:50 isopropyl alcohol/water containing 0.1% formic acid) was performed every 1 min to ensure high mass accuracy.

Data processing

The acquired data were processed by ProteinLynx Global Server software (PLGS; v. 3.0.1, Waters). Peak lists were generated after deisotoping and deconvolution. Separate databases containing sequence of each of the investigated protein were created and the data were searched with trypsin as a digestion reagent and three potential miscleavages. Peptide and fragment tolerance were set to automatic. Oxidation M and dehydratation ST were allowed as variable modifications in all protein data sets, while deamidation N was added for Epo and Trn data and phosphorylation for Csn data set.

For the derivatized peptides, N-term reagent modifier was created for 5-formylbenzene-1,3-disulfonic acid and used as fixed modification in workflow parameters.

Results and Discussion

By comparing the percentage of protein sequence coverage obtained using conventional method (non-derivatized + mode) with the percentage obtained using our method (derivatized +/- mode), we have found that our method provided higher sequence coverage for each of 12 analyzed proteins. The results are shown in Figure 4 and summarized in Table 1 . More detailed information about coverage of the analyzed proteins can be found in separate document (Appendix_2015-05-19).

Double 	  Bracket: ü Up to 59% higher sequence coverage

Figure 4. Protein sequence coverage (%) calculated after PLGS data processing and database matching of non-derivatizated and derivatizated peptides acquired in MSE analyses in positive and positive/ negative ion mode, respectively.

Our method provided sequence coverage that ranged from 88-100% (96% on average), in contrast to conventional method that ranged from 31-94% (80% on average). Derivatization method provided 16% higher sequence coverage, with the best result obtained for Csn sample where the difference in sequence coverage reached 59%.

Table 1. Protein sequence coverage (%) calculated after database matching of non-derivatizated and derivatizated peptides acquired in MSE analyses in positive and positive/ negative ion modes, respectively.














Non-derivatized (+ mode)















(+/- mode) %















  1. M. Cindrić, L. Bindila, T. Čepo, J. Peter-Katalinić. Mass Spectrometry-Based Glycoproteomic Approach Involving Lysine Derivatization for Structural Characterization of Recombinant Human Erythropoietin. J. Proteome Res., 2006, 5 (11), 3066–3076.
  2. Mario Cindric, Zdenko Hamersak, Ivana Dodig. Method of detection of amino acid sequence and/or identification of peptides and proteins, by use of a new derivatization reagent and synthesis of 5-formyl-benzene-1,3-disulphonic acid as derivatization reagent, EP2710380 A1. 


Sample preparation

  1. For tryptic digestion, 0.05 M Na2HPO4 buffer solution of pH 8.0 containing 25 µg of protein was introduced into a polypropylene tube. 25 µL of a 0.02 mg/mL trypsin solution was added to the sample and incubated at 37 °C for 18 h.

  2. For denaturing, 125 µL of an 8 M guanidine hydrochloride solution was added to the sample and mixed well.

  3. For reduction, 10 µL of a 1 M solution of dithiothreitol was added and mixed well.

  4. The capped tube was placed in boiling water for 1 min, then allowed to cool to room temperature.

  5. The sample containing tryptic peptide mixture was divided in two equal volumes, purified using 100 μL ZipTips and dried in SpeedVac. After drying, each tube contained 10 µg of the evaporated sample.

  6. One sample was resuspended in 100 μL 0.1% formic acid to reach the final concentration of 0.1 mg/mL and placed in a vial for the UPLC/MS analysis.

  7. Other sample was reconstituted in 30 µL of CAF derivatization solution. The derivatization solution contains 12.5 mM 5-formylbenzene-1,3-disulfonic acid disodium salt hydrate (p.a. synthetic product, Ruđer Bošković Institute, Cindrić et al2) and 95.5 mM of NaBH3CN in 10 mM KH2PO4, pH 5. The sample containing tube was closed, placed in the styrofoam holder and derivatized in a household microwave oven at 180 W for 20 minutes (2x10 min cycles). Finally, the sample was diluted with ultrapure water to a final concentration of 0.1 mg/mL.