Train Data Generation

Module to generate training data.

yalign.train_data_generation.training_alignments_from_documents(document_a, document_b)

Returns an iterable of SentencePairs to be used for training. The inputs document_a and document_b are both lists of Sentences made from a parallel corpus.

yalign.train_data_generation.training_scrambling_from_documents(document_a, document_b)
Returns a tuple (scrambled_a, scrambled_b, correct_alignments) where:
  • scrambled_a is a scrambled version of document_a.

  • scrambled_b is a scrambled version of document_b.

  • correct_alignments are all the correct sentence alignments that exist

    between scrambled_a and scrambled_b.