Gene Editing - 题目

Background

Humanity has developed various gene-editing technologies, the most traditional of which requires "restriction endonucleases" (restriction enzymes). These enzymes can recognize specific nucleotide sequences and cleave the phosphodiester bonds connecting adjacent nucleotides at designated sites, creating sequence breaks known as "ends." Only matching ends can be joined using DNA ligase.

Suppose we want to replace a gene segment $B$ on a vector $V$ with a gene segment $A$. In gene-editing techniques using restriction enzymes, the following operations are typically required:

Select a restriction enzyme recognition site at each end of gene $A$. These recognition sites should also exist at the corresponding ends of gene $B$.
Treat gene $A$ with the restriction enzymes corresponding to the selected recognition sites, causing its ends to produce the corresponding sticky ends. Purify the processed gene $A$.
Use the same restriction enzymes to cut the recognition sites on vector $V$, causing gene $B$ to detach from vector $V$. Purify the vector $V'$ from which gene $B$ has been removed.
Mix vector $V'$ with gene $A$ and use DNA ligase to reconnect the broken phosphodiester bonds.

It is worth noting that if the two recognition sites produce the same ends upon cleavage, then in step 4, vector $V'$ might ligate to itself, producing a vector that contains neither gene $A$ nor gene $B$; alternatively, gene $A$ might be inserted into $V'$ in reverse, also producing an incorrect vector. Therefore, in practice, restriction enzymes that produce different ends are typically chosen to process gene $A$ and vector $V$.

Problem Statement

In the year 3032, humanity discovered an alien civilization, HD1048576d, that has mastered gene-editing technology. Of course, this gene-editing technology is limited to the genes of alien organisms living on the planet HD1048576d. The smallest unit recognizable by human gene-editing technology is a single base in a DNA sequence, while the smallest unit recognizable by the alien civilization's gene-editing technology is a single "noicleobase" in their gene sequence. For convenience, we can represent a single noicleobase as a positive integer starting from $1$, so a segment of an alien life form's gene sequence can be represented as a sequence of positive integers.

For a gene sequence of an alien life form of length $n$ (let its positive integer representation be $s_1, s_2, \cdots, s_n$), the gene-editing process of the HD1048576d civilization is as follows:

Select a region $[l, r]$ to be edited, i.e., replace the subsequence $s_l, s_{l+1}, \cdots, s_r$ in the original sequence;
Choose a pair of indices $(i, j)$ that span the region to be replaced (i.e., $1 \le i < l$ and $r < j \le n$), and mass-produce the new sequence $s_i, \cdots, s_{l-1}, t_1, \cdots, t_k, s_{r+1}, \cdots, s_j$ corresponding to the subsequence $s_i, \cdots, s_j$ after editing;
Use the corresponding specificity recognition tool to disconnect the subsequence $s_i, \cdots, s_j$ from the original sequence and attach $s_i, \cdots, s_{l-1}, t_1, \cdots, t_k, s_{r+1}, \cdots, s_j$ into the sequence to obtain the target gene sequence.

It should be noted that in step 2, the chosen pair of indices must correspond to a unique noicleobase combination. That is, the ordered pair $(i', j')$ satisfying $s_{i'} = s_i, s_{j'} = s_j$ and $i' < j'$ must be unique (i.e., it must be $(i, j)$), otherwise the specificity recognition tool might cut other segments of the gene sequence; additionally, $s_i \neq s_j$, otherwise the specificity recognition tool might only cut a single noicleobase.

Furthermore, since producing new gene sequences incurs significant costs, the alien civilization wishes to minimize the length of the gene sequence that needs to be produced. Clearly, minimizing this length is equivalent to minimizing the length of the gene subsequence being cut out, so in practice, the optimal solution is generally calculated by minimizing the length of the cut-out gene subsequence.

Now, they want to test the intelligence of human civilization, and you have been chosen from among many high school students to solve this problem.

Simplified Problem:

Given $n, l, r$ and a sequence $a$, find $\min(y - x + 1)$ such that $x < l \le r < y$ and there does not exist any $1 \le x' \le y' \le n$ satisfying $(x, y) \neq (x', y')$ and $(a_x, a_y) = (a_{x'}, a_{y'})$.

Input

The input is read from standard input.

The first line contains three positive integers $n, l, r$, representing the length of the gene sequence to be edited, the left endpoint of the region to be edited, and the right endpoint of the region to be edited. It is guaranteed that $3 \le n \le 10^6$ and $1 < l \le r < n$.

The second line contains $n$ positive integers $s_1, \cdots, s_n$, representing the gene sequence to be edited in terms of positive integers. It is guaranteed that the identifier $s_i$ for each noicleobase is in the range $[1, 10^6]$.

Output

Output to standard output.

If there exists a gene sequence cutting scheme that satisfies the constraints of the alien civilization's gene-editing technology, output a single positive integer representing the minimum length of the gene subsequence cut out among all valid schemes. Otherwise, output -1, indicating that no such cutting scheme exists.

Examples

Input 1

10 4 6
2 1 4 7 4 8 3 6 4 8

Output 1

Note 1

The optimal scheme is to cut 1 4 7 4 8 3. It can be proven that there is no better cutting scheme that satisfies the technical constraints.

A scheme with a shorter cut length is 4 7 4 8 3, but in this scheme, the specificity recognition tool might disconnect 4 8 3, leading to unexpected mutations in the target gene sequence; therefore, this cutting scheme does not satisfy the technical constraints.

Input 2

(See attached file)

Output 2

(See attached file)

Note 2

This sample satisfies the special properties of Subtask 2.

Input 3

(See attached file)

Output 3

(See attached file)

Note 3

This sample satisfies the special properties of Subtask 4.

Constraints

For $100\%$ of the data, it is guaranteed that $3 \le n \le 10^6$, $\forall 1 \le i \le n, 1 \le s_i \le 10^6$, and $1 < l \le r < n$.

There are 5 subtasks in this problem. You must pass all test cases in a subtask to receive the corresponding score. The table below shows the data scale and properties for each subtask.

Subtask	Score	$n \le$	$s_i \le$	Special Property
$1$	$5$	$1000$	$1000$	$\times$
$2$	$10$	$1000$	$10^6$
$3$	$25$	$10^6$	$1000$
$4$	$30$		$10^6$	$\checkmark$
$5$	$30$		$10^6$	$\times$

The "Special Property" in the table above is: $s_1, \cdots, s_{l-1}$ are all distinct, and $s_{r+1}, \cdots, s_n$ are all distinct.

QOJ.ac

QOJ

#3271. Gene Editing

Background

Problem Statement

Input

Output

Examples

Input 1

Output 1

Note 1

Input 2

Output 2

Note 2

Input 3

Output 3

Note 3

Constraints

Discussions

About Discussions

Open Discussions 0

Issues

About Issues

Active Issues 0

Closed/Resolved Issues 0