French orthography encompasses the spelling and punctuation of the French language. It is based on a combination of phonemic and historical principles. The spelling of words is largely based on the pronunciation of Old French c. 1100–1200 AD, and has stayed more or less the same since then, despite enormous changes to the pronunciation of the language in the intervening years. Even in the late 17th century, with the publication of the first French dictionary by the Académie française, there were attempts to reform French orthography.
This has resulted in a complicated relationship between spelling and sound, especially for vowels; a multitude of silent letters; and many homophones, e.g. saint/sein/sain/seing/ceins/ceint (all pronounced [sɛ̃]) and sang/sans/cent (all pronounced [sɑ̃]). This is conspicuous in verbs: parles (you speak), parle (I speak / one speaks) and parlent (they speak) all sound like [paʁl]. Later attempts to respell some words in accordance with their Latin etymologies further increased the number of silent letters (e.g., temps vs. older tans – compare English "tense", which reflects the original spelling – and vingt vs. older vint).
Nevertheless, there are rules governing French orthography which allow for a reasonable degree of accuracy when pronouncing French words from their written forms. The reverse operation, producing written forms from pronunciation, is much more ambiguous. The French alphabet uses a number of diacritics, including the circumflex, diaeresis, acute, and grave accents, as well as ligatures. A system of braille has been developed for people who are visually impaired.
⟨w⟩ and ⟨k⟩ are rarely used except in loanwords and regional words. /w/ is usually written ⟨ou⟩; /k/ is usually written ⟨c⟩ anywhere but before ⟨e, i, y⟩, ⟨qu⟩ before ⟨e, i, y⟩, and sometimes ⟨que⟩ at the ends of words. However, ⟨k⟩ is common in the metric prefix kilo- (originally from Greek χίλιαkhilia "a thousand"), e.g. kilogramme, kilomètre, kilowatt, kilohertz.
Diacritics
The diacritics used in French orthography are the acute accent (⟨◌́⟩; accent aigu), the grave accent (⟨◌̀⟩; accent grave), the circumflex (⟨◌̂⟩; accent circonflexe), the diaeresis (⟨◌̈⟩; tréma), and the cedilla (⟨◌̧⟩; cédille). Diacritics have no effect on the primary alphabetical order.
An acute accent over ⟨e⟩ represents /e/. An ⟨é⟩ in modern French is often used where a combination of ⟨e⟩ and a consonant, usually ⟨s⟩, would have been used formerly, e.g. écouter < escouter.
A grave accent over ⟨a⟩ or ⟨u⟩ is primarily used to distinguish homophones: à ("to") vs. a ("has"); ou ("or") vs. où ("where"; note that ⟨ù⟩ is only used in this word). A grave accent over ⟨e⟩ indicates /ɛ/ in positions where a plain ⟨e⟩ would be pronounced /ə/ (schwa). Many verb conjugations contain regular alternations between ⟨è⟩ and ⟨e⟩; for example, the accent mark in the present tense verb lève/lεv/ distinguishes the vowel's pronunciation from the schwa in the infinitive, lever/ləve/.
A circumflex over ⟨a, e, o⟩ indicates /ɑ,ɛ,o/, respectively, but the distinction between ⟨a⟩/a/ vs. ⟨â⟩/ɑ/ is being lost in Parisian French, merging them as [a]. In Belgian French, ⟨ê⟩ is pronounced [ɛː]. Most often, it indicates the historical deletion of an adjacent letter (usually ⟨s⟩ or a vowel): château < castel, fête < feste, sûr < seur, dîner < disner (in medieval manuscripts many letters were often written as diacritical marks, e.g. the circumflex for ⟨/s/⟩ and the tilde for ⟨/n/⟩). It has also come to be used to distinguish homophones, e.g. du ("of the") vs. dû (past participle of devoir "to have to do something (pertaining to an act)"); however dû is in fact written thus because of a dropped ⟨e⟩: deu (see Circumflex in French). Since the 1990 orthographic changes, the circumflex on ⟨i⟩ and ⟨u⟩ can be dropped unless it distinguishes homophones, e.g. chaîne becomes chaine but sûr (sure) does not change to avoid ambiguity with the word sur (on).
A diaeresis over ⟨e, i, u, y⟩ indicates a hiatus between the accented vowel and the vowel preceding it, e.g. naïve/naiv/, Noël/nɔɛl/. The diaeresis may also indicate a glide/diphthong, as in naïade/najad/.
The combination ⟨oë⟩ is pronounced in the regular way if followed by ⟨n⟩ (Samoëns/samwɛ̃/. An exception to this is Citroën/sitʁoɛn/).
The combination ⟨aë⟩ is either pronounced /aɛ/ (Raphaël, Israël) or /a/ (Staël); it represents /ɑ̃/ if it precedes ⟨n⟩ (Saint-Saëns[sɛ̃sɑ̃(s)]).
A diaeresis on ⟨y⟩ only occurs in some proper names and in modern editions of old French texts, e.g. Aÿ/ai/ (commune in Marne, now Aÿ-Champagne), Rue des Cloÿs? (alley in the 18th arrondissement of Paris), Croÿ/kʁwi/ (family name and hotel on the Boulevard Raspail, Paris), Château du Feÿ/dyfei/? (near Joigny), Ghÿs/ɡis/? (name of Flemish origin spelt ⟨Ghijs⟩ where cursive ⟨ij⟩ looked like ⟨ÿ⟩ to French clerks), L'Haÿ-les-Roses/lajlɛʁoz/ (commune between Paris and Orly airport), Pierre Louÿs/luis/ (author), Eugène Ysaÿe/izai/ (violinist/composer), Moÿ-de-l'Aisne/mɔidəlɛn/ (commune in Aisne and a family name), and Le Blanc de Nicolaÿ/nikɔlai/ (an insurance company in eastern France).
The diaeresis on ⟨u⟩ appears in the Biblical proper names Archélaüs/aʁʃelay/?, Capharnaüm/kafaʁnaɔm/ (with ⟨üm⟩ for /ɔm/ as in words of Latin origin such as album, maximum, or chemical element names such as sodium, aluminium), Emmaüs/ɛmays/, Ésaü/ezay/, and Saül/sayl/, as well as French names such as Haüy/aɥi/.[WP-fr has as 3 syllables, [ayi]] Nevertheless, since the 1990 orthographic changes, the diaeresis in words containing ⟨guë⟩ (such as aiguë/eɡy/ or ciguë/siɡy/) can be moved onto the ⟨u⟩: aigüe, cigüe, and by analogy may be used in verbs such as j'argüe. Without a diaeresis, the ⟨ue⟩ would be silent (or a schwa in accents which retain one): Aigues-Mortes/ɛɡ(ə)mɔʁt(ə)/.
In addition, words of German origin retain their umlaut (⟨ä, ö, ü⟩) if applicable but often use French pronunciation, such as Kärcher (/kεʁʃɛʁ/or/kaʁʃɛʁ/, trademark of a pressure washer).
A cedilla under ⟨c⟩ indicates that it is pronounced /s/ rather than /k/. Thus je lance "I throw" (with ⟨c⟩ for /s/ before ⟨e⟩), je lançais "I was throwing" (⟨c⟩ would represent /k/ before ⟨a⟩ without the cedilla). The cedilla is only used before ⟨a, o, u⟩, e.g. ça/sa/. A cedilla is not used before ⟨e, i, y⟩, since they already mark the ⟨c⟩ as /s/, e.g. ce, ci, cycle.
A tilde (⟨◌̃⟩) above ⟨n⟩ is occasionally used in French for words and names of Spanish origin that have been incorporated into the language (e.g., El Niño, piñata). Like the other diacritics, the tilde has no impact on the primary alphabetical order.
Diacritics are often omitted on capital letters, mainly for technical reasons (not present on AZERTY keyboards). However both the Académie française and the Office québécois de la langue française reject this usage and confirm that "in French, the accent has full orthographic value",[1] except for acronyms but not for abbreviations (e.g., CEE, ALENA, but É.-U.).[2] Nevertheless, diacritics are often ignored in word games, including crosswords, Scrabble, and Des chiffres et des lettres.
Ligatures
The ligatures⟨æ⟩ and ⟨œ⟩ are part of French orthography. For collation, these ligatures are treated like the sequences ⟨ae⟩ and ⟨oe⟩ respectively.
Æ
⟨æ⟩ (French: e dans l'a, a-e entrelacé or a, e collés/liés) is rare, appearing only in some words of Latin and Greek origin like tænia, ex æquo, cæcum, æthuse (as named dog’s parsley).[3] It generally represents the vowel /e/, like ⟨é⟩.
The sequence ⟨ae⟩ appears in loanwords where both sounds are heard, as in maestro and paella.[4]
Œ
⟨œ⟩ (French: e dans l'o, o-e entrelacé or o et e collés/liés) is a mandatory contraction of ⟨oe⟩ in certain words. Some of these are native French words, with the pronunciation /œ/ or /ø/, e.g. chœur "choir" /kœʁ/, cœur "heart" /kœʁ/, mœurs "moods (related to moral)" /mœʁ,mœʁs/, nœud "knot" /nø/, sœur "sister" /sœʁ/, œuf "egg" /œf/, œuvre "work (of art)" /œvʁ/, vœu "vow" /vø/. It usually appears in the combination ⟨œu⟩; œil/œj/ "eye" is an exception. Many of these words were originally written with the digraph⟨eu⟩; the ⟨o⟩ in the ligature represents a sometimes artificial attempt to imitate the Latin spelling: Latin: bovem > Old French buef/beuf > Modern French bœuf.
⟨œ⟩ is also used in words of Greek origin, as the Latin rendering of the Greek diphthong ⟨οι⟩, e.g. cœlacanthe "coelacanth". These words used to be pronounced with /e/, but in recent years a spelling pronunciation with /ø/ has taken hold, e.g. œsophage/ezɔfaʒ/ or /øzɔfaʒ/, Œdipe/edip/ or /ødip/ etc. The pronunciation with /e/ is often seen to be more correct.
When ⟨œ⟩ is found after ⟨c⟩, the ⟨c⟩ can be pronounced /k/ in some cases (cœur), or /s/ in others (cœlacanthe).
⟨œ⟩ is not used when both letters contribute different sounds. For example, when ⟨o⟩ is part of a prefix (coexister), or when ⟨e⟩ is part of a suffix (minoen), or in the word moelle and its derivatives.[5]
Digraphs and trigraphs
This section needs expansion. You can help by adding to it. (August 2008)
French digraphs and trigraphs have both historical and phonological origins. In the first case, it is a vestige of the spelling in the word's original language (usually Latin or Greek) maintained in modern French, e.g. the use of ⟨ph⟩ in téléphone, ⟨th⟩ in théorème, or ⟨ch⟩ in chaotique. In the second case, a digraph is due to an archaic pronunciation, such as ⟨eu⟩, ⟨au⟩, ⟨oi⟩, ⟨ai⟩, and ⟨œu⟩, or is merely a convenient way to expand the twenty-six-letter alphabet to cover all relevant phonemes, as in ⟨ch⟩, ⟨on⟩, ⟨an⟩, ⟨ou⟩, ⟨un⟩, and ⟨in⟩. Some cases are a mixture of these or are used for purely pragmatic reasons, such as ⟨ge⟩ for /ʒ/ in il mangeait ('he ate'), where the ⟨e⟩ serves to indicate a "soft" ⟨g⟩ inherent in the verb's root, similar to the significance of a cedilla to ⟨c⟩.
Spelling to sound correspondences
Some exceptions apply to the rules governing the pronunciation of word-final consonants. See Liaison (French) for details.
Consonants
Consonants and combinations of consonant letters
Spelling
Major value (IPA)
Examples of major value
Minor values (IPA)
Examples of minor values
Exceptions
Foreign words
-bs, -cs (in the plural of words ending in silent ⟨b⟩ or ⟨c⟩), -ds, -fs (in œufs, bœufs, and plurals of words ending in a silent ⟨f⟩), ‑gs, -ps, -ts
Ø Island, mesdames, mesdemoiselles, Descartes (also /j/), messieurs (not considered double s), messeigneurs (not considered double s), Debusclin (see also sch)
essence, effet, henné recherche, secrète, repli (before ⟨ch⟩+vowel or a consonant (except ⟨l, r⟩) followed by ⟨l, r⟩)
/e/et, pieds (and any other noun plural ending in (consonant other than t)+s) /a/ femme, solennel, fréquemment, (and other adverbs ending in -emment)[8] /œ/ Gennevilliers (see also -er) (see also ae)
les, nez, clef, mangez, (and any form of a verb in the second person plural that ends in -ez), assez (see also -er, -es), mesdames, mesdemoiselles (also /ɛ/), Descartes (also /ɛ/), eh, prehnite
que, de, je (in monosyllables), quatre, parle, chambre, répondre, hymne, indemne, syntagme (after two or more consonants of which the last is r, l, m or n), presque, puisque, quelque (the compound adjective pronouns ending in -que) (see also ae)
the suffix -tié, all conjugated forms of verbs with a radical ending in -t (augmentions, partiez, etc.) or derived from tenir, and all nouns and past participles derived from such verbs and ending in -ie (sortie, divertie, etc.)
^1 These combinations are pronounced /j/ after ⟨a, e, eu, œ, ou, ue⟩, all but the last of which are pronounced normally and are not influenced by the ⟨i⟩. For example, in rail, ⟨a⟩ is pronounced /a/; in mouiller, ⟨ou⟩ is pronounced /u/. ⟨ue⟩, however, which only occurs in such combinations after ⟨c⟩ and ⟨g⟩, is pronounced /œ/ as opposed to /ɥɛ/, e.g. orgueil/ɔʁɡœj/, cueillir/kœjiʁ/, accueil/akœj/, etc. These combinations are never pronounced /j/ after ⟨o, u⟩, except -⟨uill⟩- (/ɥij/), e.g. aiguille/egɥij/, juillet/ʒɥijɛ/, where the vowel + ⟨i⟩ + ⟨ll⟩ sequence is pronounced normally, although as usual, the pronunciation of ⟨u⟩ after ⟨g⟩ and ⟨q⟩ is somewhat unpredictable: poil, huile, équilibre[ekilibʁə] but équilatéral[ekɥilateʁal], etc.
Words from Greek
The spelling of French words of Greek origin is complicated by a number of digraphs which originated in the Latin transcriptions. The digraphs ⟨ph, th, ch⟩ normally represent /f,t,k/, respectively, in Greek loanwords; and the ligatures ⟨æ⟩ and ⟨œ⟩ in Greek loanwords represent the same vowel as ⟨é⟩ (/e/). Further, many words in the international scientific vocabulary were constructed in French from Greek roots and have kept their digraphs (e.g. stratosphère, photographie).
History
This section needs expansion. You can help by adding to it. (June 2008)
The Oaths of Strasbourg from 842 is the earliest text written in the early form of French called Romance or Gallo-Romance.
Roman
The Celtic Gaulish language of the inhabitants of Gaul disappeared progressively over the course of Roman rule as the Latin language began to replace it. Vulgar Latin, a generally lower register of Classical Latin spoken by the Roman soldiers, merchants and even by patricians in quotidian speech, was adopted by the natives and evolved slowly, taking the forms of different spoken Roman vernaculars according to the region of the empire.
In the 9th century, the Romance vernaculars were already quite far from Latin. For example, to understand the Bible, written in Latin, footnotes were necessary. The languages found in the manuscripts dating from the 9th century to the 13th century form what is known as Old French (ancien français). With consolidation of royal power, beginning in the 13th century, the Francien vernacular, the langue d'oil variety then in usage in the Île-de-France (region around Paris), took, little by little, over the other languages and evolved toward Classic French. These languages continued to evolve until Middle French (moyen français) emerged, in the 14th century to the 16th century.[13]
Middle French
During the Middle French period (c. 1300–1600), modern spelling practices were largely established. This happened especially during the 16th century, under the influence of printers. The overall trend was towards continuity with Old French spelling, although some changes were made under the influence of changed pronunciation habits; for example, the Old French distinction between the diphthongs ⟨eu⟩ and ⟨ue⟩ was eliminated in favor of consistent ⟨eu⟩,[a] as both diphthongs had come to be pronounced /ø/ or /œ/ (depending on the surrounding sounds). However, many other distinctions that had become equally superfluous were maintained, e.g. between ⟨s⟩ and soft ⟨c⟩ or between ⟨ai⟩ and ⟨ei⟩. It is likely that etymology was the guiding factor here: the distinctions ⟨s/c⟩ and ⟨ai/ei⟩ reflect corresponding distinctions in the spelling of the underlying Latin words, whereas no such distinction exists in the case of ⟨eu/ue⟩.
This period also saw the development of some explicitly etymological spellings, e.g. temps ("time"), vingt ("twenty") and poids ("weight") (note that in many cases, the etymologizing was sloppy or occasionally completely incorrect; vingt reflects Latin viginti, with the ⟨g⟩ in the wrong place, and poids actually comes from Latin pensum, with no ⟨d⟩ at all; the spelling poids is due to an incorrect derivation from Latin pondus). The trend towards etymologizing sometimes produced absurd (and generally rejected) spellings such as sçapvoir for normal savoir ("to know"), which attempted to combine Latin sapere ("to be wise", the correct origin of savoir) with scire ("to know").
Modern French spelling was codified in the late 17th century by the Académie française, based largely on previously established spelling conventions. Some reforms have occurred since then, but most have been fairly minor. The most significant changes have been:
Adoption of ⟨j⟩ and ⟨v⟩ to represent consonants, in place of former ⟨i⟩ and ⟨u⟩.
Addition of a circumflex accent to reflect historical vowel length. During the Middle French period, a distinction developed between long and short vowels, with long vowels largely stemming from a lost /s/ before a consonant, as in même (cf. Spanish mismo), but sometimes from the coalescence of similar vowels, as in âge from earlier aage, eage (early Old French*edage < Vulgar Latin*aetaticum, cf. Spanish edad < aetate(m)). Prior to this, such words continued to be spelled historically (e.g. mesme and age). Ironically, by the time this convention was adopted in the 19th century, the former distinction between short and long vowels had largely disappeared in all but the most conservative pronunciations, with vowels automatically pronounced long or short depending on the phonological context (see French phonology).
Use of ⟨ai⟩ in place of ⟨oi⟩ where pronounced /ɛ/ rather than /wa/. The most significant effect of this was to change the spelling of all imperfect verbs (formerly spelled -⟨ois⟩, -⟨oit⟩, -⟨oient⟩ rather than -⟨ais⟩, -⟨ait⟩, -⟨aient⟩), as well as the name of the language, from françois to français.
In October 1989, Michel Rocard, then-Prime Minister of France, established the High Council of the French Language (Conseil supérieur de la langue française) in Paris. He designated experts – among them linguists, representatives of the Académie française and lexicographers – to propose standardizing several points, a few of those points being:
The uniting hyphen in all compound numerals
e.g. trente-et-un
The plural of compound words, the second element of which always takes the plural s
e.g. un après-midi, des après-midis
The circumflex ⟨ˆ⟩ disappears on ⟨u⟩ and ⟨i⟩ except for when it is needed to differentiate homophones
e.g. coût (cost) → cout, abîme (abyss) → abime but sûr (sure) because of sur (on)
The past participle of laisser followed by an infinitive verb is invariable (now works the same way as the verb faire)
elle s'est laissée mourir → elle s'est laissé mourir
Quickly, the experts set to work. Their conclusions were submitted to Belgian and Québécois linguistic political organizations. They were likewise submitted to the Académie française, which endorsed them unanimously, saying:
"Current orthography remains that of usage, and the 'recommendations' of the High Council of the French language only enter into play with words that may be written in a different manner without being considered as incorrect or as faults."[citation needed]
The changes were published in the Journal officiel de la République française in December 1990. At the time the proposed changes were considered to be suggestions. In 2016, schoolbooks in France began to use the newer recommended spellings, with instruction to teachers that both old and new spellings be deemed correct.[14]
In France and Belgium, the exclamation mark, question mark, semicolon, colon, percentage mark, currency symbols, hash, and guillemet all require a thin space between the punctuation mark and the material it adjoins. Outside of France and Belgium, this rule is often ignored. Computer software may aid or hinder the application of this rule, depending on the degree of localisation, as it is marked differently from most other Western punctuation.
This rule is not uniformly observed in official names, e.g., either la Côte-d'Ivoire or la Côte d'Ivoire, and usually la Côte d'Azur has no hyphens.
The names of Montreal Metro stations are consistently hyphenated when suitable, but those of Paris Métro stations mostly ignore this rule. (For more examples, see Trait d'union.)